Giter Club home page Giter Club logo

seqrepo-rest-service's Introduction

This project exists only to ensure that the top-level biocommons namespace is correctly declared as a namespace package.

This trivial package was pushed to pypi in order to reserve the biocommons namespace (contributions are welcome!).

Steps:

pyvenv venv
source venv/bin/activate
pip install wheel
python setup.py register
python setup.py sdist bdist bdist_egg bdist_wheel upload

seqrepo-rest-service's People

Contributors

andreasprlic avatar bmrobin avatar bpeterman avatar korikuzma avatar melissacline avatar reece avatar theferrit32 avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

seqrepo-rest-service's Issues

docker build fails on Jinja2 package compatibility error

steps to reproduce on main:

docker build -t seqrepo-rest-service:latest .

[...omitting long output...]

#10 20.25 error: Jinja2 3.0.3 is installed but Jinja2<3.0,>=2.10.1 is required by {'flask'}
------
executor failed running [/bin/sh -c python3 setup.py install]: exit code: 1

Add optional argv to main so it can be overridden programmatically

Add an argv parameter to cli.main, cli._parse_opts. Currently uses ArgumentParser.parse_args() with no args, so it only looks at sys.argv.

Parameterizing those would make it easier to run the service programmatically from another application.

def main():
coloredlogs.install(level="INFO")
if "SEQREPO_DIR" in os.environ:
_logger.warn("SEQREPO_DIR environment variable is now ignored")
opts = _parse_opts()

seqrepo exceptions when handling concurrent requests

Description

The same file exceptions related to seqrepo thread safety tracked here: clingen-data-model/architecture#548
occur when running seqrepo-rest-service. When there are concurrent requests, the global seqrepo object in the flask worker causes exceptions, and once it hits this exception, there will be an exception on all future requests that attempt to read from that file as well.

Stopping and starting the seqrepo-rest-service process resets it and temporarily resolves the problem. So the issue is with process state, there's no issue with the actual files on the filesystem.

The temporary solution to the above github issue that avoids modifying the seqrepo codebase was to add mutual exclusion to the object that contained the SeqRepo object. This enables the application to run concurrent threads except when they are executing code under that object, which is suboptimal because that is a fairly large critical section to make mutually exclusive.

Steps to reproduce

Run the server from one shell:

$ seqrepo-rest-service /path/to/seqrepo/2021-01-29

Send requests from a second shell:

$ bash -c 'for i in $(seq 1 100); do curl "http://127.0.0.1:5001/seqrepo/1/sequence/refseq%3ANM_000551.3" & done'

Provide search interface

There is no search interface in the current implementations: a sequence either exists or doesn't (404).

For this issue, design and implement an endpoint to search for sequences. Examples:

  • Find all versions of an unversioned sequence
  • Find all sequences from a specific namespace

Because searching could return large sets, the implementation will need to consider whether/how to handle paging and state (sessions).

GA4GH Aliases Bug

When fetching aliases, the ga4gh computed identifier has a GS prefix, which is not listed in the documentation.

This bug can be seen from the readme example:

  $ curl -f "http://0.0.0.0:5000/seqrepo/1/metadata/GRCh38:1"
    {
    "added": "2016-08-27T21:17:00Z",
    "aliases": [
        "GRCh38:1",
        "GRCh38:chr1",
        "GRCh38.p1:1",
            ... 
        "GRCh38.p9:chr1",
        "MD5:6aef897c3d6ff0c78aff06ac189178dd",
        "NCBI:NC_000001.11",
        "refseq:NC_000001.11",
        "SEGUID:FCUd6VJ6uikS/VWLbhGdVmj2rOA",
        "SHA1:14251de9527aba2912fd558b6e119d5668f6ace0",
        "VMC:GS_Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO",
        "sha512t24u:Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO",
        "ga4gh:GS.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO"
    ],
    "alphabet": "ACGMNRT",
    "length": 248956422
    }

I might be wrong, but I believe that the prefix should instead be SQ?

Service maxes out number of open files

This service was working great for me for a couple of days but then it maxes out the OS allowable files open at the same time and fails.

Gives a Failed to open FASTA index ... "Too many open files"

A lsof | wc -l shows -> 65,863 files, a huge number of db.sqlite3 as well as individual fasta files.

Is there a workaround for this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.