Giter Club home page Giter Club logo

bioutils's People

Contributors

afrubin avatar andreasprlic avatar bmrobin avatar davmlaw avatar gomoto avatar lucaswiman avatar mihaitodor avatar pjcoenen avatar reece avatar theferrit32 avatar timothyjlaurent avatar trentwatt avatar veenarajaraman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

bioutils's Issues

(PYL-W0102) Dangerous default argument

Description

Do not use a mutable like list or dictionary as a default value to an argument. Python’s default arguments are evaluated once when the function is defined. Using a mutable default argument and mutating it will mutate that object for all future calls to the function as well.

Occurrences

There are 2 occurrences of this issue in the repository.

See all occurrences on DeepSource → deepsource.io/gh/biocommons/bioutils/issue/PYL-W0102/occurrences/

Add GRCh38p13 and GRCh38p14

Hi, could you please add the last 2 GRCh38 patches thanks?

GRCh38.p13

make_ac_name_map("GRCh38.p13")
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.10/dist-packages/bioutils/_data/assemblies/GRCh38.p13.json.gz'

GRCh38.p14

make_ac_name_map("GRCh38.p14")
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.10/dist-packages/bioutils/_data/assemblies/GRCh38.p14.json.gz'

Support degenerate codons in translate table

biocommons/hgvs#595 identified that degenerate codons are not supported by the translate function.

For the hgvs, this is a regression relative to prior versions that used biopython.

While this is being fixed, consider moving the translation tables to _data (i.e., here) to be accessed the same way that assemblies and cytobands are.

seqfetcher fails with protein accessions

NCBI apparently made a subtle change to eutilities that breaks sequence fetching by seqfetcher. Previously, fetching NP sequences from nucleotide worked fine; it stopped working around Feb 7, 2017.

This change will break hgvs validation for folks not using seqrepo, so the fix is urgent.

(PTC-W0019) Consider using literal syntax to create the data structure

Description

Using the literal syntax can give minor performance bumps compared to using function calls to create dict, list and tuple. <!--more--> ```bash In [1]: timeit.timeit(stmt="dict()", number=100000000) Out[1]: 9.560388602000103 In [2]: timeit.timeit(stmt="{}", number=100000000) Out[2]: 1.685333584000091 In [3]: timeit.timeit(stmt="tuple()", number=100000000) Out[3]: 4.509182139000131 In [4]: timeit.timeit(stmt="()", number=100000000) Out[4]: 0.5455615430000762 In [5]: timeit.timeit(stmt="list()", number=100000000) …

Occurrences

There is 1 occurrence of this issue in the repository.

See all occurrences on DeepSource → deepsource.io/gh/biocommons/bioutils/issue/PTC-W0019/occurrences/

build broken

same problem as several of our projects. To fix, need to do the same as here (mark optional dependencies, move dependencies into pyproject.toml)

Tests do not run when package is freshly installed

Can replicate with docker file:

FROM python:3.7

COPY . /app
WORKDIR /app
RUN pip install '.[dev,test]'
RUN make test

Output:

[...]
 > [5/5] RUN make test:
#7 0.343 pytest
#7 1.021 ImportError while loading conftest '/app/tests/conftest.py'.
#7 1.021 tests/conftest.py:5: in <module>
#7 1.021     import vcr
#7 1.021 /usr/local/lib/python3.7/site-packages/vcr/__init__.py:2: in <module>
#7 1.021     from .config import VCR
#7 1.021 /usr/local/lib/python3.7/site-packages/vcr/config.py:11: in <module>
#7 1.021     from .cassette import Cassette
#7 1.021 /usr/local/lib/python3.7/site-packages/vcr/cassette.py:12: in <module>
#7 1.021     from .patch import CassettePatcherBuilder
#7 1.021 /usr/local/lib/python3.7/site-packages/vcr/patch.py:41: in <module>
#7 1.021     _VerifiedHTTPSConnection = cpool.VerifiedHTTPSConnection
#7 1.021 E   AttributeError: module 'urllib3.connectionpool' has no attribute 'VerifiedHTTPSConnection'
#7 1.063 make: *** [Makefile:71: test] Error 4
------
executor failed running [/bin/sh -c make test]: exit code: 2

vcrpy version is not specified in setup.cfg

Improve support for degenerate codons

PR #30 added basic support for codons so that any codon with an ambiguity code translated as X (the wildcard AA). However, it's often possible to translate codons with ambiguity codes where the ambiguity is irrelevant to the outcome. For example, in a standard translation table, CUN ⇒ Leu, GCN ⇒ Ala, GGN ⇒ Gly, AAY ⇒ Asn, etc.

This issue is to provide fuller support for ambiguity codes. Ideally, the solution will work for any translation table.

Fix testing warnings

We've accumulated test warnings, mostly related to deprecations in pytest. Fix these.

Write/improve documentation

bioutils now has a proper docs directory. Nearly all of the docs are actually pulled from the docstrings in the source files. Results are automatically built at https://bioutils.readthedocs.io.

To build locally type make -C docs html, then open docs/build/html/index.html.

To write/improve documentation, do the following:

  • Set up a linux VM
  • Fork and clone this repo
  • Set up environment. make devready ought to do it.
  • Demonstrate that you can build with make -C docs html
  • Add a comment to this issue saying that you're working on docs for a specific file
  • Make changes, using make -C html to rebuild as necessary.
  • Commit with a message like #22: Added docs for normalize.py (The #22 refers to this issue and gihub will automatically create a link to it.)
  • git push
  • Submit a PR at github

N.B. bioutils.readthedocs.io won't be rebuilt until your PR is merged, so don't expect that to update immediately.

Thanks!

release latest main branch

Hi Reece,

I haven't see a release of bioutils since about a year ago. What is needed to get a new release out? Let me know if I can help with that.

Thanks,
Andreas

seqfetcher doesn't support Ensembl transcript versions

The Ensembl sequence API only supports transcripts, not transcript.version, and returns the latest transcript version sequence

Example:

from bioutils.seqfetcher import fetch_seq
fetch_seq("ENST00000543872.6")

throws exception:

RuntimeError: Failed to fetch ENST00000543872.6 (400 Client Error: Bad Request for url: http://rest.ensembl.org/sequence/id/ENST00000543872.6)

I will link a pull request that fixes this by stripping the version before calling the API, then checking if the version in the response matches

Implement flexible sequence normalization

Implement normalization with the following arguments:

  • alleles[]: array of sequence strings
  • interval: location of alleles
  • bounds: maximal extent of normalization left and right (for intron or other barriers)
  • sequence_fetcher: callback to fetch sequence context
  • mode: shuffle left (vcf), shuffle right (hgvs), extend (voca)
  • consider: anchor: 0 (# of bases left and right)

Returns:

  • new interval
  • normalized alleles

See ga4gh/vrs-python#16 and ga4gh/vrs-python#17.

wheel build fails

A very small error in the setup.cfg file, introduced in version 0.5.0 is causing wheel builds to fail.

Here is the setup.cfg change to make:
-license-file = LICENSE
+license-file = LICENSE.txt

pip works around this issue for installations from the command-line, but it causes a problem for build chains relying on the wheel build (it breaks in AWS SAM using a docker container).

The problem can reproduced with this command:
pip wheel --wheel-dir wheels/bioutils/ bioutils==0.5.0

While the wheel build for the previous version works fine:
pip wheel --wheel-dir wheels/bioutils/ bioutils==0.4.4

migrate from recordtype to attrs

recordtype is 6 years old and unmaintained.
It's started throwing false alarm errors like this:

Searching for recordtype
Reading https://pypi.python.org/simple/recordtype/
Downloading https://pypi.python.org/packages/cc/1c/7ff90f4379110d6ef92a7f44ce487f235dbb3243f17c5294a73e0156b6f4/recordtype-1.1.tar.gz#md5=8133256b9c62baa2019ec16db3b14115
Best match: recordtype 1.1
Processing recordtype-1.1.tar.gz
Writing /tmp/easy_install-_oii6rik/recordtype-1.1/setup.cfg
Running recordtype-1.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-_oii6rik/recordtype-1.1/egg-dist-tmp-f50gmf3h
  File "build/bdist.linux-x86_64/egg/recordtype.py", line 250
    exec template in namespace
                ^
SyntaxError: Missing parentheses in call to 'exec'

zip_safe flag not set; analyzing archive contents...
Moving recordtype-1.1-py3.5.egg to /home/reece/projects/biocommons/bioutils/.eggs

Installed /home/reece/projects/biocommons/bioutils/.eggs/recordtype-1.1-py3.5.egg

I have no idea why it fails and then succeeds, but I don't care to work it out. Migrate to attrs, which is better anyway.

seqfetcher: Retry after rate limit error

NCBI returns http status 429 and {"error":"API rate limit exceeded","api-key":"157.131.198.215","count":"5","limit":"3"} when rate limit is exceeded.

Implement retries when this error is received.

See also biocommons/eutils#131, which would use eutils timing and caching.

Implement origin inference using identifiers.org information

Identifiers.org contains regexps associated with identifier syntax. For example, http://identifiers.org/insdc shows:

field value
Recommended name Nucleotide Sequence Database
Alternative name(s) International Nucleotide Sequence Database CollaborationINSDCNCBI nucleotideGenBank
Description The International Nucleotide Sequence Database Collaboration (INSDC) consists of a joint effort to collect and disseminate databases containing DNA and RNA sequences.
Identifier pattern ^([A-Z]\d{5}|[A-Z]{2}\d{6}|[A-Z]{4}\d{8}|[A-J][A-Z]{2}\d{5})(.\d+)?$
Registry identifier MIR:00000029
Namespace insdc
URI http://identifiers.org/insdc/

Goal: implement functions to infer namespace from a given accession based on regexp matches.

Records obtained from identifiers.org should be general enough to enable implementing CURIEs (using the namespace) and resolvers (using the uri).

The registry is available at https://identifiers.org/service/registryxml

Add support for selenoproteins

bioutils currently does not support selenoproteins. It would be great to be able to add it to the codon translation table (technically it is already there, but maps to "") . We prob want to have a different translation table that maps the stop codon to SEC / U. Also add support for the alternate translation table as an option to translate_cds.

Finally I don't think that sequences.py has any test coverage. Add some bare bones unit tests for the newly added features.

seqrepo tests fail due to removal of unicode coercion

I removed coercion of sequences to unicode in digests.py. Unsurprisingly, code that doesn't pass unicode now fails. Unfortunately, that includes seqrepo tests.

So, bring coercion back until this can be done more thoughtfully. (It would be good to indicate when sequences are being coerced and warn callers.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.