Giter Club home page Giter Club logo

netsurfp-3.0's People

Contributors

eryk96 avatar magnushhoie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

netsurfp-3.0's Issues

HHblits or MMseqs,and the model

I found that in Netsurfp3 it only shows the HHblits in example/netsurfp_3,but not same as netsurfp2 which also has the MMseqs?
Otherwise,should I train before I predict?It seems that the code didn't have the pretrained model to predicate?

CUDA out of memory for FASTA entries > 1022 residues

The error occurs on GPU and the Biolib server (https://dtu.biolib.com/NetSurfP-3/) for FASTA files with sequences longer than the original ESM1b limit of ~1024 residues. It is reproducible on Biolib with as low as 60 sequences of 1900 residues each. At the same time, a massive input of 4900 sequences of 1000 residues works fine.

Current work-around:
NOTE: If you get an out-of-memory error, remove all sequences above 1022 residues. Sequences above 1022 residues can be submitted up to 40 at at time. The bug occurs due to an unresolved Pytorch GPU memory handling bug. Alternatively, use the DTU Healthtech server, which uses CPU only: https://services.healthtech.dtu.dk/service.php?NetSurfP-3.0

Input file tests (uploaded here: https://github.com/Eryk96/NetSurfP-3.0/tree/main/healthtech/input_tests)

  • 4900 seqs x 1000 residues no problem (4.9M residues total)
  • 40 seqs x 1900 residues no problem (76k residues total)
  • 60x1900 residues:
    RuntimeError: CUDA out of memory. Tried to allocate 1.95 GiB (GPU 0; 14.56 GiB total capacity; 10.13 GiB already allocated; 1.83 GiB free; 11.79 GiB reserved in total by PyTorch)

I think the bug is related to this code:
https://github.com/Eryk96/NetSurfP-3.0/blob/main/nsp3/nsp3/embeddings/esm1b.py#L76)

We overcome ESM-1bs limit of 1024 residues per sequence by separating longer sequences into chunks, predicting them, then concatenating them back into the original sequence length. My guess is that some CUDA object remains on the GPU between batches, eventually leading to out of memory errors. However, I cannot see that ANYTHING remains in the code.

[Question] Does NetSurfP-3.0 perform better?

Does replacing the HMM with the embeddings actually improve the performance relative to the results reported in the NetSuftP-2.0 paper? (Sorry if I missed any comparison statistics you've already added).

Thanks!

mm_msa not found

I'm using the Netsurfp 2.0 from official website,and mmseqs2 in netsurfp, something wrong with the result2msa,it seems that after search, the 'out.mm_msa' didn't exist.
I installed it by conda, and the version is 13.45111
Here are the output:
NetSurfP-2 : INFO Running mmseqs search example_out/mmseqs_files/in.mmdb db/swissprot example_out/mmseqs_files/out.mm_search /tmp/tmphggd3z8b --num-iterations 2 --max-seqs 2000
03:25 NetSurfP-2 : INFO Running mmseqs result2msa example_out/mmseqs_files/in.mmdb db/swissprot example_out/mmseqs_files/out.mm_search example_out/mmseqs_files/out.mm_msa
03:25 NetSurfP-2 : INFO Parsing MMseqs2 MSA database
Traceback (most recent call last):
File "/opt/conda/envs/netsurfp/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/envs/netsurfp/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/nfs/my/lsz/netsurfp2/netsurfp2/main.py", line 95, in
entry()
File "/nfs/my/lsz/netsurfp2/netsurfp2/main.py", line 59, in entry
profiles = searcher(protlist, args.out)
File "/nfs/my/lsz/netsurfp2/netsurfp2/preprocess.py", line 150, in call
with open(mmmsa) as fdat, open(mmmsa + '.index') as fidx:
FileNotFoundError: [Errno 2] No such file or directory: 'example_out/mmseqs_files/out.mm_msa'

TypeError: string indices must be integers

Hello, First of all, Thank you for your contribution very much. However, I'm facing the problem of training with pretrained weight of ESM1b.
Whenever I try to train the NetSurfP3 with the language model from Facebook. It could not run smoothly as NetsurfP2.
The error is as follows:

WARNING:setup:"logging.yml" not found. Using basicConfig.
INFO:nsp3.nsp3.main:Building: nsp3.models.ESM1b
Traceback (most recent call last):
File "/home/quang/.conda/envs/nsp3/bin/nsp3", line 33, in
sys.exit(load_entry_point('nsp3', 'console_scripts', 'nsp3')())
File "/home/quang/.conda/envs/nsp3/lib/python3.8/site-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/home/quang/.conda/envs/nsp3/lib/python3.8/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/quang/.conda/envs/nsp3/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/quang/.conda/envs/nsp3/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/quang/.conda/envs/nsp3/lib/python3.8/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/ssd1/quang/dti2d/NetSurfP-3.0/nsp3/nsp3/cli.py", line 32, in train
main.train(config, resume)
File "/ssd1/quang/dti2d/NetSurfP-3.0/nsp3/nsp3/main.py", line 42, in train
model = get_instance(module_arch, 'arch', cfg)
File "/ssd1/quang/dti2d/NetSurfP-3.0/nsp3/nsp3/main.py", line 285, in get_instance
return getattr(module, ctor_name)(*args, **config[name]['args'])
File "/ssd1/quang/dti2d/NetSurfP-3.0/nsp3/nsp3/models/ESM1b/model.py", line 23, in init
self.embedding = ESM1bEmbedding(language_model, **kwargs)
File "/ssd1/quang/dti2d/NetSurfP-3.0/nsp3/nsp3/embeddings/esm1b.py", line 42, in init
alphabet = esm.Alphabet.from_architecture(embedding_args['arch'])
TypeError: string indices must be integers

looking forward to hearing from you soon!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.