Giter Club home page Giter Club logo

Comments (4)

fbreitwieser avatar fbreitwieser commented on August 19, 2024

It seems there is currently no complete Zika genome in RefSeq - I found that very surprising, too.

Look at https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi and https://www.ncbi.nlm.nih.gov/assembly/?term=txid64320[Organism:noexp] . Since we only take the latest complete genome, it didn't find its way into the database. I think it is a mistake that that assembly is flagged as 'Scaffold' level assembled - there is only one scaffold, and it replaced an assembly that was flagged as complete.

I will look into the later issue of downloading the RefSeq data. However it won't fix the issue of the missing Zika genome - RefSeq has to be updated for that. However you could add the Zika virus reference genome, and add one entry (NC_012532<tab>64320) to the map file provided to centrifuge-build via the --conversion-table argument.

Also I'll work on providing a Makefile target for a database that includes viral strains from the NCBI viral genome resource.

from centrifuge.

fbreitwieser avatar fbreitwieser commented on August 19, 2024

Fixed now. Couple of points:

  • consider installing rsync for faster downloads. The downloads failed because the script falls back to curl/wget when rsync is not installed, and those did not have the address updated from ftp to https

  • I added several more database targets to the Makefile, including one with only viruses (v) and prokaryotes (p) or the combination (p+v). Try

    make THREADS=10 v

etc.

I'll re-build the standard database next week with all viral genomes.

from centrifuge.

waywardsyintist avatar waywardsyintist commented on August 19, 2024

Hello,

Re-installed centrifuge, and installed rsync.

When trying to make the p+v index, I got the following error...

jrussellmac:indices jrussell$ make THREADS=4 p+v DONT_DUSTMASK=1
Making: p+v: p+v
/Library/Developer/CommandLineTools/usr/bin/make -f Makefile IDX_NAME=p+v
[[ -d tmp_p+v ]] && rm -rf tmp_p+v; mkdir -p tmp_p+v
Downloading and dust-masking archaea
centrifuge-download -o tmp_p+v -d "archaea" -P 4 refseq >
tmp_p+v/all-archaea.map
Downloading ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/archaea/assembly_summary.txt ...
rsync: failed to connect to ftp.ncbi.nlm.nih.gov: No route to host (65)
rsync error: error in socket IO (code 10) at clientserver.c(122) [Receiver=3.0.7]
rsync Download failed! Have a look at valid domains at ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq .
make[1]: *** [reference-sequences/all-archaea.fna] Error 1
make: *** [p+v] Error 2

Also tried 'make THREADS=4 v'. Error is below...

jrussellmac:indices jrussell$ make THREADS=4 v DONT_DUSTMASK=1
Making: v: v
/Library/Developer/CommandLineTools/usr/bin/make -f Makefile IDX_NAME=v
[[ -d tmp_v ]] && rm -rf tmp_v; mkdir -p tmp_v
Downloading and dust-masking viral-any_level
centrifuge-download -o tmp_v -d "viral-any_level" -P 4 refseq >
tmp_v/all-viral-any_level.map
viral-any_level is not a valid domain - use one of the following:
archaea
bacteria
fungi
invertebrate
plant
protozoa
unknown
vertebrate_mammalian
vertebrate_other
viral
make[1]: *** [reference-sequences/all-viral-any_level.fna] Error 1
make: *** [v] Error 2

It seems like NCBI isn't liking the way things are named in the MAKEFILE? I tried changing names a bit but got nowhere.

Any insight much appreciated.

Thanks.

from centrifuge.

waywardsyintist avatar waywardsyintist commented on August 19, 2024

from centrifuge.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.