Comments (4)
It seems there is currently no complete Zika genome in RefSeq - I found that very surprising, too.
Look at https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi and https://www.ncbi.nlm.nih.gov/assembly/?term=txid64320[Organism:noexp] . Since we only take the latest complete genome, it didn't find its way into the database. I think it is a mistake that that assembly is flagged as 'Scaffold' level assembled - there is only one scaffold, and it replaced an assembly that was flagged as complete.
I will look into the later issue of downloading the RefSeq data. However it won't fix the issue of the missing Zika genome - RefSeq has to be updated for that. However you could add the Zika virus reference genome, and add one entry (NC_012532<tab>64320
) to the map file provided to centrifuge-build
via the --conversion-table
argument.
Also I'll work on providing a Makefile target for a database that includes viral strains from the NCBI viral genome resource.
from centrifuge.
Fixed now. Couple of points:
-
consider installing rsync for faster downloads. The downloads failed because the script falls back to curl/wget when rsync is not installed, and those did not have the address updated from ftp to https
-
I added several more database targets to the Makefile, including one with only viruses (v) and prokaryotes (p) or the combination (p+v). Try
make THREADS=10 v
etc.
I'll re-build the standard database next week with all viral genomes.
from centrifuge.
Hello,
Re-installed centrifuge, and installed rsync.
When trying to make the p+v index, I got the following error...
jrussellmac:indices jrussell$ make THREADS=4 p+v DONT_DUSTMASK=1
Making: p+v: p+v
/Library/Developer/CommandLineTools/usr/bin/make -f Makefile IDX_NAME=p+v
[[ -d tmp_p+v ]] && rm -rf tmp_p+v; mkdir -p tmp_p+v
Downloading and dust-masking archaea
centrifuge-download -o tmp_p+v -d "archaea" -P 4 refseq >
tmp_p+v/all-archaea.map
Downloading ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/archaea/assembly_summary.txt ...
rsync: failed to connect to ftp.ncbi.nlm.nih.gov: No route to host (65)
rsync error: error in socket IO (code 10) at clientserver.c(122) [Receiver=3.0.7]
rsync Download failed! Have a look at valid domains at ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq .
make[1]: *** [reference-sequences/all-archaea.fna] Error 1
make: *** [p+v] Error 2
Also tried 'make THREADS=4 v'. Error is below...
jrussellmac:indices jrussell$ make THREADS=4 v DONT_DUSTMASK=1
Making: v: v
/Library/Developer/CommandLineTools/usr/bin/make -f Makefile IDX_NAME=v
[[ -d tmp_v ]] && rm -rf tmp_v; mkdir -p tmp_v
Downloading and dust-masking viral-any_level
centrifuge-download -o tmp_v -d "viral-any_level" -P 4 refseq >
tmp_v/all-viral-any_level.map
viral-any_level is not a valid domain - use one of the following:
archaea
bacteria
fungi
invertebrate
plant
protozoa
unknown
vertebrate_mammalian
vertebrate_other
viral
make[1]: *** [reference-sequences/all-viral-any_level.fna] Error 1
make: *** [v] Error 2
It seems like NCBI isn't liking the way things are named in the MAKEFILE? I tried changing names a bit but got nowhere.
Any insight much appreciated.
Thanks.
from centrifuge.
from centrifuge.
Related Issues (20)
- Where are you Sulfobacillus? HOT 33
- error in make
- Custom database including only specific organisms for reads separation?
- Error when executing centrifuge "Argument list too long" HOT 4
- how to make seqid2taxid.map HOT 1
- Error when downloading only bacteria version HOT 3
- Error downloading centrifuge databases HOT 11
- centrifuge download fails HOT 2
- The second file of customer db is empty HOT 4
- Database download taking a lot of disk space + taking too long HOT 3
- not able to build database, taxonomy does not exist warnings, latest version by git HOT 1
- M. tuberculosis genome size error
- Unrecognised `--temp-directory` option, however specified in code? HOT 9
- ERROR): Expected centrifuge to be in same directory with centrifuge-class: HOT 2
- Database problem HOT 1
- Database question HOT 10
- Can using "--mm" flag reduce memory usage? HOT 2
- NCBI nt index HOT 6
- option --packed
- nt index incomplete build HOT 21
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from centrifuge.