Comments (18)
Hmm, actually I'm not getting any taxonomy results from sourpurge
must be a change in sourmash.....
from aaftf.
hmm, I 'll try to test myself locally not sure if need to ask for input from @ctb or @luizirber if stuck.
from aaftf.
yeah could just be me -- I haven't tried to use it in a few years. (I'm also on my Mac with sourmash install from BIOCONDA)
from aaftf.
But alternatively we could stick with an older version of sourmash, in case anybody is looking here, commands we are trying to run are:
$ sourmash compute -k 31 --scaled=1000 --singleton assembly.fasta > assembly.fasta.sig
$ sourmash lca classify --db genbank-k31.lca.json.gz --query assembly.fasta.sig
Using the latest v4.2.2 on Mac OS, I essentially got this result:
ID | status | superkingdom | phylum | class | order | family | genus | species | strain |
---|---|---|---|---|---|---|---|---|---|
NODE_1_length_531696_cov_9.953852 | nomatch | ||||||||
NODE_2_length_448760_cov_9.622479 | nomatch | ||||||||
NODE_3_length_360422_cov_9.704374 | nomatch | ||||||||
NODE_4_length_343545_cov_9.301333 | nomatch | ||||||||
NODE_5_length_319398_cov_10.079307 | nomatch |
from aaftf.
Ah, I guess I should look at the command help menu!
$ sourmash compute
usage:
** WARNING: the sourmash compute command is DEPRECATED as of 4.0 and
** will be removed in 5.0. Please see the 'sourmash sketch' command instead.
sourmash compute -k 21,31,51 *.fa *.fq
Create MinHash sketches at k-mer sizes of 21, 31 and 51, for
all FASTA and FASTQ files in the current directory, and save them in
signature files ending in '.sig'. You can rapidly compare these files
with `compare` and query them with `search`, among other operations;
see the full documentation at http://sourmash.rtfd.io/.
from aaftf.
ahh okay. changes to apply.
from aaftf.
hi all, thanks for tagging me in!
I'll have to go digging to give you exact dates, but we updated LCA database formats many, many moons ago - back in 2.x somewhere.
The difference in results is unexpected. The underlying algorithms didn't change; the database format expanded to accommodate sketches that didn't have taxonomy associated.
Last but by no means least, sourmash compute still works as it did before, and the sketch/signature formats are the same. So no change needed there right now. It's just getting removed in 5.0 :).
from aaftf.
Okay thanks @ctb -- must be related to something with my install. I'll try to figure out and open an issue on sourmash GitHub if I can't figure it out. So @hyphaltip no reason to change the way we are running this quite yet, but we will need to update the database/resource link I think.
from aaftf.
well, I doubt it's your install - it's probably some SNAFU on our part, since it should have been working the same as before :). Either that or the database is bad/wrong? Yay computerz. We'll figure it out together tho, promise.
I do think you might want to take advantage of the new sourmash gather/sourmash tax
approach, which is much better than lca classify
, but that would be a somewhat bigger change. See @bluegenes blog post, https://bluegenes.github.io/sourmash-tax/.
from aaftf.
Okay, I'll look into that. Basically what we are trying to do here is just classify each contig from de novo assembly and remove things that are obviously contamination, ie bacterial taxonomic classification when we are working on a fungal genome.
from aaftf.
hi! reminded of this by https://twitter.com/jonpalmer2013/status/1521312530936725506 :)
we did just release new databases! it would be easy for me to build you a new Genbank LCA (or give you the commands to do it), or you could just use the GTDB ones.
from aaftf.
(as of sourmash v4.4, scheduled soon, we can also point you at larger-on-disk but much faster and lower memory SQLite-based LCA database.)
from aaftf.
thanks i was trying some thing before and it was way too slow for us to put on but I want to give this another go.
Noting that our current 'sourpurg' w sourmash did about as good a job as NCBI's now available screening tool in a fraction of the time and a lot less data to download..
from aaftf.
k - let us know how we can help! would AAFTF be something we can just download and run on our own, if we feel so inclined to try it out?
from aaftf.
sure - is very simple python package and certainly welcome someone else helping me package it up for conda properly...
from aaftf.
have implemented, it doesn't seem to really work as well as the old genbank-k31 though
with gtdb or gtdb-reps
CMD: sourmash lca classify --db /srv/projects/db/AAFTF_DB/gtdb-
rs207-genomic-reps.dna.k31.lca.json.gz --query assembly.fasta.sig
[May 29 11:18 AM] Found 0 taxonomic classifications for contigs:
With old scheme.
CMD: sourmash lca classify --db
/srv/projects/db/AAFTF_DB/genbank-k31.lca.json.gz --query
assembly.fasta.sig
[May 29 11:17 AM] Found 2 taxonomic classifications for contigs:
Eukaryota;Ascomycota;Dothideomycetes;Capnodiales;Cladosporiaceae;Rachicladosporium;Rachicladosporium antarcticum
Eukaryota;Ascomycota;Dothideomycetes;Capnodiales;Cladosporiaceae;Rachicladosporium
@ctb is this because the gtdb is really bacteria only? I think this is okay in a sense but I guess DBs are too large now to really do a single sourmash search on representative dbs?
from aaftf.
from aaftf.
Related Issues (13)
- different gunzip/gzip options on osx/linux in vecscreen.py HOT 3
- Add GC% in 'assess' command HOT 1
- After to setup of the requirements, how to install? Is it necessary a sudo user? HOT 4
- Out of Memory on default pilon run HOT 1
- Add racon as polishing step
- AAFTF pipeline always fails at vecscreen step HOT 2
- Support single-end and interleaved fastq data HOT 3
- Generate command list as top-level running HOT 1
- use bbduk.sh (BBMap) for filter step instead of bwa/bowtie matching reads HOT 6
- Specify version numbers of all sub-tools
- Integrate NCBI tax tool in screening HOT 2
- support other assemblers: dipspades HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aaftf.