Comments (6)
yes
- filter is for reads uses BBMap by default against a contamination and primer/adaptor database. It can also be given a set of accessions to screen against if you have known contamination you want to remove at the read level.
- vecscreen is for assembled contigs - this is probably going to be replaced with the ncbi fcs-adaptor step - it recapitulated a vector screening tool that was at ncbi online https://www.ncbi.nlm.nih.gov/tools/vecscreen/ - this step will remove parts or whole contigs that match a vector and adaptor database - if a vector match is found within a contig the contig is split into pieces since one cannot be sure the linkage of the contig is sound now.
- sourpurge is looking for cross-species contamination and removes the entire contig if it is classified as contamination. this is may be subsituted with ncbi fcs-gx
from aaftf.
Vecscreen removes primer and obvious contam (phiX) and trims
Or splits contigs using univec db.
Sourpurge screens scaffolds for contamination by looking for matches to groups outside the phylum. It removes scaffolds that are likely matching to a contamination source.
I have also just written in support for NCBI fcs-Gx tool which does similar thing to sourmash/sourpurge step.
from aaftf.
In addition to vecscreen and sourpurge, the filter function is also involved in removing gene fragments of non-target species. The filter step removes reads before the assemble step, and also removes reads of mitochondrial genes from the mito step. Therefore, in AAFTF, there are three steps involved in removing gene fragments of non-target species.
Did I summarize it correctly?
from aaftf.
someday soon I'll finish writing up a manuscript on the tool to give this more description.
the ncbi fcs-adapator and fcs-gx are being tested but gx is a little unwieldy as it needs a lot of memory and large DB while sourpurge is pretty small footprint (though the version of the DB I had been using before was removed from OSF.io so the default install may not work right now till we see if the replacement larger DB can still work efficiently).
from aaftf.
fcs-gx support is now in AAFTF - available in live code and will be part of the v0.5.0 release
from aaftf.
0.5.0 has these features. closing this query since answered above.
from aaftf.
Related Issues (20)
- Support single-end and interleaved fastq data HOT 3
- Generate command list as top-level running HOT 1
- use bbduk.sh (BBMap) for filter step instead of bwa/bowtie matching reads HOT 6
- Specify version numbers of all sub-tools
- Integrate NCBI tax tool in screening HOT 2
- support other assemblers: dipspades HOT 1
- Add GC% in 'assess' command HOT 1
- After to setup of the requirements, how to install? Is it necessary a sudo user? HOT 4
- Out of Memory on default pilon run HOT 1
- Add racon as polishing step
- sourmash LCA database update url? HOT 18
- AAFTF pipeline always fails at vecscreen step HOT 2
- [Enhancement] Assembly for Nanopore reads HOT 2
- error when running AAFTF filter and AAFTF vecscreen HOT 5
- masurca polca.sh breaks with recent samtools HOT 1
- AAFTF pipeline exiting when mito fails HOT 3
- spades assembly failed HOT 4
- aaftf filter URL issue HOT 2
- AAFTF filter urllib issue
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aaftf.