stajichlab / aaftf Goto Github PK
View Code? Open in Web Editor NEWAutomatic Assembly For The Fungi
License: MIT License
Automatic Assembly For The Fungi
License: MIT License
Add a memory option to AAFTF pilon step that can specific heapsize otherwise we get out of memory errors on large read pair datasets
Do they all aim to remove contaminants from contigs (i.e., eliminate contigs that do not belong to the target species)?
polca.sh (part of @alekseyzimin/masurca) appears to need an older version of samtools, I have provided a fix for this in current code patches folder which can be used to update in place the polca.sh
script installed with masurca.
gunzip -k
is not valid on linux
could we use native gzip in python to open and uncompress the file rather than rely on this since we aren't doing a parallel tool like pigz anyways?
Support single ended read data for cleanup and assembly.
Support interleaved read data - may still have to split into fwd and rev but the program should do this not require the user.
Seems that at some point sourmash newer version requires an updated LCA database, I got failure with v4.2.2 and the one we have pinned in resources.py. New link is https://osf.io/9xdg2/download and page is https://sourmash.readthedocs.io/en/latest/databases.html
Hi dear all,
I'm trying to use, but I did not find instructions to setup the AAFTF after the requirements.
I cloned the repository and added to the path, but did not work.
Could you please send me some instructions?
Thank you.
Kind regards
Nickolas
I had no issues running each step of the pipeline independently, but when I tried to run AAFTF in pipeline mode it always failed at the vecscreen step. Turns out there is just a simply typo on line 137 in the pipeline.py
script. Can you remove the space?
# please change
if not checkfile(basename + ' .vecscreen.fasta'):
# to
if not checkfile(basename + '.vecscreen.fasta'):
Running the tool with all-in-one options so that it generates a BASH / Makefile or script to run all the sub-pieces so that user does not have to write script to do all the steps.
Support "smart" restart options in the pipeline so that previously run steps are not re-done.
This seems like a solution for snakemake or makefiles instead of shell scripts....
Hello,
Is it possible to run the AAFTF pipeline, skipping the mito step? I get an error from Novoplasty about invalid seed, which is not true, so it might be that no mitochondrial reads are present. Is there a way to make the pipeline ignore the mito step and continue with the rest?
Thank you in advance.
https://github.com/ncbi/ngs-tools/tree/tax/tools/tax tool provides way to screen for contamination in reads or contigs before processing
add a row specifying mean or median GC% for genome
Hi! We have sequenced some isolates of Candida auris and would like to perform de novo assembly. May we ask is it possible to add support for Nanopore reads?
Hello! I tried to run the script for aaftf filter but terminal gave me always the same error:
Running AAFTF v0.5.0
error with url https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/819/615/GCF_000819615.1_ViralProj14015/GCF_000819615.1_ViralProj14015_genomic.fna.gz aaftf-filter_85caa12f/GCF_000819615.1_ViralProj14015_genomic.fna.gz
Traceback (most recent call last):
File "/mnt/home/dematth2/anaconda3/envs/aaftf/bin/AAFTF", line 8, in
sys.exit(main())
File "/mnt/home/dematth2/anaconda3/envs/aaftf/lib/python3.7/site-packages/AAFTF/AAFTF_main.py", line 1113, in main
args.func(parser, args)
File "/mnt/home/dematth2/anaconda3/envs/aaftf/lib/python3.7/site-packages/AAFTF/AAFTF_main.py", line 53, in run_subtool
submodule.run(parser, args)
File "/mnt/home/dematth2/anaconda3/envs/aaftf/lib/python3.7/site-packages/AAFTF/filter.py", line 65, in run
earliest_file_age = os.path.getctime(acc_file)
File "/mnt/home/dematth2/anaconda3/envs/aaftf/lib/python3.7/genericpath.py", line 65, in getctime
return os.stat(filename).st_ctime
FileNotFoundError: [Errno 2] No such file or directory: 'aaftf-filter_85caa12f/GCF_000819615.1_ViralProj14015_genomic.fna.gz'
======= This is where my script ends! =========
I tried to modify the link because, if i delete the last part and i copy on my internet page i can download the Escherichia coli's sequence but never has changed on the terminal...i obtained the same error.
I tried also to modify the script first by adding the genome directory (before i downloaded it) with the options -u but it keeps insisting on the URL and then i tried to add the right URL instead, but it keeps giving an error on that initial URL.
Thank you 👍
use kmer based matching of reads with bbduk.sh which can also run phiX and primer filtering.
It is possible this could replace trim trimmomatic step as well? @nextgenusfs
Report the version number of tools utilized by AAFTF to support methods capture and writing.
Should be trivial to add dipspades support. should add this.
Hi, dear AAFTF team,
When I running command of AAFTF filter and AAFTF vecscreen, I stuck at trouble as following:
Traceback (most recent call last):
File "/home/liangdong/opt/anaconda3/bin/AAFTF", line 8, in <module>
sys.exit(main())
File "/home/liangdong/opt/anaconda3/lib/python3.10/site-packages/AAFTF/AAFTF_main.py", line 936, in main
args.func(parser, args)
File "/home/liangdong/opt/anaconda3/lib/python3.10/site-packages/AAFTF/AAFTF_main.py", line 47, in run_subtool
submodule.run(parser, args)
File "/home/liangdong/opt/anaconda3/lib/python3.10/site-packages/AAFTF/vecscreen.py", line 285, in run
urllib.request.urlretrieve(url, file)
File "/home/liangdong/opt/anaconda3/lib/python3.10/urllib/request.py", line 241, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "/home/liangdong/opt/anaconda3/lib/python3.10/urllib/request.py", line 216, in urlopen
return opener.open(url, data, timeout)
File "/home/liangdong/opt/anaconda3/lib/python3.10/urllib/request.py", line 525, in open
response = meth(req, response)
File "/home/liangdong/opt/anaconda3/lib/python3.10/urllib/request.py", line 634, in http_response
response = self.parent.error(
File "/home/liangdong/opt/anaconda3/lib/python3.10/urllib/request.py", line 563, in error
return self._call_chain(*args)
File "/home/liangdong/opt/anaconda3/lib/python3.10/urllib/request.py", line 496, in _call_chain
result = func(*args)
File "/home/liangdong/opt/anaconda3/lib/python3.10/urllib/request.py", line 643, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
I'm not sure what is happened during this procession, is my internet connection issue? if yes, is there any proxy or mirror site I can use in china?
anyway, here is my command:
(1)AAFTF filter -c 16 --memory 48 --aligner bbduk -o ${sp}_filter --left ${sp}_trim_1P.fastq.gz --right ${sp}_trim_2P.fastq.gz --pipe --AAFTF_DB ./ref_genome
(2)AAFTF vecscreen -c 16 -i $sp.spades.assembly.fa -o $sp.assembly.vecscreen.out -s high --pipe
and following is my python version and installation path:
python version: 3.10.9 (main, Mar 1 2023, 18:23:06) [GCC 11.2.0] on linux
installation path: /home/liangdong/opt/anaconda3/bin/python
I installed AAFTF by pip install
Thanks and best regards
Add racon as an alternative or additional polishing step.
Hello,
I am trying to run the AAFTF pipeline to assemble several Pleurotus genomes (testing it on 1 genome only) and I thought to run it as a pipeline at first. I am getting this error below. Spades seems to fail, but I cannot find any spades .log file anywhere. What do you think?
I am running it in HPC that uses SLURM, please see the slurm output and the submitted sbatch file attached.
Additionally,
AAFTF piepieline -h
option below) :--tmpdir TMPDIR Assembler temporary dir
and -w WORKDIR, --workdir WORKDIR temp directory`--assembler_args ASSEMBLER_ARGS Additional SPAdes/Megahit arguments
if it is possible, for example, different kmer sizes etc.benucci@dev-amd20 code]$ conda activate aaftf
(aaftf) [benucci@dev-amd20 code]$ AAFTF pipeline -h
usage: AAFTF pipeline [-h] [-q] [--tmpdir TMPDIR] [--assembler_args ASSEMBLER_ARGS] [--method METHOD] -l LEFT [-r RIGHT] -o BASENAME [-c cpus]
[-m MEMORY] [-ml MINLEN] [-a [SCREEN_ACCESSIONS ...]] [-u [SCREEN_URLS ...]] [-it ITERATIONS] [-mc MINCONTIGLEN]
[--AAFTF_DB AAFTF_DB] [-w WORKDIR] [-v] -p PHYLUM [PHYLUM ...] [--sourdb SOURDB] [--mincovpct MINCOVPCT]
Run entire AAFTF pipeline automagically
options:
-h, --help show this help message and exit
-q, --quiet Do not output warnings to stderr
--tmpdir TMPDIR Assembler temporary dir
--assembler_args ASSEMBLER_ARGS
Additional SPAdes/Megahit arguments
--method METHOD Assembly method: spades, dipspades, megahit
-l LEFT, --left LEFT left/forward reads of paired-end FASTQ or single-end FASTQ.
-r RIGHT, --right RIGHT
right/reverse reads of paired-end FASTQ.
-o BASENAME, --out BASENAME
Output basename, default to base name of --left reads
-c cpus, --cpus cpus Number of CPUs/threads to use.
-m MEMORY, --memory MEMORY
Memory (in GB) setting for SPAdes. Default is Auto
-ml MINLEN, --minlen MINLEN
Minimum read length after trimming, default: 75
-a [SCREEN_ACCESSIONS ...], --screen_accessions [SCREEN_ACCESSIONS ...]
Genbank accession number(s) to screen out from initial reads.
-u [SCREEN_URLS ...], --screen_urls [SCREEN_URLS ...]
URLs to download and screen out initial reads.
-it ITERATIONS, --iterations ITERATIONS
Number of Pilon Polishing iterations to run
-mc MINCONTIGLEN, --mincontiglen MINCONTIGLEN
Minimum length of contigs to keep
--AAFTF_DB AAFTF_DB Path to AAFTF resources, defaults to $AAFTF_DB
-w WORKDIR, --workdir WORKDIR
temp directory
-v, --debug Provide debugging messages
-p PHYLUM [PHYLUM ...], --phylum PHYLUM [PHYLUM ...]
Phylum or Phyla to keep matches, i.e. Ascomycota
--sourdb SOURDB SourMash LCA k-31 taxonomy database
--mincovpct MINCOVPCT
Minimum percent of N50 coverage to remove
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.