ablab / quast Goto Github PK
View Code? Open in Web Editor NEWGenome assembly evaluation tool
Home Page: http://quast.sf.net
License: Other
Genome assembly evaluation tool
Home Page: http://quast.sf.net
License: Other
Hi,
I would like to know if metaQUAST calls only complete genes or includes partial genes in the counts of unique genes?
Looking forward to your reply.
Best,
Shaman
I found some inconsitencies regarding the required software for submodules. The README says:
For the optional submodules:
Time::HiRes perl module for GeneMark-ES (needed when using --gene-finding --eukaryote)
Java 1.8 or later for GRIDSS (needed for SV detection)
R for GRIDSS (needed for SV detection)
and the manual states:
In addition, QUAST submodules require:
Java JDK (tested with OpenJDK 6) for GAGE
Time::HiRes perl module for GeneMark-ES
Boost (tested with v1.56.0) for E-MEM
Which of them is correct and can you update them to be consistent?
Hi,
I am obtaining this error when trying to run QUAST
list index out of range
Traceback (most recent call last):
File "/mnt/nfs/projects/ecosystem_biology/local_tools/quast-master/metaquast.py", line 707, in <module>
return_code = main(sys.argv[1:])
File "/mnt/nfs/projects/ecosystem_biology/local_tools/quast-master/metaquast.py", line 559, in main
ref_fpaths = search_references_meta.do(assemblies, downloaded_dirpath)
File "/mnt/gaiagpfs/projects/ecosystem_biology/local_tools/quast-master/libs/search_references_meta.py", line 283, in do
idy = float(line[2])
IndexError: list index out of range
Is there something wrong with my data?
Hi all,
I am trying to use quast with a reference (360Mbp) and one new assembly (450Mbp). Can the size be a problem? The reference come from illumina reads and the new assembly from pacbio. It is running on a cluster (--threads 24
) for three days, without results yet. Reading your supplementary information, it should not take that long, right?
Thank you,
Hi,
Thanks for the great program!
When I was running metaQuast with a list of reference, the output only contains summary report with no misassembly to reference informations and report. I am wondering how to get the misassembly (to reference) information.
Any ideas?
Hi,
metaquast is great piece of software, but I receive an error in the new version 3.0:
Command:
python /usr/local/quast/metaquast.py $LABELS $REFERENCES --scaffolds --output-dir $WORK_DIR $CONTIGS
metaquast.log:
.
.
.
Summarizing results...
'adoring_jones broken' is not in list
Traceback (most recent call last):
File "/usr/local/quast/metaquast.py", line 708, in <module>
return_code = main(sys.argv[1:])
File "/usr/local/quast/metaquast.py", line 695, in main
create_meta_summary.do(output_dirpath, summary_dirpath, labels, metrics_for_plots, misassembl_metrics, ref_names)
File "/usr/local/quast/libs/create_meta_summary.py", line 59, in do
results, all_rows, cur_ref_names = get_results_for_metric(ref_names, metric, contigs_num, labels, output_dirpath, qconfig.transposed_report_prefix + '.tsv')
File "/usr/local/quast/libs/create_meta_summary.py", line 41, in get_results_for_metric
index_contig = labels.index(values[0])
ValueError: 'adoring_jones broken' is not in list
I got Augustus crash with this error:
* glibc detected * /home/akomissarov/libs/quast/libs/Busco/augustus-3.0.3/bin/augustus: double free or corruption (fasttop): 0x00000000019
a9820 *
When I run it with --debug I did not get it. Or it was silenced.
The position of the contig on the y-axis does not seem to convey any information. The percent identity of the contig to the reference could be mapped to the y-axis to aid in identifying collapsed repeats.
Hi,
I am using metaquast to evaluate my metagenomics data. As I understand, metaquast will search for the top most matching 16S rRNA from SILVA and then try to download the complete genomes from NCBI. However, I noticed many entries in SILVA cannot be found on NCBI, thus limiting the analysis. Have you guys thought of ways to overcome this problem? E.g. One naive way is to look for the next best SILVA 16S rRNA if the current one does not have a matched NCBI entry.
What do you think?
Regards,
K
I have a problem running metaQUAST with a set of reference genomes. In the metaQUAST paper it is reported that:
The acquired list of species names can be fed to MetaQUAST in a plain text format, making it download the specified sequences from the NCBI database and use them for the reference based evaluation
I have a list of species obtained with Metaphlan2 and I would like to use them as reference for contig evaluation. I have tried several ways to do that using the -R
option but unfortunately I received always the same error:
Reference(s):
WARNING: Skipping species.txt because it contains non-ACGTN characters.
All references combined in combined_reference.fasta
So, basically, metaQUAST is searching for a set of sequence files and obviously it crashes since it can't find them. My species file is formatted as follows:
genus species 1
genus species 2
genus species 3
...
Is there a way to make metaQUAST download all references reported in this file?
Thanks in advance,
cheers,
Giovanni
In the # misassemblies
count in misassemblies_report.txt
, I'd like to know how many of the # relocations
occurred in scaffold gaps. In the mis_contigs.info
report, it'd be helpful to report which Extensive misassembly (relocation)
were found in scaffold gaps, and the size of the scaffold gap size error. I'm curious which misassemblies may have been classified as a scaffold gap size mis.
rather than an extensive misassembly had the --scaffold-gap-max-size
threshold been higher.
Hi!
I had evaluated my assemble with quast, but it was failed at the step of nucmer (it was kiiled). According to the log, it seemed it was killed before the step of broken the scaffold, and it was said "Analysis is finished" in the file of "nucmer_output/sf". Also, the "all_alignment_" file is empty. So I want to run step by step to find the reason, however, I can't get information about the produce of some files, such as headless, all_alignment_. After searing the script in github, I look for help here.
Any suggestion would be grateful!
Best wishes!
FYI there are quite a few python3 incompatible statements in quast_libs/site_packages/joblib
quast.py -t64 -s -e -o abyss/k144/hsapiens-scaffolds.quast -R /projects/btl/reference_genomes/H_sapiens/GRCh38/GCA_000001405.15_GRCh38_genomic.chr-only.fa -G /projects/btl/reference_genomes/H_sapiens/GRCh38/Homo_sapiens.GRCh38.86.chr.gff3 -l abyss/k144/hsapiens-scaffolds.fa abyss/k144/hsapiens-scaffolds.fa
/gsc/btl/linuxbrew/Cellar/quast/4.2/quast.py -t64 -s -e -o abyss/k144/hsapiens-scaffolds.quast -R /projects/btl/reference_genomes/H_sapiens/GRCh38/GCA_000001405.15_GRCh38_genomic.chr-only.fa -G /projects/btl/reference_genomes/H_sapiens/GRCh38/Homo_sapiens.GRCh38.86.chr.gff3 -l abyss/k144/hsapiens-scaffolds.fa abyss/k144/hsapiens-scaffolds.fa
…
Contigs:
breaking scaffolds into contigs:
abyss/k144/hsapiens-scaffolds.fa ==> abyss/k144/hsapiens-scaffolds.fa
964783 scaffolds (abyss/k144/hsapiens-scaffolds.fa) were broken into 1069902 contigs (abyss/k144/hsapiens-scaffolds.fa_broken)
[Errno 2] No such file or directory: '/projects/btl/datasets/hsapiens/giab/abyss/k144/hsapiens-scaffolds.quast/quast_corrected_input/abyss/k144/hsapiens-scaffolds.fa.fa'
Github repository of "quast" is very large for an initial clone (more than 500MB).
I am not sure why it is so massive, specially that I doubt it contains ~500MB of code.
If most of this size is because of large binaries or test data, I suggest to move those binary files out of repository and download them if necessary upon the first run.
A SCM repository is not a good place for storing large binaries.
"# N's per 100 kbp" is too boring to recompute in %gaps in assembly, in some eukaryote assemblies it can be more than 10%. And as I know biologists simply don't understand what is the meaning of "# N's per 100 kbp" and remove this metric from results. However, this metric has the same importance as N50/L50 for interpretation results including genome annotation results, e.g. a lot of gaps => overestimation of genes number.
I'm running QUAST 5 4d0761d on four human genome assemblies with 48 threads. It's exceeding 350 GB of memory usage and is being killed off by the cluster scheduler.
slurmstepd: error: Job 448183 exceeded memory limit (353221580 > 344064000), being killed
slurmstepd: error: Exceeded job memory limit
Is this memory usage expected, and do you have any suggestions to reduce the memory usage of QUAST?
It looks like it was nearly done. It completed Contig analyzer
and Running NA-NGA calculation
. What is the Genome analyzer
step doing, and can I disable it? I'm primarily interested in the NGA50
and # misassemblies
. The last few lines of the log file are…
Running Genome analyzer...
NOTICE: No file with genomic features were provided. Use the --features option if you want to specify it.
NOTICE: No file with operons were provided. Use the -O option if you want to specify it.
1 abyss2
2 abyss2_broken
3 abyss2.tigmint
4 abyss2.tigmint_broken
5 abyss2.arcs
6 abyss2.arcs_broken
7 abyss2.tigmint.arcs
8 abyss2.tigmint.arcs_broken
The command line is…
/home/sjackman/.linuxbrew/Cellar/quast-lg/5.0-g4d0761d/quast.py -t48 -se --fast --large --scaffold-gap-max-size 10000 -R GRCh38.fa -o abyss2.quast-g10000 abyss2.fa abyss2.tigmint.fa abyss2.arcs.fa abyss2.tigmint.arcs.fa
I'll try reducing the number of threads.
Hello,
when I am running python quast.py --test following warning occured.
But metaquast.py --test PASSED without any notes.
Could you help me with that?
Quast.log
unning GAGE...
1 contigs_1...
2 contigs_2...
1 Logging to files gage_contigs_1.stdout and gage_contigs_1.stderr...
2 Logging to files gage_contigs_2.stdout and gage_contigs_2.stderr...
1 sh libs/gage/getCorrectnessStats.sh quast_test_output/quast_corrected_input/reference.fasta
quast_test_output/quast_corrected_input/contigs_1.fasta quast_test_output/gage/tmp
500 > quast_test_output/gage/gage_contigs_1.stdout 2> quast_test_output/gage/gage_contigs_1.stderr
2 sh libs/gage/getCorrectnessStats.sh quast_test_output/quast_corrected_input/reference.fasta
quast_test_output/quast_corrected_input/contigs_2.fasta quast_test_output/gage/tmp
500 > quast_test_output/gage/gage_contigs_2.stdout 2> quast_test_output/gage/gage_contigs_2.stderr
The tool returned non-zero. See quast_test_output/gage/gage_contigs_1.stderr for stderr.
1 Failed.
The tool returned non-zero. See quast_test_output/gage/gage_contigs_2.stderr for stderr.
2 Failed.
WARNING: Error occurred while GAGE was processing assemblies. See GAGE error logs for details: /quast-master/quast_test_output/gage/gage_*.stderr
gage_*.stderr
/quast-master/libs/gage/getCorrectnessStats.sh: 35: /quast-master/libs/gage/getCorrectnessStats.sh: javac: not found
/quast-master/libs/gage/getCorrectnessStats.sh: 36: /quast-master/libs/gage/getCorrectnessStats.sh: javac: not found
/quast-master/libs/gage/getCorrectnessStats.sh: 37: /quast-master/libs/gage/getCorrectnessStats.sh: javac: not found
Error occurred during compilation of java classes (/quast-master/libs/gage/*.java)! Try to compile them manually!
Hi,
I was wondering if it was possible to supply multiple gff files when calling metaquast? There is an option for supplying multiple fasta files (with the "-R" flag).
So what about "-G" with one gff corresponding to each genome provided in "-R"?
/usr/lib/python3.5/site-packages/quast-4.5-py3.5.egg/EGG-INFO/scripts/quast.py assembly_metrics/sample_data/BCM-After-Atlas/Contigs/Clec_Bbug02212013.contigs.fa.gz
Version: 4.5
System information:
OS: Linux-3.10.0-327.28.2.el7.x86_64-x86_64-with-centos-7.2.1511-Core (linux_64)
Python version: 3.5.3
CPUs number: 4
Started: 2017-08-15 12:59:09
Logging to /home/huei820504/quast_results/results_2017_08_15_12_59_09/quast.log
NOTICE: Maximum number of threads is set to 1 (use --threads option to set it manually)
▽
CWD: /home/huei820504
Main parameters:
Threads: 1, minimum contig length: 500, ambiguity: one, threshold for extensive misassembly size: 1000
Contigs:
Pre-processing...
assembly_metrics/sample_data/BCM-After-Atlas/Contigs/Clec_Bbug02212013.contigs.fa.gz ==> Clec_Bbug02212013.contigs
2017-08-15 13:00:25
Running Basic statistics processor...
Contig files:
Clec_Bbug02212013.contigs
Calculating N50 and L50...
Clec_Bbug02212013.contigs, N50 = 23541, L50 = 5952, Total length = 513170376, GC % = 34.82, # N's per 100 kbp = 0.00
Drawing Nx plot...
saved to /home/huei820504/quast_results/results_2017_08_15_12_59_09/basic_stats/Nx_plot.pdf
Drawing cumulative plot...
saved to /home/huei820504/quast_results/results_2017_08_15_12_59_09/basic_stats/cumulative_plot.pdf
Drawing GC content plot...
saved to /home/huei820504/quast_results/results_2017_08_15_12_59_09/basic_stats/GC_content_plot.pdf
Drawing Clec_Bbug02212013.contigs GC content plot...
saved to /home/huei820504/quast_results/results_2017_08_15_12_59_09/basic_stats/Clec_Bbug02212013.contigs_GC_content_plot.pdf
Done.
NOTICE: Genes are not predicted by default. Use --gene-finding option to enable it.
2017-08-15 13:01:19
Creating large visual summaries...
This may take a while: press Ctrl-C to skip this step..
1 of 2: Creating Icarus viewers...
2 of 2: Creating PDF with all tables and plots...
Done
2017-08-15 13:01:34
RESULTS:
Text versions of total report are saved to /home/huei820504/quast_results/results_2017_08_15_12_59_09/report.txt, report.tsv, and report.tex
Text versions of transposed total report are saved to /home/huei820504/quast_results/results_2017_08_15_12_59_09/transposed_report.txt, transposed_report.tsv, and transposed_report.tex
HTML version (interactive tables and plots) saved to /home/huei820504/quast_results/results_2017_08_15_12_59_09/report.html
PDF version (tables and plots) is saved to /home/huei820504/quast_results/results_2017_08_15_12_59_09/report.pdf
Icarus (contig browser) is saved to /home/huei820504/quast_results/results_2017_08_15_12_59_09/icarus.html
Log saved to /home/huei820504/quast_results/results_2017_08_15_12_59_09/quast.log
Finished: 2017-08-15 13:01:34
Elapsed time: 0:02:25.053717
NOTICEs: 2; WARNINGs: 0; non-fatal ERRORs: 0
Thank you for using QUAST!
If you see the running time of pre-processing:
2017-08-15 12:59:09 to 2017-08-15 13:00:25
That is, it costs 00:01:16 (over half of total running time 0:02:25)
It's wired.
I see there is a write operation at
Line 129 in e0e6212
Dear authors,
I am obtaining the following error:
ERROR! Failed downloading BLAST! The search for reference genomes cannot be performed. Try to download it manually in /home/imp/lib/quast-release_3.1/libs/blast and restart MetaQUAST.
I therefore downloaded blast from the NCBI website: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.2.31+-x64-linux.tar.gz
I installed it and copied the binary into the the folder given by the error. However, I still obtain the same error after that.
Is there something wrong with the version I am using? Can I know which exact blastn version is required?
-Shaman-
Hello,
I have recently downloaded the new quast version (4.6.1), and I ran it on one of my assemblies; it is indeed much faster. I found some unexpected results in the general stats (report.txt), especially regarding genome fraction, so I also ran the former version (4.5), below is the comparative table.
1/ How can you explain the difference ?
2/ How did you have quast4.6.1 performing much faster ? Is there something changed in the algorithm, and especially regarding definitions (e.g., of misassemblies) and alignment thresholds ?
Thank you very much for your answer.
Best regards,
Coline Jaworski
Assembly | Quast4.5 | Quast4.6.1 |
---|---|---|
contigs (>= 0 bp) | 261 | 261 |
contigs (>= 1000 bp) | 261 | 261 |
contigs (>= 5000 bp) | 261 | 261 |
contigs (>= 10000 bp) | 260 | 260 |
contigs (>= 25000 bp) | 235 | 235 |
contigs (>= 50000 bp) | 150 | 150 |
Total length (>= 0 bp) | 159,478,338 | 159,478,338 |
Total length (>= 1000 bp) | 159,478,338 | 159,478,338 |
Total length (>= 5000 bp) | 159,478,338 | 159,478,338 |
Total length (>= 10000 bp) | 159,473,207 | 159,473,207 |
Total length (>= 25000 bp) | 158,969,480 | 158,969,480 |
Total length (>= 50000 bp) | 155,802,267 | 155,802,267 |
contigs | 261 | 261 |
Largest contig | 9,888,897 | 9,888,897 |
Total length | 159,478,338 | 159,478,338 |
Reference length | 153,681,346 | 153,681,346 |
GC (%) | 39.63 | 39.63 |
Reference GC (%) | 39.85 | 39.85 |
N50 | 3,799,886 | 3,799,886 |
NG50 | 3,799,886 | 3,799,886 |
N75 | 1,355,015 | 1,355,015 |
NG75 | 1,528,338 | 1,528,338 |
L50 | 14 | 14 |
LG50 | 14 | 14 |
L75 | 32 | 32 |
LG75 | 29 | 29 |
misassemblies | 5,500 | 3,698 |
misassembled contigs | 231 | 214 |
Misassembled contigs length | 158,499,367 | 104,931,326 |
local misassemblies | 10,392 | 6,454 |
unaligned mis. contigs | 25 | 42 |
unaligned contigs | 1 249 part | 1 249 part |
Unaligned length | 10,118,222 | 53,286,055 |
Genome fraction (%) | 93.03 | 65.89 |
Duplication ratio | 1.05 | 1.05 |
N's per 100 kbp | 0.00 | 0.00 |
mismatches per 100 kbp | 329.83 | 342.27 |
indels per 100 kbp | 248.43 | 249.18 |
Largest alignment | 518,864 | 518,864 |
Total aligned length | 149,357,462 | 106,208,216 |
NA50 | 82,893 | 29,721 |
NGA50 | 86,330 | 34,756 |
NA75 | 33,455 | 885 |
NGA75 | 39,352 | 794 |
LA50 | 530 | |
LGA50 | 496 | |
LA75 | 1277 | |
LGA75 | 1157 |
Hi there,
How could I using nucmer in mummer4 to align contigs rather than E-MEM embeded in quast?
Since Quast can use existing alignment files to do the evaluation, could I run nucmer stand alone and copy the output into "/contigs_reports/nucmer_output/", and which files are needed to skip the alignment step in Quast?
Thanks!
Graphical Fragment Assembly (GFA) format describes sequence overlap graphs (assembly graphs). Specification and examples are here: https://github.com/pmelsted/GFA-spec
GFA is supported by ABySS 1.9.0, Bandage and many other tools. So, implementing its support in QUAST (for Icarus' contig alignment viewer) sounds reasonable and useful for the community.
This issue is created for a further discussion and any suggestions about GFA data representation in Icarus.
This enhancement idea was suggested by @sjackman (Shaun Jackman)
Dear developers,
When making a circus diagram the genes.txt file which is created contains only data for just under half of the reference chromosomes supplied. predictably, the gene track in the diagram is incomplete. I checked the raw annotation file (gff3), but this one is complete.
Kind regards,
Anne
Dear developers
Quast throws an error and exits after genemark when trying to make graphs (paths anonymised):
`2017-08-22 17:37:23
Creating large visual summaries...
This may take a while: press Ctrl-C to skip this step..
1 of 2: Creating Circos plots...
[Errno 2] No such file or directory: '/.../my_QUAST_outputdir/contigs_reports/all_alignments_myassembly-unitigs.tsv'
Traceback (most recent call last):
File "/data/software/quast-4.5/quast.py", line 302, in
return_code = main(sys.argv[1:])
File "/data/software/quast-4.5/quast.py", line 258, in main
features_containers, cov_fpath, os.path.join(output_dirpath, 'circos'), logger)
File "/data/software/quast-4.5/quast_libs/circos.py", line 565, in do
conf_fpath, circos_legend_fpath = create_conf(ref_fpath, contigs_fpaths, contig_report_fpath_pattern, output_dir, gc_fpath, features_containers, cov_fpath, logger)
File "/data/software/quast-4.5/quast_libs/circos.py", line 442, in create_conf
assemblies, contig_points = parse_alignments(contigs_fpaths, contig_report_fpath_pattern)
File "/data/software/quast-4.5/quast_libs/circos.py", line 186, in parse_alignments
aligned_blocks, misassembled_id_to_structure = parse_nucmer_contig_report(report_fpath)
File "/data/software/quast-4.5/quast_libs/circos.py", line 145, in parse_nucmer_contig_report
with open(report_fpath) as report_file:
IOError: [Errno 2] No such file or directory: '/.../my_QUAST_outputdir/contigs_reports/all_alignments_myassembly-unitigs.tsv'
ERROR! exception caught!
In case you have troubles running QUAST, you can write to [email protected]
Please provide us with quast.log file from the output directory.
`
I had a look for the missing tsv file and in the contig_reports directory there are only these tsv files:
unaligned_report.tsv
transposed_report_misassemblies.tsv
misassemblies_report.tsv
How can the missing file be made?
Kind regards,
Anne
Request from Dmitry Antipov:
At least for MetaSPAdes, we can extract average read support from contig names (i.e. NODE_1_length_1833779_cov_52.589). Add metric in reference-based reports "Ave contig read support". Calculate it based on coverages of contigs that have a large enough alignment to this reference (>90%?)
QUAST previously was not recommended for large (mammalian) genomes. Is that still the case for QUAST 4.3?
Hi,
I ran quast on E.coli assembly (1 contig) and I got in the contig size viewer a misassembled contig and also in the contig alignment viewer almost all the contig was misassembled.
The following is the mummerplot:
Is the contig misassembled because the genome is circular? Is there a way to change the criterions for misasembly to get a correct contig?
Thanks
Dear authors,
I would like to know if there is an upper limit to the reference genomes, when trying to validate a metagenome. I have 73 genome sequences stored in separate files (multi fasta). I appended the list of genomes into a the command (coma separated), but quast
warns that there are no similarities between the query and reference, but this is not actually the case because when I reduce the number of references to two, it seems to work.
I issued the command:
SIM_REF=`\ls /mnt/nfs/projects/ecosystem_biology/test_datasets/CelajEtAl/73_species/*.fa | paste -s -d,`
metaquast.py -o /scratch/users/snarayanasamy/IMP_MS_data/quast_simDat -R ${SIM_REF} -t 12 -l IMP,metAmos /scratch/users/snarayanasamy/IMP_MS_data/IMP/simulated_data_output/Assembly/MGMT.assembly.merged.fa /scratch/users/snarayanasamy/IMP_MS_data/metAmosAnalysis/simDat_metAmos/Assemble/out/soapdenovo.31.asm.contig
And obtained the following stderr/stdout:
Partitioning contigs into bins aligned to each reference..
processing IMP
processing metAmos
No contigs were aligned to the reference Bacteroides_finegoldii_DSM_17565, skipping..
No contigs were aligned to the reference Eubacterium_siraeum_DSM_15702, skipping..
No contigs were aligned to the reference Bacteroides_ovatus_ATCC_8483, skipping..
No contigs were aligned to the reference Bacteroides_stercoris_ATCC_43183, skipping..
No contigs were aligned to the reference Alistipes_putredinis_DSM_17216, skipping..
No contigs were aligned to the reference Bacteroides_spDOT_4_3_47FAA, skipping..
No contigs were aligned to the reference Collinsella_aerofaciens_ATCC_25986, skipping..
No contigs were aligned to the reference Bacteroides_fragilis_3_1_12, skipping..
No contigs were aligned to the reference Bacteroides_dorei_DSM_17855, skipping..
Starting quast.py for the contigs aligned to Eubacterium_dolichum_DSM_3991
(logging to /scratch/users/snarayanasamy/IMP_MS_data/quast_simDat/Eubacterium_dolichum_DSM_3991_quast_output/quast.log)
No contigs were aligned to the reference Blautia_hydrogenotrophica_DSM_10507, skipping..
Notice that quast runs for the genome "Eubacterium_dolichum_DSM_3991", this occurs because I ran the analysis previously and overwrote the output directory, such that the nucmer files corresponding to that particular genome is retained, hence quast
is able to access it and perform the analysis.
Is there a workaround or a better way to do this? I am guessing that the list of genomes (absolute paths) is too long for the command. Please let me know if you need more information. I look forward to your response.
Update: I tried up to 27 references, and it works. I am slowly increasing it to see where this problem occurs. Still not sure what the issue is...
Update 2: I iteratively ran quast and it seems that it fails when I provide 40 reference files... It works up to 39.
-Shaman-
This contig is misassembled.
Warning! This contig is more unaligned than misassembled. Contig length is 18494 and total length of all aligns is 8494
Alignment: 43285440 43292966 | 1 7535 | 7527 7535 | 98.93 | CM000681.2_Homo_sapiens_chromosome_19__GRCh38_reference_primary_assembly 10725403_8495_0_5261205__..._10482123
Alignment: 43232488 43233446 | 17536 18494 | 959 959 | 99.9 | CM000681.2_Homo_sapiens_chromosome_19__GRCh38_reference_primary_assembly 10725403_8495_0_5261205__..._10482123
Unaligned bases: 10000
This contig is composed of 8,494 nucleotides and 10,000 N. The unaligned portion is entirely Ns. I believe the number of Ns should not count toward either the unaligned length or the contig size. In this case, 100% of the non-N portion of the contig is aligned.
Have you considered setting up CI (CircleCI or TravisCI) to run quast --test
for each PR and commit?
Dear authors,
I am attempting to run quast on some metagenomic assemblies. It seems that quast is able to download some of the references, however it terminates after a certain point, even if I repeat the command.
Logging to /output/Analysis/results/quast/metaquast.log
Contigs:
No references are provided, starting to search for reference genomes in SILVA rRNA database and to download them from NCBI...
2015-09-01 13:28:36
Using existing BLAST alignments for MGMT.assembly.merged...
Trying to use previously downloaded references...
2015-09-01 13:28:36
Trying to download found references from NCBI. Totally 46 organisms to try.
Candidatus_Microthrix_parvicella_RN1 | was downloaded previously (total 1, 45 more to go)
Dechloromonas_sp._SIUL | not found in the NCBI database
Trachelomonas_volvocinopsis_var._spiralis | not found in the NCBI database
ERROR! Cannot established internet connection to download reference genomes! Check internet connection or run MetaQUAST with option "--max-ref-number 0".
I am unable to find any other logs files to further troubleshoot this problem... I would be happy to send over my contig fasta file so that you may be able to test it yourself.
Best,
Shaman
Hi,
I want to combine these in a figure to compare after I got several "report.pdf" with assembly results of different assembly tools. Just like the figure as "http://bioinf.spbau.ru/quast" shows. What should I do? Any Ideas?
Thanks!
Dear,
kindly assist me to determine what is going wrong. I just unzipped quast and run
./install_full.sh
which returned
Starting QUAST test... (stdout redirected to ./install_log.stdout)
No handlers could be found for logger "quast"
ERROR! QUAST TEST FAILED!
It seems compilation of Mummer fails with (make: *** No targets specified and no makefile found. Stop.)
Install log is attached.
I have loaded the following modules
jdk64
gcc/4.8.2
perl5/5.18.2
python64/3.5.2
Thanks
Anthony
Hello,
I'm getting the following error when trying to run quast w/ even the most simple parameters (quast.log):
`/root/anaconda3/bin/quast.py ./test.fa -o /root/DeepBiome/SRR924736_out_dir/mega_out/combined_reference
Version: 4.6.3
System information:
OS: Linux-4.9.87-linuxkit-aufs-x86_64-with-debian-8.10 (linux_64)
Python version: 3.6.5
CPUs number: 3
Started: 2018-04-14 13:48:14
Logging to /root/DeepBiome/SRR924736_out_dir/mega_out/combined_reference/quast.log
NOTICE: Output directory already exists. Existing Nucmer alignments can be used
NOTICE: Maximum number of threads is set to 1 (use --threads option to set it manually)
CWD: /root/DeepBiome/SRR924736_out_dir/mega_out
Main parameters:
Threads: 1, minimum contig length: 500, ambiguity: one, threshold for extensive misassembly size: 1000
Contigs:
Pre-processing...
./test.fa ==> test
2018-04-14 13:48:15
Running Basic statistics processor...
Contig files:
test
'ascii' codec can't decode byte 0xef in position 14588: ordinal not in range(128)
Traceback (most recent call last):
File "/root/anaconda3/bin/quast.py", line 281, in
return_code = main(sys.argv[1:])
File "/root/anaconda3/bin/quast.py", line 140, in main
output_dirpath)
File "/root/anaconda3/lib/python3.6/site-packages/quast_libs/basic_stats.py", line 228, in do
html_saver.save_contigs_lengths(results_dir, contigs_fpaths, corr_lists_of_lengths)
File "/root/anaconda3/lib/python3.6/site-packages/quast_libs/html_saver/html_saver.py", line 475, in save_contigs_lengths
append(results_dirpath, json_fpath, 'contigsLengths')
File "/root/anaconda3/lib/python3.6/site-packages/quast_libs/html_saver/html_saver.py", line 220, in append
init(html_fpath)
File "/root/anaconda3/lib/python3.6/site-packages/quast_libs/html_saver/html_saver.py", line 115, in init
script_texts.append(js_html(aux_f_rel_path))
File "/root/anaconda3/lib/python3.6/site-packages/quast_libs/html_saver/html_saver.py", line 95, in js_html
return '<script type="text/javascript">\n' + open(get_real_path(script_rel_path)).read() + '\n</script>\n'
File "/root/anaconda3/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 14588: ordinal not in range(128)
`
Any help would be greatly appreciated.
I got:
"The tool returned non-zero. See ../../../../../../../storage1/akomissarov/pacbio/quast_results/results_2015_07_01_14_10_36/busco_output/busco.err for stderr."
and it should be:
"The tool returned non-zero. See /storage1/akomissarov/pacbio/quast_results/results_2015_07_01_14_10_36/busco_output/busco.err for stderr."
Probably you should check if it is an absolute path or not with
if not os.path.isabs(path):
Hi,
I have assembled nanopore reads and i expect the identity between my assembly and the reference around 92%. Latest dev of quast implements the -asm5
option for minimap2 and i get a genome fraction around 45% between my assembly and the reference bacterial genome.
Since i expect my assembly to be less than 95% aligned to the reference is it possible to have the -asm10
option implemented ???
Just a heads up that this issue lh3/minimap2#104 caused QUAST to incorrectly report a misassembly, fixed now in minimap2 r677. You'll want to update minimap2 2.9 when it's released.
Hi, I've successfully run metaquast a few times without a reference, but one data set is giving me this error:
Failed aligning the contigs for all the references. Try to use option --max-ref-number to change maximum number of references (per each assembly) to download.
It looks like there are 121 reference genomes that are retrieved, and I set the --max-ref-number as 200. Do you know what could be the issue? Attached in the metaquast log in case that is helpful.
Thank you.
-Jennifer
Hi,
I was wondering if there is a flag to pass a combined reference (or if I can just pass it with the -R flag, the format used by metaQUAST), instead of passing separate references that get combined. As of now, I split apart a combined reference file into separate references, only to get recombined.
Thanks!
Read nucmer output and build a syntheny plot:
Consider known tools first, there are plenty of those.
Create both PDF and JS versions.
Thanks @snurk
Physical coverage is the coverage of the reference by the paired-end fragments, counting the reads and the gap between the paired-end reads as covered. This additional track may be useful in identifying reference misassemblies and read coverage gaps that cannot be scaffolded over due to a lack of physical coverage.
While the definition above is rather obvious it is not clear how to calculate per-base physical coverage in many practical cases. For example, we should decide what to do with orphan reads, reads mapped with insert size much larger than average one and so on.
Some discussions on this are here: https://www.biostars.org/p/131268/
This enhancement idea was suggested by @sjackman (Shaun Jackman)
After an initial discussion with @smoe by email, I'd like to start a discussion
here of possibly creating a Debian package for QUAST. As QUAST is becoming more
and more widely used in the genomics community, a Debian package could further
increase adoption by making it simpler to install.
I think it would be possible to create a debian package from the source
distribution. One point to consider though is that many of the bundled
dependencies in the QUAST package (e.g. mummber) are all already available in
DebianMed so it could be the case that only QUAST scripts would be required,
along with updates to $PATH.
I've opened this issue to see if I, @smoe, @pbelmann and the QUAST team might
find this to be a worthwhile exercise.
Support gff.gz, gtf, gtf.gz, bed, bed.gz.
When the output directory already exists, but is empty, I see this message:
NOTICE: Output directory already exists. Existing Minimap2 alignments can be used
This run of QUAST then later exited with an exit status of 1. The last message it displayed was:
Running Basic statistics processor...
I'm not sure if the two are related, but the NOTICE
is misleading in any case, since the Minimap2 alignments don't exist in the empty directory.
Quast takes quite a long time to download from sourceforge, approximately ~7 minutes. Would you consider moving the releases to github using their release feature? I think this might provide faster access. I ask because creating a Docker container of quast requires downloading this file each time, which makes the process take a long time as a whole.
Also would you consider releasing .xz versions too, to further decrease the file size?
Thank you.
Hi,
When I run metaquast without references and with option --max-ref-number 200 to find references in SILVA database, it only gave 48 organisms (tried to download) after blastn running. Dose this mean my contigs only match 48 organisms in SILVA database? I was trying to find more references for the analysis. Any ideas?
Here is the metaquast log file:
metaquast.log
Thanks!
Hi,
MetaQUAST seems to be terminating on one of the downloaded references. Below is the error I am getting:
'Escherichia_coli_O104_H4_str_2009EL-2071' Traceback (most recent call last): File "/usr/bin/metaquast", line 730, in <module> return_code = main(sys.argv[1:]) File "/usr/bin/metaquast", line 635, in main num_notifications_tuple=total_num_notifications) File "/usr/bin/metaquast", line 158, in _start_quast_main return_code = quast.main(args) File "/home/imp/lib/quast-release_3.1/quast.py", line 692, in main ref_fpath, contigs_fpaths, qconfig.prokaryote, os.path.join(output_dirpath, 'contigs_reports'), old_contigs_fpaths) File "/home/imp/lib/quast-release_3.1/libs/contigs_analyzer.py", line 1525, in do for i, fname in enumerate(zip(contigs_fpaths, old_contigs_fpaths))) File "/usr/lib/pymodules/python2.7/joblib/parallel.py", line 517, in __call__ self.dispatch(function, args, kwargs) File "/usr/lib/pymodules/python2.7/joblib/parallel.py", line 312, in dispatch job = ImmediateApply(func, args, kwargs) File "/usr/lib/pymodules/python2.7/joblib/
Let me know if you need more information. Looking forward to your response.
-Shaman-
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.