ablab / quast Goto Github PK

Genome assembly evaluation tool

License: Other

Shell 0.04% Makefile 0.25% Python 3.55% Perl 3.70% C 2.97% C++ 4.31% CSS 0.15% JavaScript 2.00% HTML 0.71% Roff 0.11% Dockerfile 0.01% AMPL 82.19% Cython 0.03%

bioinformatics genome-assembly-evaluation visualization contigs

quast's People

Contributors

Stargazers

Watchers

quast's Issues

metaQUAST gene calling

Hi,

I would like to know if metaQUAST calls only complete genes or includes partial genes in the counts of unique genes?

Looking forward to your reply.

Best,
Shaman

Inconsistent requirements for submodules

I found some inconsitencies regarding the required software for submodules. The README says:

For the optional submodules:

Time::HiRes perl module for GeneMark-ES (needed when using --gene-finding --eukaryote)
Java 1.8 or later for GRIDSS (needed for SV detection)
R for GRIDSS (needed for SV detection)

and the manual states:

In addition, QUAST submodules require:

Java JDK (tested with OpenJDK 6) for GAGE
Time::HiRes perl module for GeneMark-ES
Boost (tested with v1.56.0) for E-MEM

Which of them is correct and can you update them to be consistent?

Hi all,
I am trying to use quast with a reference (360Mbp) and one new assembly (450Mbp). Can the size be a problem? The reference come from illumina reads and the new assembly from pacbio. It is running on a cluster (--threads 24) for three days, without results yet. Reading your supplementary information, it should not take that long, right?

Thank you,

Misassemblies report and unaligned report

Hi,

Thanks for the great program!

When I was running metaQuast with a list of reference, the output only contains summary report with no misassembly to reference informations and report. I am wondering how to get the misassembly (to reference) information.

Any ideas?

MetaQuast Summary Error

Hi,

metaquast is great piece of software, but I receive an error in the new version 3.0:

Command:

python /usr/local/quast/metaquast.py $LABELS $REFERENCES --scaffolds --output-dir $WORK_DIR $CONTIGS

metaquast.log:

.
.
.
Summarizing results...

'adoring_jones broken' is not in list
Traceback (most recent call last):
  File "/usr/local/quast/metaquast.py", line 708, in <module>
    return_code = main(sys.argv[1:])
  File "/usr/local/quast/metaquast.py", line 695, in main
    create_meta_summary.do(output_dirpath, summary_dirpath, labels, metrics_for_plots, misassembl_metrics, ref_names)
  File "/usr/local/quast/libs/create_meta_summary.py", line 59, in do
    results, all_rows, cur_ref_names = get_results_for_metric(ref_names, metric, contigs_num, labels, output_dirpath, qconfig.transposed_report_prefix + '.tsv')
  File "/usr/local/quast/libs/create_meta_summary.py", line 41, in get_results_for_metric
    index_contig = labels.index(values[0])
ValueError: 'adoring_jones broken' is not in list

BUSCO's Augustus crashes

I got Augustus crash with this error:

* glibc detected * /home/akomissarov/libs/quast/libs/Busco/augustus-3.0.3/bin/augustus: double free or corruption (fasttop): 0x00000000019
a9820 *

When I run it with --debug I did not get it. Or it was silenced.

Display percent identity of the contig to the reference

The position of the contig on the y-axis does not seem to convey any information. The percent identity of the contig to the reference could be mapped to the y-axis to aid in identifying collapsed repeats.

Metaquast silva and ncbi database matching

Hi,

I am using metaquast to evaluate my metagenomics data. As I understand, metaquast will search for the top most matching 16S rRNA from SILVA and then try to download the complete genomes from NCBI. However, I noticed many entries in SILVA cannot be found on NCBI, thus limiting the analysis. Have you guys thought of ways to overcome this problem? E.g. One naive way is to look for the next best SILVA 16S rRNA if the current one does not have a matched NCBI entry.

What do you think?

Regards,
K

metaQUEST: download reference genomes from a list

I have a problem running metaQUAST with a set of reference genomes. In the metaQUAST paper it is reported that:

The acquired list of species names can be fed to MetaQUAST in a plain text format, making it download the specified sequences from the NCBI database and use them for the reference based evaluation

I have a list of species obtained with Metaphlan2 and I would like to use them as reference for contig evaluation. I have tried several ways to do that using the -R option but unfortunately I received always the same error:

Reference(s):
WARNING: Skipping species.txt because it contains non-ACGTN characters.
All references combined in combined_reference.fasta

So, basically, metaQUAST is searching for a set of sequence files and obviously it crashes since it can't find them. My species file is formatted as follows:

genus species 1
genus species 2
genus species 3
...

Is there a way to make metaQUAST download all references reported in this file?

Thanks in advance,

cheers,

Giovanni

Differentiate gap and non-gap misassemblies

In the # misassemblies count in misassemblies_report.txt, I'd like to know how many of the # relocations occurred in scaffold gaps. In the mis_contigs.info report, it'd be helpful to report which Extensive misassembly (relocation) were found in scaffold gaps, and the size of the scaffold gap size error. I'm curious which misassemblies may have been classified as a scaffold gap size mis. rather than an extensive misassembly had the --scaffold-gap-max-size threshold been higher.

Which script control the step of numcer?

Hi!

I had evaluated my assemble with quast, but it was failed at the step of nucmer (it was kiiled). According to the log, it seemed it was killed before the step of broken the scaffold, and it was said "Analysis is finished" in the file of "nucmer_output/sf". Also, the "all_alignment_" file is empty. So I want to run step by step to find the reason, however, I can't get information about the produce of some files, such as headless, all_alignment_. After searing the script in github, I look for help here.

Any suggestion would be grateful!

Best wishes!

python 3 compat in

FYI there are quite a few python3 incompatible statements in quast_libs/site_packages/joblib

Using slash `/` characters in -l label causes error

quast.py -t64 -s -e -o abyss/k144/hsapiens-scaffolds.quast -R /projects/btl/reference_genomes/H_sapiens/GRCh38/GCA_000001405.15_GRCh38_genomic.chr-only.fa -G /projects/btl/reference_genomes/H_sapiens/GRCh38/Homo_sapiens.GRCh38.86.chr.gff3 -l abyss/k144/hsapiens-scaffolds.fa abyss/k144/hsapiens-scaffolds.fa
/gsc/btl/linuxbrew/Cellar/quast/4.2/quast.py -t64 -s -e -o abyss/k144/hsapiens-scaffolds.quast -R /projects/btl/reference_genomes/H_sapiens/GRCh38/GCA_000001405.15_GRCh38_genomic.chr-only.fa -G /projects/btl/reference_genomes/H_sapiens/GRCh38/Homo_sapiens.GRCh38.86.chr.gff3 -l abyss/k144/hsapiens-scaffolds.fa abyss/k144/hsapiens-scaffolds.fa
…
Contigs:
    breaking scaffolds into contigs:
  abyss/k144/hsapiens-scaffolds.fa ==> abyss/k144/hsapiens-scaffolds.fa
      964783 scaffolds (abyss/k144/hsapiens-scaffolds.fa) were broken into 1069902 contigs (abyss/k144/hsapiens-scaffolds.fa_broken)

[Errno 2] No such file or directory: '/projects/btl/datasets/hsapiens/giab/abyss/k144/hsapiens-scaffolds.quast/quast_corrected_input/abyss/k144/hsapiens-scaffolds.fa.fa'

Reduce repo size for faster initial clones

Github repository of "quast" is very large for an initial clone (more than 500MB).
I am not sure why it is so massive, specially that I doubt it contains ~500MB of code.
If most of this size is because of large binaries or test data, I suggest to move those binary files out of repository and download them if necessary upon the first run.
A SCM repository is not a good place for storing large binaries.

Show "%of gaps in assembly" in extended table

"# N's per 100 kbp" is too boring to recompute in %gaps in assembly, in some eukaryote assemblies it can be more than 10%. And as I know biologists simply don't understand what is the meaning of "# N's per 100 kbp" and remove this metric from results. However, this metric has the same importance as N50/L50 for interpretation results including genome annotation results, e.g. a lot of gaps => overestimation of genes number.

Memory usage exceeding 350 GB

I'm running QUAST 5 4d0761d on four human genome assemblies with 48 threads. It's exceeding 350 GB of memory usage and is being killed off by the cluster scheduler.

slurmstepd: error: Job 448183 exceeded memory limit (353221580 > 344064000), being killed
slurmstepd: error: Exceeded job memory limit

Is this memory usage expected, and do you have any suggestions to reduce the memory usage of QUAST?

It looks like it was nearly done. It completed Contig analyzer and Running NA-NGA calculation. What is the Genome analyzer step doing, and can I disable it? I'm primarily interested in the NGA50 and # misassemblies. The last few lines of the log file are…

Running Genome analyzer...
  NOTICE: No file with genomic features were provided. Use the --features option if you want to specify it.

  NOTICE: No file with operons were provided. Use the -O option if you want to specify it.
  1  abyss2
  2  abyss2_broken
  3  abyss2.tigmint
  4  abyss2.tigmint_broken
  5  abyss2.arcs
  6  abyss2.arcs_broken
  7  abyss2.tigmint.arcs
  8  abyss2.tigmint.arcs_broken

The command line is…

/home/sjackman/.linuxbrew/Cellar/quast-lg/5.0-g4d0761d/quast.py -t48 -se --fast --large --scaffold-gap-max-size 10000 -R GRCh38.fa -o abyss2.quast-g10000 abyss2.fa abyss2.tigmint.fa abyss2.arcs.fa abyss2.tigmint.arcs.fa

I'll try reducing the number of threads.

quast.py --test -> Warning occured

Hello,

when I am running python quast.py --test following warning occured.

But metaquast.py --test PASSED without any notes.

Could you help me with that?

Quast.log

unning GAGE...
1 contigs_1...
2 contigs_2...
1 Logging to files gage_contigs_1.stdout and gage_contigs_1.stderr...
2 Logging to files gage_contigs_2.stdout and gage_contigs_2.stderr...
1 sh libs/gage/getCorrectnessStats.sh quast_test_output/quast_corrected_input/reference.fasta
quast_test_output/quast_corrected_input/contigs_1.fasta quast_test_output/gage/tmp
500 > quast_test_output/gage/gage_contigs_1.stdout 2> quast_test_output/gage/gage_contigs_1.stderr
2 sh libs/gage/getCorrectnessStats.sh quast_test_output/quast_corrected_input/reference.fasta
quast_test_output/quast_corrected_input/contigs_2.fasta quast_test_output/gage/tmp
500 > quast_test_output/gage/gage_contigs_2.stdout 2> quast_test_output/gage/gage_contigs_2.stderr
The tool returned non-zero. See quast_test_output/gage/gage_contigs_1.stderr for stderr.
1 Failed.
The tool returned non-zero. See quast_test_output/gage/gage_contigs_2.stderr for stderr.
2 Failed.
WARNING: Error occurred while GAGE was processing assemblies. See GAGE error logs for details: /quast-master/quast_test_output/gage/gage_*.stderr

gage_*.stderr

/quast-master/libs/gage/getCorrectnessStats.sh: 35: /quast-master/libs/gage/getCorrectnessStats.sh: javac: not found
/quast-master/libs/gage/getCorrectnessStats.sh: 36: /quast-master/libs/gage/getCorrectnessStats.sh: javac: not found
/quast-master/libs/gage/getCorrectnessStats.sh: 37: /quast-master/libs/gage/getCorrectnessStats.sh: javac: not found
Error occurred during compilation of java classes (/quast-master/libs/gage/*.java)! Try to compile them manually!

Metaquast provide multiple gff files

Hi,
I was wondering if it was possible to supply multiple gff files when calling metaquast? There is an option for supplying multiple fasta files (with the "-R" flag).
So what about "-G" with one gff corresponding to each genome provided in "-R"?

Bottleneck at pre-processing

/usr/lib/python3.5/site-packages/quast-4.5-py3.5.egg/EGG-INFO/scripts/quast.py assembly_metrics/sample_data/BCM-After-Atlas/Contigs/Clec_Bbug02212013.contigs.fa.gz

Version: 4.5

System information:
  OS: Linux-3.10.0-327.28.2.el7.x86_64-x86_64-with-centos-7.2.1511-Core (linux_64)
  Python version: 3.5.3
  CPUs number: 4

Started: 2017-08-15 12:59:09

Logging to /home/huei820504/quast_results/results_2017_08_15_12_59_09/quast.log
NOTICE: Maximum number of threads is set to 1 (use --threads option to set it manually)

▽

CWD: /home/huei820504
Main parameters:
  Threads: 1, minimum contig length: 500, ambiguity: one, threshold for extensive misassembly size: 1000

Contigs:
  Pre-processing...
  assembly_metrics/sample_data/BCM-After-Atlas/Contigs/Clec_Bbug02212013.contigs.fa.gz ==> Clec_Bbug02212013.contigs

2017-08-15 13:00:25
Running Basic statistics processor...
  Contig files:
    Clec_Bbug02212013.contigs
  Calculating N50 and L50...
    Clec_Bbug02212013.contigs, N50 = 23541, L50 = 5952, Total length = 513170376, GC % = 34.82, # N's per 100 kbp =  0.00
  Drawing Nx plot...
    saved to /home/huei820504/quast_results/results_2017_08_15_12_59_09/basic_stats/Nx_plot.pdf
  Drawing cumulative plot...
    saved to /home/huei820504/quast_results/results_2017_08_15_12_59_09/basic_stats/cumulative_plot.pdf
  Drawing GC content plot...
    saved to /home/huei820504/quast_results/results_2017_08_15_12_59_09/basic_stats/GC_content_plot.pdf
  Drawing Clec_Bbug02212013.contigs GC content plot...
    saved to /home/huei820504/quast_results/results_2017_08_15_12_59_09/basic_stats/Clec_Bbug02212013.contigs_GC_content_plot.pdf
Done.

NOTICE: Genes are not predicted by default. Use --gene-finding option to enable it.

2017-08-15 13:01:19
Creating large visual summaries...
This may take a while: press Ctrl-C to skip this step..
  1 of 2: Creating Icarus viewers...
  2 of 2: Creating PDF with all tables and plots...
Done

2017-08-15 13:01:34
RESULTS:
  Text versions of total report are saved to /home/huei820504/quast_results/results_2017_08_15_12_59_09/report.txt, report.tsv, and report.tex
  Text versions of transposed total report are saved to /home/huei820504/quast_results/results_2017_08_15_12_59_09/transposed_report.txt, transposed_report.tsv, and transposed_report.tex
  HTML version (interactive tables and plots) saved to /home/huei820504/quast_results/results_2017_08_15_12_59_09/report.html
  PDF version (tables and plots) is saved to /home/huei820504/quast_results/results_2017_08_15_12_59_09/report.pdf
  Icarus (contig browser) is saved to /home/huei820504/quast_results/results_2017_08_15_12_59_09/icarus.html
  Log saved to /home/huei820504/quast_results/results_2017_08_15_12_59_09/quast.log

Finished: 2017-08-15 13:01:34
Elapsed time: 0:02:25.053717
NOTICEs: 2; WARNINGs: 0; non-fatal ERRORs: 0

Thank you for using QUAST!

If you see the running time of pre-processing:
2017-08-15 12:59:09 to 2017-08-15 13:00:25
That is, it costs 00:01:16 (over half of total running time 0:02:25)
It's wired.

I see there is a write operation at

quast/quast_libs/qutils.py

Line 129 in e0e6212

fastaparser.write_fasta(corrected_fpath, modified_fasta_entries)

,
which maybe can be avoided.

Which BLASTN version

Dear authors,

I am obtaining the following error:

ERROR! Failed downloading BLAST! The search for reference genomes cannot be performed. Try to download it manually in /home/imp/lib/quast-release_3.1/libs/blast and restart MetaQUAST.

I therefore downloaded blast from the NCBI website: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.2.31+-x64-linux.tar.gz

I installed it and copied the binary into the the folder given by the error. However, I still obtain the same error after that.

Is there something wrong with the version I am using? Can I know which exact blastn version is required?

-Shaman-

Summary stats different with quast4.5 and quast4.6

Hello,
I have recently downloaded the new quast version (4.6.1), and I ran it on one of my assemblies; it is indeed much faster. I found some unexpected results in the general stats (report.txt), especially regarding genome fraction, so I also ran the former version (4.5), below is the comparative table.

1/ How can you explain the difference ?
2/ How did you have quast4.6.1 performing much faster ? Is there something changed in the algorithm, and especially regarding definitions (e.g., of misassemblies) and alignment thresholds ?

Thank you very much for your answer.
Best regards,
Coline Jaworski

Assembly	Quast4.5	Quast4.6.1
contigs (>= 0 bp)	261	261
contigs (>= 1000 bp)	261	261
contigs (>= 5000 bp)	261	261
contigs (>= 10000 bp)	260	260
contigs (>= 25000 bp)	235	235
contigs (>= 50000 bp)	150	150
Total length (>= 0 bp)	159,478,338	159,478,338
Total length (>= 1000 bp)	159,478,338	159,478,338
Total length (>= 5000 bp)	159,478,338	159,478,338
Total length (>= 10000 bp)	159,473,207	159,473,207
Total length (>= 25000 bp)	158,969,480	158,969,480
Total length (>= 50000 bp)	155,802,267	155,802,267
contigs	261	261
Largest contig	9,888,897	9,888,897
Total length	159,478,338	159,478,338
Reference length	153,681,346	153,681,346
GC (%)	39.63	39.63
Reference GC (%)	39.85	39.85
N50	3,799,886	3,799,886
NG50	3,799,886	3,799,886
N75	1,355,015	1,355,015
NG75	1,528,338	1,528,338
L50	14	14
LG50	14	14
L75	32	32
LG75	29	29
misassemblies	5,500	3,698
misassembled contigs	231	214
Misassembled contigs length	158,499,367	104,931,326
local misassemblies	10,392	6,454
unaligned mis. contigs	25	42
unaligned contigs	1 249 part	1 249 part
Unaligned length	10,118,222	53,286,055
Genome fraction (%)	93.03	65.89
Duplication ratio	1.05	1.05
N's per 100 kbp	0.00	0.00
mismatches per 100 kbp	329.83	342.27
indels per 100 kbp	248.43	249.18
Largest alignment	518,864	518,864
Total aligned length	149,357,462	106,208,216
NA50	82,893	29,721
NGA50	86,330	34,756
NA75	33,455	885
NGA75	39,352	794
LA50	530
LGA50	496
LA75	1277
LGA75	1157

Using mummer4 instead of E-MEM

Hi there,

How could I using nucmer in mummer4 to align contigs rather than E-MEM embeded in quast?
Since Quast can use existing alignment files to do the evaluation, could I run nucmer stand alone and copy the output into "/contigs_reports/nucmer_output/", and which files are needed to skip the alignment step in Quast?

Thanks!

Discuss support of GFA format in Icarus output

Graphical Fragment Assembly (GFA) format describes sequence overlap graphs (assembly graphs). Specification and examples are here: https://github.com/pmelsted/GFA-spec

GFA is supported by ABySS 1.9.0, Bandage and many other tools. So, implementing its support in QUAST (for Icarus' contig alignment viewer) sounds reasonable and useful for the community.

This issue is created for a further discussion and any suggestions about GFA data representation in Icarus.

This enhancement idea was suggested by @sjackman (Shaun Jackman)

circos: incomplete genes.txt file

Dear developers,
When making a circus diagram the genes.txt file which is created contains only data for just under half of the reference chromosomes supplied. predictably, the gene track in the diagram is incomplete. I checked the raw annotation file (gff3), but this one is complete.
Kind regards,

Anne

circos plot error

Dear developers

Quast throws an error and exits after genemark when trying to make graphs (paths anonymised):

`2017-08-22 17:37:23
Creating large visual summaries...
This may take a while: press Ctrl-C to skip this step..
1 of 2: Creating Circos plots...

[Errno 2] No such file or directory: '/.../my_QUAST_outputdir/contigs_reports/all_alignments_myassembly-unitigs.tsv'
Traceback (most recent call last):
File "/data/software/quast-4.5/quast.py", line 302, in
return_code = main(sys.argv[1:])
File "/data/software/quast-4.5/quast.py", line 258, in main
features_containers, cov_fpath, os.path.join(output_dirpath, 'circos'), logger)
File "/data/software/quast-4.5/quast_libs/circos.py", line 565, in do
conf_fpath, circos_legend_fpath = create_conf(ref_fpath, contigs_fpaths, contig_report_fpath_pattern, output_dir, gc_fpath, features_containers, cov_fpath, logger)
File "/data/software/quast-4.5/quast_libs/circos.py", line 442, in create_conf
assemblies, contig_points = parse_alignments(contigs_fpaths, contig_report_fpath_pattern)
File "/data/software/quast-4.5/quast_libs/circos.py", line 186, in parse_alignments
aligned_blocks, misassembled_id_to_structure = parse_nucmer_contig_report(report_fpath)
File "/data/software/quast-4.5/quast_libs/circos.py", line 145, in parse_nucmer_contig_report
with open(report_fpath) as report_file:
IOError: [Errno 2] No such file or directory: '/.../my_QUAST_outputdir/contigs_reports/all_alignments_myassembly-unitigs.tsv'
ERROR! exception caught!
In case you have troubles running QUAST, you can write to [email protected]
Please provide us with quast.log file from the output directory.
`
I had a look for the missing tsv file and in the contig_reports directory there are only these tsv files:
unaligned_report.tsv
transposed_report_misassemblies.tsv
misassemblies_report.tsv
How can the missing file be made?

Kind regards,

Anne

MetaQUAST: add average read support of reference (based on contigs)

Request from Dmitry Antipov:

At least for MetaSPAdes, we can extract average read support from contig names (i.e. NODE_1_length_1833779_cov_52.589). Add metric in reference-based reports "Ave contig read support". Calculate it based on coverages of contigs that have a large enough alignment to this reference (>90%?)

Large genomes

QUAST previously was not recommended for large (mammalian) genomes. Is that still the case for QUAST 4.3?

misassembled contig

Hi,

I ran quast on E.coli assembly (1 contig) and I got in the contig size viewer a misassembled contig and also in the contig alignment viewer almost all the contig was misassembled.
The following is the mummerplot:

Is the contig misassembled because the genome is circular? Is there a way to change the criterions for misasembly to get a correct contig?

Thanks

Upper limit to reference sequences

Dear authors,

I would like to know if there is an upper limit to the reference genomes, when trying to validate a metagenome. I have 73 genome sequences stored in separate files (multi fasta). I appended the list of genomes into a the command (coma separated), but quast warns that there are no similarities between the query and reference, but this is not actually the case because when I reduce the number of references to two, it seems to work.

I issued the command:

SIM_REF=`\ls /mnt/nfs/projects/ecosystem_biology/test_datasets/CelajEtAl/73_species/*.fa | paste -s -d,`

metaquast.py -o /scratch/users/snarayanasamy/IMP_MS_data/quast_simDat -R ${SIM_REF} -t 12 -l IMP,metAmos /scratch/users/snarayanasamy/IMP_MS_data/IMP/simulated_data_output/Assembly/MGMT.assembly.merged.fa /scratch/users/snarayanasamy/IMP_MS_data/metAmosAnalysis/simDat_metAmos/Assemble/out/soapdenovo.31.asm.contig

And obtained the following stderr/stdout:

Partitioning contigs into bins aligned to each reference..
  processing IMP
  processing metAmos

No contigs were aligned to the reference Bacteroides_finegoldii_DSM_17565, skipping..

No contigs were aligned to the reference Eubacterium_siraeum_DSM_15702, skipping..

No contigs were aligned to the reference Bacteroides_ovatus_ATCC_8483, skipping..

No contigs were aligned to the reference Bacteroides_stercoris_ATCC_43183, skipping..

No contigs were aligned to the reference Alistipes_putredinis_DSM_17216, skipping..

No contigs were aligned to the reference Bacteroides_spDOT_4_3_47FAA, skipping..

No contigs were aligned to the reference Collinsella_aerofaciens_ATCC_25986, skipping..

No contigs were aligned to the reference Bacteroides_fragilis_3_1_12, skipping..

No contigs were aligned to the reference Bacteroides_dorei_DSM_17855, skipping..

Starting quast.py for the contigs aligned to Eubacterium_dolichum_DSM_3991
(logging to /scratch/users/snarayanasamy/IMP_MS_data/quast_simDat/Eubacterium_dolichum_DSM_3991_quast_output/quast.log)

No contigs were aligned to the reference Blautia_hydrogenotrophica_DSM_10507, skipping..

Notice that quast runs for the genome "Eubacterium_dolichum_DSM_3991", this occurs because I ran the analysis previously and overwrote the output directory, such that the nucmer files corresponding to that particular genome is retained, hence quast is able to access it and perform the analysis.

Is there a workaround or a better way to do this? I am guessing that the list of genomes (absolute paths) is too long for the command. Please let me know if you need more information. I look forward to your response.

Update: I tried up to 27 references, and it works. I am slowly increasing it to see where this problem occurs. Still not sure what the issue is...

Update 2: I iteratively ran quast and it seems that it fails when I provide 40 reference files... It works up to 39.

-Shaman-

"This contig is more unaligned than misassembled" counts Ns

This contig is misassembled.
Warning! This contig is more unaligned than misassembled. Contig length is 18494 and total length of all aligns is 8494
	Alignment: 43285440 43292966 | 1 7535 | 7527 7535 | 98.93 | CM000681.2_Homo_sapiens_chromosome_19__GRCh38_reference_primary_assembly 10725403_8495_0_5261205__..._10482123
	Alignment: 43232488 43233446 | 17536 18494 | 959 959 | 99.9 | CM000681.2_Homo_sapiens_chromosome_19__GRCh38_reference_primary_assembly 10725403_8495_0_5261205__..._10482123
Unaligned bases: 10000

This contig is composed of 8,494 nucleotides and 10,000 N. The unaligned portion is entirely Ns. I believe the number of Ns should not count toward either the unaligned length or the contig size. In this case, 100% of the non-N portion of the contig is aligned.

Continuous integration

Have you considered setting up CI (CircleCI or TravisCI) to run quast --test for each PR and commit?

Contig size viewer enhancements

Highlight misassembled contigs in red/orange.
Show the positions of the breakpoints in the misassembled contigs.
Display the "Contig info" sidebar.
Include a link to the "Alignment view" of the selected contig.
Optionally display the sizes of the aligned blocks rather than the unaligned contig sizes to visualize the NA50.

Quast error about internet connection

Dear authors,

I am attempting to run quast on some metagenomic assemblies. It seems that quast is able to download some of the references, however it terminates after a certain point, even if I repeat the command.

Logging to /output/Analysis/results/quast/metaquast.log

Contigs:

No references are provided, starting to search for reference genomes in SILVA rRNA database and to download them from NCBI...

2015-09-01 13:28:36
  Using existing BLAST alignments for MGMT.assembly.merged... 

Trying to use previously downloaded references...

2015-09-01 13:28:36
Trying to download found references from NCBI. Totally 46 organisms to try.
  Candidatus_Microthrix_parvicella_RN1               | was downloaded previously (total 1, 45 more to go)
  Dechloromonas_sp._SIUL                             | not found in the NCBI database
  Trachelomonas_volvocinopsis_var._spiralis          | not found in the NCBI database

ERROR! Cannot established internet connection to download reference genomes! Check internet connection or run MetaQUAST with option "--max-ref-number 0".

I am unable to find any other logs files to further troubleshoot this problem... I would be happy to send over my contig fasta file so that you may be able to test it yourself.

Best,
Shaman

how to combine the results of different assembly using quest

Hi,
I want to combine these in a figure to compare after I got several "report.pdf" with assembly results of different assembly tools. Just like the figure as "http://bioinf.spbau.ru/quast" shows. What should I do? Any Ideas?

Thanks!

No handlers could be found for logger "quast"

Dear,

kindly assist me to determine what is going wrong. I just unzipped quast and run
./install_full.sh
which returned

Starting QUAST test... (stdout redirected to ./install_log.stdout)
No handlers could be found for logger "quast"
ERROR! QUAST TEST FAILED!

It seems compilation of Mummer fails with (make: *** No targets specified and no makefile found. Stop.)

Install log is attached.

I have loaded the following modules

jdk64
gcc/4.8.2
perl5/5.18.2
python64/3.5.2

Thanks
Anthony

Error running quast

Hello,
I'm getting the following error when trying to run quast w/ even the most simple parameters (quast.log):

`/root/anaconda3/bin/quast.py ./test.fa -o /root/DeepBiome/SRR924736_out_dir/mega_out/combined_reference

Version: 4.6.3

System information:
OS: Linux-4.9.87-linuxkit-aufs-x86_64-with-debian-8.10 (linux_64)
Python version: 3.6.5
CPUs number: 3

Started: 2018-04-14 13:48:14

Logging to /root/DeepBiome/SRR924736_out_dir/mega_out/combined_reference/quast.log
NOTICE: Output directory already exists. Existing Nucmer alignments can be used
NOTICE: Maximum number of threads is set to 1 (use --threads option to set it manually)

CWD: /root/DeepBiome/SRR924736_out_dir/mega_out
Main parameters:
Threads: 1, minimum contig length: 500, ambiguity: one, threshold for extensive misassembly size: 1000

Contigs:
Pre-processing...
./test.fa ==> test

2018-04-14 13:48:15
Running Basic statistics processor...
Contig files:
test

'ascii' codec can't decode byte 0xef in position 14588: ordinal not in range(128)
Traceback (most recent call last):
File "/root/anaconda3/bin/quast.py", line 281, in
return_code = main(sys.argv[1:])
File "/root/anaconda3/bin/quast.py", line 140, in main
output_dirpath)
File "/root/anaconda3/lib/python3.6/site-packages/quast_libs/basic_stats.py", line 228, in do
html_saver.save_contigs_lengths(results_dir, contigs_fpaths, corr_lists_of_lengths)
File "/root/anaconda3/lib/python3.6/site-packages/quast_libs/html_saver/html_saver.py", line 475, in save_contigs_lengths
append(results_dirpath, json_fpath, 'contigsLengths')
File "/root/anaconda3/lib/python3.6/site-packages/quast_libs/html_saver/html_saver.py", line 220, in append
init(html_fpath)
File "/root/anaconda3/lib/python3.6/site-packages/quast_libs/html_saver/html_saver.py", line 115, in init
script_texts.append(js_html(aux_f_rel_path))
File "/root/anaconda3/lib/python3.6/site-packages/quast_libs/html_saver/html_saver.py", line 95, in js_html
return '<script type="text/javascript">\n' + open(get_real_path(script_rel_path)).read() + '\n</script>\n'
File "/root/anaconda3/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 14588: ordinal not in range(128)
`
Any help would be greatly appreciated.

Incorrect path

I got:

"The tool returned non-zero. See ../../../../../../../storage1/akomissarov/pacbio/quast_results/results_2015_07_01_14_10_36/busco_output/busco.err for stderr."

and it should be:

"The tool returned non-zero. See /storage1/akomissarov/pacbio/quast_results/results_2015_07_01_14_10_36/busco_output/busco.err for stderr."

Probably you should check if it is an absolute path or not with

 if not os.path.isabs(path):

minimap2 "-asm10" option

Hi,
I have assembled nanopore reads and i expect the identity between my assembly and the reference around 92%. Latest dev of quast implements the -asm5 option for minimap2 and i get a genome fraction around 45% between my assembly and the reference bacterial genome.

Since i expect my assembly to be less than 95% aligned to the reference is it possible to have the -asm10 option implemented ???

Misaligned scaffold

Just a heads up that this issue lh3/minimap2#104 caused QUAST to incorrectly report a misassembly, fixed now in minimap2 r677. You'll want to update minimap2 2.9 when it's released.

Metaquast Failed aligning contigs

Hi, I've successfully run metaquast a few times without a reference, but one data set is giving me this error:
Failed aligning the contigs for all the references. Try to use option --max-ref-number to change maximum number of references (per each assembly) to download.

It looks like there are 121 reference genomes that are retrieved, and I set the --max-ref-number as 200. Do you know what could be the issue? Attached in the metaquast log in case that is helpful.

Thank you.

-Jennifer

metaquast.log.txt

Skip creating combined reference for metaQUAST

Hi,

I was wondering if there is a flag to pass a combined reference (or if I can just pass it with the -R flag, the format used by metaQUAST), instead of passing separate references that get combined. As of now, I split apart a combined reference file into separate references, only to get recombined.

Thanks!

Syntheny plot

Read nucmer output and build a syntheny plot:

Consider known tools first, there are plenty of those.

Create both PDF and JS versions.

Thanks @snurk

Display physical coverage in Icarus in addition to reads coverage

Physical coverage is the coverage of the reference by the paired-end fragments, counting the reads and the gap between the paired-end reads as covered. This additional track may be useful in identifying reference misassemblies and read coverage gaps that cannot be scaffolded over due to a lack of physical coverage.

While the definition above is rather obvious it is not clear how to calculate per-base physical coverage in many practical cases. For example, we should decide what to do with orphan reads, reads mapped with insert size much larger than average one and so on.
Some discussions on this are here: https://www.biostars.org/p/131268/

This enhancement idea was suggested by @sjackman (Shaun Jackman)

Discuss the possibility of creating a QUAST package for DebianMed

After an initial discussion with @smoe by email, I'd like to start a discussion
here of possibly creating a Debian package for QUAST. As QUAST is becoming more
and more widely used in the genomics community, a Debian package could further
increase adoption by making it simpler to install.

I think it would be possible to create a debian package from the source
distribution. One point to consider though is that many of the bundled
dependencies in the QUAST package (e.g. mummber) are all already available in
DebianMed so it could be the case that only QUAST scripts would be required,
along with updates to $PATH.

I've opened this issue to see if I, @smoe, @pbelmann and the QUAST team might
find this to be a worthwhile exercise.

Gene formats

Support gff.gz, gtf, gtf.gz, bed, bed.gz.

Output directory already exists

When the output directory already exists, but is empty, I see this message:

NOTICE: Output directory already exists. Existing Minimap2 alignments can be used

This run of QUAST then later exited with an exit status of 1. The last message it displayed was:

Running Basic statistics processor...

I'm not sure if the two are related, but the NOTICE is misleading in any case, since the Minimap2 alignments don't exist in the empty directory.

Move downloads from sourceforge to github

Quast takes quite a long time to download from sourceforge, approximately ~7 minutes. Would you consider moving the releases to github using their release feature? I think this might provide faster access. I ask because creating a Docker container of quast requires downloading this file each time, which makes the process take a long time as a whole.

Also would you consider releasing .xz versions too, to further decrease the file size?

Thank you.

metaquast with option --max-ref-number

Hi,

When I run metaquast without references and with option --max-ref-number 200 to find references in SILVA database, it only gave 48 organisms (tried to download) after blastn running. Dose this mean my contigs only match 48 organisms in SILVA database? I was trying to find more references for the analysis. Any ideas?

Here is the metaquast log file:
metaquast.log

Thanks!

metaQUAST error

Hi,

MetaQUAST seems to be terminating on one of the downloaded references. Below is the error I am getting:

'Escherichia_coli_O104_H4_str_2009EL-2071' Traceback (most recent call last): File "/usr/bin/metaquast", line 730, in <module> return_code = main(sys.argv[1:]) File "/usr/bin/metaquast", line 635, in main num_notifications_tuple=total_num_notifications) File "/usr/bin/metaquast", line 158, in _start_quast_main return_code = quast.main(args) File "/home/imp/lib/quast-release_3.1/quast.py", line 692, in main ref_fpath, contigs_fpaths, qconfig.prokaryote, os.path.join(output_dirpath, 'contigs_reports'), old_contigs_fpaths) File "/home/imp/lib/quast-release_3.1/libs/contigs_analyzer.py", line 1525, in do for i, fname in enumerate(zip(contigs_fpaths, old_contigs_fpaths))) File "/usr/lib/pymodules/python2.7/joblib/parallel.py", line 517, in __call__ self.dispatch(function, args, kwargs) File "/usr/lib/pymodules/python2.7/joblib/parallel.py", line 312, in dispatch job = ImmediateApply(func, args, kwargs) File "/usr/lib/pymodules/python2.7/joblib/

Let me know if you need more information. Looking forward to your response.

-Shaman-

ablab / quast Goto Github PK

quast's People

Contributors

Stargazers

Watchers

Forkers

quast's Issues

Recommend Projects

Recommend Topics

Recommend Org