Hello, I am trying to assemble a plant genome using different genome assembly software

Hi Hans, Do wengan supports computer cluster (e.g. sge) and co

Hi Alex (<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard

Hi Zih-Hua, Is it better to run on M mode or D mode?</st

Does it run on computer cluster and how to continue running unfinished tasks? about wengan HOT 5 CLOSED

adigenova commented on September 12, 2024

Does it run on computer cluster and how to continue running unfinished tasks?

from wengan.

Comments (5)

adigenova commented on September 12, 2024 1

Hi Hans,

Do wengan supports computer cluster (e.g. sge) and continue running unfinished tasks?
Wengan is designed to run in a single machine, It can continue unfinished task because Wengan generates a makefile (*.mk) to control its execution. You can use your cluster scheduler (e.g sge) to submit Wengan jobs, but they will be executed on a single machine.
Do you support some assembler or alignment pipeline in wengan?
The current version of Wengan supports 3 different short-read assemblers (Minia3, Abyss, and DiscoVarDenovo). The other components of the pipeline were designed specifically for Wengan and include tools for error-correct short-read contigs (intervalmiss), alignment of short and long-reads (fastmin-sg), and liger that is the final module that implements the SSG graph.
We recommend 50X and 30X of coverage for short and long reads respectively. Increasing the short-read coverage over 50X is not very useful and worst short-read assemblies might be obtained. Additionally. more short-read coverage increases the computational resources needed to complete the assembly. For long-read we have done assemblies with 90X coverage and the results are similar or better to the one using only 30X. Thus you can increase the long-read coverage if you have the reads.

Best,

Alex

from wengan.

zihhuafang commented on September 12, 2024

Hi Alex (@adigenova),

On the topic of coverage, we have ~30X of short reads and ~40X of ONT reads (N50 ~30Kb) for a genome that is the similar size of human.
Is it better to run on M mode or D mode?

Was trying on the D mode but got this error message (see below). Not sure what the problem is.

export MALLOC_PER_THREAD=1
/wengan/wengan-v0.2-bin-Linux/bin/DiscovarExp READS="Illumina/203_tursiops_unclass_Clean_R_1.fastq.gz,Illumina/203_tursiops_unclass_Clean_R_2.fastq.gz" OUT_DIR=/tmp/asm_wenganDD NUM_THREADS=32 2> asm_wenganD.Disco_denovo.err > asm_wenganD.Disco_denovo.log
asm_wenganD.mk:4: recipe for target 'asm_wenganD.contigs-disco.fa' failed
make: *** [asm_wenganD.contigs-disco.fa] Error 1

In asm_wenganD.Disco_denovo.log

1: 60 bases , 31 quals
2: 60 bases , 31 quals
See inconsistent base/quality lengths in Illumina/203_tursiops_unclass_Clean_R_1.fastq.gz or Illumina/203_tursiops_unclass_Clean_R_2.fastq.gz

Not sure what this mean. We did the standard QC for our short reads.

Would appreciate your advice!
Thanks
Zih-Hua

from wengan.

adigenova commented on September 12, 2024

Hi Zih-Hua,

Is it better to run on M mode or D mode?
Wengan achieves better results with the D mode, but the D mode requires more memory than the other ones. For a 3Gb genome at 60X short read coverage the D mode need about 600Gb, for lower coverage ~30X, it would require about 300Gb.

Regarding the error message, DiscovarDenovo (Disco for short) is complaining that there are short-reads in your dataset with inconsistencies in the lengths of quality and bases (probably a corrupt fastq file). My recommendation is to give the raw short-reads as input to Wengan, because Disco error-correct the short-read data using sophisticated algorithms that are more convenient than just trimming reads based on single read qualities. Additionally. reads shorter than 60bp are not supported by Disco and also stop its execution. You can check that your reads are longer than 60bp using fastp for instance.
Best
Alex

from wengan.

zihhuafang commented on September 12, 2024

Dear Alex,

Thanks for the reply. Just one quick question about trimming the reads. Our reads are generated from NovoSeq, so there is poly-G tail for each read. I guess I would still need to trim it before putting the reads to Wengan?

Thanks.
Zih-Hua

from wengan.

adigenova commented on September 12, 2024

Yes, you can trim that tail, but be sure that all the reads are longer than 60bp.
Best,

Alex

from wengan.

Does it run on computer cluster and how to continue running unfinished tasks? about wengan HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent