Comments (5)
Hi Hans,
- Do wengan supports computer cluster (e.g. sge) and continue running unfinished tasks?
Wengan is designed to run in a single machine, It can continue unfinished task because Wengan generates a makefile (*.mk) to control its execution. You can use your cluster scheduler (e.g sge) to submit Wengan jobs, but they will be executed on a single machine. - Do you support some assembler or alignment pipeline in wengan?
The current version of Wengan supports 3 different short-read assemblers (Minia3, Abyss, and DiscoVarDenovo). The other components of the pipeline were designed specifically for Wengan and include tools for error-correct short-read contigs (intervalmiss), alignment of short and long-reads (fastmin-sg), and liger that is the final module that implements the SSG graph. - We recommend 50X and 30X of coverage for short and long reads respectively. Increasing the short-read coverage over 50X is not very useful and worst short-read assemblies might be obtained. Additionally. more short-read coverage increases the computational resources needed to complete the assembly. For long-read we have done assemblies with 90X coverage and the results are similar or better to the one using only 30X. Thus you can increase the long-read coverage if you have the reads.
Best,
Alex
from wengan.
Hi Alex (@adigenova),
On the topic of coverage, we have ~30X of short reads and ~40X of ONT reads (N50 ~30Kb) for a genome that is the similar size of human.
Is it better to run on M mode or D mode?
Was trying on the D mode but got this error message (see below). Not sure what the problem is.
export MALLOC_PER_THREAD=1
/wengan/wengan-v0.2-bin-Linux/bin/DiscovarExp READS="Illumina/203_tursiops_unclass_Clean_R_1.fastq.gz,Illumina/203_tursiops_unclass_Clean_R_2.fastq.gz" OUT_DIR=/tmp/asm_wenganDD NUM_THREADS=32 2> asm_wenganD.Disco_denovo.err > asm_wenganD.Disco_denovo.log
asm_wenganD.mk:4: recipe for target 'asm_wenganD.contigs-disco.fa' failed
make: *** [asm_wenganD.contigs-disco.fa] Error 1
In asm_wenganD.Disco_denovo.log
1: 60 bases , 31 quals
2: 60 bases , 31 quals
See inconsistent base/quality lengths in Illumina/203_tursiops_unclass_Clean_R_1.fastq.gz or Illumina/203_tursiops_unclass_Clean_R_2.fastq.gz
Not sure what this mean. We did the standard QC for our short reads.
Would appreciate your advice!
Thanks
Zih-Hua
from wengan.
Hi Zih-Hua,
Is it better to run on M mode or D mode?
Wengan achieves better results with the D mode, but the D mode requires more memory than the other ones. For a 3Gb genome at 60X short read coverage the D mode need about 600Gb, for lower coverage ~30X, it would require about 300Gb.
Regarding the error message, DiscovarDenovo (Disco for short) is complaining that there are short-reads in your dataset with inconsistencies in the lengths of quality and bases (probably a corrupt fastq file). My recommendation is to give the raw short-reads as input to Wengan, because Disco error-correct the short-read data using sophisticated algorithms that are more convenient than just trimming reads based on single read qualities. Additionally. reads shorter than 60bp are not supported by Disco and also stop its execution. You can check that your reads are longer than 60bp using fastp for instance.
Best
Alex
from wengan.
Dear Alex,
Thanks for the reply. Just one quick question about trimming the reads. Our reads are generated from NovoSeq, so there is poly-G tail for each read. I guess I would still need to trim it before putting the reads to Wengan?
Thanks.
Zih-Hua
from wengan.
Yes, you can trim that tail, but be sure that all the reads are longer than 60bp.
Best,
Alex
from wengan.
Related Issues (20)
- Error 139 HOT 4
- Setting new tmp directory for intermediate files HOT 1
- error2 HOT 3
- asm1.minia.41.contigs.fa] Error 127 HOT 3
- "--clib" flag error in intervalmiss HOT 2
- Using error corrected long reads HOT 2
- Error 137 HOT 4
- Is adapter filtering needed? HOT 1
- Unhandled kmer size HOT 1
- issues with non IUPAC bases HOT 1
- unrecognized command 'iupac2bases'. Abort! HOT 1
- Installatation HOT 1
- SPolished.asm.wengan.fasta] Error 136 HOT 3
- Leveraging ONT raw and PacBio raw HOT 1
- Error 132 - linger
- Wengan error 1 HOT 2
- Error 136
- make: *** [m013330.mk:4: m013330.abyss2-contigs.fa] Error 2 HOT 2
- Unable to install DiscovarDenovo
- Final assembly too small HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wengan.