Giter Club home page Giter Club logo

Comments (5)

adigenova avatar adigenova commented on September 12, 2024 1

Hi Hans,

  1. Do wengan supports computer cluster (e.g. sge) and continue running unfinished tasks?
    Wengan is designed to run in a single machine, It can continue unfinished task because Wengan generates a makefile (*.mk) to control its execution. You can use your cluster scheduler (e.g sge) to submit Wengan jobs, but they will be executed on a single machine.
  2. Do you support some assembler or alignment pipeline in wengan?
    The current version of Wengan supports 3 different short-read assemblers (Minia3, Abyss, and DiscoVarDenovo). The other components of the pipeline were designed specifically for Wengan and include tools for error-correct short-read contigs (intervalmiss), alignment of short and long-reads (fastmin-sg), and liger that is the final module that implements the SSG graph.
  3. We recommend 50X and 30X of coverage for short and long reads respectively. Increasing the short-read coverage over 50X is not very useful and worst short-read assemblies might be obtained. Additionally. more short-read coverage increases the computational resources needed to complete the assembly. For long-read we have done assemblies with 90X coverage and the results are similar or better to the one using only 30X. Thus you can increase the long-read coverage if you have the reads.

Best,

Alex

from wengan.

zihhuafang avatar zihhuafang commented on September 12, 2024

Hi Alex (@adigenova),

On the topic of coverage, we have ~30X of short reads and ~40X of ONT reads (N50 ~30Kb) for a genome that is the similar size of human.
Is it better to run on M mode or D mode?

Was trying on the D mode but got this error message (see below). Not sure what the problem is.

export MALLOC_PER_THREAD=1
/wengan/wengan-v0.2-bin-Linux/bin/DiscovarExp READS="Illumina/203_tursiops_unclass_Clean_R_1.fastq.gz,Illumina/203_tursiops_unclass_Clean_R_2.fastq.gz" OUT_DIR=/tmp/asm_wenganDD NUM_THREADS=32 2> asm_wenganD.Disco_denovo.err > asm_wenganD.Disco_denovo.log
asm_wenganD.mk:4: recipe for target 'asm_wenganD.contigs-disco.fa' failed
make: *** [asm_wenganD.contigs-disco.fa] Error 1

In asm_wenganD.Disco_denovo.log

1: 60 bases , 31 quals
2: 60 bases , 31 quals
See inconsistent base/quality lengths in Illumina/203_tursiops_unclass_Clean_R_1.fastq.gz or Illumina/203_tursiops_unclass_Clean_R_2.fastq.gz

Not sure what this mean. We did the standard QC for our short reads.

Would appreciate your advice!
Thanks
Zih-Hua

from wengan.

adigenova avatar adigenova commented on September 12, 2024

Hi Zih-Hua,

Is it better to run on M mode or D mode?
Wengan achieves better results with the D mode, but the D mode requires more memory than the other ones. For a 3Gb genome at 60X short read coverage the D mode need about 600Gb, for lower coverage ~30X, it would require about 300Gb.

Regarding the error message, DiscovarDenovo (Disco for short) is complaining that there are short-reads in your dataset with inconsistencies in the lengths of quality and bases (probably a corrupt fastq file). My recommendation is to give the raw short-reads as input to Wengan, because Disco error-correct the short-read data using sophisticated algorithms that are more convenient than just trimming reads based on single read qualities. Additionally. reads shorter than 60bp are not supported by Disco and also stop its execution. You can check that your reads are longer than 60bp using fastp for instance.
Best
Alex

from wengan.

zihhuafang avatar zihhuafang commented on September 12, 2024

Dear Alex,

Thanks for the reply. Just one quick question about trimming the reads. Our reads are generated from NovoSeq, so there is poly-G tail for each read. I guess I would still need to trim it before putting the reads to Wengan?

Thanks.
Zih-Hua

from wengan.

adigenova avatar adigenova commented on September 12, 2024

Yes, you can trim that tail, but be sure that all the reads are longer than 60bp.
Best,

Alex

from wengan.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.