Comments (3)
Hi Minky,
Yes, you are right, liger cannot handle ambiguous nucleotides, are you running liger independently? I mean without using the Wengan pipeline. I ask this because at earlier steps wengan remove the ambiguous bases from the short-read contigs, thus the only source of ambiguous letters can be the long-reads, if that is the case, you can replace the N bases on your long-reads using my fork of the seqtk program, you have to run the following command:
#compilation
git clone https://github.com/adigenova/seqtk.git
cd seqtk; make
#this command will replace all N character by A, the input can be in fasta or fastq format.
./seqtk iupac2basesA test.fa
After running the above, you will get the long-read sequences free of N, alternative, you can split the long-read sequence using the "cutN" program of seqtk, but I think that the first idea might be better for you unless you have plenty of N letters.
The program report the number of bases changed at the end (stderr). After fixing the long-read file you can rerun wengan.
Best,
Alex
from wengan.
Hi Alex,
Thanks a lot. The problem was solved after replacing N with A. I didn't run liger independently, but I applied the mode of "wengan.pl -x pacraw -a A" with input contig assembly from the platanus software.
Indeed the N bases are from the long-reads. There are 4M of Ns out of ~5 Gb pacbio data (seems not too many?). I see that Wengan was designed for human genome. I'm working with a plant genome with complex repeat regions like Arabidopsis. The genome size of my plant is ~250m and the pacbio long-reads covered ~20X of the genome. Wengan pipeline increased the contig N50 from 53k to 92k. I think this is already a great improvement.
Best,
Minky
from wengan.
Hi Minky,
It seems that you can improve a bit more the result by using WenganD with the raw data (I mean without starting from the Platanus assembly). As you have 20X of long-read coverage you might set the -N parameter to -N 3 (def is 5), there are additional optimizations described on #38 #35.
Best,
Alex
from wengan.
Related Issues (20)
- Error 139 HOT 4
- Setting new tmp directory for intermediate files HOT 1
- error2 HOT 3
- asm1.minia.41.contigs.fa] Error 127 HOT 3
- "--clib" flag error in intervalmiss HOT 2
- Using error corrected long reads HOT 2
- Error 137 HOT 4
- Is adapter filtering needed? HOT 1
- Unhandled kmer size HOT 1
- issues with non IUPAC bases HOT 1
- unrecognized command 'iupac2bases'. Abort! HOT 1
- Installatation HOT 1
- SPolished.asm.wengan.fasta] Error 136 HOT 3
- Leveraging ONT raw and PacBio raw HOT 1
- Error 132 - linger
- Wengan error 1 HOT 2
- Error 136
- make: *** [m013330.mk:4: m013330.abyss2-contigs.fa] Error 2 HOT 2
- Unable to install DiscovarDenovo
- Final assembly too small HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wengan.