Comments (24)
Are the two copies exactly the same? If not, hifiasm will separate them. PS: it could also be a bug, but without looking at the data, it is hard to know what is happening.
from hifiasm.
May I ask which version are you using? Thanks.
from hifiasm.
From the the unitig name, I guess you are using v0.7. I strong recommend you to reassemble with v0.9. For v0.9, you can adjust the following options:
purge-cov
pri-range
And what's the size of p_ctg.gfa and a_ctg.gfa? We have also tested a repetitive maize genome and the result looks fine.
from hifiasm.
Yes, indeed this was done with version 0.7, I will now run with the new version with the suggested parameters. Any suggestion to use with 40X coverage?
p_ctg.gfa has 1.66Gb and a_ctg.gfa has 102Mb
Thanks!
from hifiasm.
Are these -l0 or -l2 numbers? Probably you can also try the 'hifiasm_high_het' branch (v0.9-r296). This branch shares the same bin files with v0.9, so you can just run the whole hifiasm once and run these two versions on top of the same bin files. But please note that v0.9 and v0.7 use different bin files.
--pri-range: used to keep all contigs with specific coverage at p_ctg.gfa, so this number should be set to around 40, such as 35. You can also check the homo peak reported by hifiasm.
If you want to apply '-l2':
purge-cov: Coverage upper bound of Purge-dups. If the coverage of a contig is higher than this bound, don't apply Purge-dups. So it should be also set to homo peak/~40.
-s/-l: increase these two can make Purge-dups stringent.
For '-l0' segdups merging, I guess some of them are not real mis-assemblies. This is because their corresponding regions are mis-partitioned to a_ctg.gfa. '--pri-range' recover them to p_ctg.gfa. You can also manually check 'rd:i:' at S-line which represents the coverage information. If it is around 40, the contig should be at p_ctg.gfa.
from hifiasm.
these are -l0 numbers. Do you think -l1 would help?
I will try the different approaches... Thanks for the support!
from hifiasm.
Maybe not... There is still room for improvement on segdups, we are constantly working on that.
from hifiasm.
Hifiasm is not perfect and may produce misassemblies. Some of the peaks you saw should be misassemblies. Nonetheless, probably not all peaks correspond to misassemblies. A unitig may represent multiple exact copies of a repeat/segdup. Most of such unitigs can't be resolved unless you add longer nanopore reads. These are remaining ambiguity in the assembly but are not really counted as misassemblies. For a homozygous genome, a typical misassembly has doubled coverage AND is enriched with heterozygous "SNPs".
from hifiasm.
Thank you guys for the support. Now I have merged the p_ctg and a_ctg fast files and I got I pretty nice mapping uniformity... Only one contig keeps showing half of the coverage. What could it be?
from hifiasm.
A good try on merging p_ctg and a_ctg. Now I wonder if these a_ctg should really be p_ctg...
Only one contig keeps showing half of the coverage. What could it be?
You said your sample is homozygous, which implies it is not haploid. In that case, there could be a heterozygous deletion on one haplotype. Another possibility is that hifiasm assembles one homozygous region into two, but I think this is unlikely. I don't know.
from hifiasm.
I agree with Heng. Another way is to check the corresponding regions at r_utg.gfa. We can have a look together if you upload the subgraph of r_utg.gfa here.
from hifiasm.
Hi, here are the graphs for the p_ctg.gfa file, the contig with half coverage is highlightened, how should I send you the r_utg.gfa? is a 1.8gb file!
Looking at the circular shape of this contig graph, would it be possible that hifiasm assembled the same sequence twice and did not collapsed?
from hifiasm.
Indeed! But why?
from hifiasm.
is it possible to set assembly with even higher read accuracy?
from hifiasm.
They are 99,99% similar. I will send you the graphs later
from hifiasm.
You can send us the r_utg.noseq.gfa first without sequences. That should be much smaller. You can also give us a few read names on that questionable contig such that we can pinpoint the unitig inthe r_utg graph.
Based on the information so far, my current guess is that this is a real heterozygous region. However, hifiasm somehow misassembles the two haplotypes into one contig.
from hifiasm.
Here are the links to download the r_utg.noseq.gfa and a list with read names
https://www.dropbox.com/s/aif2ldq3nqfhc27/TestRpuberaCCS.r_utg.noseq.gfa?dl=0
https://www.dropbox.com/s/ztxntvn75c9hwkv/utg003791l-read-names.csv?dl=0
I can also send the bam file only for this particular contig...
from hifiasm.
any news on this issue?
from hifiasm.
I'm so sorry for the late reply. I just checked your r_utg.gfa, which is a little bit weird. For example, m64078_200405_044943/109644221/ccs and m64078_200427_080658/85131714/ccs are at utg003791l, but they are not at r_utg.gfa. Could you please check where does utg003791l come from? It shouldn't come from p_ctg.gfa since the prefix of contig name at p_ctg.gfa is 'ptg'.
from hifiasm.
Hi, here are the graphs for the p_ctg.gfa file, the contig with half coverage is highlightened, how should I send you the r_utg.gfa? is a 1.8gb file!
Looking at the circular shape of this contig graph, would it be possible that hifiasm assembled the same sequence twice and did not collapsed?
I wondered how you generated such graphs, looks great and quite helpful. Thanks!
from hifiasm.
https://rrwick.github.io/Bandage/
from hifiasm.
Cool, thanks! @dabitz
from hifiasm.
Hi all,
With hifiasm (p.ctg.gfa) and bandage, i get this graph.
Is this a correct output, unlike #dabitz's graph the lines are one color (major larges ones). Any help to interpret would be great.
Best
from hifiasm.
Hi all,
With hifiasm (p.ctg.gfa) and bandage, i get this graph.
Is this a correct output, unlike #dabitz's graph the lines are one color (major larges ones). Any help to interpret would be great.
Best
Looks like the graph is not bad? The color of each contig is determined by Bandage.
from hifiasm.
Related Issues (20)
- rerunning hifiasm from error corrected bin file with new parameters HOT 1
- what -b filter is doing ? HOT 1
- Skewed histogram in multiple species HOT 4
- How to set --hom-cov to homozygous coverage peak correctly? HOT 2
- How to improve the QV of the hifiasm assembly? HOT 3
- Assembly running out of memory; tried tuning down minimizer window size and kmer HOT 4
- How to check for circularity of contigs that are labeled as linear?
- can specific haploid parameters be added to improve assembly results? HOT 3
- Illegal instruction Error HOT 3
- error: update_rovlp_chain_qse HOT 5
- N50 is too small
- Suggestion for using HiC
- No k-mer peak HOT 3
- genome and contig increased HOT 2
- Question regarding dataset used in Hifiasm (UL) manuscript HOT 5
- Is purge_dups meaningful for hic phasing assembly? HOT 1
- problem of assembly HOT 1
- position of overlaps in final assembly HOT 3
- what is the default parameter for the minimum length of overlaps in hifiasm ? HOT 1
- Can gfa file add coverage information? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hifiasm.