Giter Club home page Giter Club logo

Comments (24)

lh3 avatar lh3 commented on May 27, 2024 1

Are the two copies exactly the same? If not, hifiasm will separate them. PS: it could also be a bug, but without looking at the data, it is hard to know what is happening.

from hifiasm.

chhylp123 avatar chhylp123 commented on May 27, 2024

May I ask which version are you using? Thanks.

from hifiasm.

chhylp123 avatar chhylp123 commented on May 27, 2024

From the the unitig name, I guess you are using v0.7. I strong recommend you to reassemble with v0.9. For v0.9, you can adjust the following options:

purge-cov
pri-range

And what's the size of p_ctg.gfa and a_ctg.gfa? We have also tested a repetitive maize genome and the result looks fine.

from hifiasm.

dabitz avatar dabitz commented on May 27, 2024

Yes, indeed this was done with version 0.7, I will now run with the new version with the suggested parameters. Any suggestion to use with 40X coverage?

p_ctg.gfa has 1.66Gb and a_ctg.gfa has 102Mb

Thanks!

from hifiasm.

chhylp123 avatar chhylp123 commented on May 27, 2024

Are these -l0 or -l2 numbers? Probably you can also try the 'hifiasm_high_het' branch (v0.9-r296). This branch shares the same bin files with v0.9, so you can just run the whole hifiasm once and run these two versions on top of the same bin files. But please note that v0.9 and v0.7 use different bin files.

--pri-range: used to keep all contigs with specific coverage at p_ctg.gfa, so this number should be set to around 40, such as 35. You can also check the homo peak reported by hifiasm.

If you want to apply '-l2':

purge-cov: Coverage upper bound of Purge-dups. If the coverage of a contig is higher than this bound, don't apply Purge-dups. So it should be also set to homo peak/~40.
-s/-l: increase these two can make Purge-dups stringent.

For '-l0' segdups merging, I guess some of them are not real mis-assemblies. This is because their corresponding regions are mis-partitioned to a_ctg.gfa. '--pri-range' recover them to p_ctg.gfa. You can also manually check 'rd:i:' at S-line which represents the coverage information. If it is around 40, the contig should be at p_ctg.gfa.

from hifiasm.

dabitz avatar dabitz commented on May 27, 2024

these are -l0 numbers. Do you think -l1 would help?

I will try the different approaches... Thanks for the support!

from hifiasm.

chhylp123 avatar chhylp123 commented on May 27, 2024

Maybe not... There is still room for improvement on segdups, we are constantly working on that.

from hifiasm.

lh3 avatar lh3 commented on May 27, 2024

Hifiasm is not perfect and may produce misassemblies. Some of the peaks you saw should be misassemblies. Nonetheless, probably not all peaks correspond to misassemblies. A unitig may represent multiple exact copies of a repeat/segdup. Most of such unitigs can't be resolved unless you add longer nanopore reads. These are remaining ambiguity in the assembly but are not really counted as misassemblies. For a homozygous genome, a typical misassembly has doubled coverage AND is enriched with heterozygous "SNPs".

from hifiasm.

dabitz avatar dabitz commented on May 27, 2024

Thank you guys for the support. Now I have merged the p_ctg and a_ctg fast files and I got I pretty nice mapping uniformity... Only one contig keeps showing half of the coverage. What could it be?

best-assembly-igv_snapshot

from hifiasm.

lh3 avatar lh3 commented on May 27, 2024

A good try on merging p_ctg and a_ctg. Now I wonder if these a_ctg should really be p_ctg...

Only one contig keeps showing half of the coverage. What could it be?

You said your sample is homozygous, which implies it is not haploid. In that case, there could be a heterozygous deletion on one haplotype. Another possibility is that hifiasm assembles one homozygous region into two, but I think this is unlikely. I don't know.

from hifiasm.

chhylp123 avatar chhylp123 commented on May 27, 2024

I agree with Heng. Another way is to check the corresponding regions at r_utg.gfa. We can have a look together if you upload the subgraph of r_utg.gfa here.

from hifiasm.

dabitz avatar dabitz commented on May 27, 2024

Hi, here are the graphs for the p_ctg.gfa file, the contig with half coverage is highlightened, how should I send you the r_utg.gfa? is a 1.8gb file!

Looking at the circular shape of this contig graph, would it be possible that hifiasm assembled the same sequence twice and did not collapsed?

TestrPuberaCCS p_ctg

from hifiasm.

dabitz avatar dabitz commented on May 27, 2024

Indeed! But why?

utg003791l

from hifiasm.

dabitz avatar dabitz commented on May 27, 2024

is it possible to set assembly with even higher read accuracy?

from hifiasm.

dabitz avatar dabitz commented on May 27, 2024

They are 99,99% similar. I will send you the graphs later

from hifiasm.

lh3 avatar lh3 commented on May 27, 2024

You can send us the r_utg.noseq.gfa first without sequences. That should be much smaller. You can also give us a few read names on that questionable contig such that we can pinpoint the unitig inthe r_utg graph.

Based on the information so far, my current guess is that this is a real heterozygous region. However, hifiasm somehow misassembles the two haplotypes into one contig.

from hifiasm.

dabitz avatar dabitz commented on May 27, 2024

Here are the links to download the r_utg.noseq.gfa and a list with read names

https://www.dropbox.com/s/aif2ldq3nqfhc27/TestRpuberaCCS.r_utg.noseq.gfa?dl=0

https://www.dropbox.com/s/ztxntvn75c9hwkv/utg003791l-read-names.csv?dl=0

I can also send the bam file only for this particular contig...

from hifiasm.

dabitz avatar dabitz commented on May 27, 2024

any news on this issue?

from hifiasm.

chhylp123 avatar chhylp123 commented on May 27, 2024

I'm so sorry for the late reply. I just checked your r_utg.gfa, which is a little bit weird. For example, m64078_200405_044943/109644221/ccs and m64078_200427_080658/85131714/ccs are at utg003791l, but they are not at r_utg.gfa. Could you please check where does utg003791l come from? It shouldn't come from p_ctg.gfa since the prefix of contig name at p_ctg.gfa is 'ptg'.

from hifiasm.

zhaotao1987 avatar zhaotao1987 commented on May 27, 2024

Hi, here are the graphs for the p_ctg.gfa file, the contig with half coverage is highlightened, how should I send you the r_utg.gfa? is a 1.8gb file!

Looking at the circular shape of this contig graph, would it be possible that hifiasm assembled the same sequence twice and did not collapsed?

TestrPuberaCCS p_ctg

I wondered how you generated such graphs, looks great and quite helpful. Thanks!

from hifiasm.

dabitz avatar dabitz commented on May 27, 2024

https://rrwick.github.io/Bandage/

from hifiasm.

zhaotao1987 avatar zhaotao1987 commented on May 27, 2024

https://rrwick.github.io/Bandage/

Cool, thanks! @dabitz

from hifiasm.

B10inform avatar B10inform commented on May 27, 2024

Hi all,

With hifiasm (p.ctg.gfa) and bandage, i get this graph.
image

Is this a correct output, unlike #dabitz's graph the lines are one color (major larges ones). Any help to interpret would be great.

Best

from hifiasm.

chhylp123 avatar chhylp123 commented on May 27, 2024

Hi all,

With hifiasm (p.ctg.gfa) and bandage, i get this graph.
image

Is this a correct output, unlike #dabitz's graph the lines are one color (major larges ones). Any help to interpret would be great.

Best

Looks like the graph is not bad? The color of each contig is determined by Bandage.

from hifiasm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.