Giter Club home page Giter Club logo

Comments (12)

chhylp123 avatar chhylp123 commented on May 27, 2024 1

@plattsad @conchoecia The new version of hifiasm (v0.14) incorporates a new option '--min-hist-cnt' to ignore noisy counts when analyzing the k-mer spectrum. Hope it will be helpful for this problem.

from hifiasm.

chhylp123 avatar chhylp123 commented on May 27, 2024

Since the het rate is quite high, hifiasm regards the het peak as homozygous peak incorrectly. It doesn't matter too much for assembly, but affects purge_dups a lot. One solution is to set --purge-cov to be homozygous peak (e.g. 45). A decreasing number of -s might be also helpful. However, both of these options probably cannot do enough purging for samples with het rate. If that happens, we recommend you to try Purge_dups (https://github.com/dfguan/purge_dups). In future, I guess we will have a better solution to avoid purge_dups, which has a lot of problems.

from hifiasm.

conchoecia avatar conchoecia commented on May 27, 2024

Thanks for your message, @chhylp123. I gave it another run, this time specifying -k 19. This time hifiasm correctly guessed the het and hom peaks, but again it erroneously output both haplotypes into the primary assembly, and the alternate assembly is much too small.

[M::ha_analyze_count] lowest: count[9] = 1754283
[M::ha_analyze_count] highest: count[19] = 2980425
[M::ha_hist_line]     2: ****************************************************************************************************> 14170747
[M::ha_hist_line]     3: ****************************************************************************************************> 6095522
[M::ha_hist_line]     4: ****************************************************************************************************> 3517697
[M::ha_hist_line]     5: ********************************************************************************** 2441350
[M::ha_hist_line]     6: ********************************************************************* 2062511
[M::ha_hist_line]     7: *************************************************************** 1874500
[M::ha_hist_line]     8: *********************************************************** 1771091
[M::ha_hist_line]     9: *********************************************************** 1754283
[M::ha_hist_line]    10: ************************************************************ 1795541
[M::ha_hist_line]    11: *************************************************************** 1869441
[M::ha_hist_line]    12: ******************************************************************** 2028087
[M::ha_hist_line]    13: ************************************************************************* 2188209
[M::ha_hist_line]    14: ******************************************************************************* 2354761
[M::ha_hist_line]    15: ************************************************************************************* 2523250
[M::ha_hist_line]    16: ****************************************************************************************** 2692454
[M::ha_hist_line]    17: ************************************************************************************************ 2851163
[M::ha_hist_line]    18: ************************************************************************************************** 2933855
[M::ha_hist_line]    19: **************************************************************************************************** 2980425
[M::ha_hist_line]    20: **************************************************************************************************** 2973333
[M::ha_hist_line]    21: *************************************************************************************************** 2952906
[M::ha_hist_line]    22: ************************************************************************************************ 2865676
[M::ha_hist_line]    23: ******************************************************************************************* 2710854
[M::ha_hist_line]    24: ************************************************************************************* 2518491
[M::ha_hist_line]    25: ****************************************************************************** 2313737
[M::ha_hist_line]    26: ************************************************************************ 2138149
[M::ha_hist_line]    27: ***************************************************************** 1938409
[M::ha_hist_line]    28: ************************************************************ 1787849
[M::ha_hist_line]    29: ******************************************************* 1628415
[M::ha_hist_line]    30: *************************************************** 1529645
[M::ha_hist_line]    31: ************************************************* 1452905
[M::ha_hist_line]    32: *********************************************** 1406267
[M::ha_hist_line]    33: ********************************************** 1375248
[M::ha_hist_line]    34: *********************************************** 1386959
[M::ha_hist_line]    35: *********************************************** 1392977
[M::ha_hist_line]    36: ************************************************ 1418896
[M::ha_hist_line]    37: ************************************************ 1443780
[M::ha_hist_line]    38: ************************************************** 1491147
[M::ha_hist_line]    39: *************************************************** 1517179
[M::ha_hist_line]    40: *************************************************** 1524668
[M::ha_hist_line]    41: ************************************************** 1502562
[M::ha_hist_line]    42: ************************************************** 1480236
[M::ha_hist_line]    43: ************************************************ 1429719
[M::ha_hist_line]    44: ********************************************** 1381645
[M::ha_hist_line]    45: ********************************************* 1329876
[M::ha_hist_line]    46: ****************************************** 1261889
[M::ha_hist_line]    47: **************************************** 1186449
[M::ha_hist_line]    48: ************************************* 1110108
[M::ha_hist_line]    49: *********************************** 1037530
[M::ha_hist_line]    50: ******************************** 966132
[M::ha_hist_line]    51: ****************************** 890500
[M::ha_hist_line]    52: **************************** 826020
[M::ha_hist_line]    53: ************************** 762370
[M::ha_hist_line]    54: ************************ 711653
[M::ha_hist_line]    55: ********************** 657454
[M::ha_hist_line]    56: ********************* 614353
[M::ha_hist_line]    57: ******************* 574305
[M::ha_hist_line]    58: ****************** 548358
[M::ha_hist_line]    59: ****************** 522686
[M::ha_hist_line]    60: ***************** 502057
[M::ha_hist_line]    61: **************** 485945
[M::ha_hist_line]    62: **************** 469769
[M::ha_hist_line]    63: *************** 452361
[M::ha_hist_line]    64: *************** 435805
[M::ha_hist_line]    65: ************** 420092
[M::ha_hist_line]    66: ************** 409057
[M::ha_hist_line]    67: ************* 391407
[M::ha_hist_line]    68: ************* 379454
[M::ha_hist_line]    69: ************ 367346
[M::ha_hist_line]    70: ************ 354757
[M::ha_hist_line]    71: *********** 339978
[M::ha_hist_line]    72: *********** 327008
-
-
-
[M::ha_hist_line]   200: * 15256
[M::ha_hist_line]   201: * 15186
[M::ha_hist_line]   202: * 15098
[M::ha_hist_line]  rest: **************************************************************************************** 2613571
[M::ha_analyze_count] left: none
[M::ha_analyze_count] right: count[40] = 1524668
[M::ha_ft_gen] peak_hom: 40; peak_het: 19
[M::ha_ft_gen::327.776*[email protected]] ==> filtered out 2659111 k-mers occurring 200 or more times
[M::ha_opt_update_cov] updated max_n_chain to 200
[M::ha_pt_gen::470.491*6.63] ==> counted 11783100 distinct minimizer k-mers
[M::ha_pt_gen] count[4095] = 0 (for sanity check)
[M::ha_analyze_count] lowest: count[7] = 103020
[M::ha_analyze_count] highest: count[19] = 243814
[M::ha_hist_line]     1: ****************************************************************************************************> 3881309
[M::ha_hist_line]     2: ****************************************************************************************************> 731724
[M::ha_hist_line]     3: ****************************************************************************************************> 285736
[M::ha_hist_line]     4: ******************************************************************* 164048
[M::ha_hist_line]     5: ************************************************* 118919
[M::ha_hist_line]     6: ******************************************** 106851
[M::ha_hist_line]     7: ****************************************** 103020
[M::ha_hist_line]     8: ******************************************* 104750
[M::ha_hist_line]     9: ********************************************** 112275
[M::ha_hist_line]    10: ************************************************** 121525
[M::ha_hist_line]    11: ******************************************************* 135217
[M::ha_hist_line]    12: *************************************************************** 152592
[M::ha_hist_line]    13: ********************************************************************** 170299
[M::ha_hist_line]    14: ***************************************************************************** 187520
[M::ha_hist_line]    15: ************************************************************************************* 206166
[M::ha_hist_line]    16: ****************************************************************************************** 220437
[M::ha_hist_line]    17: ************************************************************************************************* 235482
[M::ha_hist_line]    18: *************************************************************************************************** 240448
[M::ha_hist_line]    19: **************************************************************************************************** 243814
[M::ha_hist_line]    20: **************************************************************************************************** 243184
[M::ha_hist_line]    21: ************************************************************************************************* 236363
[M::ha_hist_line]    22: ********************************************************************************************* 226290
[M::ha_hist_line]    23: ************************************************************************************** 208942
[M::ha_hist_line]    24: ****************************************************************************** 188999
[M::ha_hist_line]    25: ********************************************************************** 170183
[M::ha_hist_line]    26: ************************************************************** 151626
[M::ha_hist_line]    27: ******************************************************* 133053
[M::ha_hist_line]    28: ************************************************* 118693
[M::ha_hist_line]    29: ****************************************** 103494
[M::ha_hist_line]    30: *************************************** 93931
[M::ha_hist_line]    31: *********************************** 85921
[M::ha_hist_line]    32: ********************************* 80779
[M::ha_hist_line]    33: ******************************* 76282
[M::ha_hist_line]    34: ******************************* 76613
[M::ha_hist_line]    35: ******************************* 74620
[M::ha_hist_line]    36: ******************************* 75318
[M::ha_hist_line]    37: ******************************* 75739
[M::ha_hist_line]    38: ******************************** 77902
[M::ha_hist_line]    39: ******************************** 77883
[M::ha_hist_line]    40: ******************************** 78776
[M::ha_hist_line]    41: ******************************* 76017
[M::ha_hist_line]    42: ******************************* 75003
[M::ha_hist_line]    43: ***************************** 71694
[M::ha_hist_line]    44: **************************** 68535
[M::ha_hist_line]    45: *************************** 66454
[M::ha_hist_line]    46: ************************** 62425
[M::ha_hist_line]    47: ************************ 58036
[M::ha_hist_line]    48: ********************** 54488
[M::ha_hist_line]    49: ********************* 50170
[M::ha_hist_line]    50: ******************* 46680
[M::ha_hist_line]    51: ****************** 42821
[M::ha_hist_line]    52: **************** 39110
[M::ha_hist_line]    53: *************** 36031
[M::ha_hist_line]    54: ************** 33860
[M::ha_hist_line]    55: ************* 30924
[M::ha_hist_line]    56: ************ 29141
[M::ha_hist_line]    57: *********** 27028
[M::ha_hist_line]    58: *********** 25620
[M::ha_hist_line]    59: ********** 24737
[M::ha_hist_line]    60: ********** 23265
[M::ha_hist_line]    61: ********* 22523
[M::ha_hist_line]    62: ********* 21687
[M::ha_hist_line]    63: ******** 20549
[M::ha_hist_line]    64: ******** 19945
[M::ha_hist_line]    65: ******** 19109
[M::ha_hist_line]    66: ******** 18588
[M::ha_hist_line]    67: ******* 17721
[M::ha_hist_line]    68: ******* 17247
[M::ha_hist_line]    69: ******* 16661
[M::ha_hist_line]    70: ****** 15704
-
-
-
[M::ha_hist_line]   164: * 1335
[M::ha_hist_line]   165: * 1259
[M::ha_hist_line]  rest: ************* 31330
[M::ha_analyze_count] left: none
[M::ha_analyze_count] right: count[40] = 78776
[M::ha_pt_gen] peak_hom: 40; peak_het: 19
[M::ha_pt_gen::521.345*7.64] ==> indexed 216061193 positions
[M::ha_assemble::2779.950*[email protected]] ==> corrected reads for round 1
[M::ha_assemble] # bases: 8054326319; # corrected bases: 14082381; # recorrected bases: 16537
[M::ha_assemble] size of buffer: 15.063GB
[M::ha_pt_gen::2825.302*21.59] ==> counted 9874090 distinct minimizer k-mers
[M::ha_pt_gen] count[4095] = 0 (for sanity check)
[M::ha_analyze_count] lowest: count[7] = 100493
[M::ha_analyze_count] highest: count[19] = 242703
[M::ha_hist_line]     1: ****************************************************************************************************> 2237612
[M::ha_hist_line]     2: ****************************************************************************************************> 543579
[M::ha_hist_line]     3: *************************************************************************************************** 239651
[M::ha_hist_line]     4: ************************************************************* 148520
[M::ha_hist_line]     5: ********************************************** 111364
[M::ha_hist_line]     6: ******************************************* 103166
[M::ha_hist_line]     7: ***************************************** 100493
[M::ha_hist_line]     8: ****************************************** 102670
[M::ha_hist_line]     9: ********************************************* 110027
[M::ha_hist_line]    10: ************************************************* 118852
[M::ha_hist_line]    11: ******************************************************* 132281
[M::ha_hist_line]    12: ************************************************************** 149774
[M::ha_hist_line]    13: ********************************************************************* 167554
[M::ha_hist_line]    14: **************************************************************************** 184708
[M::ha_hist_line]    15: ************************************************************************************ 203351
[M::ha_hist_line]    16: ****************************************************************************************** 217242
[M::ha_hist_line]    17: ************************************************************************************************ 233245
[M::ha_hist_line]    18: *************************************************************************************************** 240099
[M::ha_hist_line]    19: **************************************************************************************************** 242703
[M::ha_hist_line]    20: **************************************************************************************************** 241927
[M::ha_hist_line]    21: ************************************************************************************************** 236969
[M::ha_hist_line]    22: ********************************************************************************************** 227311
[M::ha_hist_line]    23: *************************************************************************************** 211026
[M::ha_hist_line]    24: ******************************************************************************* 190978
[M::ha_hist_line]    25: *********************************************************************** 171183
[M::ha_hist_line]    26: *************************************************************** 153630
[M::ha_hist_line]    27: ******************************************************** 135095
[M::ha_hist_line]    28: ************************************************** 120339
[M::ha_hist_line]    29: ******************************************* 103924
[M::ha_hist_line]    30: *************************************** 94874
[M::ha_hist_line]    31: *********************************** 86055
[M::ha_hist_line]    32: ********************************* 80923
[M::ha_hist_line]    33: ******************************* 75959
[M::ha_hist_line]    34: ******************************* 76099
[M::ha_hist_line]    35: ******************************* 74110
[M::ha_hist_line]    36: ******************************* 75386
[M::ha_hist_line]    37: ******************************* 74682
[M::ha_hist_line]    38: ******************************** 77166
[M::ha_hist_line]    39: ******************************** 77475
[M::ha_hist_line]    40: ********************************* 79001
[M::ha_hist_line]    41: ******************************* 76020
[M::ha_hist_line]    42: ******************************* 75217
[M::ha_hist_line]    43: ***************************** 71586
[M::ha_hist_line]    44: ***************************** 69413
[M::ha_hist_line]    45: **************************** 66756
[M::ha_hist_line]    46: ************************** 63191
[M::ha_hist_line]    47: ************************ 58743
[M::ha_hist_line]    48: *********************** 55832
[M::ha_hist_line]    49: ********************* 50727
[M::ha_hist_line]    50: ******************** 47525
[M::ha_hist_line]    51: ****************** 43052
[M::ha_hist_line]    52: **************** 39946
[M::ha_hist_line]    53: *************** 36391
[M::ha_hist_line]    54: ************** 34694
[M::ha_hist_line]    55: ************* 31186
[M::ha_hist_line]    56: ************ 29236
[M::ha_hist_line]    57: *********** 27356
[M::ha_hist_line]    58: *********** 26237
[M::ha_hist_line]    59: ********** 24596
[M::ha_hist_line]    60: ********** 23546
[M::ha_hist_line]    61: ********* 22721
.
.
.
[M::ha_hist_line]   165: * 1307
[M::ha_hist_line]   166: * 1307
[M::ha_hist_line]  rest: ************* 31683
[M::ha_analyze_count] left: none
[M::ha_analyze_count] right: count[40] = 78965
[M::ha_pt_gen] peak_hom: 40; peak_het: 19
[M::ha_pt_gen::7418.119*23.39] ==> indexed 216581152 positions
[M::ha_assemble::7927.976*[email protected]] ==> found overlaps for the final round
[M::ha_print_ovlp_stat] # overlaps: 21150601
[M::ha_print_ovlp_stat] # strong overlaps: 3922010
[M::ha_print_ovlp_stat] # weak overlaps: 17228591
[M::ha_print_ovlp_stat] # exact overlaps: 20329706
[M::ha_print_ovlp_stat] # inexact overlaps: 820895
[M::ha_print_ovlp_stat] # overlaps without large indels: 21111083
[M::ha_print_ovlp_stat] # reverse overlaps: 2530916
Writing reads to disk...
Reads has been written.
Writing ma_hit_ts to disk...
ma_hit_ts has been written.
Writing ma_hit_ts to disk...
ma_hit_ts has been written.
bin files have been written.
Writing raw unitig GFA to disk...
Writing processed unitig GFA to disk...
[M::purge_dups] purge duplication coverage threshold: 50
[M::purge_dups] purge duplication coverage threshold: 50
[M::adjust_utg_by_primary] primary contig coverage range: [32, infinity]
Writing primary contig GFA to disk...
Writing alternate contig GFA to disk...
[M::main] Version: 0.12-r304

primary contigs:

Main genome scaffold total: 773
Main genome contig total:   773
Main genome scaffold sequence total: 420.5 MB
Main genome contig sequence total:   420.5 MB (->  0.0% gap)
Main genome scaffold N/L50: 66/1.9 MB
Main genome contig N/L50:   66/1.9 MB
Number of scaffolds > 50 KB: 452
% main genome in scaffolds > 50 KB: 97.5%

 Minimum    Number    Number     Total        Total     Scaffold
Scaffold      of        of      Scaffold      Contig     Contig
 Length   Scaffolds  Contigs     Length       Length    Coverage
--------  ---------  -------  -----------  -----------  --------
    All       773        773  420,473,782  420,473,782   100.00%
   1 kb       773        773  420,473,782  420,473,782   100.00%
 2.5 kb       773        773  420,473,782  420,473,782   100.00%
   5 kb       773        773  420,473,782  420,473,782   100.00%
  10 kb       773        773  420,473,782  420,473,782   100.00%
  25 kb       727        727  419,474,641  419,474,641   100.00%
  50 kb       452        452  409,886,666  409,886,666   100.00%
 100 kb       354        354  403,275,622  403,275,622   100.00%
 250 kb       279        279  390,826,593  390,826,593   100.00%
 500 kb       222        222  370,606,466  370,606,466   100.00%
   1 mb       133        133  304,500,802  304,500,802   100.00%
 2.5 mb        41         41  154,917,623  154,917,623   100.00%
   5 mb         5          5   30,732,695   30,732,695   100.00%

alternate contigs:

Main genome scaffold total: 603
Main genome contig total:   603
Main genome scaffold sequence total: 28.5 MB
Main genome contig sequence total:   28.5 MB (->  0.0% gap)
Main genome scaffold N/L50: 70/66.0 KB
Main genome contig N/L50:   70/66.0 KB
Number of scaffolds > 50 KB: 95
% main genome in scaffolds > 50 KB: 55.0%

 Minimum    Number    Number     Total        Total     Scaffold
Scaffold      of        of      Scaffold      Contig     Contig
 Length   Scaffolds  Contigs     Length       Length    Coverage
--------  ---------  -------  -----------  -----------  --------
    All       603        603   28,541,231   28,541,231   100.00%
   1 kb       603        603   28,541,231   28,541,231   100.00%
 2.5 kb       603        603   28,541,231   28,541,231   100.00%
   5 kb       603        603   28,541,231   28,541,231   100.00%
  10 kb       602        602   28,531,752   28,531,752   100.00%
  25 kb       336        336   23,702,698   23,702,698   100.00%
  50 kb        95         95   15,683,649   15,683,649   100.00%
 100 kb        43         43   12,259,161   12,259,161   100.00%
 250 kb        11         11    7,373,679    7,373,679   100.00%
 500 kb         7          7    5,998,160    5,998,160   100.00%
   1 mb         2          2    2,266,314    2,266,314   100.00%
 2.5 mb         0          0            0            0     0.00%

Not sure what to try at this point, but I'm interested to keep the conversation going if you all are interested in developing the assembler to handle such cases.

from hifiasm.

lh3 avatar lh3 commented on May 27, 2024

The location of the hom/het k-mer peak has little to do with the doubled assembly size. As Haoyu said, try --purge-cov and/or -s. I guess probably the standalone purge_dups will work better in your case.

from hifiasm.

plattsad avatar plattsad commented on May 27, 2024

I'm not sure if we may be seeing a somewhat similar issue. I have a 240MBase plant genome that isn't assembling well - we're getting an N50 of about 110KBase.

The plant should have a moderate to low heterozygosity and we have 11GBase of sequence, so the peak below at x46 looks like the homozygous peak with no evident heterozygous peak. But it looks like hifiasm is picking up on something in the noise as a spurious homozygous peak (maybe a bit of residual contamination?) and is calling "[M::ha_ft_gen] peak_hom: 6; peak_het: -1" which is possibly messing with the downstream assembly? I can't immediately see any way of overriding this?

[M::ha_analyze_count] highest: count[6] = 7615771
[M::ha_hist_line]     2: ****************************************************************************************************> 22408535
[M::ha_hist_line]     3: ****************************************************************************************************> 8657975
[M::ha_hist_line]     4: ************************************************************************************************ 7336391
[M::ha_hist_line]     5: *************************************************************************************************** 7513444
[M::ha_hist_line]     6: **************************************************************************************************** 7615771
[M::ha_hist_line]     7: ******************************************************************************************* 6953884
[M::ha_hist_line]     8: **************************************************************************** 5818652
[M::ha_hist_line]     9: ************************************************************* 4675623
[M::ha_hist_line]    10: ********************************************** 3483947
[M::ha_hist_line]    11: ********************************* 2486463
[M::ha_hist_line]    12: *********************** 1734740
[M::ha_hist_line]    13: ***************** 1273650
[M::ha_hist_line]    14: ************* 988026
[M::ha_hist_line]    15: *********** 816541
[M::ha_hist_line]    16: ********** 744607
[M::ha_hist_line]    17: ********** 744153
[M::ha_hist_line]    18: ********** 778618
[M::ha_hist_line]    19: *********** 831791
[M::ha_hist_line]    20: ************ 885995
[M::ha_hist_line]    21: ************* 967146
[M::ha_hist_line]    22: ************** 1082043
[M::ha_hist_line]    23: *************** 1175861
[M::ha_hist_line]    24: ***************** 1291702
[M::ha_hist_line]    25: ******************* 1418231
[M::ha_hist_line]    26: ********************* 1574217
[M::ha_hist_line]    27: *********************** 1717038
[M::ha_hist_line]    28: ************************* 1867704
[M::ha_hist_line]    29: *************************** 2053383
[M::ha_hist_line]    30: ****************************** 2252547
[M::ha_hist_line]    31: ******************************** 2456543
[M::ha_hist_line]    32: *********************************** 2654839
[M::ha_hist_line]    33: ************************************* 2853282
[M::ha_hist_line]    34: **************************************** 3071010
[M::ha_hist_line]    35: ******************************************* 3268239
[M::ha_hist_line]    36: ********************************************** 3478024
[M::ha_hist_line]    37: ************************************************* 3723711
[M::ha_hist_line]    38: *************************************************** 3914659
[M::ha_hist_line]    39: ****************************************************** 4076444
[M::ha_hist_line]    40: ******************************************************** 4243403
[M::ha_hist_line]    41: ********************************************************* 4350864
[M::ha_hist_line]    42: ********************************************************** 4393110
[M::ha_hist_line]    43: ********************************************************** 4425975
[M::ha_hist_line]    44: ********************************************************** 4441411
[M::ha_hist_line]    45: ********************************************************** 4424218
[M::ha_hist_line]    46: ********************************************************* 4351193
[M::ha_hist_line]    47: ******************************************************** 4275290
[M::ha_hist_line]    48: ****************************************************** 4132816
[M::ha_hist_line]    49: ***************************************************** 4004287
[M::ha_hist_line]    50: ************************************************** 3827829
[M::ha_hist_line]    51: *********************************************** 3613549
[M::ha_hist_line]    52: ********************************************* 3408068
[M::ha_hist_line]    53: ****************************************** 3196954
[M::ha_hist_line]    54: *************************************** 2951828
[M::ha_hist_line]    55: ************************************ 2730826
[M::ha_hist_line]    56: ********************************* 2492887
[M::ha_hist_line]    57: ****************************** 2261824
[M::ha_hist_line]    58: *************************** 2060978
[M::ha_hist_line]    59: ************************ 1844006
[M::ha_hist_line]    60: ********************** 1666759
[M::ha_hist_line]    61: ******************** 1488025
[M::ha_hist_line]    62: ***************** 1298218
[M::ha_hist_line]    63: *************** 1128978
[M::ha_hist_line]    64: ************* 1006326
[M::ha_hist_line]    65: ************ 884390
[M::ha_hist_line]    66: ********** 758372
[M::ha_hist_line]    67: ********* 654228
[M::ha_hist_line]    68: ******* 559453
[M::ha_hist_line]    69: ****** 472609
[M::ha_hist_line]    70: ***** 401628
[M::ha_hist_line]    71: ***** 343492
[M::ha_hist_line]    72: **** 293676
[M::ha_hist_line]    73: *** 256838
[M::ha_hist_line]    74: *** 226872
[M::ha_hist_line]    75: *** 204987
[M::ha_hist_line]    76: ** 180611
[M::ha_hist_line]    77: ** 158380
[M::ha_hist_line]    78: ** 142687
[M::ha_hist_line]    79: ** 131546
[M::ha_hist_line]    80: ** 118504
[M::ha_hist_line]    81: * 108998
[M::ha_hist_line]    82: * 102957
[M::ha_hist_line]    83: * 95738
[M::ha_hist_line]    84: * 94677
[M::ha_hist_line]    85: * 90178
[M::ha_hist_line]    86: * 84027
[M::ha_hist_line]    87: * 82578
[M::ha_hist_line]    88: * 80335
[M::ha_hist_line]    89: * 77907
[M::ha_hist_line]    90: * 77706
[M::ha_hist_line]    91: * 75025
[M::ha_hist_line]    92: * 73079
[M::ha_hist_line]    93: * 69838
[M::ha_hist_line]    94: * 69007
[M::ha_hist_line]    95: * 67334
[M::ha_hist_line]    96: * 67287
[M::ha_hist_line]    97: * 64676
[M::ha_hist_line]    98: * 62391
[M::ha_hist_line]    99: * 60120
[M::ha_hist_line]   100: * 58544
[M::ha_hist_line]   101: * 56452
[M::ha_hist_line]   102: * 53633
[M::ha_hist_line]   103: * 52964
[M::ha_hist_line]   104: * 51004
[M::ha_hist_line]   105: * 50528
[M::ha_hist_line]   106: * 47821
[M::ha_hist_line]   107: * 45735
[M::ha_hist_line]   108: * 45054
[M::ha_hist_line]   109: * 42925
[M::ha_hist_line]   110: * 39438
[M::ha_hist_line]   111: * 38983
[M::ha_hist_line]  rest: ********************************* 2482275
[M::ha_analyze_count] left: none
[M::ha_analyze_count] right: none
[M::ha_ft_gen] peak_hom: 6; peak_het: -1

from hifiasm.

conchoecia avatar conchoecia commented on May 27, 2024

What are your assembly stats for the primary and alternative assemblies, @plattsad ? https://github.com/conchoecia/fasta_stats

from hifiasm.

plattsad avatar plattsad commented on May 27, 2024

With a vanilla configuration (only threads specified)...

---------------- Information for assembly 'fot.asm.p_ctg.gfa.fa' ----------------


                                         Number of scaffolds       5480
                                     Total size of scaffolds  464904639
                                            Longest scaffold    2070765
                                           Shortest scaffold      18620
                                 Number of scaffolds > 1K nt       5480 100.0%
                                Number of scaffolds > 10K nt       5480 100.0%
                               Number of scaffolds > 100K nt       1025  18.7%
                                 Number of scaffolds > 1M nt         11   0.2%
                                Number of scaffolds > 10M nt          0   0.0%
                                          Mean scaffold size      84837
                                        Median scaffold size      47112
                                         N50 scaffold length     117304
                                          L50 scaffold count        821
                                                 scaffold %A      30.62
                                                 scaffold %C      19.38
                                                 scaffold %G      19.38
                                                 scaffold %T      30.62
                                                 scaffold %N       0.00
                                         scaffold %non-ACGTN       0.00
                             Number of scaffold non-ACGTN nt          0

                Percentage of assembly in scaffolded contigs       0.0%
              Percentage of assembly in unscaffolded contigs     100.0%
                      Average number of contigs per scaffold        1.0
Average length of break (>25 Ns) between contigs in scaffold          0

                                           Number of contigs       5480
                              Number of contigs in scaffolds          0
                          Number of contigs not in scaffolds       5480
                                       Total size of contigs  464904639
                                              Longest contig    2070765
                                             Shortest contig      18620
                                   Number of contigs > 1K nt       5480 100.0%
                                  Number of contigs > 10K nt       5480 100.0%
                                 Number of contigs > 100K nt       1025  18.7%
                                   Number of contigs > 1M nt         11   0.2%
                                  Number of contigs > 10M nt          0   0.0%
                                            Mean contig size      84837
                                          Median contig size      47112
                                           N50 contig length     117304
                                            L50 contig count        821
                                                   contig %A      30.62
                                                   contig %C      19.38
                                                   contig %G      19.38
                                                   contig %T      30.62
                                                   contig %N       0.00
                                           contig %non-ACGTN       0.00
                               Number of contig non-ACGTN nt          0
---------------- Information for assembly 'fot.asm.a_ctg.gfa.fa' ----------------


                                         Number of scaffolds       1019
                                     Total size of scaffolds   53586801
                                            Longest scaffold     562572
                                           Shortest scaffold       9051
                                 Number of scaffolds > 1K nt       1019 100.0%
                                Number of scaffolds > 10K nt       1018  99.9%
                               Number of scaffolds > 100K nt         76   7.5%
                                 Number of scaffolds > 1M nt          0   0.0%
                                Number of scaffolds > 10M nt          0   0.0%
                                          Mean scaffold size      52588
                                        Median scaffold size      37402
                                         N50 scaffold length      58138
                                          L50 scaffold count        226
                                                 scaffold %A      30.73
                                                 scaffold %C      19.29
                                                 scaffold %G      19.25
                                                 scaffold %T      30.73
                                                 scaffold %N       0.00
                                         scaffold %non-ACGTN       0.00
                             Number of scaffold non-ACGTN nt          0

                Percentage of assembly in scaffolded contigs       0.0%
              Percentage of assembly in unscaffolded contigs     100.0%
                      Average number of contigs per scaffold        1.0
Average length of break (>25 Ns) between contigs in scaffold          0

                                           Number of contigs       1019
                              Number of contigs in scaffolds          0
                          Number of contigs not in scaffolds       1019
                                       Total size of contigs   53586801
                                              Longest contig     562572
                                             Shortest contig       9051
                                   Number of contigs > 1K nt       1019 100.0%
                                  Number of contigs > 10K nt       1018  99.9%
                                 Number of contigs > 100K nt         76   7.5%
                                   Number of contigs > 1M nt          0   0.0%
                                  Number of contigs > 10M nt          0   0.0%
                                            Mean contig size      52588
                                          Median contig size      37402
                                           N50 contig length      58138
                                            L50 contig count        226
                                                   contig %A      30.73
                                                   contig %C      19.29
                                                   contig %G      19.25
                                                   contig %T      30.73
                                                   contig %N       0.00
                                           contig %non-ACGTN       0.00
                               Number of contig non-ACGTN nt          0

from hifiasm.

chhylp123 avatar chhylp123 commented on May 27, 2024

Looks like we should be able to let users to manually set homozygous peak. Do you think it is helpful @plattsad @conchoecia? I will release v0.14 today if you think it is an acceptable solution.

from hifiasm.

plattsad avatar plattsad commented on May 27, 2024

I think that might be useful. As a test I manually set peak_hom and peak_het in htab.cpp to 46 and 23 respectively. The assembly with only threads set appeared considerably better:

---------------- Information for assembly 'fot.APmod.asm.p_ctg.gfa.fa' ----------------


                                         Number of scaffolds        494
                                     Total size of scaffolds  243557028
                                            Longest scaffold   29379636
                                           Shortest scaffold      21153
                                 Number of scaffolds > 1K nt        494 100.0%
                                Number of scaffolds > 10K nt        494 100.0%
                               Number of scaffolds > 100K nt         60  12.1%
                                 Number of scaffolds > 1M nt         19   3.8%
                                Number of scaffolds > 10M nt         10   2.0%
                                          Mean scaffold size     493030
                                        Median scaffold size      40029
                                         N50 scaffold length   16540259
                                          L50 scaffold count          6
                                                 scaffold %A      30.58
                                                 scaffold %C      19.42
                                                 scaffold %G      19.44
                                                 scaffold %T      30.56
                                                 scaffold %N       0.00
                                         scaffold %non-ACGTN       0.00
                             Number of scaffold non-ACGTN nt          0

                Percentage of assembly in scaffolded contigs       0.0%
              Percentage of assembly in unscaffolded contigs     100.0%
                      Average number of contigs per scaffold        1.0
Average length of break (>25 Ns) between contigs in scaffold          0

                                           Number of contigs        494
                              Number of contigs in scaffolds          0
                          Number of contigs not in scaffolds        494
                                       Total size of contigs  243557028
                                              Longest contig   29379636
                                             Shortest contig      21153
                                   Number of contigs > 1K nt        494 100.0%
                                  Number of contigs > 10K nt        494 100.0%
                                 Number of contigs > 100K nt         60  12.1%
                                   Number of contigs > 1M nt         19   3.8%
                                  Number of contigs > 10M nt         10   2.0%
                                            Mean contig size     493030
                                          Median contig size      40029
                                           N50 contig length   16540259
                                            L50 contig count          6
                                                   contig %A      30.58
                                                   contig %C      19.42
                                                   contig %G      19.44
                                                   contig %T      30.56
                                                   contig %N       0.00
                                           contig %non-ACGTN       0.00
                               Number of contig non-ACGTN nt          0

from hifiasm.

conchoecia avatar conchoecia commented on May 27, 2024

@plattsad that is a great improvement after manually setting the het and hom peak. Good idea trying that as a direct proof-of-concept. If you have Hi-C data then that should nicely scaffold into chromosomes.

@chhylp123 If you think it will increase the assembly quality, then yes it would be nice to have the option (even if it only affects purgedups). Our examples have shown that the current hifiasm algorithm to detect the het and hom peaks doesn't work in all cases.

By the way, I used Hi-C to scaffold the output of my genome assembly above that was doubled in size, and ended up with a nice chromosome-scale diploid assembly. The heterozygosity seems like it is well over 5% in the animal that I am working with, including major indels and inversions between the haplotypes.

from hifiasm.

conchoecia avatar conchoecia commented on May 27, 2024

Hi @chhylp123 - I think that this is a good compromise to help the software determine where a good cutoff for the noise is. I take it to mean that --min-hist-cnt is to be in the trough between the heterozygous peak and error peak in the k-mer spectrum? I'll follow up here once I try it out.

from hifiasm.

chhylp123 avatar chhylp123 commented on May 27, 2024

Yes, I think so. Skip the problematic peaks should work.

from hifiasm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.