Comments (12)
@plattsad @conchoecia The new version of hifiasm (v0.14) incorporates a new option '--min-hist-cnt' to ignore noisy counts when analyzing the k-mer spectrum. Hope it will be helpful for this problem.
from hifiasm.
Since the het rate is quite high, hifiasm regards the het peak as homozygous peak incorrectly. It doesn't matter too much for assembly, but affects purge_dups a lot. One solution is to set --purge-cov
to be homozygous peak (e.g. 45). A decreasing number of -s
might be also helpful. However, both of these options probably cannot do enough purging for samples with het rate. If that happens, we recommend you to try Purge_dups (https://github.com/dfguan/purge_dups). In future, I guess we will have a better solution to avoid purge_dups, which has a lot of problems.
from hifiasm.
Thanks for your message, @chhylp123. I gave it another run, this time specifying -k 19
. This time hifiasm
correctly guessed the het and hom peaks, but again it erroneously output both haplotypes into the primary assembly, and the alternate assembly is much too small.
[M::ha_analyze_count] lowest: count[9] = 1754283
[M::ha_analyze_count] highest: count[19] = 2980425
[M::ha_hist_line] 2: ****************************************************************************************************> 14170747
[M::ha_hist_line] 3: ****************************************************************************************************> 6095522
[M::ha_hist_line] 4: ****************************************************************************************************> 3517697
[M::ha_hist_line] 5: ********************************************************************************** 2441350
[M::ha_hist_line] 6: ********************************************************************* 2062511
[M::ha_hist_line] 7: *************************************************************** 1874500
[M::ha_hist_line] 8: *********************************************************** 1771091
[M::ha_hist_line] 9: *********************************************************** 1754283
[M::ha_hist_line] 10: ************************************************************ 1795541
[M::ha_hist_line] 11: *************************************************************** 1869441
[M::ha_hist_line] 12: ******************************************************************** 2028087
[M::ha_hist_line] 13: ************************************************************************* 2188209
[M::ha_hist_line] 14: ******************************************************************************* 2354761
[M::ha_hist_line] 15: ************************************************************************************* 2523250
[M::ha_hist_line] 16: ****************************************************************************************** 2692454
[M::ha_hist_line] 17: ************************************************************************************************ 2851163
[M::ha_hist_line] 18: ************************************************************************************************** 2933855
[M::ha_hist_line] 19: **************************************************************************************************** 2980425
[M::ha_hist_line] 20: **************************************************************************************************** 2973333
[M::ha_hist_line] 21: *************************************************************************************************** 2952906
[M::ha_hist_line] 22: ************************************************************************************************ 2865676
[M::ha_hist_line] 23: ******************************************************************************************* 2710854
[M::ha_hist_line] 24: ************************************************************************************* 2518491
[M::ha_hist_line] 25: ****************************************************************************** 2313737
[M::ha_hist_line] 26: ************************************************************************ 2138149
[M::ha_hist_line] 27: ***************************************************************** 1938409
[M::ha_hist_line] 28: ************************************************************ 1787849
[M::ha_hist_line] 29: ******************************************************* 1628415
[M::ha_hist_line] 30: *************************************************** 1529645
[M::ha_hist_line] 31: ************************************************* 1452905
[M::ha_hist_line] 32: *********************************************** 1406267
[M::ha_hist_line] 33: ********************************************** 1375248
[M::ha_hist_line] 34: *********************************************** 1386959
[M::ha_hist_line] 35: *********************************************** 1392977
[M::ha_hist_line] 36: ************************************************ 1418896
[M::ha_hist_line] 37: ************************************************ 1443780
[M::ha_hist_line] 38: ************************************************** 1491147
[M::ha_hist_line] 39: *************************************************** 1517179
[M::ha_hist_line] 40: *************************************************** 1524668
[M::ha_hist_line] 41: ************************************************** 1502562
[M::ha_hist_line] 42: ************************************************** 1480236
[M::ha_hist_line] 43: ************************************************ 1429719
[M::ha_hist_line] 44: ********************************************** 1381645
[M::ha_hist_line] 45: ********************************************* 1329876
[M::ha_hist_line] 46: ****************************************** 1261889
[M::ha_hist_line] 47: **************************************** 1186449
[M::ha_hist_line] 48: ************************************* 1110108
[M::ha_hist_line] 49: *********************************** 1037530
[M::ha_hist_line] 50: ******************************** 966132
[M::ha_hist_line] 51: ****************************** 890500
[M::ha_hist_line] 52: **************************** 826020
[M::ha_hist_line] 53: ************************** 762370
[M::ha_hist_line] 54: ************************ 711653
[M::ha_hist_line] 55: ********************** 657454
[M::ha_hist_line] 56: ********************* 614353
[M::ha_hist_line] 57: ******************* 574305
[M::ha_hist_line] 58: ****************** 548358
[M::ha_hist_line] 59: ****************** 522686
[M::ha_hist_line] 60: ***************** 502057
[M::ha_hist_line] 61: **************** 485945
[M::ha_hist_line] 62: **************** 469769
[M::ha_hist_line] 63: *************** 452361
[M::ha_hist_line] 64: *************** 435805
[M::ha_hist_line] 65: ************** 420092
[M::ha_hist_line] 66: ************** 409057
[M::ha_hist_line] 67: ************* 391407
[M::ha_hist_line] 68: ************* 379454
[M::ha_hist_line] 69: ************ 367346
[M::ha_hist_line] 70: ************ 354757
[M::ha_hist_line] 71: *********** 339978
[M::ha_hist_line] 72: *********** 327008
-
-
-
[M::ha_hist_line] 200: * 15256
[M::ha_hist_line] 201: * 15186
[M::ha_hist_line] 202: * 15098
[M::ha_hist_line] rest: **************************************************************************************** 2613571
[M::ha_analyze_count] left: none
[M::ha_analyze_count] right: count[40] = 1524668
[M::ha_ft_gen] peak_hom: 40; peak_het: 19
[M::ha_ft_gen::327.776*[email protected]] ==> filtered out 2659111 k-mers occurring 200 or more times
[M::ha_opt_update_cov] updated max_n_chain to 200
[M::ha_pt_gen::470.491*6.63] ==> counted 11783100 distinct minimizer k-mers
[M::ha_pt_gen] count[4095] = 0 (for sanity check)
[M::ha_analyze_count] lowest: count[7] = 103020
[M::ha_analyze_count] highest: count[19] = 243814
[M::ha_hist_line] 1: ****************************************************************************************************> 3881309
[M::ha_hist_line] 2: ****************************************************************************************************> 731724
[M::ha_hist_line] 3: ****************************************************************************************************> 285736
[M::ha_hist_line] 4: ******************************************************************* 164048
[M::ha_hist_line] 5: ************************************************* 118919
[M::ha_hist_line] 6: ******************************************** 106851
[M::ha_hist_line] 7: ****************************************** 103020
[M::ha_hist_line] 8: ******************************************* 104750
[M::ha_hist_line] 9: ********************************************** 112275
[M::ha_hist_line] 10: ************************************************** 121525
[M::ha_hist_line] 11: ******************************************************* 135217
[M::ha_hist_line] 12: *************************************************************** 152592
[M::ha_hist_line] 13: ********************************************************************** 170299
[M::ha_hist_line] 14: ***************************************************************************** 187520
[M::ha_hist_line] 15: ************************************************************************************* 206166
[M::ha_hist_line] 16: ****************************************************************************************** 220437
[M::ha_hist_line] 17: ************************************************************************************************* 235482
[M::ha_hist_line] 18: *************************************************************************************************** 240448
[M::ha_hist_line] 19: **************************************************************************************************** 243814
[M::ha_hist_line] 20: **************************************************************************************************** 243184
[M::ha_hist_line] 21: ************************************************************************************************* 236363
[M::ha_hist_line] 22: ********************************************************************************************* 226290
[M::ha_hist_line] 23: ************************************************************************************** 208942
[M::ha_hist_line] 24: ****************************************************************************** 188999
[M::ha_hist_line] 25: ********************************************************************** 170183
[M::ha_hist_line] 26: ************************************************************** 151626
[M::ha_hist_line] 27: ******************************************************* 133053
[M::ha_hist_line] 28: ************************************************* 118693
[M::ha_hist_line] 29: ****************************************** 103494
[M::ha_hist_line] 30: *************************************** 93931
[M::ha_hist_line] 31: *********************************** 85921
[M::ha_hist_line] 32: ********************************* 80779
[M::ha_hist_line] 33: ******************************* 76282
[M::ha_hist_line] 34: ******************************* 76613
[M::ha_hist_line] 35: ******************************* 74620
[M::ha_hist_line] 36: ******************************* 75318
[M::ha_hist_line] 37: ******************************* 75739
[M::ha_hist_line] 38: ******************************** 77902
[M::ha_hist_line] 39: ******************************** 77883
[M::ha_hist_line] 40: ******************************** 78776
[M::ha_hist_line] 41: ******************************* 76017
[M::ha_hist_line] 42: ******************************* 75003
[M::ha_hist_line] 43: ***************************** 71694
[M::ha_hist_line] 44: **************************** 68535
[M::ha_hist_line] 45: *************************** 66454
[M::ha_hist_line] 46: ************************** 62425
[M::ha_hist_line] 47: ************************ 58036
[M::ha_hist_line] 48: ********************** 54488
[M::ha_hist_line] 49: ********************* 50170
[M::ha_hist_line] 50: ******************* 46680
[M::ha_hist_line] 51: ****************** 42821
[M::ha_hist_line] 52: **************** 39110
[M::ha_hist_line] 53: *************** 36031
[M::ha_hist_line] 54: ************** 33860
[M::ha_hist_line] 55: ************* 30924
[M::ha_hist_line] 56: ************ 29141
[M::ha_hist_line] 57: *********** 27028
[M::ha_hist_line] 58: *********** 25620
[M::ha_hist_line] 59: ********** 24737
[M::ha_hist_line] 60: ********** 23265
[M::ha_hist_line] 61: ********* 22523
[M::ha_hist_line] 62: ********* 21687
[M::ha_hist_line] 63: ******** 20549
[M::ha_hist_line] 64: ******** 19945
[M::ha_hist_line] 65: ******** 19109
[M::ha_hist_line] 66: ******** 18588
[M::ha_hist_line] 67: ******* 17721
[M::ha_hist_line] 68: ******* 17247
[M::ha_hist_line] 69: ******* 16661
[M::ha_hist_line] 70: ****** 15704
-
-
-
[M::ha_hist_line] 164: * 1335
[M::ha_hist_line] 165: * 1259
[M::ha_hist_line] rest: ************* 31330
[M::ha_analyze_count] left: none
[M::ha_analyze_count] right: count[40] = 78776
[M::ha_pt_gen] peak_hom: 40; peak_het: 19
[M::ha_pt_gen::521.345*7.64] ==> indexed 216061193 positions
[M::ha_assemble::2779.950*[email protected]] ==> corrected reads for round 1
[M::ha_assemble] # bases: 8054326319; # corrected bases: 14082381; # recorrected bases: 16537
[M::ha_assemble] size of buffer: 15.063GB
[M::ha_pt_gen::2825.302*21.59] ==> counted 9874090 distinct minimizer k-mers
[M::ha_pt_gen] count[4095] = 0 (for sanity check)
[M::ha_analyze_count] lowest: count[7] = 100493
[M::ha_analyze_count] highest: count[19] = 242703
[M::ha_hist_line] 1: ****************************************************************************************************> 2237612
[M::ha_hist_line] 2: ****************************************************************************************************> 543579
[M::ha_hist_line] 3: *************************************************************************************************** 239651
[M::ha_hist_line] 4: ************************************************************* 148520
[M::ha_hist_line] 5: ********************************************** 111364
[M::ha_hist_line] 6: ******************************************* 103166
[M::ha_hist_line] 7: ***************************************** 100493
[M::ha_hist_line] 8: ****************************************** 102670
[M::ha_hist_line] 9: ********************************************* 110027
[M::ha_hist_line] 10: ************************************************* 118852
[M::ha_hist_line] 11: ******************************************************* 132281
[M::ha_hist_line] 12: ************************************************************** 149774
[M::ha_hist_line] 13: ********************************************************************* 167554
[M::ha_hist_line] 14: **************************************************************************** 184708
[M::ha_hist_line] 15: ************************************************************************************ 203351
[M::ha_hist_line] 16: ****************************************************************************************** 217242
[M::ha_hist_line] 17: ************************************************************************************************ 233245
[M::ha_hist_line] 18: *************************************************************************************************** 240099
[M::ha_hist_line] 19: **************************************************************************************************** 242703
[M::ha_hist_line] 20: **************************************************************************************************** 241927
[M::ha_hist_line] 21: ************************************************************************************************** 236969
[M::ha_hist_line] 22: ********************************************************************************************** 227311
[M::ha_hist_line] 23: *************************************************************************************** 211026
[M::ha_hist_line] 24: ******************************************************************************* 190978
[M::ha_hist_line] 25: *********************************************************************** 171183
[M::ha_hist_line] 26: *************************************************************** 153630
[M::ha_hist_line] 27: ******************************************************** 135095
[M::ha_hist_line] 28: ************************************************** 120339
[M::ha_hist_line] 29: ******************************************* 103924
[M::ha_hist_line] 30: *************************************** 94874
[M::ha_hist_line] 31: *********************************** 86055
[M::ha_hist_line] 32: ********************************* 80923
[M::ha_hist_line] 33: ******************************* 75959
[M::ha_hist_line] 34: ******************************* 76099
[M::ha_hist_line] 35: ******************************* 74110
[M::ha_hist_line] 36: ******************************* 75386
[M::ha_hist_line] 37: ******************************* 74682
[M::ha_hist_line] 38: ******************************** 77166
[M::ha_hist_line] 39: ******************************** 77475
[M::ha_hist_line] 40: ********************************* 79001
[M::ha_hist_line] 41: ******************************* 76020
[M::ha_hist_line] 42: ******************************* 75217
[M::ha_hist_line] 43: ***************************** 71586
[M::ha_hist_line] 44: ***************************** 69413
[M::ha_hist_line] 45: **************************** 66756
[M::ha_hist_line] 46: ************************** 63191
[M::ha_hist_line] 47: ************************ 58743
[M::ha_hist_line] 48: *********************** 55832
[M::ha_hist_line] 49: ********************* 50727
[M::ha_hist_line] 50: ******************** 47525
[M::ha_hist_line] 51: ****************** 43052
[M::ha_hist_line] 52: **************** 39946
[M::ha_hist_line] 53: *************** 36391
[M::ha_hist_line] 54: ************** 34694
[M::ha_hist_line] 55: ************* 31186
[M::ha_hist_line] 56: ************ 29236
[M::ha_hist_line] 57: *********** 27356
[M::ha_hist_line] 58: *********** 26237
[M::ha_hist_line] 59: ********** 24596
[M::ha_hist_line] 60: ********** 23546
[M::ha_hist_line] 61: ********* 22721
.
.
.
[M::ha_hist_line] 165: * 1307
[M::ha_hist_line] 166: * 1307
[M::ha_hist_line] rest: ************* 31683
[M::ha_analyze_count] left: none
[M::ha_analyze_count] right: count[40] = 78965
[M::ha_pt_gen] peak_hom: 40; peak_het: 19
[M::ha_pt_gen::7418.119*23.39] ==> indexed 216581152 positions
[M::ha_assemble::7927.976*[email protected]] ==> found overlaps for the final round
[M::ha_print_ovlp_stat] # overlaps: 21150601
[M::ha_print_ovlp_stat] # strong overlaps: 3922010
[M::ha_print_ovlp_stat] # weak overlaps: 17228591
[M::ha_print_ovlp_stat] # exact overlaps: 20329706
[M::ha_print_ovlp_stat] # inexact overlaps: 820895
[M::ha_print_ovlp_stat] # overlaps without large indels: 21111083
[M::ha_print_ovlp_stat] # reverse overlaps: 2530916
Writing reads to disk...
Reads has been written.
Writing ma_hit_ts to disk...
ma_hit_ts has been written.
Writing ma_hit_ts to disk...
ma_hit_ts has been written.
bin files have been written.
Writing raw unitig GFA to disk...
Writing processed unitig GFA to disk...
[M::purge_dups] purge duplication coverage threshold: 50
[M::purge_dups] purge duplication coverage threshold: 50
[M::adjust_utg_by_primary] primary contig coverage range: [32, infinity]
Writing primary contig GFA to disk...
Writing alternate contig GFA to disk...
[M::main] Version: 0.12-r304
primary contigs:
Main genome scaffold total: 773
Main genome contig total: 773
Main genome scaffold sequence total: 420.5 MB
Main genome contig sequence total: 420.5 MB (-> 0.0% gap)
Main genome scaffold N/L50: 66/1.9 MB
Main genome contig N/L50: 66/1.9 MB
Number of scaffolds > 50 KB: 452
% main genome in scaffolds > 50 KB: 97.5%
Minimum Number Number Total Total Scaffold
Scaffold of of Scaffold Contig Contig
Length Scaffolds Contigs Length Length Coverage
-------- --------- ------- ----------- ----------- --------
All 773 773 420,473,782 420,473,782 100.00%
1 kb 773 773 420,473,782 420,473,782 100.00%
2.5 kb 773 773 420,473,782 420,473,782 100.00%
5 kb 773 773 420,473,782 420,473,782 100.00%
10 kb 773 773 420,473,782 420,473,782 100.00%
25 kb 727 727 419,474,641 419,474,641 100.00%
50 kb 452 452 409,886,666 409,886,666 100.00%
100 kb 354 354 403,275,622 403,275,622 100.00%
250 kb 279 279 390,826,593 390,826,593 100.00%
500 kb 222 222 370,606,466 370,606,466 100.00%
1 mb 133 133 304,500,802 304,500,802 100.00%
2.5 mb 41 41 154,917,623 154,917,623 100.00%
5 mb 5 5 30,732,695 30,732,695 100.00%
alternate contigs:
Main genome scaffold total: 603
Main genome contig total: 603
Main genome scaffold sequence total: 28.5 MB
Main genome contig sequence total: 28.5 MB (-> 0.0% gap)
Main genome scaffold N/L50: 70/66.0 KB
Main genome contig N/L50: 70/66.0 KB
Number of scaffolds > 50 KB: 95
% main genome in scaffolds > 50 KB: 55.0%
Minimum Number Number Total Total Scaffold
Scaffold of of Scaffold Contig Contig
Length Scaffolds Contigs Length Length Coverage
-------- --------- ------- ----------- ----------- --------
All 603 603 28,541,231 28,541,231 100.00%
1 kb 603 603 28,541,231 28,541,231 100.00%
2.5 kb 603 603 28,541,231 28,541,231 100.00%
5 kb 603 603 28,541,231 28,541,231 100.00%
10 kb 602 602 28,531,752 28,531,752 100.00%
25 kb 336 336 23,702,698 23,702,698 100.00%
50 kb 95 95 15,683,649 15,683,649 100.00%
100 kb 43 43 12,259,161 12,259,161 100.00%
250 kb 11 11 7,373,679 7,373,679 100.00%
500 kb 7 7 5,998,160 5,998,160 100.00%
1 mb 2 2 2,266,314 2,266,314 100.00%
2.5 mb 0 0 0 0 0.00%
Not sure what to try at this point, but I'm interested to keep the conversation going if you all are interested in developing the assembler to handle such cases.
from hifiasm.
The location of the hom/het k-mer peak has little to do with the doubled assembly size. As Haoyu said, try --purge-cov
and/or -s
. I guess probably the standalone purge_dups will work better in your case.
from hifiasm.
I'm not sure if we may be seeing a somewhat similar issue. I have a 240MBase plant genome that isn't assembling well - we're getting an N50 of about 110KBase.
The plant should have a moderate to low heterozygosity and we have 11GBase of sequence, so the peak below at x46 looks like the homozygous peak with no evident heterozygous peak. But it looks like hifiasm is picking up on something in the noise as a spurious homozygous peak (maybe a bit of residual contamination?) and is calling "[M::ha_ft_gen] peak_hom: 6; peak_het: -1" which is possibly messing with the downstream assembly? I can't immediately see any way of overriding this?
[M::ha_analyze_count] highest: count[6] = 7615771
[M::ha_hist_line] 2: ****************************************************************************************************> 22408535
[M::ha_hist_line] 3: ****************************************************************************************************> 8657975
[M::ha_hist_line] 4: ************************************************************************************************ 7336391
[M::ha_hist_line] 5: *************************************************************************************************** 7513444
[M::ha_hist_line] 6: **************************************************************************************************** 7615771
[M::ha_hist_line] 7: ******************************************************************************************* 6953884
[M::ha_hist_line] 8: **************************************************************************** 5818652
[M::ha_hist_line] 9: ************************************************************* 4675623
[M::ha_hist_line] 10: ********************************************** 3483947
[M::ha_hist_line] 11: ********************************* 2486463
[M::ha_hist_line] 12: *********************** 1734740
[M::ha_hist_line] 13: ***************** 1273650
[M::ha_hist_line] 14: ************* 988026
[M::ha_hist_line] 15: *********** 816541
[M::ha_hist_line] 16: ********** 744607
[M::ha_hist_line] 17: ********** 744153
[M::ha_hist_line] 18: ********** 778618
[M::ha_hist_line] 19: *********** 831791
[M::ha_hist_line] 20: ************ 885995
[M::ha_hist_line] 21: ************* 967146
[M::ha_hist_line] 22: ************** 1082043
[M::ha_hist_line] 23: *************** 1175861
[M::ha_hist_line] 24: ***************** 1291702
[M::ha_hist_line] 25: ******************* 1418231
[M::ha_hist_line] 26: ********************* 1574217
[M::ha_hist_line] 27: *********************** 1717038
[M::ha_hist_line] 28: ************************* 1867704
[M::ha_hist_line] 29: *************************** 2053383
[M::ha_hist_line] 30: ****************************** 2252547
[M::ha_hist_line] 31: ******************************** 2456543
[M::ha_hist_line] 32: *********************************** 2654839
[M::ha_hist_line] 33: ************************************* 2853282
[M::ha_hist_line] 34: **************************************** 3071010
[M::ha_hist_line] 35: ******************************************* 3268239
[M::ha_hist_line] 36: ********************************************** 3478024
[M::ha_hist_line] 37: ************************************************* 3723711
[M::ha_hist_line] 38: *************************************************** 3914659
[M::ha_hist_line] 39: ****************************************************** 4076444
[M::ha_hist_line] 40: ******************************************************** 4243403
[M::ha_hist_line] 41: ********************************************************* 4350864
[M::ha_hist_line] 42: ********************************************************** 4393110
[M::ha_hist_line] 43: ********************************************************** 4425975
[M::ha_hist_line] 44: ********************************************************** 4441411
[M::ha_hist_line] 45: ********************************************************** 4424218
[M::ha_hist_line] 46: ********************************************************* 4351193
[M::ha_hist_line] 47: ******************************************************** 4275290
[M::ha_hist_line] 48: ****************************************************** 4132816
[M::ha_hist_line] 49: ***************************************************** 4004287
[M::ha_hist_line] 50: ************************************************** 3827829
[M::ha_hist_line] 51: *********************************************** 3613549
[M::ha_hist_line] 52: ********************************************* 3408068
[M::ha_hist_line] 53: ****************************************** 3196954
[M::ha_hist_line] 54: *************************************** 2951828
[M::ha_hist_line] 55: ************************************ 2730826
[M::ha_hist_line] 56: ********************************* 2492887
[M::ha_hist_line] 57: ****************************** 2261824
[M::ha_hist_line] 58: *************************** 2060978
[M::ha_hist_line] 59: ************************ 1844006
[M::ha_hist_line] 60: ********************** 1666759
[M::ha_hist_line] 61: ******************** 1488025
[M::ha_hist_line] 62: ***************** 1298218
[M::ha_hist_line] 63: *************** 1128978
[M::ha_hist_line] 64: ************* 1006326
[M::ha_hist_line] 65: ************ 884390
[M::ha_hist_line] 66: ********** 758372
[M::ha_hist_line] 67: ********* 654228
[M::ha_hist_line] 68: ******* 559453
[M::ha_hist_line] 69: ****** 472609
[M::ha_hist_line] 70: ***** 401628
[M::ha_hist_line] 71: ***** 343492
[M::ha_hist_line] 72: **** 293676
[M::ha_hist_line] 73: *** 256838
[M::ha_hist_line] 74: *** 226872
[M::ha_hist_line] 75: *** 204987
[M::ha_hist_line] 76: ** 180611
[M::ha_hist_line] 77: ** 158380
[M::ha_hist_line] 78: ** 142687
[M::ha_hist_line] 79: ** 131546
[M::ha_hist_line] 80: ** 118504
[M::ha_hist_line] 81: * 108998
[M::ha_hist_line] 82: * 102957
[M::ha_hist_line] 83: * 95738
[M::ha_hist_line] 84: * 94677
[M::ha_hist_line] 85: * 90178
[M::ha_hist_line] 86: * 84027
[M::ha_hist_line] 87: * 82578
[M::ha_hist_line] 88: * 80335
[M::ha_hist_line] 89: * 77907
[M::ha_hist_line] 90: * 77706
[M::ha_hist_line] 91: * 75025
[M::ha_hist_line] 92: * 73079
[M::ha_hist_line] 93: * 69838
[M::ha_hist_line] 94: * 69007
[M::ha_hist_line] 95: * 67334
[M::ha_hist_line] 96: * 67287
[M::ha_hist_line] 97: * 64676
[M::ha_hist_line] 98: * 62391
[M::ha_hist_line] 99: * 60120
[M::ha_hist_line] 100: * 58544
[M::ha_hist_line] 101: * 56452
[M::ha_hist_line] 102: * 53633
[M::ha_hist_line] 103: * 52964
[M::ha_hist_line] 104: * 51004
[M::ha_hist_line] 105: * 50528
[M::ha_hist_line] 106: * 47821
[M::ha_hist_line] 107: * 45735
[M::ha_hist_line] 108: * 45054
[M::ha_hist_line] 109: * 42925
[M::ha_hist_line] 110: * 39438
[M::ha_hist_line] 111: * 38983
[M::ha_hist_line] rest: ********************************* 2482275
[M::ha_analyze_count] left: none
[M::ha_analyze_count] right: none
[M::ha_ft_gen] peak_hom: 6; peak_het: -1
from hifiasm.
What are your assembly stats for the primary and alternative assemblies, @plattsad ? https://github.com/conchoecia/fasta_stats
from hifiasm.
With a vanilla configuration (only threads specified)...
---------------- Information for assembly 'fot.asm.p_ctg.gfa.fa' ----------------
Number of scaffolds 5480
Total size of scaffolds 464904639
Longest scaffold 2070765
Shortest scaffold 18620
Number of scaffolds > 1K nt 5480 100.0%
Number of scaffolds > 10K nt 5480 100.0%
Number of scaffolds > 100K nt 1025 18.7%
Number of scaffolds > 1M nt 11 0.2%
Number of scaffolds > 10M nt 0 0.0%
Mean scaffold size 84837
Median scaffold size 47112
N50 scaffold length 117304
L50 scaffold count 821
scaffold %A 30.62
scaffold %C 19.38
scaffold %G 19.38
scaffold %T 30.62
scaffold %N 0.00
scaffold %non-ACGTN 0.00
Number of scaffold non-ACGTN nt 0
Percentage of assembly in scaffolded contigs 0.0%
Percentage of assembly in unscaffolded contigs 100.0%
Average number of contigs per scaffold 1.0
Average length of break (>25 Ns) between contigs in scaffold 0
Number of contigs 5480
Number of contigs in scaffolds 0
Number of contigs not in scaffolds 5480
Total size of contigs 464904639
Longest contig 2070765
Shortest contig 18620
Number of contigs > 1K nt 5480 100.0%
Number of contigs > 10K nt 5480 100.0%
Number of contigs > 100K nt 1025 18.7%
Number of contigs > 1M nt 11 0.2%
Number of contigs > 10M nt 0 0.0%
Mean contig size 84837
Median contig size 47112
N50 contig length 117304
L50 contig count 821
contig %A 30.62
contig %C 19.38
contig %G 19.38
contig %T 30.62
contig %N 0.00
contig %non-ACGTN 0.00
Number of contig non-ACGTN nt 0
---------------- Information for assembly 'fot.asm.a_ctg.gfa.fa' ----------------
Number of scaffolds 1019
Total size of scaffolds 53586801
Longest scaffold 562572
Shortest scaffold 9051
Number of scaffolds > 1K nt 1019 100.0%
Number of scaffolds > 10K nt 1018 99.9%
Number of scaffolds > 100K nt 76 7.5%
Number of scaffolds > 1M nt 0 0.0%
Number of scaffolds > 10M nt 0 0.0%
Mean scaffold size 52588
Median scaffold size 37402
N50 scaffold length 58138
L50 scaffold count 226
scaffold %A 30.73
scaffold %C 19.29
scaffold %G 19.25
scaffold %T 30.73
scaffold %N 0.00
scaffold %non-ACGTN 0.00
Number of scaffold non-ACGTN nt 0
Percentage of assembly in scaffolded contigs 0.0%
Percentage of assembly in unscaffolded contigs 100.0%
Average number of contigs per scaffold 1.0
Average length of break (>25 Ns) between contigs in scaffold 0
Number of contigs 1019
Number of contigs in scaffolds 0
Number of contigs not in scaffolds 1019
Total size of contigs 53586801
Longest contig 562572
Shortest contig 9051
Number of contigs > 1K nt 1019 100.0%
Number of contigs > 10K nt 1018 99.9%
Number of contigs > 100K nt 76 7.5%
Number of contigs > 1M nt 0 0.0%
Number of contigs > 10M nt 0 0.0%
Mean contig size 52588
Median contig size 37402
N50 contig length 58138
L50 contig count 226
contig %A 30.73
contig %C 19.29
contig %G 19.25
contig %T 30.73
contig %N 0.00
contig %non-ACGTN 0.00
Number of contig non-ACGTN nt 0
from hifiasm.
Looks like we should be able to let users to manually set homozygous peak. Do you think it is helpful @plattsad @conchoecia? I will release v0.14 today if you think it is an acceptable solution.
from hifiasm.
I think that might be useful. As a test I manually set peak_hom and peak_het in htab.cpp to 46 and 23 respectively. The assembly with only threads set appeared considerably better:
---------------- Information for assembly 'fot.APmod.asm.p_ctg.gfa.fa' ----------------
Number of scaffolds 494
Total size of scaffolds 243557028
Longest scaffold 29379636
Shortest scaffold 21153
Number of scaffolds > 1K nt 494 100.0%
Number of scaffolds > 10K nt 494 100.0%
Number of scaffolds > 100K nt 60 12.1%
Number of scaffolds > 1M nt 19 3.8%
Number of scaffolds > 10M nt 10 2.0%
Mean scaffold size 493030
Median scaffold size 40029
N50 scaffold length 16540259
L50 scaffold count 6
scaffold %A 30.58
scaffold %C 19.42
scaffold %G 19.44
scaffold %T 30.56
scaffold %N 0.00
scaffold %non-ACGTN 0.00
Number of scaffold non-ACGTN nt 0
Percentage of assembly in scaffolded contigs 0.0%
Percentage of assembly in unscaffolded contigs 100.0%
Average number of contigs per scaffold 1.0
Average length of break (>25 Ns) between contigs in scaffold 0
Number of contigs 494
Number of contigs in scaffolds 0
Number of contigs not in scaffolds 494
Total size of contigs 243557028
Longest contig 29379636
Shortest contig 21153
Number of contigs > 1K nt 494 100.0%
Number of contigs > 10K nt 494 100.0%
Number of contigs > 100K nt 60 12.1%
Number of contigs > 1M nt 19 3.8%
Number of contigs > 10M nt 10 2.0%
Mean contig size 493030
Median contig size 40029
N50 contig length 16540259
L50 contig count 6
contig %A 30.58
contig %C 19.42
contig %G 19.44
contig %T 30.56
contig %N 0.00
contig %non-ACGTN 0.00
Number of contig non-ACGTN nt 0
from hifiasm.
@plattsad that is a great improvement after manually setting the het and hom peak. Good idea trying that as a direct proof-of-concept. If you have Hi-C data then that should nicely scaffold into chromosomes.
@chhylp123 If you think it will increase the assembly quality, then yes it would be nice to have the option (even if it only affects purgedups
). Our examples have shown that the current hifiasm
algorithm to detect the het and hom peaks doesn't work in all cases.
By the way, I used Hi-C to scaffold the output of my genome assembly above that was doubled in size, and ended up with a nice chromosome-scale diploid assembly. The heterozygosity seems like it is well over 5% in the animal that I am working with, including major indels and inversions between the haplotypes.
from hifiasm.
Hi @chhylp123 - I think that this is a good compromise to help the software determine where a good cutoff for the noise is. I take it to mean that --min-hist-cnt
is to be in the trough between the heterozygous peak and error peak in the k-mer spectrum? I'll follow up here once I try it out.
from hifiasm.
Yes, I think so. Skip the problematic peaks should work.
from hifiasm.
Related Issues (20)
- Mitigate Overlapping Sequence Assignments in Haplotypes HOT 3
- Help!!! Segmentation fault (core dumped) HOT 1
- Question about the depth of ONT ultra-long reads HOT 1
- Homotetraploid, super-large genome, with different parameters, the size of p_utg varies greatly? HOT 1
- setting K parameter in yak HOT 2
- how to make the correct genome size estimation for allotetraploid species? HOT 2
- Possible missing one haplotype in human assemblies HOT 2
- No haploid.gfa files output in trio-binning mode HOT 3
- Hifi + Hi-c + ONT assembly fails
- In Trio-binning, always more on hap1 despite (almost) same sequences for paternal and maternal
- discontinuous assembly with shorter pacbio hifi reads but high coverage HOT 2
- Is x20 of Hifi data enough to construct draft assembly of 6.5Gb genome? HOT 1
- line 8: 110334 Aborted(core dumped) HOT 1
- Ultra Long intergration failed: no output for UL kmer counting HOT 3
- missing 8Mb sequences in the assembly HOT 5
- Empty haplotype 2 gfa files by ONT integration HOT 1
- Basic Question About HiFi Input HOT 3
- Spend too long times to run hifiasm HOT 1
- Switch error on X and Y chromosome HOT 2
- *.ovlp.bin file HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hifiasm.