Comments (17)
I noticed that the coordinates in the automatically generated file "0_199.csv" are not sorted in ascending order. I'm not sure what the impact would be if I remove the file, but currently, I am able to execute successfully after removing it.
less -S work/chr1/0_199.csv
,snp_phased_block_1,snp_phased_block_2,extended_phased_block_1,extended_phased_block_2,myth_phasing_relationship,same_hap_num,diff_hap_num,olp_hp1_total_read_len,olp_hp2_total_read_len,olp_hp1_CpG_num,olp_hp2_CpG_num,olp_same_read_len,olp_not_same_read_len,olp_same_CpG_num,olp_not_same_CpG_num
0,"(248528664, 248946160)","(43042, 66277)","(248516738, 248946160)","(43042, 89283)",cannot decide,,,0,0,0,0,0,0,0,0
1,"(43042, 66277)","(89283, 160571)","(43042, 89283)","(66277, 165210)",cannot decide,,,0,0,0,0,0,0,0,0
2,"(89283, 160571)","(165210, 180261)","(66277, 165210)","(160571, 257733)",cannot decide,,,0,0,0,0,0,0,0,0
3,"(165210, 180261)","(257733, 296746)","(160571, 257733)","(180261, 368589)",cannot decide,,,0,0,0,0,0,0,0,0
4,"(257733, 296746)","(368589, 370458)","(180261, 368589)","(296746, 399996)",cannot decide,,,0,0,0,0,0,0,0,0
5,"(368589, 370458)","(399996, 402079)","(296746, 399996)","(370458, 406767)",cannot decide,,,0,0,0,0,0,0,0,0
6,"(399996, 402079)","(406767, 446106)","(370458, 406767)","(402079, 464054)",cannot decide,,,0,0,0,0,0,0,0,0
7,"(406767, 446106)","(464054, 464109)","(402079, 464054)","(446106, 470234)",cannot decide,,,0,0,0,0,0,0,0,0
8,"(464054, 464109)","(470234, 531222)","(446106, 470234)","(464109, 597817)",cannot decide,,,0,0,0,0,0,0,0,0
9,"(470234, 531222)","(597817, 746318)","(464109, 597817)","(531222, 814327)",cannot decide,,,0,0,0,0,0,0,0,0
10,"(597817, 746318)","(814327, 1324179)","(531222, 814327)","(746318, 1357359)",same,15,13,270362,186279,203,243,272188,184453,207,239
11,"(814327, 1324179)","(1357359, 2702747)","(746318, 1357359)","(1324179, 2746360)",same,36,31,460356,469320,1000,1510,550209,379467,1539,971
12,"(1357359, 2702747)","(2746360, 8835484)","(1324179, 2746360)","(2702747, 8911215)",cannot decide,,,0,0,0,0,0,0,0,0
13,"(2746360, 8835484)","(8911215, 10512026)","(2702747, 8911215)","(8835484, 10590287)",same,174,14,1157239,1237948,1228,1488,2259831,135356,2574,142
14,"(8911215, 10512026)","(10590287, 12218875)","(8835484, 10590287)","(10512026, 12269238)",same,117,48,753145,1604186,384,1555,1698795,658536,1111,828
15,"(10590287, 12218875)","(12269238, 12950011)","(10512026, 12269238)","(12218875, 13005933)",same,70,15,421830,631703,317,460,928510,125023,692,85
16,"(12269238, 12950011)","(13005933, 13884604)","(12218875, 13005933)","(12950011, 13942278)",cannot decide,,,0,0,0,0,0,0,0,0
from methphaser.
Please try only keep the primary alignments of the bam file
from methphaser.
Dear @Fu-Yilei
The first program, meth_phaser_parallel, ran successfully. However, when I executed the second program, meth_phaser_post_processing, I encountered some issues. Could you please advise on how to resolve them?
$~/methphaser/meth_phaser_parallel \
-b hg002.sup.60x.whatshap.haplotag.bam \
-r GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \
-g whatshap_hg02_60x.gtf \
-vc whatshap_hg02_60x.vcf.gz \
-o work
$ls work/
chr1 chr14_read_assignment chr19_read_assignment chr3 chr8
chr10 chr15 chr1_read_assignment chr3_read_assignment chr8_read_assignment
chr10_read_assignment chr15_read_assignment chr2 chr4 chr9
chr11 chr16 chr20 chr4_read_assignment chr9_read_assignment
chr11_read_assignment chr16_read_assignment chr20_read_assignment chr5 chrX
chr12 chr17 chr21 chr5_read_assignment chrX_read_assignment
chr12_read_assignment chr17_read_assignment chr21_read_assignment chr6 chrY
chr13 chr18 chr22 chr6_read_assignment chrY_read_assignment
chr13_read_assignment chr18_read_assignment chr22_read_assignment chr7
chr14 chr19 chr2_read_assignment chr7_read_assignment
The program meth_phaser_post_processing encountered a TypeError: "replace() argument 2 must be str, not int" issue. Could you please advise which file I need to check?
~/methphaser/meth_phaser_post_processing \
-ib hg002.sup.60x.whatshap.haplotag.primary.bam \
-if work \
-ov output.vcf \
-ob output_ob \
-vc /whatshap_hg02_60x.vcf.gz \
-t 8
0%| | 0/24 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/jyunhong104/methphaser/meth_phaser_post_processing", line 697, in <module>
main(sys.argv[1:])
File "/home/jyunhong104/methphaser/meth_phaser_post_processing", line 651, in main
final_block_dict, remaining_dict, flipping_dict = get_altered_vcf(
File "/home/jyunhong104/methphaser/meth_phaser_post_processing", line 306, in get_altered_vcf
split_rec[-1] = split_rec[-1].replace(start_loc, get_altered_block_start_loc(current_chrom_final_block, int(start_loc))) # type: ignore
TypeError: replace() argument 2 must be str, not int
Thanks
JH
from methphaser.
Which version of MethPhaser are you using?
from methphaser.
Please try the newest MethPhaser (V0.0.3, also available on conda). This should be the issue related to PS tags on vcf files, and it was fixed on this version. If you are using this version and still see this bug, could you please also share the vcf file with me?
from methphaser.
The issue "TypeError: replace() argument 2 must be str, not int" is specific to the GitLab version I am using. When I re-cloned the GitHub version, a different error occurred.
github version
git clone https://github.com/treangenlab/methphaser.git
Cloning into 'methphaser'...
remote: Enumerating objects: 70, done.
remote: Counting objects: 100% (70/70), done.
remote: Compressing objects: 100% (63/63), done.
remote: Total 70 (delta 34), reused 23 (delta 7), pack-reused 0
Unpacking objects: 100% (70/70), 48.88 KiB | 1.11 MiB/s, done.
$ time ~/github_v/methphaser/meth_phaser_post_processing -ib hg002.sup.60x.whatshap.haplotag.primary.bam -if work/ -ov output.vcf -ob output_ob -vc ~/test_r1041_dirty/whatshap/whatshap_hg02_60x.vcf.gz -t 8
33%|███████████████████████████████ | 8/24 [00:05<00:10, 1.58it/s]
Traceback (most recent call last):
File "/home/jyunhong104/github_v/methphaser/meth_phaser_post_processing", line 701, in <module>
main(sys.argv[1:])
File "/home/jyunhong104/github_v/methphaser/meth_phaser_post_processing", line 655, in main
final_block_dict, remaining_dict, flipping_dict = get_altered_vcf(
File "/home/jyunhong104/github_v/methphaser/meth_phaser_post_processing", line 287, in get_altered_vcf
called_vcf = called_vcf_file.fetch(chrom, i[0], i[1])
File "pysam/libcbcf.pyx", line 4468, in pysam.libcbcf.VariantFile.fetch
File "pysam/libchtslib.pyx", line 688, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid coordinates: start (248516738) > stop (43042)
gitlab version
git clone https://gitlab.com/treangenlab/methphaser.git
Cloning into 'methphaser'...
remote: Enumerating objects: 68, done.
remote: Counting objects: 100% (9/9), done.
remote: Compressing objects: 100% (9/9), done.
remote: Total 68 (delta 3), reused 0 (delta 0), pack-reused 59
Unpacking objects: 100% (68/68), 212.44 MiB | 11.67 MiB/s, done.
$ time ~/gitlab_v/methphaser/meth_phaser_post_processing -ib hg002.sup.60x.whatshap.haplotag.primary.bam -if work/ -ov output.vcf -ob output_ob -vc ~/test_r1041_dirty/whatshap/whatshap_hg02_60x.vcf.gz -t 8
0%| | 0/24 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/jyunhong104/gitlab_v/methphaser/meth_phaser_post_processing", line 697, in <module>
main(sys.argv[1:])
File "/home/jyunhong104/gitlab_v/methphaser/meth_phaser_post_processing", line 651, in main
final_block_dict, remaining_dict, flipping_dict = get_altered_vcf(
File "/home/jyunhong104/gitlab_v/methphaser/meth_phaser_post_processing", line 306, in get_altered_vcf
split_rec[-1] = split_rec[-1].replace(start_loc, get_altered_block_start_loc(current_chrom_final_block, int(start_loc))) # type: ignore
TypeError: replace() argument 2 must be str, not int
less ~/test_r1041_dirty/whatshap/whatshap_hg02_60x.vcf.gz
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=LowQual,Description="Low quality variant">
##FILTER=<ID=RefCall,Description="Reference call">
##INFO=<ID=P,Number=0,Type=Flag,Description="Result from pileup calling">
##INFO=<ID=F,Number=0,Type=Flag,Description="Result from full-alignment calling">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Phred-scaled genotype likelihoods rounded to the closest integer">
##FORMAT=<ID=AF,Number=1,Type=Float,Description="Estimated allele frequency in the range of [0,1]">
##contig=<ID=chr1,length=248956422>
##contig=<ID=chr2,length=242193529>
##contig=<ID=chr3,length=198295559>
##contig=<ID=chr4,length=190214555>
##contig=<ID=chr5,length=181538259>
##contig=<ID=chr6,length=170805979>
##contig=<ID=chr7,length=159345973>
##contig=<ID=chr8,length=145138636>
##contig=<ID=chr9,length=138394717>
##contig=<ID=chr10,length=133797422>
##contig=<ID=chr11,length=135086622>
##contig=<ID=chr12,length=133275309>
##contig=<ID=chr13,length=114364328>
##contig=<ID=chr14,length=107043718>
##contig=<ID=chr15,length=101991189>
##contig=<ID=chr16,length=90338345>
##contig=<ID=chr17,length=83257441>
##contig=<ID=chr18,length=80373285>
##contig=<ID=chr19,length=58617616>
##contig=<ID=chr20,length=64444167>
##contig=<ID=chr21,length=46709983>
##contig=<ID=chr22,length=50818468>
##contig=<ID=chrX,length=156040895>
##contig=<ID=chrY,length=57227415>
##contig=<ID=chrM,length=16569>
##contig=<ID=chr1_KI270706v1_random,length=175055>
##contig=<ID=chr1_KI270707v1_random,length=32032>
##contig=<ID=chr1_KI270708v1_random,length=127682>
##contig=<ID=chr1_KI270709v1_random,length=66860>
##contig=<ID=chr1_KI270710v1_random,length=40176>
##contig=<ID=chr1_KI270711v1_random,length=42210>
##contig=<ID=chr1_KI270712v1_random,length=176043>
##contig=<ID=chr1_KI270713v1_random,length=40745>
##contig=<ID=chr1_KI270714v1_random,length=41717>
##contig=<ID=chr2_KI270715v1_random,length=161471>
##contig=<ID=chr2_KI270716v1_random,length=153799>
##contig=<ID=chr3_GL000221v1_random,length=155397>
##contig=<ID=chr4_GL000008v2_random,length=209709>
##contig=<ID=chr5_GL000208v1_random,length=92689>
##contig=<ID=chr9_KI270717v1_random,length=40062>
##contig=<ID=chr9_KI270718v1_random,length=38054>
##contig=<ID=chr9_KI270719v1_random,length=176845>
##contig=<ID=chr9_KI270720v1_random,length=39050>
##contig=<ID=chr11_KI270721v1_random,length=100316>
##contig=<ID=chr14_GL000009v2_random,length=201709>
##contig=<ID=chr14_GL000225v1_random,length=211173>
##contig=<ID=chr14_KI270722v1_random,length=194050>
##contig=<ID=chr14_GL000194v1_random,length=191469>
##contig=<ID=chr14_KI270723v1_random,length=38115>
##contig=<ID=chr14_KI270724v1_random,length=39555>
##contig=<ID=chr14_KI270725v1_random,length=172810>
##contig=<ID=chr14_KI270726v1_random,length=43739>
##contig=<ID=chr15_KI270727v1_random,length=448248>
##contig=<ID=chr16_KI270728v1_random,length=1872759>
##contig=<ID=chr17_GL000205v2_random,length=185591>
##contig=<ID=chr17_KI270729v1_random,length=280839>
##contig=<ID=chr17_KI270730v1_random,length=112551>
##contig=<ID=chr22_KI270731v1_random,length=150754>
##contig=<ID=chr22_KI270732v1_random,length=41543>
##contig=<ID=chr22_KI270733v1_random,length=179772>
##contig=<ID=chr22_KI270734v1_random,length=165050>
##contig=<ID=chr22_KI270735v1_random,length=42811>
##contig=<ID=chr22_KI270736v1_random,length=181920>
##contig=<ID=chr22_KI270737v1_random,length=103838>
##contig=<ID=chr22_KI270738v1_random,length=99375>
##contig=<ID=chr22_KI270739v1_random,length=73985>
##contig=<ID=chrY_KI270740v1_random,length=37240>
##contig=<ID=chrUn_KI270302v1,length=2274>
##contig=<ID=chrUn_KI270304v1,length=2165>
##contig=<ID=chrUn_KI270303v1,length=1942>
##contig=<ID=chrUn_KI270305v1,length=1472>
##contig=<ID=chrUn_KI270322v1,length=21476>
##contig=<ID=chrUn_KI270320v1,length=4416>
##contig=<ID=chrUn_KI270310v1,length=1201>
##contig=<ID=chrUn_KI270316v1,length=1444>
##contig=<ID=chrUn_KI270315v1,length=2276>
##contig=<ID=chrUn_KI270312v1,length=998>
##contig=<ID=chrUn_KI270311v1,length=12399>
##contig=<ID=chrUn_KI270317v1,length=37690>
##contig=<ID=chrUn_KI270412v1,length=1179>
##contig=<ID=chrUn_KI270411v1,length=2646>
##contig=<ID=chrUn_KI270414v1,length=2489>
##contig=<ID=chrUn_KI270419v1,length=1029>
##contig=<ID=chrUn_KI270418v1,length=2145>
##contig=<ID=chrUn_KI270420v1,length=2321>
##contig=<ID=chrUn_KI270424v1,length=2140>
##contig=<ID=chrUn_KI270417v1,length=2043>
##contig=<ID=chrUn_KI270422v1,length=1445>
##contig=<ID=chrUn_KI270423v1,length=981>
##contig=<ID=chrUn_KI270425v1,length=1884>
##contig=<ID=chrUn_KI270429v1,length=1361>
##contig=<ID=chrUn_KI270442v1,length=392061>
##contig=<ID=chrUn_KI270466v1,length=1233>
##contig=<ID=chrUn_KI270465v1,length=1774>
##contig=<ID=chrUn_KI270467v1,length=3920>
##contig=<ID=chrUn_KI270435v1,length=92983>
##contig=<ID=chrUn_KI270438v1,length=112505>
##contig=<ID=chrUn_KI270468v1,length=4055>
##contig=<ID=chrUn_KI270510v1,length=2415>
##contig=<ID=chrUn_KI270509v1,length=2318>
##contig=<ID=chrUn_KI270518v1,length=2186>
##contig=<ID=chrUn_KI270508v1,length=1951>
##contig=<ID=chrUn_KI270516v1,length=1300>
##contig=<ID=chrUn_KI270512v1,length=22689>
##contig=<ID=chrUn_KI270519v1,length=138126>
##contig=<ID=chrUn_KI270522v1,length=5674>
##contig=<ID=chrUn_KI270511v1,length=8127>
##contig=<ID=chrUn_KI270515v1,length=6361>
##contig=<ID=chrUn_KI270507v1,length=5353>
##contig=<ID=chrUn_KI270517v1,length=3253>
##contig=<ID=chrUn_KI270529v1,length=1899>
##contig=<ID=chrUn_KI270528v1,length=2983>
##contig=<ID=chrUn_KI270530v1,length=2168>
##contig=<ID=chrUn_KI270539v1,length=993>
##contig=<ID=chrUn_KI270538v1,length=91309>
##contig=<ID=chrUn_KI270544v1,length=1202>
##contig=<ID=chrUn_KI270548v1,length=1599>
##contig=<ID=chrUn_KI270583v1,length=1400>
##contig=<ID=chrUn_KI270587v1,length=2969>
##contig=<ID=chrUn_KI270580v1,length=1553>
##contig=<ID=chrUn_KI270581v1,length=7046>
##contig=<ID=chrUn_KI270579v1,length=31033>
##contig=<ID=chrUn_KI270589v1,length=44474>
##contig=<ID=chrUn_KI270590v1,length=4685>
##contig=<ID=chrUn_KI270584v1,length=4513>
##contig=<ID=chrUn_KI270582v1,length=6504>
##contig=<ID=chrUn_KI270588v1,length=6158>
##contig=<ID=chrUn_KI270593v1,length=3041>
##contig=<ID=chrUn_KI270591v1,length=5796>
##contig=<ID=chrUn_KI270330v1,length=1652>
##contig=<ID=chrUn_KI270329v1,length=1040>
##contig=<ID=chrUn_KI270334v1,length=1368>
##contig=<ID=chrUn_KI270333v1,length=2699>
##contig=<ID=chrUn_KI270335v1,length=1048>
##contig=<ID=chrUn_KI270338v1,length=1428>
##contig=<ID=chrUn_KI270340v1,length=1428>
##contig=<ID=chrUn_KI270336v1,length=1026>
##contig=<ID=chrUn_KI270337v1,length=1121>
##contig=<ID=chrUn_KI270363v1,length=1803>
##contig=<ID=chrUn_KI270364v1,length=2855>
##contig=<ID=chrUn_KI270362v1,length=3530>
##contig=<ID=chrUn_KI270366v1,length=8320>
##contig=<ID=chrUn_KI270378v1,length=1048>
##contig=<ID=chrUn_KI270379v1,length=1045>
##contig=<ID=chrUn_KI270389v1,length=1298>
##contig=<ID=chrUn_KI270390v1,length=2387>
##contig=<ID=chrUn_KI270387v1,length=1537>
##contig=<ID=chrUn_KI270395v1,length=1143>
##contig=<ID=chrUn_KI270396v1,length=1880>
##contig=<ID=chrUn_KI270388v1,length=1216>
##contig=<ID=chrUn_KI270394v1,length=970>
##contig=<ID=chrUn_KI270386v1,length=1788>
##contig=<ID=chrUn_KI270391v1,length=1484>
##contig=<ID=chrUn_KI270383v1,length=1750>
##contig=<ID=chrUn_KI270393v1,length=1308>
##contig=<ID=chrUn_KI270384v1,length=1658>
##contig=<ID=chrUn_KI270392v1,length=971>
##contig=<ID=chrUn_KI270381v1,length=1930>
##contig=<ID=chrUn_KI270385v1,length=990>
##contig=<ID=chrUn_KI270382v1,length=4215>
##contig=<ID=chrUn_KI270376v1,length=1136>
##contig=<ID=chrUn_KI270374v1,length=2656>
##contig=<ID=chrUn_KI270372v1,length=1650>
##contig=<ID=chrUn_KI270373v1,length=1451>
##contig=<ID=chrUn_KI270375v1,length=2378>
##contig=<ID=chrUn_KI270371v1,length=2805>
##contig=<ID=chrUn_KI270448v1,length=7992>
##contig=<ID=chrUn_KI270521v1,length=7642>
##contig=<ID=chrUn_GL000195v1,length=182896>
##contig=<ID=chrUn_GL000219v1,length=179198>
##contig=<ID=chrUn_GL000220v1,length=161802>
##contig=<ID=chrUn_GL000224v1,length=179693>
##contig=<ID=chrUn_KI270741v1,length=157432>
##contig=<ID=chrUn_GL000226v1,length=15008>
##contig=<ID=chrUn_GL000213v1,length=164239>
##contig=<ID=chrUn_KI270743v1,length=210658>
##contig=<ID=chrUn_KI270744v1,length=168472>
##contig=<ID=chrUn_KI270745v1,length=41891>
##contig=<ID=chrUn_KI270746v1,length=66486>
##contig=<ID=chrUn_KI270747v1,length=198735>
##contig=<ID=chrUn_KI270748v1,length=93321>
##contig=<ID=chrUn_KI270749v1,length=158759>
##contig=<ID=chrUn_KI270750v1,length=148850>
##contig=<ID=chrUn_KI270751v1,length=150742>
##contig=<ID=chrUn_KI270752v1,length=27745>
##contig=<ID=chrUn_KI270753v1,length=62944>
##contig=<ID=chrUn_KI270754v1,length=40191>
##contig=<ID=chrUn_KI270755v1,length=36723>
##contig=<ID=chrUn_KI270756v1,length=79590>
##contig=<ID=chrUn_KI270757v1,length=71251>
##contig=<ID=chrUn_GL000214v1,length=137718>
##contig=<ID=chrUn_KI270742v1,length=186739>
##contig=<ID=chrUn_GL000216v2,length=176608>
##contig=<ID=chrUn_GL000218v1,length=161147>
##contig=<ID=chrEBV,length=171823>
##FORMAT=<ID=PS,Number=1,Type=Integer,Description="Phase set identifier">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Observed allele depths">
##commandline="(whatshap 1.7) phase --ignore-read-groups -r GRCh38_no_alt.fa Sample.wf_snp.only.vcf ../Sample.sup.60x.bam -o whatshap_hg02_60x.vcf"
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample
chr1 43042 . G A 6.98 PASS F GT:GQ:DP:AF:PS 0|1:6:78:0.7692:43042
chr1 43586 . C A 8.71 PASS F GT:GQ:DP:AF:PS 0|1:8:76:0.7763:43042
chr1 43666 . C A 8.69 PASS F GT:GQ:DP:AF:PS 0|1:8:76:0.7368:43042
chr1 44690 . C T 3.88 PASS F GT:GQ:DP:AF:PS 0|1:3:69:0.7246:43042
chr1 45627 . G A 8.34 PASS F GT:GQ:DP:AF:PS 0|1:8:63:0.746:43042
chr1 45891 . T A 6.94 PASS F GT:GQ:DP:AF:PS 0|1:6:62:0.7581:43042
chr1 46095 . C G 4.19 PASS F GT:GQ:DP:AF:PS 0/1:4:61:0.7213:43042
chr1 47661 . T C 13.31 PASS F GT:GQ:DP:AF:PS 0|1:13:51:0.6863:43042
chr1 47696 . T C 11.35 PASS F GT:GQ:DP:AF:PS 0|1:11:51:0.6863:43042
chr1 47699 . A G 10.23 PASS F GT:GQ:DP:AF:PS 0|1:10:51:0.6863:43042
chr1 47706 . A T 9.42 PASS F GT:GQ:DP:AF:PS 0|1:9:51:0.6863:43042
chr1 47761 . T C 10.27 PASS F GT:GQ:DP:AF:PS 0|1:10:49:0.6735:43042
chr1 47930 . T C 13.94 PASS F GT:GQ:DP:AF:PS 0|1:13:49:0.6735:43042
chr1 48171 . A G 11.7 PASS F GT:GQ:DP:AF:PS 0|1:11:49:0.6531:43042
chr1 48183 . C A 10.83 PASS F GT:GQ:DP:AF:PS 1|0:10:49:0.2653:43042
chr1 48937 . T C 15.84 PASS F GT:GQ:DP:AF:PS 1|0:15:44:0.2955:43042
chr1 48976 . A G 11.93 PASS F GT:GQ:DP:AF:PS 1|0:11:44:0.2955:43042
chr1 49148 . A T 13.15 PASS F GT:GQ:DP:AF:PS 1|0:13:43:0.3023:43042
chr1 49243 . G A 24.57 PASS P GT:GQ:DP:AF:PS 0|1:24:43:0.6512:43042
chr1 49291 . C T 12.57 PASS F GT:GQ:DP:AF:PS 0|1:12:43:0.6744:43042
chr1 49314 . G A 12.23 PASS F GT:GQ:DP:AF:PS 0|1:12:43:0.6512:43042
chr1 49315 . T C 26.9 PASS P GT:GQ:DP:AF:PS 0|1:26:43:0.6512:43042
chr1 49342 . G T 15.13 PASS F GT:GQ:DP:AF:PS 0|1:15:43:0.6279:43042
chr1 49363 . C T 24.43 PASS P GT:GQ:DP:AF:PS 0|1:24:43:0.6512:43042
chr1 49404 . C T 26.11 PASS P GT:GQ:DP:AF:PS 0|1:26:43:0.6279:43042
chr1 49427 . C T 24.18 PASS P GT:GQ:DP:AF:PS 0|1:24:43:0.6512:43042
chr1 49482 . G A 5.66 PASS F GT:GQ:DP:AF:PS 0|1:5:43:0.4884:43042
chr1 49515 . G A 5.31 PASS F GT:GQ:DP:AF:PS 0|1:5:42:0.7381:43042
chr1 51403 . A T 14.27 PASS F GT:GQ:DP:AF:PS 1|0:14:33:0.303:43042
chr1 51459 . G A 24.24 PASS P GT:GQ:DP:AF:PS 0|1:24:33:0.6667:43042
chr1 51499 . G A 10.56 PASS F GT:GQ:DP:AF:PS 1|0:10:33:0.303:43042
chr1 51620 . A G 5.21 PASS F GT:GQ:DP:AF:PS 1|0:5:33:0.2424:43042
chr1 51802 . C T 8.03 PASS F GT:GQ:DP:AF:PS 0|1:8:33:0.6364:43042
chr1 51806 . A G 9.17 PASS F GT:GQ:DP:AF:PS 0|1:9:33:0.6364:43042
chr1 51902 . C T 7.17 PASS F GT:GQ:DP:AF:PS 1|0:7:32:0.2812:43042
chr1 51936 . G T 9.73 PASS F GT:GQ:DP:AF:PS 1|0:9:32:0.2812:43042
chr1 51941 . A G 10.77 PASS F GT:GQ:DP:AF:PS 1|0:10:32:0.2812:43042
chr1 51951 . T C 9.02 PASS F GT:GQ:DP:AF:PS 0|1:9:32:0.6562:43042
chr1 52105 . A T 11.99 PASS F GT:GQ:DP:AF:PS 0|1:11:31:0.6452:43042
chr1 52144 . T A 13.86 PASS F GT:GQ:DP:AF:PS 1|0:13:31:0.2903:43042
chr1 52525 . T C 13.11 PASS F GT:GQ:DP:AF:PS 1|0:13:31:0.2903:43042
chr1 52863 . C A 11.99 PASS F GT:GQ:DP:AF:PS 1|0:11:29:0.2759:43042
chr1 53009 . C T 11.05 PASS F GT:GQ:DP:AF:PS 0|1:11:29:0.6552:43042
chr1 53428 . T C 8.17 PASS F GT:GQ:DP:AF:PS 1|0:8:27:0.1852:43042
chr1 53789 . A C 27.99 PASS P GT:GQ:DP:AF:PS 0|1:27:23:0.6087:43042
chr1 53817 . A G 12.7 PASS F GT:GQ:DP:AF:PS 0|1:12:23:0.6522:43042
chr1 53958 . A G 13.11 PASS F GT:GQ:DP:AF:PS 0|1:13:22:0.6364:43042
chr1 54178 . T C 24.35 PASS P GT:GQ:DP:AF:PS 0|1:24:22:0.6364:43042
chr1 54222 . A C 14.25 PASS F GT:GQ:DP:AF:PS 1|0:14:22:0.2727:43042
chr1 54290 . C G 7.36 PASS F GT:GQ:DP:AF:PS 1|0:7:22:0.1818:43042
chr1 54395 . C T 14.67 PASS F GT:GQ:DP:AF:PS 0|1:14:22:0.6364:43042
chr1 55038 . C G 7.15 PASS F GT:GQ:DP:AF:PS 1|0:7:18:0.1667:43042
chr1 55388 . C A 10.79 PASS F GT:GQ:DP:AF:PS 0|1:10:18:0.7222:43042
from methphaser.
I might need more information to solve this bug.. Could you please also share the /work
folder, and your subsampled BAM file (or the command that generates the file if the file is too large) with me to my Email [email protected]? From what I am seeing in this VCF file, it seems like there is only 1 phaseblock (43042) exists. This might be the reason that causes this bug. I am also unsure how MethPhaser will react when only one phaseblock exists, so I might dig into that a little further.
from methphaser.
ls work/
chr1 chr13 chr16_read_assignment chr1_read_assignment chr22_read_assignment chr5_read_assignment chr9
chr10 chr13_read_assignment chr17 chr2 chr2_read_assignment chr6 chr9_read_assignment
chr10_read_assignment chr14 chr17_read_assignment chr20 chr3 chr6_read_assignment chrX
chr11 chr14_read_assignment chr18 chr20_read_assignment chr3_read_assignment chr7 chrX_read_assignment
chr11_read_assignment chr15 chr18_read_assignment chr21 chr4 chr7_read_assignment chrY
chr12 chr15_read_assignment chr19 chr21_read_assignment chr4_read_assignment chr8 chrY_read_assignment
chr12_read_assignment chr16 chr19_read_assignment chr22 chr5 chr8_read_assignment
$ ls work/chr1
0_199.csv 0_26.csv 100_126.csv 125_151.csv 150_176.csv 175_199.csv 25_51.csv 50_76.csv 75_101.csv
$ ls work/chr1_read_assignment/
0_43042_89283.csv 130_124757385_124786140.csv 161_188238880_188542279.csv 191_233068669_243367177.csv 4_296746_399996.csv 74_104255071_104557469.csv
100_123193985_123299246.csv 131_124782456_125132469.csv 16_12950011_13942278.csv 19_18039368_20279975.csv 43_51480627_53566222.csv 7_446106_470234.csv
101_123278074_123394796.csv 132_125130164_125184526.csv 162_188503811_190260557.csv 192_243301759_243470371.csv 44_53514369_53643837.csv 75_104511025_105545352.csv
102_123339421_123404555.csv 133_143185165_143384675.csv 163_190198144_190482285.csv 193_243395483_243621683.csv 45_53572984_60000160.csv 76_105504443_107169460.csv
103_123395199_123498012.csv 134_143342014_144335200.csv 164_190416420_192156066.csv 194_243571178_243702213.csv 46_59925583_65189420.csv 77_107085075_109669912.csv
104_123476366_123548544.csv 135_144183106_144772613.csv 165_192112629_193865185.csv 195_243644148_246823403.csv 47_65120314_67057639.csv 78_109624961_110003350.csv
105_123531735_123601209.csv 136_144651265_146316156.csv 166_193809530_195029697.csv 196_246817660_248253504.csv 48_66987352_67183813.csv 79_109952758_111644441.csv
106_123583280_123792587.csv 137_146234833_147228133.csv 1_66277_165210.csv 197_248179535_248528664.csv 49_67131127_68514106.csv 80_111587266_111901264.csv
107_123787930_123820206.csv 138_147161089_149495322.csv 167_194945545_195119852.csv 198_248516738_248946160.csv 50_68443430_68703258.csv 81_111720041_113761186.csv
10_746318_1357359.csv 13_8835484_10590287.csv 168_195051805_196833767.csv 20_20181920_24762365.csv 51_68555167_72346221.csv 82_113698624_116880820.csv
108_123807462_123834808.csv 139_149354637_149897254.csv 169_196761270_198499033.csv 21_24673789_27877387.csv 52_72300402_73832178.csv 83_116797059_119977355.csv
109_123820702_123842020.csv 140_149824518_151244853.csv 170_198430131_202989384.csv 2_160571_257733.csv 5_370458_406767.csv 84_119905042_121779871.csv
110_123839080_123886578.csv 14_10512026_12269238.csv 171_202875529_208636682.csv 22_27743697_29555950.csv 53_73769029_74392171.csv 8_464109_597817.csv
111_123876136_123914672.csv 141_151140967_151333747.csv 17_13884604_16849384.csv 23_29549390_31889339.csv 54_74336707_75824232.csv 85_121773858_122028748.csv
11_1324179_2746360.csv 142_151260631_151556923.csv 172_208530725_208919111.csv 24_31817122_32450544.csv 55_75764840_76011522.csv 86_121910542_122042222.csv
112_123886950_123926748.csv 143_151501721_153025493.csv 173_208858643_210083595.csv 25_32409679_33978177.csv 56_75874529_77082146.csv 87_122039145_122092834.csv
113_123914910_123930157.csv 144_152968833_156494753.csv 174_210029508_210319398.csv 26_33928305_34072879.csv 57_77032002_78409421.csv 88_122044927_122240382.csv
114_123927995_124086759.csv 145_156424208_156838145.csv 175_210244406_212588636.csv 27_33987928_35170626.csv 58_78355654_79218443.csv 89_122096238_122404351.csv
115_124086721_124107676.csv 146_156789853_159985145.csv 176_211630419_212806752.csv 28_35120226_35810601.csv 59_79116922_79507349.csv 90_122389435_122413193.csv
116_124103903_124145909.csv 147_159912287_160608434.csv 177_212588647_213740703.csv 29_35719173_35941550.csv 60_79319643_79578361.csv 91_122404837_122437836.csv
117_124139448_124204759.csv 148_160546472_161425952.csv 178_212806783_214238055.csv 30_35910532_41249565.csv 61_79514507_79636120.csv 92_122413354_122509171.csv
118_124171436_124371073.csv 149_161304521_161579393.csv 179_213740985_214490934.csv 31_41182743_44865590.csv 62_79585873_80220929.csv 93_122439936_122666360.csv
119_124344922_124434468.csv 150_161523828_162844759.csv 180_214240846_214679740.csv 3_180261_368589.csv 63_79892346_83485778.csv 94_122657964_122671949.csv
120_124432735_124450328.csv 151_162651606_164276523.csv 181_214490947_214927380.csv 32_44816619_47386206.csv 6_402079_464054.csv 95_122668848_122798097.csv
121_124435600_124457911.csv 15_12218875_13005933.csv 18_16798943_18096829.csv 33_47326102_48055478.csv 64_83373627_86930791.csv 9_531222_814327.csv
122_124451497_124670206.csv 152_164226936_164477975.csv 182_214711597_216590930.csv 34_47958326_48750699.csv 65_86870226_88885395.csv 96_122797822_122818198.csv
12_2702747_8911215.csv 153_164360718_165571654.csv 183_216525502_218356111.csv 35_48697255_48964070.csv 66_88838102_89621621.csv 97_122799373_122844384.csv
123_124643525_124685650.csv 154_165504948_169736216.csv 184_218301288_222689485.csv 36_48893675_49063809.csv 67_89561597_93135530.csv 98_122818575_123022962.csv
124_124672342_124690411.csv 155_169682691_170163441.csv 185_222595897_223333419.csv 37_48977976_49359434.csv 68_93039833_93266848.csv 99_123004093_123198659.csv
125_124686763_124714343.csv 156_170084946_177943283.csv 186_223291874_223609492.csv 38_49302608_49528668.csv 69_93203819_93344870.csv
126_124690983_124716775.csv 157_177898675_181691253.csv 187_223557837_226866167.csv 39_49425939_49696255.csv 70_93298414_93647846.csv
127_124715380_124722020.csv 158_181617029_185379171.csv 188_226795561_228608871.csv 40_49623147_50126929.csv 71_93434356_97946313.csv
128_124717199_124727438.csv 159_185320789_185661979.csv 189_228558244_231412905.csv 41_50061567_51329032.csv 72_97764038_102772534.csv
129_124723292_124761588.csv 160_185598521_188281070.csv 190_231327455_233144051.csv 42_51219386_51531037.csv 73_102713485_104324697.csv
The original file can be downloaded from the amazon web services S3 bucket.
s3://ont-open-data/giab_lsk114_2022.12/analysis/wf-human-var-output/hg002_sup_v4
Here is a small portion of the file I am using.
subsample.zip
whatshap_hg02_60x.chr1.vcf.gz
from methphaser.
Hi,
I am using ONT R10.4.1 provided by EPI2ME ( https://labs.epi2me.io/askenazi-kit14-2022-12/ ) and testing with whatshap. However, I encountered an error during the process. How can I resolve it? Here are the commands I used.
whatshap --version 1.7
whatshap phase --ignore-read-groups --indels \ -r GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \ hg002.wf_snp.vcf.gz \ hg002.sup.60x.bam \ -o whatshap_hg02_60x.vcf
bgzip -c whatshap_hg02_60x.vcf > whatshap_hg02_60x.vcf.gz tabix -p vcf whatshap_hg02_60x.vcf.gz
whatshap haplotag --ignore-read-groups \ whatshap_hg02_60x.vcf.gz hg002.sup.60x.bam \ -r GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \ -o hg002.sup.60x.whatshap.haplotag.bam samtools index -@ 24 hg002.sup.60x.whatshap.haplotag.bam
whatshap stats --gtf whatshap_hg02_60x.gtf whatshap_hg02_60x.vcf.gz
~/methphaser/meth_phaser_parallel \ -b hg002.sup.60x.whatshap.haplotag.bam \ -r GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \ -g whatshap_hg02_60x.gtf \ -vc whatshap_hg02_60x.vcf.gz \ -o work [E::bam_parse_basemod] MM tag refers to bases beyond sequence length Traceback (most recent call last): File "/home/jyunhong104/methphaser/methphasing", line 1471, in <module> main(sys.argv[1:]) File "/home/jyunhong104/methphaser/methphasing", line 1437, in main ) = get_assignment_max( File "/home/jyunhong104/methphaser/methphasing", line 898, in get_assignment_max base_modification_list = get_base_modification_dictionary( # build the dictionary with snp phased reads File "/home/jyunhong104/methphaser/methphasing", line 239, in get_base_modification_dictionary if methylation_identifier_0 in list(mm.keys()): AttributeError: 'NoneType' object has no attribute 'keys' [W::bam_next_basemod] MM tag refers to bases beyond sequence length Traceback (most recent call last): File "/home/jyunhong104/methphaser/methphasing", line 1471, in <module> main(sys.argv[1:]) File "/home/jyunhong104/methphaser/methphasing", line 1437, in main ) = get_assignment_max( File "/home/jyunhong104/methphaser/methphasing", line 898, in get_assignment_max base_modification_list = get_base_modification_dictionary( # build the dictionary with snp phased reads File "/home/jyunhong104/methphaser/methphasing", line 239, in get_base_modification_dictionary if methylation_identifier_0 in list(mm.keys()): AttributeError: 'NoneType' object has no attribute 'keys' [W::bam_next_basemod] MM tag refers to bases beyond sequence length Traceback (most recent call last): File "/home/jyunhong104/methphaser/methphasing", line 1471, in <module> main(sys.argv[1:]) File "/home/jyunhong104/methphaser/methphasing", line 1437, in main ) = get_assignment_max( File "/home/jyunhong104/methphaser/methphasing", line 898, in get_assignment_max base_modification_list = get_base_modification_dictionary( # build the dictionary with snp phased reads File "/home/jyunhong104/methphaser/methphasing", line 239, in get_base_modification_dictionary if methylation_identifier_0 in list(mm.keys()): AttributeError: 'NoneType' object has no attribute 'keys' ...
Thanks
Hello, where did you obtain the hg002.sup.60x.bam you used in the first step?
I couldn't find it in EPI2ME( https://labs.epi2me.io/askenazi-kit14-2022-12/).
Thanks.
from methphaser.
Hi @QianZixi,
EPI2ME only provides "hg002.pass.cram" file, while "hg002.sup.60x.bam" is a test file generated by myself after downsampling.
The original file can be downloaded from the amazon web services S3 bucket.
s3://ont-open-data/giab_lsk114_2022.12/analysis/wf-human-var-output/hg002_sup_v4/hg002.pass.cram
from methphaser.
could you please also try just remove the first line?
from methphaser.
Hi,
I am using ONT R10.4.1 provided by EPI2ME ( https://labs.epi2me.io/askenazi-kit14-2022-12/ ) and testing with whatshap. However, I encountered an error during the process. How can I resolve it? Here are the commands I used.
whatshap --version 1.7
whatshap phase --ignore-read-groups --indels \ -r GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \ hg002.wf_snp.vcf.gz \ hg002.sup.60x.bam \ -o whatshap_hg02_60x.vcf
bgzip -c whatshap_hg02_60x.vcf > whatshap_hg02_60x.vcf.gz tabix -p vcf whatshap_hg02_60x.vcf.gz
whatshap haplotag --ignore-read-groups \ whatshap_hg02_60x.vcf.gz hg002.sup.60x.bam \ -r GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \ -o hg002.sup.60x.whatshap.haplotag.bam samtools index -@ 24 hg002.sup.60x.whatshap.haplotag.bam
whatshap stats --gtf whatshap_hg02_60x.gtf whatshap_hg02_60x.vcf.gz
~/methphaser/meth_phaser_parallel \ -b hg002.sup.60x.whatshap.haplotag.bam \ -r GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \ -g whatshap_hg02_60x.gtf \ -vc whatshap_hg02_60x.vcf.gz \ -o work [E::bam_parse_basemod] MM tag refers to bases beyond sequence length Traceback (most recent call last): File "/home/jyunhong104/methphaser/methphasing", line 1471, in <module> main(sys.argv[1:]) File "/home/jyunhong104/methphaser/methphasing", line 1437, in main ) = get_assignment_max( File "/home/jyunhong104/methphaser/methphasing", line 898, in get_assignment_max base_modification_list = get_base_modification_dictionary( # build the dictionary with snp phased reads File "/home/jyunhong104/methphaser/methphasing", line 239, in get_base_modification_dictionary if methylation_identifier_0 in list(mm.keys()): AttributeError: 'NoneType' object has no attribute 'keys' [W::bam_next_basemod] MM tag refers to bases beyond sequence length Traceback (most recent call last): File "/home/jyunhong104/methphaser/methphasing", line 1471, in <module> main(sys.argv[1:]) File "/home/jyunhong104/methphaser/methphasing", line 1437, in main ) = get_assignment_max( File "/home/jyunhong104/methphaser/methphasing", line 898, in get_assignment_max base_modification_list = get_base_modification_dictionary( # build the dictionary with snp phased reads File "/home/jyunhong104/methphaser/methphasing", line 239, in get_base_modification_dictionary if methylation_identifier_0 in list(mm.keys()): AttributeError: 'NoneType' object has no attribute 'keys' [W::bam_next_basemod] MM tag refers to bases beyond sequence length Traceback (most recent call last): File "/home/jyunhong104/methphaser/methphasing", line 1471, in <module> main(sys.argv[1:]) File "/home/jyunhong104/methphaser/methphasing", line 1437, in main ) = get_assignment_max( File "/home/jyunhong104/methphaser/methphasing", line 898, in get_assignment_max base_modification_list = get_base_modification_dictionary( # build the dictionary with snp phased reads File "/home/jyunhong104/methphaser/methphasing", line 239, in get_base_modification_dictionary if methylation_identifier_0 in list(mm.keys()): AttributeError: 'NoneType' object has no attribute 'keys' ...
Thanks
HI @twolinin
Thanks for your help.
Now I have processed the data in the same way as you and encountered the same error.
May I ask how you resolved this error?
AttributeError: 'NoneType' object has no attribute 'keys'
Thanks!
from methphaser.
please only use primary alignment on bam, I just updated the readme there is a recommended command for that.
from methphaser.
I could not reproduce your issue on chromosome 1's data, what's your pysam version?
from methphaser.
Hi@Fu-Yilei,I also encountered the same problem while running the second step.
~/methphaser/meth_phaser_post_processing -ib hg002.sup.10.whatshap.haplotag.bam -if work/ -ov output.vcf -ob output_ob -vc whatshap_hg02_10.vcf.gz -t 8
Traceback (most recent call last):
File "/home/lixin/methphaser/meth_phaser_post_processing", line 697, in
main(sys.argv[1:])
File "/home/lixin/methphaser/meth_phaser_post_processing", line 651, in main
final_block_dict, remaining_dict, flipping_dict = get_altered_vcf(
File "/home/lixin/methphaser/meth_phaser_post_processing", line 314, in get_altered_vcf
split_rec[-1] = split_rec[-1].replace(start_loc, get_altered_block_start_loc(current_chrom_final_block, int(start_loc))) # type: ignore
TypeError: replace() argument 2 must be str, not int
I ran the first four chromosomes separately, and this error only occurred when dealing with chr2.
I have checked the vdf of four chromosomes and it is true that the first line of the vcf for chr2 is different from the others.
The compressed package is a work folder containing the first four chromosomes.
The version of methphaser I used is the newest version in github.
work.zip
Please help solve this problem. Thanks!
from methphaser.
Could you please also attach whatshap_hg02_10.vcf.gz
?
from methphaser.
No response, closing this issue
from methphaser.
Related Issues (11)
- IndexError: list index out of range HOT 5
- Duplicated and missing entries in .methphased.vcf HOT 6
- some block have only one SNP HOT 2
- Possible no caught error for single blocks
- Empty output vcf and bam HOT 4
- IndexError: list index out of range
- Question on secondary and supplementary reads HOT 4
- Two warning messages appear HOT 2
- The total size of the output bam file has decreased
- Error when running meth_phaser_parallel HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from methphaser.