richarddurbin / pbwt Goto Github PK
View Code? Open in Web Editor NEWImplementation of Positional Burrows-Wheeler Transform for genetic data
Implementation of Positional Burrows-Wheeler Transform for genetic data
Hi,
The code from 110 to 118 of pbwtHtslib.c in function pbwtReadVcfGT may bring a bug.
if (gt_arr[i] == bcf_int32_vector_end)
x[i] = bcf_gt_allele(gt_arr[i-1]); // treat haploid genotypes as diploid homozygous A/A
if (gt_arr[i] == bcf_gt_missing)
{
x[i] = 0 ; /* use ref for now */
xMissing[i] = 1 ;
++nMissing ;
} else
x[i] = bcf_gt_allele(gt_arr[i]) ;
if gt_arr[i] equals bcf_int32_vector_end, and I guess it won't equal bcf_gt_missing, finally the statement:
x[i] = bcf_gt_allele(gt_arr[i])
will be executed. that equals x[i] = bcf_gt_allele(bcf_int32_vector_end). I guess there should be an else before if (gt_arr[i] == bcf_gt_missing).
It would be nice if a LICENSE file could be added. This would also make packaging it downstream easier.
I'm seeing output that looks like this:
3 112841 rs78923776 C G . PASS RefPanelAF=0.000153988;AN=2;AC=0;INFO=1 GT:ADS:DS:GP 0|0:0,0:0:1,0,0
3 112841 rs78923776 C T . PASS RefPanelAF=0.078534;AN=2;AC=1;INFO=1 GT:ADS:DS:GP 0|1:0,1:1:0,1,0
The first line says that my genome is CC at this position, but the second line (for the second alt allele) says that my genome is CT at this position. OK, the first line couldn't say this, so it calls me as REF/REF, but this call has to be interpreted in the context of the second call, and can't be taken at face value. That's why I wonder if this is valid VCF?
I think the line should be 'squashed' down to the following (I think):
3 112841 rs78923776 C G,T . PASS RefPanelAF=0.000153988,0.078534;AN=3;AC=0,1;INFO=1 GT:ADS:DS:GP 0|2:0,0:0:1,0,0
From a clone of the repository on github, running make gives:
$ make
cc -g -O3 -I../htslib -c -o pbwtMain.o pbwtMain.c
pbwtMain.c: In function ‘pbwtCommitHash’:
pbwtMain.c:176:12: error: ‘PBWT_COMMIT_HASH’ undeclared (first use in this function)
return PBWT_COMMIT_HASH ;
I needed to explicitly generate version.h
before the build would proceed:
$ make version.h
# ...
$ make
# builds ok
I am trying to follow the example usage that is based upon the use of macs output as input to pbwt:
macs 11000 1e6 -t 0.001 -r 0.001 > 11k.macs
pbwt -checkpoint 10000 -macs 11k.macs -write macs11k.pbwt -writeSites macs.sites
However, -macs
yields:
FATAL ERROR: unrecognised command -macs
Could you provide an update to the example usage demonstrating how to use macs output?
Hey there, I am trying to install pbwt, and have compiled htslib as instructed on the readme.
I haven't been able to compile pbwt though, I get the following errors:
`cc -g -O3 -I../htslib pbwtMain.o pbwtCore.o pbwtSample.o pbwtIO.o pbwtMatch.o pbwtImpute.o pbwtPaint.o pbwtLikelihood.o pbwtMerge.o pbwtGeneticMap.o pbwtHtslib.o hash.o dict.o array.o utils.o -lpthread ../htslib/libhts.a -lz -lm -lbz2 -llzma -o pbwt
../htslib/libhts.a(hfile_libcurl.o): In function `easy_errno':
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:175: undefined reference to `curl_easy_getinfo'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:189: undefined reference to `curl_easy_getinfo'
../htslib/libhts.a(hfile_libcurl.o): In function `hfile_plugin_init_libcurl':
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1509: undefined reference to `curl_global_init'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1512: undefined reference to `curl_share_init'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1514: undefined reference to `curl_share_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1515: undefined reference to `curl_share_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1516: undefined reference to `curl_share_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1542: undefined reference to `curl_version_info'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1518: undefined reference to `curl_share_cleanup'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1519: undefined reference to `curl_global_cleanup'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1531: undefined reference to `curl_share_cleanup'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1532: undefined reference to `curl_global_cleanup'
../htslib/libhts.a(hfile_libcurl.o): In function `wait_perform':
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:697: undefined reference to `curl_multi_fdset'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:729: undefined reference to `curl_multi_perform'
../htslib/libhts.a(hfile_libcurl.o): In function `process_messages':
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:673: undefined reference to `curl_multi_info_read'
../htslib/libhts.a(hfile_libcurl.o): In function `wait_perform':
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:700: undefined reference to `curl_multi_timeout'
../htslib/libhts.a(hfile_libcurl.o): In function `libcurl_close':
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1130: undefined reference to `curl_easy_pause'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1139: undefined reference to `curl_multi_remove_handle'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1143: undefined reference to `curl_easy_cleanup'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1144: undefined reference to `curl_multi_cleanup'
../htslib/libhts.a(hfile_libcurl.o): In function `libcurl_write':
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:880: undefined reference to `curl_easy_pause'
../htslib/libhts.a(hfile_libcurl.o): In function `libcurl_exit':
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:278: undefined reference to `curl_share_cleanup'
../htslib/libhts.a(hfile_libcurl.o): In function `restart_from_position':
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1007: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1026: undefined reference to `curl_easy_duphandle'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1030: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1031: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1032: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1042: undefined reference to `curl_multi_add_handle'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1049: undefined reference to `curl_easy_pause'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1101: undefined reference to `curl_easy_reset'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1102: undefined reference to `curl_multi_remove_handle'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1109: undefined reference to `curl_easy_cleanup'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1069: undefined reference to `curl_multi_remove_handle'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1082: undefined reference to `curl_easy_cleanup'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1084: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1085: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1072: undefined reference to `curl_easy_reset'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1073: undefined reference to `curl_multi_remove_handle'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1088: undefined reference to `curl_easy_reset'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1075: undefined reference to `curl_easy_cleanup'
../htslib/libhts.a(hfile_libcurl.o): In function `libcurl_open':
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1209: undefined reference to `curl_multi_init'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1212: undefined reference to `curl_easy_init'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1216: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1220: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1228: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1229: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1230: undefined reference to `curl_easy_setopt'
../htslib/libhts.a(hfile_libcurl.o):/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1237: more undefined references to `curl_easy_setopt' follow
../htslib/libhts.a(hfile_libcurl.o): In function `libcurl_open':
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1347: undefined reference to `curl_easy_cleanup'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1348: undefined reference to `curl_multi_cleanup'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1252: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1257: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1260: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1261: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1268: undefined reference to `curl_multi_add_handle'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1340: undefined reference to `curl_multi_remove_handle'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1223: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1224: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1319: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1320: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1276: undefined reference to `curl_easy_getinfo'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1296: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1297: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1298: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1309: undefined reference to `curl_easy_getinfo'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1263: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1255: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1330: undefined reference to `curl_easy_getinfo'
../htslib/libhts.a(hfile_libcurl.o): In function `libcurl_read':
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:821: undefined reference to `curl_easy_pause'
../htslib/libhts.a(hfile_libcurl.o): In function `libcurl_exit':
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:300: undefined reference to `curl_global_cleanup'
collect2: ld returned 1 exit status
make: *** [pbwt] Error 1`
Would you be able to help with this? Thanks!
Looking only at chromosome 3, some 2k rsIDs (from 23andMe data) are not found in the 2.8G SNPs in imputed chromosome 3... Why should SNPs be dropped from the input in the output?
Many thanks.
Hi,
I'm trying to compile pbwt and I'm getting this error
cc -g -O3 -I../htslib -c -o pbwtMain.o pbwtMain.c
cc -g -O3 -I../htslib -c -o pbwtCore.o pbwtCore.c
pbwtCore.c: In function ‘pbwtSelectSites’:
pbwtCore.c:684:97: error: ‘site’ undeclared (first use in this function)
*pOld, Array sites, BOOL isKeepOld) { return pbwtSelectSites (pOld, site, isKee
^
pbwtCore.c:684:97: note: each undeclared identifier is reported only once for each function it appears in
pbwtCore.c:684:74: error: too many arguments to function ‘pbwtSelectSites’
*pbwtSelectSites (PBWT *pOld, Array sites, BOOL isKeepOld) { return pbwtSelectS
pbwtCore.c:686:85: error: too many arguments to function ‘pbwtSelectSites’
SitesFillMissing (PBWT *pOld, Array sites, BOOL isKeepOld) { return pbwtSelectS
^
pbwtCore.c:684:7: note: declared here
PBWT *pbwtSelectSites (PBWT *pOld, Array sites, BOOL isKeepOld) { return pbwtSe
^
: recipe for target 'pbwtCore.o' failed
make: *** [pbwtCore.o] Error 1
^
Thanks,
Pauline
The algorithm 3 (pbwt -longWithin 5
) does not report some long matches. For example, in the following example:
1:1010001 001100
4:1010001 001100
0:0110000 101010
2:0011000 110010
5:0011001 000010
3:1011001 100100
I would think that a match between the haplotypes 5 and 3 should be reported at k=6 or is my interpretation of a "long match" incorrect?
EDIT: or perhaps 5 and 2 for k=6 and 5 and 3 for k=7, depending if we report at the last matching position or immediately after.
Hi there,
I am currently trying to use PBWT to impute a reference panel A onto another reference panel B, and vice versa. I am able to impute reference panel A with reference panel B. However, when imputing reference panel B with reference panel A, the process gets Killed on my Linux server.
Here are some lines from the output with file paths taken out:
read genotypes from panel_B/ch1_A.vcf.gz with 2504 sample names and 3738240 sites on chromosome 1: M, N are 5008, 3738240
user 500.356235 system 4.765889 max_RSS 448492 Memory 583036203
impute against reference panel_A_pbwt/chr1
read pbwt PBW3 file with 133049391 bytes: M, N are 9620, 7687647
read 7687647 sites on chromosome 1 from file
read 4810 sample names
1874940 sites selected from 7687647, pbwt size for 9620 haplotypes is 61442035
built reverse PBWT - size 61408995
1874940 sites selected from 3738240, pbwt size for 5008 haplotypes is 47999564
Imputation preliminaries: user 213.029846 system 1.175402 max_RSS 692372 Memory 2156881417
Reference impute using maximal matches: Killed
What could be causing this? I've tried it with two different servers with the same results.
Hi Richard,
Thanks very much for an awesome tool.
I have an issue in compiling pbwt
from source, using both master
and develop
branches of htslib
from @samtools on github. I assume this is because HTSlib & its API is a bit of a moving target still.
Two questions:
pbwt
will compile? If so, would you like me to make a pull request adding it as a git submodule to ease installation?gcc
gives, or better yet a pull request with fixes? Or is this something being actively developed in you lab?Thanks again,
Kevin
At some point between commit 9a1e0fe and c9764e2, there seems to have been a bug introduced to the writePhase code. The phase file output is now writing raw binary 0/1 bytes to the output file, instead of ASCII 0/1 characters.
I.e., the phase lines output have something like (using carat notation)
^@^A^@^@^@^@^A^@.....
^A^@^@^@^@^A^@^@....
....
Instead of
01000010...
10000100...
...
Dear Richard Durbin,
I have managed to compile the pbwt software on my laptop running mac osx but could not do it on our linux based machines (cluster nodes running Scientific Linux 6.5)
This is what I am getting:
make
gcc -g -o pbwt pbwtMain.o pbwtCore.o pbwtSample.o pbwtIO.o pbwtMatch.o pbwtImpute.o pbwtPaint.o pbwtLikelihood.o pbwtMerge.o pbwtGeneticMap.o pbwtHtslib.o hash.o dict.o array.o utils.o ../htslib/libhts.a -lpthread -lz -lm
pbwtImpute.o: In function referenceImpute3': /gpfs/hpchome/bayazit1/udustorage/macs/macs-0.5d/TEST_MACS_PBWT/pbwt-master/pbwtImpute.c:1148: undefined reference to
mergesort'
collect2: ld returned 1 exit status
make: *** [pbwt] Error 1
By the way, our IT people also failed to compile it and this is why I decided to report this.
Hello,
I am trying to install in ubuntu 14.04. While I was able to compile htslib, I failed pbwt with the following errors:
echo '#define PBWT_COMMIT_HASH "0811999"' > version.h
cc -g -O3 -I../htslib -c -o pbwtMain.o pbwtMain.c
cc -g -O3 -I../htslib -c -o pbwtCore.o pbwtCore.c
pbwtCore.c: In function ‘pbwtSelectSites’:
pbwtCore.c:684:97: error: ‘site’ undeclared (first use in this function)
PBWT *pbwtSelectSites (PBWT *pOld, Array sites, BOOL isKeepOld) { return pbwtSelectSites (pOld, site, isKeepOld, FALSE) ; }
^
pbwtCore.c:684:97: note: each undeclared identifier is reported only once for each function it appears in
pbwtCore.c:684:1: error: too many arguments to function ‘pbwtSelectSites’
PBWT *pbwtSelectSites (PBWT *pOld, Array sites, BOOL isKeepOld) { return pbwtSelectSites (pOld, site, isKeepOld, FALSE) ; }
^
pbwtCore.c:684:7: note: declared here
PBWT *pbwtSelectSites (PBWT *pOld, Array sites, BOOL isKeepOld) { return pbwtSelectSites (pOld, site, isKeepOld, FALSE) ; }
^
pbwtCore.c: In function ‘pbwtSelectSitesFillMissing’:
pbwtCore.c:686:108: error: ‘site’ undeclared (first use in this function)
PBWT *pbwtSelectSitesFillMissing (PBWT *pOld, Array sites, BOOL isKeepOld) { return pbwtSelectSites (pOld, site, isKeepOld, TRUE) ; }
^
pbwtCore.c:686:1: error: too many arguments to function ‘pbwtSelectSites’
PBWT *pbwtSelectSitesFillMissing (PBWT *pOld, Array sites, BOOL isKeepOld) { return pbwtSelectSites (pOld, site, isKeepOld, TRUE) ; }
^
pbwtCore.c:684:7: note: declared here
PBWT *pbwtSelectSites (PBWT *pOld, Array sites, BOOL isKeepOld) { return pbwtSelectSites (pOld, site, isKeepOld, FALSE) ; }
^
make: *** [pbwtCore.o] Error 1
Could you help me resolve this?
Thanks!
When I tried to run the example
[xxxlll@localhost pbwt]$ ./pbwt -checkpoint 10000 -readMacs 11k.macs -write macs11k.pbwt -writeSites macs.sites
user 0.000010 system 0.000010 max_RSS 0 Memory 131296
FATAL ERROR: MaCS COMMAND line not found
user 0.000071 system 0.000071 max_RSS 0 Memory 131360
And other commands also don't work.
What could be the problem?
referenceImpute3 in pbwtImpute.c uses BSD mergesort which is not available in glibc. This makes it difficult to compile under linux. A direct replacement with qsort seems to work fine.
I don't think the performance consequences of this should be too bad. On glibc at least, qsort actually uses mergesort under the hood unless the array is very large. I'm not sure what the performance consequences would be on a mac though.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.