Giter Club home page Giter Club logo

pbwt's People

Contributors

danjlawson avatar dsbristol avatar johnlees avatar mcshane avatar pd3 avatar richarddurbin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pbwt's Issues

Bug in reading genotypes from VCF/BCF?

Hi,

The code from 110 to 118 of pbwtHtslib.c in function pbwtReadVcfGT may bring a bug.

if (gt_arr[i] == bcf_int32_vector_end) 
	x[i] = bcf_gt_allele(gt_arr[i-1]); // treat haploid genotypes as diploid homozygous A/A
if (gt_arr[i] == bcf_gt_missing)
{ 
	x[i] = 0 ; /* use ref for now */
   	xMissing[i] = 1 ;
    ++nMissing ;
} else 
	x[i] = bcf_gt_allele(gt_arr[i]) ;

if gt_arr[i] equals bcf_int32_vector_end, and I guess it won't equal bcf_gt_missing, finally the statement:

	x[i] = bcf_gt_allele(gt_arr[i]) 

will be executed. that equals x[i] = bcf_gt_allele(bcf_int32_vector_end). I guess there should be an else before if (gt_arr[i] == bcf_gt_missing).

adding LICENSE file

It would be nice if a LICENSE file could be added. This would also make packaging it downstream easier.

Is it valid VCF not to 'squash' positions with more than one ALT allele?

I'm seeing output that looks like this:

3	112841	rs78923776	C	G	.	PASS	RefPanelAF=0.000153988;AN=2;AC=0;INFO=1	GT:ADS:DS:GP	0|0:0,0:0:1,0,0
3	112841	rs78923776	C	T	.	PASS	RefPanelAF=0.078534;AN=2;AC=1;INFO=1	GT:ADS:DS:GP	0|1:0,1:1:0,1,0

The first line says that my genome is CC at this position, but the second line (for the second alt allele) says that my genome is CT at this position. OK, the first line couldn't say this, so it calls me as REF/REF, but this call has to be interpreted in the context of the second call, and can't be taken at face value. That's why I wonder if this is valid VCF?

I think the line should be 'squashed' down to the following (I think):

3	112841	rs78923776	C	G,T	.	PASS	RefPanelAF=0.000153988,0.078534;AN=3;AC=0,1;INFO=1	GT:ADS:DS:GP	0|2:0,0:0:1,0,0

Build error from git clone

From a clone of the repository on github, running make gives:

$ make
cc -g -O3 -I../htslib  -c -o pbwtMain.o pbwtMain.c
pbwtMain.c: In function ‘pbwtCommitHash’:
pbwtMain.c:176:12: error: ‘PBWT_COMMIT_HASH’ undeclared (first use in this function)
     return PBWT_COMMIT_HASH ;

I needed to explicitly generate version.h before the build would proceed:

$ make version.h
# ...
$ make
# builds ok

Is the -macs command deprecated?

I am trying to follow the example usage that is based upon the use of macs output as input to pbwt:

macs 11000 1e6 -t 0.001 -r 0.001 > 11k.macs
pbwt -checkpoint 10000 -macs 11k.macs -write macs11k.pbwt -writeSites macs.sites

However, -macs yields:

FATAL ERROR: unrecognised command -macs

Could you provide an update to the example usage demonstrating how to use macs output?

Running 'make' on pbwt - "undefined reference to 'curl_..." errors

Hey there, I am trying to install pbwt, and have compiled htslib as instructed on the readme.

I haven't been able to compile pbwt though, I get the following errors:

`cc -g -O3 -I../htslib   pbwtMain.o pbwtCore.o pbwtSample.o pbwtIO.o pbwtMatch.o pbwtImpute.o pbwtPaint.o pbwtLikelihood.o pbwtMerge.o pbwtGeneticMap.o pbwtHtslib.o hash.o dict.o array.o utils.o -lpthread ../htslib/libhts.a -lz -lm -lbz2 -llzma -o pbwt
../htslib/libhts.a(hfile_libcurl.o): In function `easy_errno':
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:175: undefined reference to `curl_easy_getinfo'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:189: undefined reference to `curl_easy_getinfo'
../htslib/libhts.a(hfile_libcurl.o): In function `hfile_plugin_init_libcurl':
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1509: undefined reference to `curl_global_init'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1512: undefined reference to `curl_share_init'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1514: undefined reference to `curl_share_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1515: undefined reference to `curl_share_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1516: undefined reference to `curl_share_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1542: undefined reference to `curl_version_info'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1518: undefined reference to `curl_share_cleanup'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1519: undefined reference to `curl_global_cleanup'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1531: undefined reference to `curl_share_cleanup'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1532: undefined reference to `curl_global_cleanup'
../htslib/libhts.a(hfile_libcurl.o): In function `wait_perform':
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:697: undefined reference to `curl_multi_fdset'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:729: undefined reference to `curl_multi_perform'
../htslib/libhts.a(hfile_libcurl.o): In function `process_messages':
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:673: undefined reference to `curl_multi_info_read'
../htslib/libhts.a(hfile_libcurl.o): In function `wait_perform':
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:700: undefined reference to `curl_multi_timeout'
../htslib/libhts.a(hfile_libcurl.o): In function `libcurl_close':
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1130: undefined reference to `curl_easy_pause'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1139: undefined reference to `curl_multi_remove_handle'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1143: undefined reference to `curl_easy_cleanup'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1144: undefined reference to `curl_multi_cleanup'
../htslib/libhts.a(hfile_libcurl.o): In function `libcurl_write':
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:880: undefined reference to `curl_easy_pause'
../htslib/libhts.a(hfile_libcurl.o): In function `libcurl_exit':
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:278: undefined reference to `curl_share_cleanup'
../htslib/libhts.a(hfile_libcurl.o): In function `restart_from_position':
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1007: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1026: undefined reference to `curl_easy_duphandle'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1030: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1031: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1032: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1042: undefined reference to `curl_multi_add_handle'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1049: undefined reference to `curl_easy_pause'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1101: undefined reference to `curl_easy_reset'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1102: undefined reference to `curl_multi_remove_handle'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1109: undefined reference to `curl_easy_cleanup'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1069: undefined reference to `curl_multi_remove_handle'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1082: undefined reference to `curl_easy_cleanup'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1084: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1085: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1072: undefined reference to `curl_easy_reset'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1073: undefined reference to `curl_multi_remove_handle'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1088: undefined reference to `curl_easy_reset'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1075: undefined reference to `curl_easy_cleanup'
../htslib/libhts.a(hfile_libcurl.o): In function `libcurl_open':
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1209: undefined reference to `curl_multi_init'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1212: undefined reference to `curl_easy_init'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1216: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1220: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1228: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1229: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1230: undefined reference to `curl_easy_setopt'
../htslib/libhts.a(hfile_libcurl.o):/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1237: more undefined references to `curl_easy_setopt' follow
../htslib/libhts.a(hfile_libcurl.o): In function `libcurl_open':
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1347: undefined reference to `curl_easy_cleanup'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1348: undefined reference to `curl_multi_cleanup'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1252: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1257: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1260: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1261: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1268: undefined reference to `curl_multi_add_handle'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1340: undefined reference to `curl_multi_remove_handle'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1223: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1224: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1319: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1320: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1276: undefined reference to `curl_easy_getinfo'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1296: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1297: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1298: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1309: undefined reference to `curl_easy_getinfo'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1263: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1255: undefined reference to `curl_easy_setopt'
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:1330: undefined reference to `curl_easy_getinfo'
../htslib/libhts.a(hfile_libcurl.o): In function `libcurl_read':
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:821: undefined reference to `curl_easy_pause'
../htslib/libhts.a(hfile_libcurl.o): In function `libcurl_exit':
/hpc/home/reduser/programs/htslib/hfile_libcurl.c:300: undefined reference to `curl_global_cleanup'
collect2: ld returned 1 exit status
make: *** [pbwt] Error 1`

Would you be able to help with this? Thanks!

Some rsIDs in original VCF not in imputed VCF

Looking only at chromosome 3, some 2k rsIDs (from 23andMe data) are not found in the 2.8G SNPs in imputed chromosome 3... Why should SNPs be dropped from the input in the output?

Many thanks.

unable to build pbwt

Hi,

I'm trying to compile pbwt and I'm getting this error

cc -g -O3 -I../htslib -c -o pbwtMain.o pbwtMain.c
cc -g -O3 -I../htslib -c -o pbwtCore.o pbwtCore.c
pbwtCore.c: In function ‘pbwtSelectSites’:
pbwtCore.c:684:97: error: ‘site’ undeclared (first use in this function)
*pOld, Array sites, BOOL isKeepOld) { return pbwtSelectSites (pOld, site, isKee
^
pbwtCore.c:684:97: note: each undeclared identifier is reported only once for each function it appears in
pbwtCore.c:684:74: error: too many arguments to function ‘pbwtSelectSites’
*pbwtSelectSites (PBWT *pOld, Array sites, BOOL isKeepOld) { return pbwtSelectS

pbwtCore.c:686:85: error: too many arguments to function ‘pbwtSelectSites’
SitesFillMissing (PBWT *pOld, Array sites, BOOL isKeepOld) { return pbwtSelectS
^
pbwtCore.c:684:7: note: declared here
PBWT *pbwtSelectSites (PBWT *pOld, Array sites, BOOL isKeepOld) { return pbwtSe
^
: recipe for target 'pbwtCore.o' failed
make: *** [pbwtCore.o] Error 1
^
Thanks,
Pauline

Bug in algorithm ReportLongMatches?

The algorithm 3 (pbwt -longWithin 5) does not report some long matches. For example, in the following example:

1:1010001 001100
4:1010001 001100
0:0110000 101010
2:0011000 110010
5:0011001 000010
3:1011001 100100

I would think that a match between the haplotypes 5 and 3 should be reported at k=6 or is my interpretation of a "long match" incorrect?

EDIT: or perhaps 5 and 2 for k=6 and 5 and 3 for k=7, depending if we report at the last matching position or immediately after.

Reference impute using maximal matches: Killed

Hi there,

I am currently trying to use PBWT to impute a reference panel A onto another reference panel B, and vice versa. I am able to impute reference panel A with reference panel B. However, when imputing reference panel B with reference panel A, the process gets Killed on my Linux server.

Here are some lines from the output with file paths taken out:

read genotypes from panel_B/ch1_A.vcf.gz with 2504 sample names and 3738240 sites on chromosome 1: M, N are 5008, 3738240
user    500.356235      system  4.765889        max_RSS 448492  Memory  583036203
impute against reference panel_A_pbwt/chr1
read pbwt PBW3 file with 133049391 bytes: M, N are 9620, 7687647
read 7687647 sites on chromosome 1 from file
read 4810 sample names
1874940 sites selected from 7687647, pbwt size for 9620 haplotypes is 61442035
built reverse PBWT - size 61408995
1874940 sites selected from 3738240, pbwt size for 5008 haplotypes is 47999564
Imputation preliminaries: user  213.029846      system  1.175402        max_RSS 692372  Memory  2156881417
Reference impute using maximal matches: Killed 

What could be causing this? I've tried it with two different servers with the same results.

Build difficulty with HTSlib

Hi Richard,

Thanks very much for an awesome tool.

I have an issue in compiling pbwt from source, using both master and develop branches of htslib from @samtools on github. I assume this is because HTSlib & its API is a bit of a moving target still.

Two questions:

  • Do you have a known revision of HTSlib with which pbwt will compile? If so, would you like me to make a pull request adding it as a git submodule to ease installation?
  • If you intend to target the master/develop brances of HTSlib, do you want me to submit a proper issue detailing exact errors gcc gives, or better yet a pull request with fixes? Or is this something being actively developed in you lab?

Thanks again,
Kevin

writePhase writing binary 0/1 instead of ASCII 0/1

At some point between commit 9a1e0fe and c9764e2, there seems to have been a bug introduced to the writePhase code. The phase file output is now writing raw binary 0/1 bytes to the output file, instead of ASCII 0/1 characters.
I.e., the phase lines output have something like (using carat notation)

^@^A^@^@^@^@^A^@.....
^A^@^@^@^@^A^@^@....
....

Instead of

01000010...
10000100...
...

"Undefined reference" link-time error

Dear Richard Durbin,

I have managed to compile the pbwt software on my laptop running mac osx but could not do it on our linux based machines (cluster nodes running Scientific Linux 6.5)

This is what I am getting:

make
gcc -g -o pbwt pbwtMain.o pbwtCore.o pbwtSample.o pbwtIO.o pbwtMatch.o pbwtImpute.o pbwtPaint.o pbwtLikelihood.o pbwtMerge.o pbwtGeneticMap.o pbwtHtslib.o hash.o dict.o array.o utils.o ../htslib/libhts.a -lpthread -lz -lm
pbwtImpute.o: In function referenceImpute3': /gpfs/hpchome/bayazit1/udustorage/macs/macs-0.5d/TEST_MACS_PBWT/pbwt-master/pbwtImpute.c:1148: undefined reference tomergesort'
collect2: ld returned 1 exit status
make: *** [pbwt] Error 1

By the way, our IT people also failed to compile it and this is why I decided to report this.

can't compile

Hello,

I am trying to install in ubuntu 14.04. While I was able to compile htslib, I failed pbwt with the following errors:

echo '#define PBWT_COMMIT_HASH "0811999"' > version.h
cc -g -O3 -I../htslib -c -o pbwtMain.o pbwtMain.c
cc -g -O3 -I../htslib -c -o pbwtCore.o pbwtCore.c
pbwtCore.c: In function ‘pbwtSelectSites’:
pbwtCore.c:684:97: error: ‘site’ undeclared (first use in this function)
PBWT *pbwtSelectSites (PBWT *pOld, Array sites, BOOL isKeepOld) { return pbwtSelectSites (pOld, site, isKeepOld, FALSE) ; }
^
pbwtCore.c:684:97: note: each undeclared identifier is reported only once for each function it appears in
pbwtCore.c:684:1: error: too many arguments to function ‘pbwtSelectSites’
PBWT *pbwtSelectSites (PBWT *pOld, Array sites, BOOL isKeepOld) { return pbwtSelectSites (pOld, site, isKeepOld, FALSE) ; }
^
pbwtCore.c:684:7: note: declared here
PBWT *pbwtSelectSites (PBWT *pOld, Array sites, BOOL isKeepOld) { return pbwtSelectSites (pOld, site, isKeepOld, FALSE) ; }
^
pbwtCore.c: In function ‘pbwtSelectSitesFillMissing’:
pbwtCore.c:686:108: error: ‘site’ undeclared (first use in this function)
PBWT *pbwtSelectSitesFillMissing (PBWT *pOld, Array sites, BOOL isKeepOld) { return pbwtSelectSites (pOld, site, isKeepOld, TRUE) ; }
^
pbwtCore.c:686:1: error: too many arguments to function ‘pbwtSelectSites’
PBWT *pbwtSelectSitesFillMissing (PBWT *pOld, Array sites, BOOL isKeepOld) { return pbwtSelectSites (pOld, site, isKeepOld, TRUE) ; }
^
pbwtCore.c:684:7: note: declared here
PBWT *pbwtSelectSites (PBWT *pOld, Array sites, BOOL isKeepOld) { return pbwtSelectSites (pOld, site, isKeepOld, FALSE) ; }
^
make: *** [pbwtCore.o] Error 1

Could you help me resolve this?

MaCS command not found

When I tried to run the example

[xxxlll@localhost pbwt]$ ./pbwt -checkpoint 10000 -readMacs 11k.macs -write macs11k.pbwt -writeSites macs.sites
user 0.000010 system 0.000010 max_RSS 0 Memory 131296
FATAL ERROR: MaCS COMMAND line not found
user 0.000071 system 0.000071 max_RSS 0 Memory 131360

And other commands also don't work.
What could be the problem?

BSD mergesort is not portable

referenceImpute3 in pbwtImpute.c uses BSD mergesort which is not available in glibc. This makes it difficult to compile under linux. A direct replacement with qsort seems to work fine.

I don't think the performance consequences of this should be too bad. On glibc at least, qsort actually uses mergesort under the hood unless the array is very large. I'm not sure what the performance consequences would be on a mac though.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.