Comments (9)
Hi, I think it would be a great honor for me to be in the Acknowledgments section! My name is Han Shi, and my email is [email protected]. Thanks again for all your work and help~
from ancibd.
Thank you for reaching out. And congrats on apparently successfully running ancIBD
.
Actually, your IBD calls look better. I found that IID I12896 in the current tutorial has erroneously too much missing data (by accident a PMD-filtered VCF was used) - and produces lots of short FP IBD segments - that you do not have. I will update the Vignette very soon to rectify that error.
Could you let me know if you started from the current example VCF dataset? I just downloaded the data from the Dropbox link, installed ancIBD
v0.5 on a fresh machine, and reproduced the original results (exactly!).
A quick sanity check would be running ancIBD
manually on a pair of chromosomes:
The Filtering to 0.99 GP variants: 0.125x
tells you that only a few variants have high imputation quality. Is that the same for you?
from ancibd.
Really thanks for your quick response!
I tried to run ancIBD on chromosome 8 between I12440 and I12896, and got the same results as you did.
However, I still failed reproducing the results in the tutorial. So I really want to share more detailed information with you...And sorry that it will be very long, I really appreciate it if you are willing to help me check on this. Thanks!!
Figure and scripts for the quick check on chromosome 8 (consistent):
![chrom8](https://private-user-images.githubusercontent.com/94039011/264272315-ff9351b3-9e44-4c52-878a-9e1acf51f120.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk0MDQ1NTgsIm5iZiI6MTcxOTQwNDI1OCwicGF0aCI6Ii85NDAzOTAxMS8yNjQyNzIzMTUtZmY5MzUxYjMtOWU0NC00YzUyLTg3OGEtOWUxYWNmNTFmMTIwLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjI2VDEyMTczOFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQ5N2IxNDAyZDU5NzY3ZjRiNzU3NTZmZDAzMzZjODFmNDIwZDRiNGJjNmQ1OWQzMjliYjNhY2M2OTdkNDU5ZDkmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.4dryyAb-pxnOyRzwRBeFtEAqbEdswcUTd5btki61TS0)
![chrom8_script](https://private-user-images.githubusercontent.com/94039011/264272316-70fbbf76-3c51-464c-be6a-3835d156e65f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk0MDQ1NTgsIm5iZiI6MTcxOTQwNDI1OCwicGF0aCI6Ii85NDAzOTAxMS8yNjQyNzIzMTYtNzBmYmJmNzYtM2M1MS00NjRjLWJlNmEtMzgzNWQxNTZlNjVmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjI2VDEyMTczOFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWMyMzY3YmI0ZjNiMGE5YzZlYTY3NDg3MzA2M2YyYjg2ZTIwOGQxNDBmYmM1ZjU1MDA3NWI1YjMzY2VkNDAxY2EmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.YME-t_Rzl8luB5-aTg6ubsEPh5c_7iDdnJEMqipdkfA)
Detailed steps for running the example data
1) Preparation
Guidelines:
https://ancibd.readthedocs.io/en/latest/Intro.html
Data:
downloaded from https://www.dropbox.com/sh/q18yyrffbdj1yv1/AAC1apifYB_oKB8SNrmQQ-26a?dl=0
detailed information:
2) VCF->HDF5
Correct: May be the path of v51.1_1240k.snp in the tutorial should be changed into : map_path = f"./data/maps/v51.1_1240k.snp"
Output information:
results_vcf_hdf5.txt
Notice:
In output information, there is an error:
Error. nthreads cannot be larger than environment variable "NUMEXPR_MAX_THREADS" (64)
but the transformation can still go smoothly, and the other output information is the same as that in the tutorial.
I checked that on GitHub and found that some people may also have the same issue while running other software smoothly. So I am wondering maybe this may not be the major cause for the different results?....
LinkageIO/Camoco#98
3) Calling IBD
Output files:
ch_all.txt
ibd_ind.d220.txt
Output information:
is the same as I previously mentioned, which is different from the tutorial...:
I tried to upload the compressed files including all the files or the hdf5 files, yet the file size is too big(both are >25MB). And if there are something that I forget to mention, please contact me. Really thanks for your kind help!
from ancibd.
Thank you for your detailed report!
Your Chr. 8 example shows that your pre-processing works, you got exactly the same hdf5 file as me.
I could now figure out what caused your different output - it was a look-over on our side:
We updated the code in the Vignette notebook, but not the output cells! Using the same input as you I can fully replicate your output. The difference was that you use
l_model='h5', e_model='haploid_gl2'
which is the most up-to-date module.
I now further updated the Vignette, adding the latest and recommended parameter p_col='variants/RAF'.
You can find this latest notebook (and matching output) here.
You should now be able to run it and exactly replicate its output!
from ancibd.
Thanks for your careful check, I can replicate the output this time!!
Also, many thanks for your excellent work in ancIBD and hapROH, which really enable us to explore more about the family pedigrees and social structure. Thanks!
from ancibd.
Thanks a lot, you identified a bugged vignette file :)
We would add you to the Acknowledgments section. Let us know if that is okay and if so your preferred (user) name (in a PM or per email).
Happy IBD and ROH hunting!
from ancibd.
Hi, can I use this space to ask one more question about the reference panel files you use when doing the imputation and phasing?
I noticed that in the SI of your ancIBD paper, you said "We imputed all autosomal bi-allelic SNPs 1000 Genomes Phase 3 release using GLIMPSE using its default parameters". So I am wondering are you using the files downloaded from the following link? Take the chr1 as an example :http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5b.20130502.genotypes.vcf.gz
Since in the tutorials of Glimpse, the reference panel files are in the GRCh38/hg38 genome assembly, so I am not really sure about the files that should be used to impute and phase data in hg19 genome assembly. Thanks so much!
from ancibd.
Yes - we used the files from here (that you refer to) as reference for phasing and imputation, that is:
http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/
Most of human aDNA is aligned to hg19/grch37 - so please also that matching reference panel!
from ancibd.
Yeah, I got it! Really thanks for your guidance :)
from ancibd.
Related Issues (11)
- UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte HOT 5
- Input for GLIMPSE HOT 2
- Fail to detect any IBD in data HOT 3
- Converting example vcf to HDF5 HOT 6
- Merging in LD Map.. results in error HOT 3
- Installing ancIBD fails when building wheel HOT 4
- add option to use maskfile HOT 1
- Converting example vcf to HDF5: Error at interpolation step HOT 5
- Mac M1 / M2 install tips HOT 1
- filters/snps_bcftools_chX.csv HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ancibd.