Giter Club home page Giter Club logo

gt-dimred's Introduction

🧮🧬 Alex Diaz-Papkovich, PhD 🧬🧮

I'm a statistician and data scientist. I'm currently at Brown University working as a postdoctoral research associate at the Data Science Institute with Sohini Ramachandran. My PhD work was at McGill University in Quantitative Life Sciences with Simon Gravel, where I studied topological data analysis methods for genetic data. You can find my published research on Google Scholar.

I also enjoy collecting data on a variety of topics. Some of my side-projects include tracking the length of the Rideau Canal skating season and collecting news stories of traffic violence.

Some of my academic research:

Non-linear dimensionality reduction for visualizing population genetic data

UMAP is an efficient method to visualize biobank data. You can find structure in your data (i.e. population structure) related to factors like demographic history or biobank sampling methodology. When you colour in the visualizations with other data, like geography or phenotypic measures, you can see lots of patterns and study them further. You can also work in 3D and get creative, doing stuff like converting UMAP's $(x,y,z)$ coordinates to RGB positions to create colour maps.

Paper: UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts, Diaz-Papkovich et al, PLoS Genetics, 2019.

Related Github repositories:

Stratification of biobank data

Though UMAP tends to generate clusters, it is not a clustering algorithm. To extract clusters from UMAP data, we use a density-based method called HDBSCAN. We can use this for stratification to get a better grasp of the population structure in our data, study how methods like polygenic scores transfer between populations, and do QC on biobank data.

Preprint: Topological stratification of continuous genetic variation in large biobanks, Diaz-Papkovich et al, bioRxiv, 2023.

Related Github repositories:

gt-dimred's People

Contributors

diazale avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

gt-dimred's Issues

How to use this and create plot?

Dear alex,
Thanks for this repository. I am interested to construct UMAP plot for genetics ancestry in an admixed population

https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008432

I performed UMAP as:

python gt-dimred/scripts/general_umap_script.py -dset input_pc_umap.txt -pc 15 -nn 10 -md 0.001 -outdir umap_out -head T -log log_dir
The output has two columns, number of rows is equal to the number of individuals in the initial PCs data.

What does the output represent and how do I create (UMAP/t-sne?) graphics as similar to the published manuscript using python or R?

run UMAP on genotype matrix

Hello! I tried to compute UMAP on my genotype matrix that also contanes samples from 1000G. Is there code that creates Fig. 1C from the original paper? Thank you!

Best,
Dmitrii

umap for UKB first 10 PC, I got a different plot

Hi, there:

Below is the UMAP based on 10 PC of UKB data, copied from your paper:

image

I used your script general_umap_script.py -head T -dset ukb.pc.txt -pc 10 -nn 10 -md 0.001 -nc 2 -outdir umap -log umap and got the following plot (let's ignore the group color for now).

image

There is some similarity, but certainly not the same. Can you please let me know how to replicate your plot?

Thanks!

Jie

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.