Giter Club home page Giter Club logo

team_bloodies's Introduction

This is the repository for the group project of Team Bloodies.

Project: Data-driven analysis of the potential candidate transcription factors in hematopoietic stem cell differentiation into multiple progenitor compartments.

Links to:

Proposal
Progress Report
Poster

Members and division of labor

Name Initial work assignment Affiliation Expertise
Annie Cavalla TF motif enrichment analysis Bioinformatics Cancer genomics
Rawnak Hoque RNA-seq analysis and TF motif enrichment analysis Genome Science and Technology Genome scale data analysis
Fangwu Wang DNA methylation analysis, TF clustering Medical Genetics Stem cell biology
Somdeb Paul DNA methylation analysis Genome Science and Technology Transcriptomics

Rationale: Human hematopoietic stem cells (HSCs) hold great clinical promises for curative HSC transplantation therapies for numerous hematologic malignancies and diseases. Understanding the mechanisms regulating the self-renewal and lineage restriction of HSCs is crucial for improving transplantation regimens. HSC is thought to acquire multi-step lineage restriction through going down multiple progenitor populations, during which process the myeloid vs.lymphoid binary decision is made with subsequent progeny restricted to either fate. In this project, we are interested in the epigenomic status of HSCs and other progenitor populations and how it interacts with transcription factor binding to regulate lineage differentiation program.

Data source:

Our Dataset includes matched DNA methylation (bisulfite-seq) and RNA-seq data from HSCs and 5 other progenitor cell types, obtained from a recent publication (Farlik M. et al, Cell, 2016) which characterized the differentiation path of HSCs based on cell DNA methylation profiles.

Different strategy from the published paper: To more rigorously identify TFs with a potential function in cell differentiation, we annotated DNA methylation using both promoters and customized enhancers. The enhancer regions were defined from two hematopoietic cell lines (K562, GM12878) from the Genome Segment ChromHMM tracks (UCSC table browser).

Data replicate summary:

Cell Type Replicates for Methylation Replicates for RNA
HSC 3 1
MPP 3 2
MLP 3 2
CMP 3 1
GMP 3 2
CLP 3 1

Workflow: We first analyzed differential DNA methylation of 5 pairwise comparisons in the annotated promoter and enhancer regions using RnBeads. The biological meaning of the 5 pairwise comparisons:

Comparison Biological Meaning
HSC-MPP loss of long-term regeneation potential
MPP-CMP multipotent to myeloid commitment
MPP-MLP multipotent to lymphoid commitment
CMP-MLP difference between myeloid and lymphoid on the CMP-MLP level
GMP-CLP difference between myeloid and lymphoid on the GMP-MLP level

We then used low methylated regions of each cell type from each comparison (defined by the > 40% difference from pairwise comparison) to find enriched transcription factor binding motifs using HOMER findingmotif tools, and generated a list of our data-driven candidate TFs for each population from each comparison.

We analyzed the overlapped genes of DNA methylation and RNA expression to see if there is any correlation between low methylation and high expression of genes. We inspected the expression of TFs identified from motif enrichment to see if they are highly expressed in the corresponding population. Then we used the expression of TFs identified from CMP/MLP comparison (representing the myeloid and lymphoid lineages) to cluster the leukemia samples to see whether the samples from the same leukemia type group together.

Analysis and Major Findings:

RnBeads analysis of pairwise comparison:
a. Beta-value distribution and variation
b. PCA
c. Clustering
d. Differential methylated regions
e. Correlation with RNA expression
Methods:
f. Data preparation: replicate merging
g. Enhancer annotation-code
h. RnBeads: all samples and pairwise comparison (CLP-GMP as an example)
i. intersection between DNA/RNA gene lists-code

a. Sanity check:sample-sample correlation, heatmap clustering
b. Differential expression gene lists
Methods:
c. Data processing and gene id conversion

a. Results
Differential gene table
Methods:
b. limma

a. TFs found at Enhancer
b. TFs found at Promoter
Methods
c. Input files
d. HOMER Findingmotif tool

a. Normal samples CMP/MLP
b. Leukemia samples AML/CLL
Methods
c. TF list feeding into expression

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.