Giter Club home page Giter Club logo

ikap's Introduction

IKAP – Identifying K mAjor cell Population groups in single-cell RNA-seq analysis


Article:
IKAP - Identifying K mAjor cell Population groups in single-cell RNA-seq analysis
Yun-Ching Chen, Abhilash Suresh, Chingiz Underbayev, Clare Sun, Komudi Singh, Fayaz Seifuddin, Adrian Wiestner, Mehdi Pirooznia. https://academic.oup.com/gigascience/article/8/10/giz121/5579995

* Note: for Seurat3 please see Seurat3_code folder

Installation

Please install the following R libraries before installing IKAP:
Seurat, dplyr, reshape2, PRROC, WriteXLS, rpart, stringr, and rpart.plot


IKAP installation:

  1. First, you need to install the devtools package. You can do this from CRAN. Invoke R and then type

    install.packages("devtools")
    
  2. Load the devtools package.

    library(devtools)
    
  3. Install IKAP

    devtools::install_github("NHLBI-BCB/IKAP")
    

The main function, IKAP, takes a Seurat object with the normalized expression matrix and other parameters set by default values if not specified. IKAP explores sets of cell groups (clustering) by varying resolution (r) and the number of top principal components (nPC) for Seurat SNN clustering and picks a few candidate sets among all explored sets with one marked as the best that likely produces distinguishing marker genes.

Note: IKAP will, by default, regress out the percentage of mitochondrial gene counts and total UMI counts and scale the expression matrix using Seurat ScaleData function. These two values should be save in Seurat metadata with column names 'percent.mito' and 'nUMI' respectively. If you want to regress out different confounding variables or use different column names, please save these variables in Seurat metadata and set 'confounders' (an IKAP parameter) as their column names in the Seurat metadata data frame.



IKAP Workflow

Usage:

Seurat_obj <- IKAP(Seurat_obj, out.dir = "./IKAP")

Returned data and output files (saved in the output directory, default = ./IKAP/):

Seurat object: IKAP returns a Seurat object with all explored sets in the metadata data frame.

  • PC_K.pdf:

The heatmap shows the statistics for every combination of r and nPC explored. Candidate sets are marked as 'X' with the best marked as 'B'. The corresponding cell membership can be found in the metadata of the returned Seurat object with column name 'PC?K?'. For example, if 'B' (the best set) is marked at nPC = 20 and k = 8, the corresponding cell membership is stored in column 'PC20K8' in the metadata.

  • data.xls and markers.all.rds:

It saves the statistics (plotted in PC_K.pdf) for determining candidate sets in the first sheet. The other sheets display the (upregulated) marker genes for candidate sets. The R object, markers.all.rds, contains a data frame of marker genes for every candidate set.

  • *.png:

Heatmaps show expression of top 10 (ranked by expression fold change) marker genes from each cell group for candidate sets. They are plotted using Seurat DoHeatmap function.

  • DT_plot.pdf, DT_summary.rds, and DT.rds:

Decision tree output files. A decision tree is built using marker genes for every cell group in every candidate set using R package rpart. All decision trees are plotted in DT_plot.pdf. Classification errors are summarized in the R object DT_summary.rds. DT.rds is the output object from rpart.

  • *tSNE.pdf:

tSNE plots for candidate sets.



Functions in the R script:

  • IKAP: The main function runs the following steps:
    • (1) regress out confounding variables and scale data using Seurat::ScaleData;
    • (2) find variable genes for principal component analysis (PCA) using Seurat::FindVariableGenes;
    • (3) perform PCA using Seurat::RunPCA;
    • (4) estimate k.max;
    • (5) explore ranges of k and nPC and compute gap statistics;
      • GapStatistic, ObservedLogW, and ExpectedLogW:
        Compute gap statistics given a data matrix (used for computing data point Euclidean distances) and K sets of clusters with k = 1 … K. GapStatistic calls ObservedLogW and ExpectedLogW to compute sum of within-group distances for observed data and random data respectively.
      • BottomUpMerge and NearestCluster (5): Generate sets of cell groups by exploring ranges of k and nPC. BottomUpMerge finds k.max groups using Seurat::FindClusters and gradually merges two nearest clusters measured by NearestCluster.
    • (6) select candidate sets;
      • SelectCandidate:
        Select candidate sets based on gap statistics.
    • (7) compute marker genes using Seurat::FindAllMarkers;
      • ComputeMarkers:
        Compute marker genes for all cell groups in all candidate sets using Seurat::FindAllMarkers. In addition, compute Area Under the ROC curve (AUROC) for each marker genes using the R package PRROC. Plot marker gene heatmap(s) using Seurat::DoHeatmap.
    • (8) build decision trees;
      • DecisionTree:
        Build decision trees for all cell groups in all candidate sets using the R package rpart and compute the classification error for each candidate set.
    • (9) plot tSNE plots and PC_K.pdf
      • PlotSummary:
        Mark the best set based on classification error and plot PC_K.pdf


License

MIT license: https://opensource.org/licenses/MIT



Contact

If you have any question, please contact: [email protected]

ikap's People

Contributors

nhlbi-bcb avatar genomicsnx avatar

Stargazers

 avatar Emilia M avatar  avatar Ali Youssef avatar  avatar goldfish avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.