Giter Club home page Giter Club logo

scorect's Introduction

scoreCT

Automated Cell Type Annotation.

Script to automate cell type annotation on scRNA-seq data analyzed with the Scanpy package (https://github.com/theislab/scanpy). The script uses a reference CSV provided by user with markers associated to curated cell types, which is used to score and assign a cell type to each louvain cluster in the Anndata object (which encapsulates the scRNA-seq analyzed beforehand). To run, louvain clustering and gene ranking per group must have been performed and be present in the Anndata object.

Getting Started

scRNA-seq analysis packages allow to perform clustering and get biomarkers for inferred cluster in order to explore cell types in a population of cells. However, manual curation of cell types can be long and tidious. The goal of this repo is to provide biologist with a script to automate cell type annotation of clusters in data analyzed with Scanpy, by using their own list of markers and curated cell types. The formating of this table is left to the user, but is ideally a CSV with each row being a cell type, followed by its associated markers.

Prerequisites

Scanpy (Wolf et al., 2018) must be installed to run the prerequesite analysis. See Scanpy repo for tutorials on how to run Scanpy on your data. Scanpy can be installed by running:

pip install scanpy

Installing

By cloning this repository

Clone this repo in your home folder by running:

git clone https://github.com/LucasESBS/scoreCT

Then run:

python setup.py install

With conda

Use the following command to install with conda

conda install -c lucasesbs scorect

Tutorial

See the jupyter notebook in the example folder to run an example. Example data are in scoreCT/data/

cd scoreCT/example
jupyter notebook

scorect's People

Contributors

lucasesbs avatar mrjeppard avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

scorect's Issues

AttributeError: 'DataFrame' object has no attribute 'append'

When running tuto_scoreCT.ipynb in Google Colab, I get an error at the scorect() call: "AttributeError: 'DataFrame' object has no attribute 'append'"

Here is my copy of the notebook: https://colab.research.google.com/drive/1BXeWh4fmSJLZwgGrm7DOX7NL4NR1V4po?usp=sharing

Note I had to add 3 lines to clone and install scoreCT prior to importing the package:

!git clone https://github.com/LucasESBS/scoreCT`
%cd scoreCT
!python setup.py install

Improvement

Writing a summary function score_summary(my_dict) to produce a summary of scores for each cluster after running score_clusters() as a table for user visualization, or as a print during score_clusters().

Rank bins to deciles?

Like in the title. Rank is nice but especially for low information cell types the markers may need to stand out more.

Add several score scales

Instead of just linear scale, add possibility of choosing other score scale to be more stringent between bins.

cell type assignment

Dear LucasESBS, thank you very much for creating this.
I have been trying to assign cell types but it seems to produce NaN. Do you have any ideas why? Thanks.

celltype_assign = ct.assign_celltypes(cluster_assignment=cluster_assign, ct_pval_df=ct_pval, ct_score_df=ct_score, cutoff=0.1)

Add to anndata object

adata2.obs['scorect'] = celltype_assign
celltype_assign
AAACCTGCAAAGGCGT-1-02_CARTP00G1LI_AMP_ch2_GEX-5pr-v2-0 NaN
AAACCTGCATTTGCCC-1-02_CARTP00G1LI_AMP_ch2_GEX-5pr-v2-0 NaN
AAACCTGGTAGCAAAT-1-02_CARTP00G1LI_AMP_ch2_GEX-5pr-v2-0 NaN
AAACCTGTCATTGCCC-1-02_CARTP00G1LI_AMP_ch2_GEX-5pr-v2-0 NaN
AAACCTGTCTGCAGTA-1-02_CARTP00G1LI_AMP_ch2_GEX-5pr-v2-0 NaN
..
TTTGCGCTCATAACCG-1-09_CARTP00G1LI_BDR_ch2_GEX-5pr-v2-0 NaN
TTTGGTTCAAGCGATG-1-09_CARTP00G1LI_BDR_ch2_GEX-5pr-v2-0 NaN
TTTGGTTCAAGTCATC-1-09_CARTP00G1LI_BDR_ch2_GEX-5pr-v2-0 NaN
TTTGGTTTCGGCGGTT-1-09_CARTP00G1LI_BDR_ch2_GEX-5pr-v2-0 NaN
TTTGTCATCGTCACGG-1-09_CARTP00G1LI_BDR_ch2_GEX-5pr-v2-0 NaN
Name: leiden_res_0.7, Length: 4818, dtype: float64
cluster_assign
AAACCTGCAAAGGCGT-1-02_CARTP00G1LI_AMP_ch2_GEX-5pr-v2-0 11
AAACCTGCATTTGCCC-1-02_CARTP00G1LI_AMP_ch2_GEX-5pr-v2-0 7
AAACCTGGTAGCAAAT-1-02_CARTP00G1LI_AMP_ch2_GEX-5pr-v2-0 11
AAACCTGTCATTGCCC-1-02_CARTP00G1LI_AMP_ch2_GEX-5pr-v2-0 11
AAACCTGTCTGCAGTA-1-02_CARTP00G1LI_AMP_ch2_GEX-5pr-v2-0 3
..
TTTGCGCTCATAACCG-1-09_CARTP00G1LI_BDR_ch2_GEX-5pr-v2-0 0
TTTGGTTCAAGCGATG-1-09_CARTP00G1LI_BDR_ch2_GEX-5pr-v2-0 7
TTTGGTTCAAGTCATC-1-09_CARTP00G1LI_BDR_ch2_GEX-5pr-v2-0 2
TTTGGTTTCGGCGGTT-1-09_CARTP00G1LI_BDR_ch2_GEX-5pr-v2-0 12
TTTGTCATCGTCACGG-1-09_CARTP00G1LI_BDR_ch2_GEX-5pr-v2-0 13
Name: leiden_res_0.7, Length: 4818, dtype: category
Categories (17, int64): [0, 1, 2, 3, ..., 13, 14, 15, 16]
ct_pval
BCells plasma myeloid fibroendo myofibro lamina_propria ... tuft endocrine erythrocytes NK_cells monocytes CustomSet
0 1.204387e-02 4.070538e-10 1.000000e+00 1.000000e+00 1.000000 1.000000e+00 ... 6.347508e-02 1.0 1.0 1.0 1.00000 1.0
1 2.306778e-02 1.000000e+00 1.000000e+00 1.000000e+00 1.000000 1.000000e+00 ... 1.000000e+00 1.0 1.0 1.0 1.00000 1.0
10 2.306778e-02 1.000000e+00 1.515579e-12 1.000000e+00 1.000000 1.000000e+00 ... 1.385330e-02 1.0 1.0 1.0 0.00491 1.0
11 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000 1.000000e+00 ... 1.000000e+00 1.0 1.0 1.0 1.00000 1.0
12 1.000000e+00 1.000000e+00 1.000000e+00 3.435101e-12 1.000000 3.402924e-02 ... 1.000000e+00 1.0 1.0 1.0 1.00000 1.0
13 1.000000e+00 1.000000e+00 1.000000e+00 1.536033e-28 1.000000 8.539887e-06 ... 1.000000e+00 1.0 1.0 1.0 1.00000 1.0
14 3.402924e-02 1.000000e+00 7.647075e-09 1.000000e+00 1.000000 1.000000e+00 ... 2.666548e-04 1.0 1.0 1.0 1.00000 1.0
15 1.000000e+00 1.000000e+00 1.000000e+00 4.871892e-15 0.019492 5.576572e-02 ... 1.000000e+00 1.0 1.0 1.0 1.00000 1.0
16 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000 1.000000e+00 ... 1.031787e-17 1.0 1.0 1.0 1.00000 1.0
2 2.306778e-02 1.000000e+00 1.000000e+00 1.000000e+00 1.000000 1.000000e+00 ... 1.000000e+00 1.0 1.0 1.0 1.00000 1.0
3 2.306778e-02 1.000000e+00 1.000000e+00 1.000000e+00 1.000000 1.000000e+00 ... 1.000000e+00 1.0 1.0 1.0 1.00000 1.0
4 1.000000e+00 1.000000e+00 1.000000e+00 3.901415e-06 1.000000 1.000000e+00 ... 1.000000e+00 1.0 1.0 1.0 1.00000 1.0
5 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000 1.000000e+00 ... 1.000000e+00 1.0 1.0 1.0 1.00000 1.0
6 1.339561e-17 1.000000e+00 1.000000e+00 1.000000e+00 1.000000 1.000000e+00 ... 2.637989e-02 1.0 1.0 1.0 1.00000 1.0
7 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000 1.000000e+00 ... 1.000000e+00 1.0 1.0 1.0 1.00000 1.0
8 1.000000e+00 1.000000e+00 1.000000e+00 1.692935e-35 1.000000 2.657604e-13 ... 1.000000e+00 1.0 1.0 1.0 1.00000 1.0
9 1.000000e+00 1.000000e+00 1.000000e+00 2.309834e-20 0.024291 1.000000e+00 ... 1.000000e+00 1.0 1.0 1.0 1.00000 1.0

[17 rows x 20 columns]

ct_score
BCells plasma myeloid fibroendo myofibro lamina_propria sox6_fibroblasts ... paneth tuft endocrine erythrocytes NK_cells monocytes CustomSet
0 5.0 19.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 1.0 0.0 0.0 0.0 0.0 0.0
1 4.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0
10 4.0 0.0 28.0 0.0 0.0 0.0 0.0 ... 5.0 5.0 0.0 0.0 0.0 3.0 0.0
11 0.0 0.0 0.0 0.0 0.0 0.0 1.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0
12 0.0 0.0 0.0 29.0 0.0 3.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0
13 0.0 0.0 0.0 62.0 0.0 11.0 2.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0
14 3.0 0.0 20.0 0.0 0.0 0.0 0.0 ... 0.0 9.0 0.0 0.0 0.0 0.0 0.0
15 0.0 0.0 0.0 35.0 2.0 1.0 5.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0
16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 36.0 0.0 0.0 0.0 0.0 0.0
2 4.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 4.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 15.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0
5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 3.0 0.0 0.0 0.0 0.0 0.0 0.0
6 35.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 4.0 0.0 0.0 0.0 0.0 0.0
7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 3.0 0.0 0.0 0.0 0.0 0.0 0.0
8 0.0 0.0 0.0 75.0 0.0 27.0 4.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0
9 0.0 0.0 0.0 46.0 1.0 0.0 21.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.