Giter Club home page Giter Club logo

cohorts's Introduction

PyPI Build Status Coverage Status

Cohorts

Cohorts is a library for analyzing and plotting clinical data, mutations and neoepitopes in patient cohorts.

It calls out to external libraries like topiary and caches the results for easy manipulation.

Cohorts requires Python 3 (3.3+). We are no longer maintaining compatability with Python 2. For context, see this Python 3 statement.

Installation

You can install Cohorts using pip:

pip install cohorts

Features

  • Data management: construct a Cohort consisting of Patients with Samples.
  • Use varcode and topiary to generate and cache variant effects and predicted neoantigens.
  • Provenance: track the state of the world (package and data versions) for a given analysis.
  • Aggregation functions: built-in functions such as missense_snv_count, neoantigen_count, expressed_neoantigen_count; or create your own functions.
  • Plotting: survival curves via lifelines, response/no response plots (with Mann-Whitney and Fisher's Exact results), ROC curves. Example: cohort.plot_survival(on=missense_snv_count, how="pfs").
  • Filtering: filter collections of variants/effects/neoantigens by, for example, variant statistics.
  • Pre-define data sets to work with. Example: cohort.as_dataframe(join_with=["tcr", "pdl1"]).

In addition, several other libraries make use of cohorts:

Quick Start

One way to get started using Cohorts is to use it to analyze TCGA data.

As an example, we can create a cohort using query_tcga:

from query_tcga import cohort, config

# provide authentication token
config.load_config('config.ini')

# load patient data
blca_patients = cohort.prep_patients(project_name='TCGA-BLCA',
                                     project_data_dir='data')

# create cohort
blca_cohort = cohort.prep_cohort(patients=blca_patients,
                                 cache_dir='data-cache')

Then, use plot_survival() to summarize a potential biomarker (e.g. snv_count) by survival:.

from cohorts.functions import snv_count
blca_cohort.plot_survival(snv_count, how='os', threshold='median')

Which should produce a summary of results including this plot:

Survival plot example

We could alternatively use plot_benefit() to summarize OS>12mo instead of survival:

blca_cohort.plot_benefit(snv_count)

Benefit plot example

See the full example in the quick-start notebook

Building from Scratch

patient_1 = Patient(
    id="patient_1",
    os=70,
    pfs=24,
    deceased=True,
    progressed=True,
    benefit=False
)
    
patient_2 = Patient(
    id="patient_2",
    os=100,
    pfs=50,
    deceased=False,
    progressed=True,
    benefit=False
)

cohort = Cohort(
    patients=[patient_1, patient_2],
    cache_dir="/where/cohorts/results/get/saved"
)

cohort.plot_survival(on="os")
sample_1_tumor = Sample(
    is_tumor=True,
    bam_path_dna="/path/to/dna/bam",
    bam_path_rna="/path/to/rna/bam"
)

patient_1 = Patient(
    id="patient_1",
    ...
    snv_vcf_paths=["/where/my/mutect/vcfs/live",
                   "/where/my/strelka/vcfs/live"]
    indel_vcfs_paths=[...],
    tumor_sample=sample_1_tumor,
    ...
)

cohort = Cohort(
    ...
    patients=[patient_1]
)

cohorts's People

Contributors

arahuja avatar armish avatar e5c avatar hammer avatar jburos avatar tavinathanson avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.