Giter Club home page Giter Club logo

ciberamp's Introduction


CiberAMP | An R package to identify differential mRNA expression linked to somatic copy number variations in cancer datasets

CiberAMP is an R package that uses differential expression analyses to stablish accurate correlations between specific SCNVs and changes in expression in the genes affected by them. The algorithm has been designed to be an easy-to-access tool for the TCGA, the largest database in the world with genomic and transcriptomic data ofr more than 10,000 samples of 33 different human cancers.

Unlike other methods, CiberAMP can yield information on: (i) SCNV-DEGs (somatic copy number variations associated differentially expressed genes) in a cohort of TCGA tumor samples (ii) The type of copy number variation associated with each SCNV-DEG in terms of expression pattern and genomic context (iii) Insights on the potential functional relevance of each identified SCNV-DEG

Installation from GitHub

# Install devtools from CRAN
install.packages("devtools")

# Or the development version from GitHub:
# install.packages("devtools")
devtools::install_github("r-lib/devtools")

# Install ciberAMP by devtools
devtools::install_github("vqf/ciberAMP", dependencies = TRUE)

Usage

# Load the library
library(ciberAMP)

# Write your function
x <- ciberAMP(genes = c(), cohorts = c(), pat.percentage = 0, writePath = "PATH_TO_FOLDER")

Where:

  • genes The list of genes of interest. It is a vector of gene official symbols according to the HGNC.
  • cohorts The list of TCGA cohorts to be analyzed. By default, CiberAMP will be run on all TCGA cohorts. You can consult the official TCGA cohort IDs here: https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/tcga-study-abbreviations or in CiberAMP's manuscript (Table S1)
  • writePath The path to the folder to save results TIP: if you want to re-run CiberAMP, if you use the same folder where all data was stored, you will not need re-download all data from the TCGA again. This can save you a lot of space in your disk, but be careful, results will be overwritten as well.
  • pat.percentage The minimum % of copy number altered samples per gene that will be analyzed.
  • pp.cor.cut Threshold to filter samples by array-array intensity correlation (AICC) analysis (0.6 by default). Passed to TCGAanalyze_Preprocessing.
  • norm.method Method of normalization, such as gcContent or geneLength (default). See TCGAbiolinks R package for help.
  • filt.method Method of filtering, such as quantile (default), varFilter, filter1, filter2. See TCGAbiolinks R package for help.
  • filt.qnt.cut Threshold selected as quantile for filtering. Defaults to 0.25 (first quantile).
  • filt.var.func Filtering function. Defaults to IQR. See genefilter documentation for available methods.
  • filt.var.cutoff Threshold for filt.var.funct. See TCGAbiolinks R package for help.
  • filt.eta Parameter for filter1. Defaults to 0.05. See TCGAbiolinks R package for help.
  • filt.FDR.DEA Threshold to filter differentially expressed genes according their corrected p-value.
  • filt.FC Minimum log2(FC) value to considered a gene as differentially expressed. Defaults to 0.58 (that corresponds to a differential expression of at least 50%).
  • cna.thr Threshold level for copy-number variation analysis. Can be Deep, Shallow or Both
  • exp.mat Custom normalized RNAseq counts expression matrix of only tumors. Defaults to NULL.
  • cna.mat Custom copy-number analysis matrix of only tumors. Defaults to NULL.

Looking into CiberAMP results

CiberAMP returns a list of 3 data frames:

The first data frame contains all SCNV-DEGs and genes differentially expressed between tumor and normal samples exclusively. The secon data frame contains all the SCNV-DE known cancer drivers. These two data frames have the same format and in each column we can find:

  • Column 1 -> Gene approved symbols
  • Columns 2:4 -> Results from the differential expression analysis between tumor and healthy samples.
  • Column 5 -> ID of the queried TCGA cohort.
  • Column 6:9 -> Results from the differential expression analysis between copy number altered and diploid tumor samples.
  • Column 10 -> ID of the queried TCGA cohort.
  • Column 11 -> The type of comparison made: amplified vs. diploid or deleted vs. diploid.
  • Column 12 -> Recurrence of gene amplifications or deletions in the cohort.
  • Column 14 -> Barcodes of the samples harboring such SCNVs.

The third data frame contains the information about the significant co-occurring amplification or deletions between the SCNV-DEGs and known cancer drivers:

  • Column 1 -> queried gene approved symbol
  • Column 2 -> cancer driver gene approved symbol
  • Column 3 -> p-value of the comparison
  • Column 4 -> oddsRatio of the comparison
  • Column 8-11 -> number of samples with (1) or without (0) SCNVs of each gene of the pair compared
  • Column 12 -> type of interaction (Co-occurrence or Mutual exclusivity)
  • Column 13 -> gene symbols of the pair of genes compared
  • Column 14 -> TCGA cohort ID.

Looking into CiberAMP's logic classifier results

The logic classification algorithm integrated in CiberAMP's package allows the user to rate the potential candidates subdividing them into four subgroups.

First, the SCN-associated DEGs reported from the previous step are divided based on their significant genomic interactions with any COSMIC CGC oncogene in each cohort. Secondly, these genes are further subdivided regarding their genomic location inside or outside enriched genomic regions. Finally, within each of the four resulting subgroups, genes are rated based on, first, their recurrency and, secondly, their SCN-associated FDR adjusted p-value.

# Load the library
library(ciberAMP)

# Write your function
x <- CiberAMP.classifier(res1 = NULL, res3 = NULL, width.window = 10000000)

Where:

  • res1 The first data frame reported from the previous function
  • res3 The third data frame reported from the previous function
  • width.window The window length in base pairs used for genomic enriched clusters calculation.

The outcomes of this function is a list of 4 data frames. The first data frame contains all the SCNV-DEGs that are not co-amplified or co-deleted with any known cancer driver gene and outside any enriched cluster. The second data frame conatins all SCNV-DEGs that are not co-amplified or co-deleted with any known cancer driver gene and located within an enriched cluster. The third data frame containes all SCNV-DEGs that are co-amplified or co-deleted with a known cancer driver gene and outside any enriched cluster. The fourth data frame contains all SCNV-DEGs that are co-amplified or co-deleted with a known cancer driver gene and within an enriched gene cluster.

ciberamp's People

Contributors

caloto avatar vqf avatar

Stargazers

L. Francisco Lorenzo-Martín avatar

Watchers

James Cloos avatar  avatar  avatar

ciberamp's Issues

tumors_with_normal cannot be set to .tumors_N()

For some reason there is no way to set this to .tumors_N() function... This ends in an error since during execution tumors_with_normal is not found.

Error in tumor %in% tumors.with.normal : object 'tumors.with.normal' not found

3. tumor %in% tumors.with.normal at functions.r#22

  1. .downloadExpression(tumor) at cnaintexp.r#97
  2. CNAintEXP(genes = c("DGKG", "CDKN2A"), cohorts = c("HNSC"))

TCGAbiolinks Primary Tumor or Primary Solid Tumor

Este es un problema grave... Hay que hacer alguna función interna que evalúe la versión de TCGAbiolinks instalada y decida uno u otro.

Yo en mi script lo cambio siempre porque sino me da error, lo mismo que creo que te pasa a ti.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.