Giter Club home page Giter Club logo

footprints's People

Contributors

mschubert avatar rramirezf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

footprints's Issues

Check list

  • make sure speed score scaling is correct and does not suffer from different magnitudes because of different arrays having different numbers of genes on them we scale per (pathway x method), no issue
  • bootstrap of signature creation + scores
  • use NES instead of kernel density estimator for GSEA scores using NES + KD
  • scale scores tissue-wise? no
  • ...
  • add here

empirical p-values for glmnet multivariate results

prerequesites

  • map all scores to specific set of pathways
  • use same pathway set input for glmnet

kinds of models

  • pathways
  • mutations and pathways as one input (to compare mut to pathways)
  • mutations + pathways (as in, formula syntax - to compare which pathways add most on top)

generation of null models

  • null models: 100 repetitions, shuffled labels
  • null models: 1000 repetitions, shuffled labels
  • real model: one repetition

calc

  • empirical p-values for models
  • bar plots for best models (as defined with p-val)

Array processing

  • Fix ArrayExpress package to work for enough arrays
    • handle duplicate rows in .sdrf better
    • A-AFFY-141
    • A-AFFY-33
    • A-AFFY-44
    • A-AFFY-37
    • A-GEOD-9419
    • A-AFFY-143
    • A-GEOD-16239
    • A-GEOD-10520
    • A-GEOD-10200
    • A-GEOD-8300
    • A-AFFY-1
    • A-GEOD-13667
    • ...any more...?
    • Agilent 1-color arrays
    • Illumina arrays
  • Map expression to genes
  • Assemble Z-score matrix + index for all pathways

Modeling of pathway cross-talk

Fit using pathway factors:

  • That's how we do it now for speed_linear (w/o intercept)

Fit using perturbed vs basal for each pathway:

  • That's how we do it now for speed_matrix; VEGF surv increase?!
  • MAPK += EGFR - MAPK drug assocs > EGFR for MEKi

Fit using one matrix with 1/-1 coefficients:

  • MAPK += EGFR EGFR MEKi resistance (because non-MEK EGFR targets, esp. PI3K)
  • MAPK, PI3K += EGFR - mut assocs good (TP53, BRAF); EGFR survival increase (why?)
  • intercept but no xtalk - drug assocs good (MAPK>EGFR; good pvals: Trametinib<1e-15)
  • no intercept, no xtalk - ??

Drug association figure: condition both ways?

Right now, GO and Reactome associations are conditioned on the best SPEED association to show whether or not they can be explained by the response-genes alone.

We could go one step further: make the statement that SPEED explains most of the GO/Reactome associations, but not the other way around. This would require a conditioning of SPEED scores on either/both of GO/Reactome association scores.

Also, it might make sense to condition on all pathways/all significant pathways - not sure, this might just get too many random correlations with 10 conditioning vars.

Overall, this would make the statement that SPEED outperforms the others stronger.

Univariate associations

Cell line scores for different signatures

  • GDSC baseline expression
    • new SPEED
    • old SPEED
    • other genesets
  • TCGA
    • new SPEED
    • old SPEED
    • other genesets

Associations with

  • Tissue
  • Drugs
  • Mutations
  • Survival

TCGA batch effects

when accessing the TCGA data directly, they provide references to which batch patients belong to; in addition, barcodes provide information about the collection and analysis centres

this info could be used to remove batch effects, if the TCGA didn't already do so - should talk to someone who worked with the data for longer + see if adjustment changes the survival associations

Benchmark

  • RPPA data
  • Gene sets
    • BioCarta
    • Reactome
    • Gatza
    • GO
    • old SPEED
    • Gatza signatures

CNA alterated genes in speed sigs

need to impute them using sig-only neighbourhood

does that change the assocs?

also, speed in general should work better with mutation-heavy tumors and CNA-heavy - can we find something interesting there?

Improve paper discussion

For now it's almost only a recap of the results; add a bit more about when pathway expression, individual signatures are useful

Discuss a bit about the limitations

Should probably still add:

  • Curated exps are a resource for further study
  • Pathway enrichment is still useful & when it is
  • Limitation that when a specific signature is available, it may still be better

Pathway curation

Z-scores: min 2 arrays in basal condition, average for perturbed

Format: yaml

---
id        : <pathway.accession.#>
accession : <arrayexpress accession>
platform  : <arrayexpress platform id>
pathway   : <out of set below>
cells     : <human-readable cell description>
treatment : <human-readable treatment description>
effect    : activating|inhibiting
hours     : <int> number of hours treatment

control   :
    - <list of control arrays>
perturbed :
    - <list of perturbed arrays>
...

Pathways included:

  • EGFR
  • MAPK
  • PI3K
  • H2O2
  • Hypoxia
  • Trail
  • p53
  • TNFa
  • NFkB
  • VEGF
  • TGFb
  • Insulin
  • IL1
  • RAR
  • notch
  • PPAR
  • Estrogen
  • JAK-STAT
  • Wnt
  • acidosis ?
  • senescence ?
  • starvataion ?

Modelling

Model creation

  • Linear model
  • SEM lavaan not feasible for number of genes

QC

  • scores of model on original perturbation files
  • GO enrichment of "gold standard" categories

20k gene assoc not better than 5k

na.omit() for zscores leaves 5k genes, top assoc is p53 ~ 1e-19, some EGFR ~ 1e-14.

for 20k genes is is the same, that's a bit surprising

Revert scaling in input experiments?

Right now, I scale the resampled scores per experiment.

  • don't scale in scores, recompute LOOCV model scores
  • column-scale in heatmap
  • scale for pathway ROC curves
  • don't scale for t-SNE plots (?)

Clustering

run clustering

  • batch correct tcga+gdsc
  • run NMF clustering

clustering analysis

  • for different NMF clusters, do some sort of silhouette + discard weak points
  • use this to also select with k (in case cophenetic implies more than one)

clustering plots

  • tSNE for vis, but color NMF clusters + shape tumor/cell lines

Analysis ideas

  • relationship between signalling footprints and pathway expression
    • overlap
    • which signal activates transcription of which pathway?
    • which pathways are more posttranslationally controlled, which more by expression? (would need phospho for this, but could probably do drug/survival response)
    • is the effect the same for cell lines and primary tumors?
    • for each cancer type, where do pathway expression/footprints correlate more? less?
      • should do a pancan and tissue-specific correlation plot here (resp<->resp and resp<->expr)
  • TCGA pathway activation + enrichment of variants that activate those
    • e.g., we find PI3K_variantTP53_R282W and enrichment of this variant in some cancers (BRCA)
      • does the same cancer need to have PI3K activated?
      • does p53 bind more PI3K-responsive genes with modified motif?
  • tcga_pathway_per_mutation: use GISTIC scores as regression coeffs? (= include +/-1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.