The footprints from saezlab

Add RPPA in addition to pathway methods for comparisons?

preliminary results looked like the expected RPPA phosph. do not correlate with mutations

could be interesting to look at this, but TCGA data is doubtful to work

Rewrite perturbation recall paragraph

is outdated

Check list

~~make sure speed score scaling is correct and does not suffer from different magnitudes because of different arrays having different numbers of genes on them~~ we scale per (pathway x method), no issue
bootstrap of signature creation + scores
~~use NES instead of kernel density estimator for GSEA scores~~ using NES + KD
~~scale scores tissue-wise~~? no
...
add here

empirical p-values for glmnet multivariate results

prerequesites

map all scores to specific set of pathways
use same pathway set input for glmnet

kinds of models

pathways
mutations and pathways as one input (to compare mut to pathways)
mutations + pathways (as in, formula syntax - to compare which pathways add most on top)

generation of null models

null models: 100 repetitions, shuffled labels
null models: 1000 repetitions, shuffled labels
real model: one repetition

calc

empirical p-values for models
bar plots for best models (as defined with p-val)

need perturbation recall on external data set

just take the SPEED2 exps where too few controls + rank them with precision-recall curve
LINCS data

tcga-mutations: pancan plots w/o tissue covariate

should do volcano with covariate, maybe highlight in addition VHL that is only mutated in KIRC so we have another angle as well

Modeling of pathway cross-talk

Fit using pathway factors:

That's how we do it now for speed_linear (w/o intercept)

Fit using perturbed vs basal for each pathway:

That's how we do it now for speed_matrix; VEGF surv increase?!
MAPK += EGFR - MAPK drug assocs > EGFR for MEKi

Fit using one matrix with 1/-1 coefficients:

MAPK += EGFR EGFR MEKi resistance (because non-MEK EGFR targets, esp. PI3K)
MAPK, PI3K += EGFR - mut assocs good (TP53, BRAF); EGFR survival increase (why?)
intercept but no xtalk - drug assocs good (MAPK>EGFR; good pvals: Trametinib<1e-15)
no intercept, no xtalk - ??

Drug association figure: condition both ways?

Right now, GO and Reactome associations are conditioned on the best SPEED association to show whether or not they can be explained by the response-genes alone.

We could go one step further: make the statement that SPEED explains most of the GO/Reactome associations, but not the other way around. This would require a conditioning of SPEED scores on either/both of GO/Reactome association scores.

Also, it might make sense to condition on all pathways/all significant pathways - not sure, this might just get too many random correlations with 10 conditioning vars.

Overall, this would make the statement that SPEED outperforms the others stronger.

z-scores are not distributed around 0

but around 5, 9, etc.

blocking #3

Univariate associations

Cell line scores for different signatures

Associations with

Tissue
Drugs
Mutations
Survival

TCGA batch effects

when accessing the TCGA data directly, they provide references to which batch patients belong to; in addition, barcodes provide information about the collection and analysis centres

this info could be used to remove batch effects, if the TCGA didn't already do so - should talk to someone who worked with the data for longer + see if adjustment changes the survival associations

Benchmark

CNA alterated genes in speed sigs

need to impute them using sig-only neighbourhood

does that change the assocs?

also, speed in general should work better with mutation-heavy tumors and CNA-heavy - can we find something interesting there?

Improve paper discussion

For now it's almost only a recap of the results; add a bit more about when pathway expression, individual signatures are useful

Discuss a bit about the limitations

Should probably still add:

Curated exps are a resource for further study
Pathway enrichment is still useful & when it is
Limitation that when a specific signature is available, it may still be better

Pathway curation

Z-scores: min 2 arrays in basal condition, average for perturbed

Format: yaml

---
id        : <pathway.accession.#>
accession : <arrayexpress accession>
platform  : <arrayexpress platform id>
pathway   : <out of set below>
cells     : <human-readable cell description>
treatment : <human-readable treatment description>
effect    : activating|inhibiting
hours     : <int> number of hours treatment

control   :
    - <list of control arrays>
perturbed :
    - <list of perturbed arrays>
...