saezlab / progeny Goto Github PK

View Code? Open in Web Editor NEW

91.0 19.0 19.0 49.01 MB

R package for Pathway RespOnsive GENe activity inference

Home Page: https://saezlab.github.io/progeny/

License: Apache License 2.0

R 85.00% CSS 15.00%

progeny's Introduction

PROGENy: Pathway RespOnsive GENes for activity inference

Overview

PROGENy is resource that leverages a large compendium of publicly available signaling perturbation experiments to yield a common core of pathway responsive genes for human and mouse. These, coupled with any statistical method, can be used to infer pathway activities from bulk or single-cell transcriptomics.

This is an R package for storing the pathway signatures. To infer pathway activities, please check out decoupleR, available in R or python.

Installation

Progeny is available in Bioconductor. In addition, one can install the development version from the Github repository:

## To install the package from Bioconductor
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("progeny")

## To install the development version from the Github repo:
devtools::install_github("saezlab/progeny")

Updates

Since the original release, we have implemented some extensions in PROGENy:

Extension to mouse: Originally PROGENy was developed for the application to human data. In a benchmark study we showed that PROGENy is also applicable to mouse data, as described in Holland et al., 2019. Accordingly, we included new parameters to run mouse version of PROGENy by transforming the human genes to their mouse orthologs.
Expanding Pathway Collection: We expanded human and mouse PROGENy with the pathways Androgen, Estrogen and WNT.
Extension to single-cell RNA-seq data: We showed that PROGENy can be applied to scRNA-seq data, as described in Holland et al., 2020

Citation

Schubert M, Klinger B, Klünemann M, Sieber A, Uhlitz F, Sauer S, Garnett MJ, Blüthgen N, Saez-Rodriguez J. 2018. Perturbation-response genes reveal signaling footprints in cancer gene expression. Nature Communications: 10.1038/s41467-017-02391-6

progeny's People

Contributors

Stargazers

Watchers

Forkers

hrk2109 taigi gabora jan-glx nicolerivera7 matthieurouland mengchengyao standardgalactic federicomarini roramirezf alishaballina shaoyoucheng pchatterjee7 pchat2022 paubadiam deeenes shicheng-guo yunbokai sophon-0

progeny's Issues

Exporting scatter plot to see full list of genes associated with pathways

I did I need help exporting the scatter plot derived after running this following code;


prog_matrix <- getModel("Human", top=100) %>%

    as.data.frame()  %>%

    tibble::rownames_to_column("GeneID")

 

ttop_ETS1vsEV_df <- ttop_ETS1vsEV_matrix %>%

    as.data.frame() %>%

    tibble::rownames_to_column("GeneID")

 

scat_plots <- progeny::progenyScatter(df = ttop_ETS1vsEV_df,

    weight_matrix = prog_matrix,

    statName = "t_values", verbose = FALSE)


plot(scat_plots[[1]]$`MAPK`)

When I plot the scatter plot, I see the graph for an assigned pathway such as MAPK or TNFa. Is there is a way to export the scat_plot file as csv so that I can see the full list of genes associated with each pathway?

Web page deployment faulty?

Hi,

I just checked the web page at https://saezlab.github.io/progeny/ and it looked like this:

Was there maybe something going wrong with the pkgdown deployment? Maybe also just missing a merge of #49?

Provide pre-calculated GDSC/TCGA scaling

Can use those scaling factors without actually having all the data

Might be useful to normalize other panels

progenyPerm - example is for progeny, not progenyPerm

In progeny v1.10.0, in R 4.0.3/RStudio, the help docs example for progenyPerm has this -
gene_expression <- ...
progeny(gene_expression, scale=TRUE, organism="Human", top=100, perm=10000)

Would you please update this to show a progenyPerm example?

Input Data Requirements Not Specified by Vignette

The vignette immediately launches into an analysis but I would like to see some explanation of RNA-seq abundance summaries. Should counts, FPKM, TPM or something else be provided? What should definitely not be input by the user? What if the user already has an edgeR pipeline? I see the example uses DESeq2 but it's unclear what other units of measurement are valid to use. Also, Section 1 has a link to bioRxiv but it could be updated to refer to Nature Communications instead. log-scale microarray values and log-scale RNA-seq counts are quite different numbers in scale. How can the same model work for both?

Leaving out androgen pathway in tutorial?

In your PROGENy tutorial using R, I've noticed that you have left out the androgen pathway when handling data with the script below.

progeny_hmap = pheatmap(t(summarized_progeny_scores_df[,-1]),fontsize=14, 
                        fontsize_row = 10, 
                        color=myColor, breaks = progenyBreaks, 
                        main = "PROGENy (500)", angle_col = 45,
                        treeheight_col = 0,  border_color = NA)

The androgen pathway, probably because it's the first pathway when sorted alphabetically, is removed due to the [, -1] selection in the first line of the script above.

Is this a simple mistake, or is their a specific reason for this?

How to use ConservedFootprints model with PROGENy wrapper

I've found the updated model with Androgen, Estrogen and WNT pathways but how can I use it with the progeny call ("Calculate PROGENy pathway scores from gene expression")?

progeny_matrix_human_v1.csv

progeny pathway score result differences with number of samples!

Hi,
I ran progeny with three different expression matrices. First one with one sample, second with two samples and third with 7 samples.
Please see results below:

Could you please help me understand the differences in pathway scores?

Sample(geneexpn) has different pathway scores when run separately and with one other sample (geneexpn.1) and with 6 other samples (geneexpn.1......geneexpn.6)?

Why is this different and what statistical tests are used in each case?

Thank you!
PS

Potentially include utility functions for other pathway methods

pros

would make it easier to use those on expression matrices and return scores

cons

is this the scope of the package?

Error in t(expr[common_genes, , drop = FALSE]) %*% model[common_genes, : requires numeric/complex matrix/vector arguments

I am getting the same error as @yesitsjess

a <- progeny(input_human)
Error in t(expr[common_genes, , drop = FALSE]) %*% model[common_genes, :
requires numeric/complex matrix/vector arguments

my input file is same as sample input provided input_human
input_human.RData.zip

Please help me out.

input data problem

Hi, I have a DEGs table (by DESeq2) and I am trying to fit it in progeny. The class of my table is:

> class(degs)
[1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"

I think the class is not correct cause I get back this error:

Error in progeny.default(degs, scale = FALSE, organism = "Mouse") : 
  Do not know how to access the data matrix from class data.frame

I would need help to understand which is the correct object format for progeny. Thank you a lot for helping!

select new pathways

Hi,
I'm really enjoying your wonderful package for inflammation research.
Thank you

Which seurat assay to use for progeny

Hi PROGENy authors,

Thanks for developing this great tool.

I found in this tutorial that it uses NormalizeData() and ScaleData() before running PROGENy.

Is it ok to use assay SCT, or would you suggest using the way in the tutorial?

Thanks!

RNA-seq: normalization, gene lengths

Hi,

as the successor to the speed database, we have read the publication with great interest. One question we could not quite answer was regarding how to normalize / transform input data from RNA-seq. The manual states briefly

log-transformed (and possibly variance-stabilized) counts from an RNA-seq experiment.

It does not mention normalizing counts by gene length. Now, each gene contributes differently to the respective pathway score as can be seen when looking at progeny::model. Thus, longer genes would have more influence on the final score if counts/abundance is not normalized by length.

I would appreciate some guidance regarding this question.

Thanks for your time,
Clemens

SPIA-modified z-scores to add information about position of gene in pathway

Hey, I came across PROGENy after reading about another pathway mapping technique SPIA that is also referenced in your paper, and I was wondering if there would be value to combining elements from both techniques. Specifically, I'm thinking about the following:

Take the current z-score matrix, and for each pathway, modify the scores of all on-pathway genes using a relationship like Eq (1) from the original SPIA paper by Tarca et al, where you treat the current z-scores as the expression differences:

https://www.ncbi.nlm.nih.gov/pubmed/18990722

This will create a modified matrix of z-scores that will take into account the location of the genes in the pathway, and allow upstream genes to "contribute their z-score" to their downstream neighbors. The off-pathway genes can be left alone.

Would this be something that you think would be worth adding? I'm fairly new to this field so please excuse me if this is a dumb suggestion!

Missing Androgen, Estrogen and WNT?

Androgen, Estrogen and WNT are missing from my results?

I only have EGFR Hypoxia JAK.STAT MAPK NFkB PI3K TGFb TNFa Trail VEGF p53

Version 1.8.0 which I installed using bioconductor yesterday

only 11 pathways

Using Progeny from BIOCONDUCTOR it seems I can run the experiment only on 11 pathways?

Incapability of handling large matrices.

Dear progeny developers.

I am facing a size issue when working with large Seurat objects ( in my case, 27584 rows and 76680 columns). As the call for progeny::progeny.Seurat() makes use of as.matrix(), it returns the following error:

"Error in asMethod(object) : Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105"

This is due to the size of the object, as rows*columns returns a number of integer values above .Machine$integer.max (2147483647).

Is there a possibility to use Seurat::as.sparse() or similars instead of as.matrix()? If not, how can I solve this issue?

Best,
Enrique

p53 pathway target MDM2 weighting

Dear PROGENy team,

I am interested in the p53 pathway and have noted that the target MDM2 has a weight of +8.19. However, MDM2 is a well-established negative regulator of the p53 pathway. Please could you explain why MDM2 has a positive weighting if it is a negative regulator? Could it be because p53 activations increases MDM2 in a negative feedback loop?

Many thanks,
Oliver

Shift coordinates on the right plot in progenyScatter()

Currently progeny::progenyScatter() creates combined plots for every pathway with a scatter plot (left) and density plot (right). I guess that some elements from the X-axis ggplot have been removed for simplicity on the right plot. These elements that are removed include: axis text, axis ticks, etc (see

progeny/R/progenySuppFunc.r

Lines 71 to 74 in 08883c7

 xlim(minstat, maxstat) + theme_minimal() + 

 theme(legend.position = "none", axis.text.x = element_blank(), 

 axis.ticks.x = element_blank(), axis.title.y = element_blank(), 

 axis.text.y = element_blank(), axis.ticks.y = element_blank(),

). The theme is also set to minimal (theme_minimal(), which do not draw lines of the margin plots. However, when these elements are removed, the cartesian coordinates of the Y-axis do not correspond each other in the paired plots (left and right) anymore. That is, one would be skewed/shift as compared to the other. You could test it just printing a horizontal line (geom_hline(yintercept = 0) in both. The reason is that ggplot resize the plot.margins to use the most of plotting area, while the second one present more of this area with the current settings.
One quick solution is to set those elements as transparent/same as background instead element_blank(). Or tune plot.margins.

Does the proteomics data can also use this tool to get the prediction of the pathway activity?

Dear Authors,

Thanks a lot for the amazing great tools!

May I have your guidance that if the progeny can be used to process the proteomics data?

Say the proteomics data have covert the protein/peptide accession to the gene symbol. If suitable, how to normalize/transform the original abundance value, TPM...? As the proteomics data are much less items than the gene items in the RNAseq.

Thank you very much! Best!

WNT pathway

I am trying to calculate the pathway scores for WNT signaling, but all I can get are these pathways:
[1] "EGFR" "Hypoxia" "JAK.STAT" "MAPK" "NFkB" "PI3K" "TGFb" "TNFa" "Trail" "VEGF" "p53" .

On the website it says : "We expanded human and mouse PROGENy with the pathways Androgen, Estrogen and WNT."

How do I get these new pathways?

Thanks, Katharina

Error in t(expr[common_genes, , drop = FALSE]) %*% model[common_genes, : requires numeric/complex matrix/vector arguments

Following your most simple example and getting an error:

# load the downloaded files
gene_table <- readr::read_tsv("Cell_line_RMA_proc_basalExp.txt")

# we need genes in rows and samples in columns
gene_expr <- data.matrix(gene_table[,3:ncol(gene_table)])
colnames(gene_expr) <- sub("DATA.", "", colnames(gene_expr), fixed=TRUE)
rownames(gene_expr) <- gene_table$GENE_SYMBOLS

library(progeny)
pathways <- progeny(gene_expr)

Error in t(expr[common_genes, , drop = FALSE]) %*% model[common_genes,  : 
  requires numeric/complex matrix/vector arguments

Modify main progeny function to also search for human and mouse models in the local environment.

Dear progeny developers,

I am experiencing some bugs while using progeny due to the implemented behaviour in progeny::progeny(). While running the code, adapted from the vignette for scRNAseq datasets, it prompts me the error that the human model (model_human_full) is not available in the global environment.

And this is where my issue stems from. I am writing an R package as a means of workflow manager, so applying progeny is one part of it. For this, I imported the package in the local environment of my function, to avoid unnecessary global declarations in my code. Although the variable "model_human_full" is defined, accessible, and ready to use in the local environment, the error log will keep arising due to the progeny::progeny() function first searching for it in the global environment.

I am pretty sure this was coded this way since it is expected that the used would just load in the package using library(progeny). However, this is considered as of bad practice when writing R packages, enforcing the use of :: notations as much as possible (source).

Therefore, I think it would be great to enhance this function by expanding the search of the model not only to the global, but also to the local environment.

Best,
Enrique

how to apply progeny on differential expression analysis result?

Based on paper "Robustness and applicability of transcription factor and pathway analysis tools on single-cell RNA-seq data", progeny can be used for pathway analysis instead of GSEA. However, the progeny example shows how to use single cell gene matrix rather than differential expression analysis result. Would you please provide an example that how to use progeny based on differential expression analysis result (gene symbol with pvalue and fold change)? Thank you so much.

Issue when Importing progeny for another package

Hi,
we're working on another package where we import the functionality of progeny, and we're handling the dependencies with the usual Imports line and statements.
Still, the function call as the required model_human_full object is not exported by progeny itself, so it would require a call to library(progeny), which is to be avoided if one does not want the note via R CMD check.

Can this model_human_full (and while we're at it) data object be directly exported/called explicitly in the getModel function?

progeny/R/progenySuppFunc.r

Line 185 in 535f548

getModel <- function(organism = "Human", top= 100) {

I think that might be enough to solve the issue on our side.
Thanks in advance for looking into it!
Federico

Cholmod error 'problem too large' at file ../Core/cholmod_dense.c,

I tried to use progeny to analyze about 70 thousands cells and I got the error

Error in asMethod(object) :
Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 102
Calls: progeny ... progeny -> as.matrix -> as.matrix.Matrix -> as -> asMethod
Execution halted

how to fix it ?

object "model_human_full" is not exported by 'namespace:progeny''

Dear progeny developers,

Thank you for your package, it is wonderful :-)
But right now I am having an import problem, could you help? Thx!
Reproducible example:

remotes::install_github('bhagwataditya/autonomics@devel', upgrade = FALSE)
require(magrittr)
file <- autonomics::download_data('billing16.rnacounts.txt')
object <- autonomics::read_rnaseq_counts(file, plot=FALSE)
object %<>% autonomics::fit_progeny(symbolvar = 'gene_name')

          Error in getModel(organism, top = top) : object 'model_human_full' not found

Add function to test for differences between conditions

Basically just a wrapper for difference tests of multiple pathway scores for given conditions

Alternatively, just suggest using limma::lmFit

Buggy row names after permutation approach

After running progeny with perm = 1000 there are buggy row names in the output (see below). Probably an artefact from @adugourd initial permutation code

Return Gene List from progenyScatter

I'm really enjoying your wonderful package, progeny

In pathway enrichment analysis and following progenyScatter, I can plot genes according to pathway.
I show the result of my analysis below;

I'm wondering I can get the list of gene names shown on figure
In this analysis, ISG15, MX1, IFI44L...
I checked the arguments that consist progenyScatter, I can't find any gene names
How can I return gene names from progenyScatter and each pathway?

Warn/error if unable to match HGNC symbols

score calculation still needs an error/warning of not all symbols could be matched

gene weights and directions

Good day.
When I was browsing the model in the model_*_full.rda files, I found the following:
The 2 topmost p-value genes in the MAPK pathway are Dusp6 and Spry4, both are inhibitors/negative regulators of the MAPK pathway, and they have high, positive weights. On the other hand in the p53 pathway most top hits are direct p53 target genes, also with positive weights.
I'm wondering that can't this distort the 'activity' of the MAPK pathway if negative regulators have such high positive weights?

scatter plot adjustment

Dear Authors,

Thanks a lot for your great software of progeny!

May I apply for your guidance on the scatter plot?

Is there a way to modify the plots, say, add arrows to the dot to the gene
For the density plot, it seems not always reflect the difference between different pathways
For the dots for a pathway, is it necessary to label all the dots, or if there is a threshold to only labels the top10 genes

Thank you very much for your guidance!

Whether progeny could be used on gene-chip data

Hi
I'm using PROGENy recently
It is a really powerful tool for deciphering pathway activity from bulk and single-cell data.

I am a little confused that whether this package could be used on gene-chip data for inferring pathway activity.

Thanks

p-value for progeny on scRNA-seq clusters

Hello!
I used progeny on a cluster of a scRNA-seq dataset for inferring pathways activity in two different conditions. I followed this tutorial:
https://saezlab.github.io/progeny/articles/ProgenySingleCell.html. I would like to have a p-value for each pathway, but I can only see the progeny score. Is it present somewhere?
Alternatively, I see that in this case it is calculated an average and a standard deviation of the pathways for each cluster, should I take these values and perform a statistics by myself? If yes, what type of test would you suggest?

Thank you!

Possible to make corpus available?

Hi,

this is such a nice project and well put together. I was wondering if it would be possible to make publicly available the corpus on which progeny is trained, as it would greatly benefit others when training new models or comparing performance with different training sets per pathway.

Thank you,
Clemens

Progeny Function does control Gene Names duplicated

The permutation function does not run with this example from the vignette, because there are gene names duplicated. We control that in the permutation function, but the main progeny function does not control it. We have to check this further since duplicated genes can be contributing twice to the pathways scores...

Provide means of scaling by control

For perturbation experiments, only the control should be scaled.

Can provide a ref= argument to list for sample names that are the reference condition.

Expand progeny function to class data.frame -> progeny.data.frame()

Bias in 14 pathway collection

Hi,

14 pathways are mostly cancer related as the 2018 paper also focuses on cancer gene expression(Shubert et al. 2018). Can we say this would generate a bias when we look at non-cancer gene expression sets?

I am looking at immunological disease and pathway results from Progeny and transcriptional factor results from Dorothea looks biased to cancer related pathways and TFs. Though, it shouldn't be a problem for Dorothea as I am using Omnipath resources. But how about Progeny?

Is it possible to link progeny-dorothea-carnival pipeline with a larger pathway database?

All the best,
Asuman

Question of the Estrogen Pathway

Hi,
I'm really enjoying your wonderful package for tumor research.

I am a little confused of the Estrogen Pathway which calculated by the package. Does the Estrogen pathway refers to Estrogen Receptor Pathway activity or refers to pathway for Estrogen synthesis？

Best，
Xin

NES for each pathway

Hi
I'm using PROGENy recently
It is really power tool to deciphering RNA World

Recently I read vignette, https://github.com/saezlab/transcriptutorial/blob/master/scripts/03_Pathway_activity_with_Progeny.md
I'm wondering I can get NES from seurat, not matrix
In this vignette, NES calculated from limma-calculated matrix.
But I want to calculate NES from seurat

I followed this tutorial, https://saezlab.github.io/progeny/articles/ProgenySingleCell.html
Is there any way to calculate NES from seurat, followed by this tutorial?

Thanks

Use of data slot instead of scale.data slot in Progeny.

Dear progeny developers.

I have observed that the documentation for progeny::progeny() asks for the normalized data to be used as input. However, in the code for progeny::progeny.Seurat(), the slot being used in Seurat::GetAssayData() is "data" instead of "scale.data". I might be wrong here, but does the "data" slot also contain the normalized values from Seurat? Would it make more sense to go for the "scale.data" slot instead?

Best,
Enrique

Add new pathways

Hi,

if possible, even in an approximate way, could you please indicate how to add new pathways? I am interested in a pathway not currently present, namely Mitotic Spindle Checkpoint.

Prepare for upcoming Seurat v5 release

I am opening this issue as a notification because progeny is listed here as a package that relies (depends/imports/suggests) on Seurat. As you may know, we recently released Seurat v5 as a beta in March of this year, with new updates for spatial, multimodal, and massively scalable analysis. For more information on updates and improvements, check out our website https://satijalab.org/seurat/.

We are now preparing to release Seurat v5 to CRAN, and plan to submit it on October 23rd. While we have tried our best to keep things backward-compatible, it is possible that updates to Seurat and SeuratObject might break your existing functionality or tests. We wanted to reach out before the new version is on CRAN, so that there's time to report issues/incompatibilities and prepare you for any changes in your code base that might be necessary.

We apologize for any disruption or inconvenience, but hope that the improvements to Seurat v5 will benefit your users going forward.
To test the upcoming release, you can install Seurat from the seurat5 branch using the instructions available on this page: https://satijalab.org/seurat/articles/install.

Thank you!
Seurat v5 team

use with single gene list and scores

Hi
I have a single list of genes and scores created by a meta analysis between control and test samples. Would it be possible to use progeny with this data? Or must it require a control and test separately?
Thanks
Bryan

	xlim(minstat, maxstat) + theme_minimal() +
	theme(legend.position = "none", axis.text.x = element_blank(),
	axis.ticks.x = element_blank(), axis.title.y = element_blank(),
	axis.text.y = element_blank(), axis.ticks.y = element_blank(),

saezlab / progeny Goto Github PK

progeny's Introduction

PROGENy: Pathway RespOnsive GENes for activity inference

Overview

Installation

Updates

Citation

progeny's People

Contributors

Stargazers

Watchers

Forkers

progeny's Issues

Recommend Projects

Recommend Topics

Recommend Org