Comments (12)
Hi Ruth, I just tested this with some fake data and it seems to be working as expected. Can you please send me the original non-sorted ranks file you tried this with? Also where do you see the "weird results", is it just wrong ranks in the heatmap or something else?
from enrichmentmapapp.
here is the sorted ranks file (I changed the file ending to txt so I could uploated it here) -
TCGA-61-2088fakeranks_sorted.txt
here is the unsorted rank file -
TCGA-61-2088fakeranks_notsorted.txt
GSEA enrichment results file from sorted analysis -
TCGA-61-2088_fgsea_enr_results_sorted_seed42.txt
GSEA enrichment results file from not sorted analysis -
TCGA-61-2088_fgsea_enr_resultsnotsorted_seed42.txt
expression file -
TCGA-61-2088fakeexpression.txt
GMT file can be found here - https://download.baderlab.org/EM_Genesets/August_01_2019/Human/symbol/Human_GOBP_AllPathways_with_GO_iea_August_01_2019_symbol.gmt
from enrichmentmapapp.
from enrichmentmapapp.
The EM results are the same for both analysis
If you click on any geneset and try and sort the heatmap by ranks from the sorted file or the unsorted file you will see the issue
Not sorted ranks file -
from enrichmentmapapp.
Looks like you're using FGSEA. The presence of the ES and NES columns in the enrichment file is tricking the data set resolver into thinking its from GSEA. I'll have to add a check for the padj column so EM knows its from FGSEA and not GSEA.
from enrichmentmapapp.
But I want EM to think that it is GSEA. I modified the fgsea files and computed the rank at max so I could tap into the GSEA features in EM
I only realized after submitting that the requirement for the rank file to be sorted might be GSEA specific. If that is the case then we just need to specify it somewhere. That is why I marked it as a question and not as a bug.
from enrichmentmapapp.
Ok, but I assume you had to enter the files manually using the "..." buttons in the dialog?
from enrichmentmapapp.
No. I did it through RCy3 build command.
em_command = paste('enrichmentmap build analysisType="GSEA" ',
"gmtFile=",file.path(output_filepath,data_directory,basename(gmt_file)),
'pvalue=',pvalue_threshold, 'qvalue=',qvalue_threshold,
'similaritycutoff=',0.375,
'coefficients=',"COMBINED",
'enrichmentsDataset1=',fakeenr_filename_host,
'expressionDataset1=',fakeexp_name_host,
'ranksDataset1=',fakernk_name_host,
'filterByExpressions=false',
sep=" ")
from enrichmentmapapp.
This looks like a bug. When the ranks file is parsed each gene is assigned a "score", which is the actual value from the rank file, and a "rank" which is basically the position (line number) of the gene in the rank file. I'm guessing this is done because sometimes an EM network is created without a rank file, so sometimes the scores are not available? The heat map is sorting the ranks column based on the "rank", but its showing the "score", that's why it looks broken.
But shouldn't I be able to compute the "rank" by just sorting the genes by "score" and then assigning an index to it?
I can't just fix the heat map because this mismatch of rank and score could affect other things. I think this needs to be fixed in the parser.
from enrichmentmapapp.
Ok. Now I remember all the intricacies with GSEA ranks files. The expectation is the rank file is sorted in the order GSEA used it to calculate the enrichments.
The reason for the score and the rank is linked to GSEA's leading edge. The column "Rank at max" gives us the rank of the gene where the ES score is at its maximum and any genes with lower rank are part of the leading edge. The reason why we take the rank file from GSEA as it is and don't re-rank it is because if there are ties in the data changing the order would potentially change the composition of the leading edge (even if the ranks we calculated differed only slightly). I think that there were bugs where the one or two genes were missing from the leading edge and it came down to slightly different rank files.
Maybe instead of sorting the unsorted rank file maybe it is better to put in an alert "Your rank file is not sorted". We can give the user the option to have EM sort it for you or keep it as is.
from enrichmentmapapp.
I like the idea of just warning the user. My worry about changing the way we compute ranks/scores is that it could have other effects that we aren't aware of. Basically I'm worried it could cause other bugs.
from enrichmentmapapp.
agreed. GSEA ranks and leading edge calculations are messy. Best to not tamper.
from enrichmentmapapp.
Related Issues (20)
- enhancement RCy3: enrichmentmap build with Great: not able to filter using binomial q value (only hypergeometric test available) HOT 1
- Mastermap with enrichmentmap of Great results: gs_size should be changed to be TotalGenes instead of ObsGenes TotalGenes HOT 6
- cytoscape Enrichment Map no responce HOT 1
- Show data set colors for AutoAnnotate summary network.
- testing EM 3.3.5 ( enhancement): be able to see full name in heatmap table headers HOT 1
- testing EM3.3.5: changing Data Set Colors change all edges to color dataset1 HOT 1
- testing EM 3.3.5: hiding one dataset reset the style. HOT 2
- dataset resolver doesn't work for FGSEA HOT 1
- EM 3.3.5: force directed layout does not work on hidden edges (but organic layout does) HOT 2
- EM 3.3.5: question? aggregated summary HOT 1
- Heatmap only displaying expression/ranks for dataset where geneset is significant eventhough it is found in the second dataset HOT 1
- EM can't recover from fdr q-value of NA - infinite loop
- Ambiguous error message - try and create EM with just enrichments, expression and ranks but forget gmt file
- Error - Gene sets in enrichment file missing from GMT file HOT 7
- EM unable to parse directories as before in Cytoscape Beta2 3.10
- Leading edge not working in EM 3.3.5
- enrichmentmap build command - two generic datasets not working properly HOT 1
- FGSEA-service: plumber rounds q-values, affects filtering HOT 2
- Expression file with blank lines at end causes unhelpful error message. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from enrichmentmapapp.