Giter Club home page Giter Club logo

Comments (12)

mikekucera avatar mikekucera commented on September 13, 2024

Hi Ruth, I just tested this with some fake data and it seems to be working as expected. Can you please send me the original non-sorted ranks file you tried this with? Also where do you see the "weird results", is it just wrong ranks in the heatmap or something else?

from enrichmentmapapp.

risserlin avatar risserlin commented on September 13, 2024

here is the sorted ranks file (I changed the file ending to txt so I could uploated it here) -
TCGA-61-2088fakeranks_sorted.txt

here is the unsorted rank file -
TCGA-61-2088fakeranks_notsorted.txt

GSEA enrichment results file from sorted analysis -
TCGA-61-2088_fgsea_enr_results_sorted_seed42.txt

GSEA enrichment results file from not sorted analysis -

TCGA-61-2088_fgsea_enr_resultsnotsorted_seed42.txt

expression file -
TCGA-61-2088fakeexpression.txt

GMT file can be found here - https://download.baderlab.org/EM_Genesets/August_01_2019/Human/symbol/Human_GOBP_AllPathways_with_GO_iea_August_01_2019_symbol.gmt

from enrichmentmapapp.

mikekucera avatar mikekucera commented on September 13, 2024

from enrichmentmapapp.

risserlin avatar risserlin commented on September 13, 2024

The EM results are the same for both analysis
If you click on any geneset and try and sort the heatmap by ranks from the sorted file or the unsorted file you will see the issue
Not sorted ranks file -
Screen Shot 2022-11-29 at 12 09 55 PM

Sorted ranks file
Screen Shot 2022-11-29 at 12 10 41 PM

from enrichmentmapapp.

mikekucera avatar mikekucera commented on September 13, 2024

Looks like you're using FGSEA. The presence of the ES and NES columns in the enrichment file is tricking the data set resolver into thinking its from GSEA. I'll have to add a check for the padj column so EM knows its from FGSEA and not GSEA.

from enrichmentmapapp.

risserlin avatar risserlin commented on September 13, 2024

But I want EM to think that it is GSEA. I modified the fgsea files and computed the rank at max so I could tap into the GSEA features in EM
I only realized after submitting that the requirement for the rank file to be sorted might be GSEA specific. If that is the case then we just need to specify it somewhere. That is why I marked it as a question and not as a bug.

from enrichmentmapapp.

mikekucera avatar mikekucera commented on September 13, 2024

Ok, but I assume you had to enter the files manually using the "..." buttons in the dialog?

from enrichmentmapapp.

risserlin avatar risserlin commented on September 13, 2024

No. I did it through RCy3 build command.

em_command = paste('enrichmentmap build analysisType="GSEA" ',
"gmtFile=",file.path(output_filepath,data_directory,basename(gmt_file)),
'pvalue=',pvalue_threshold, 'qvalue=',qvalue_threshold,
'similaritycutoff=',0.375,
'coefficients=',"COMBINED",
'enrichmentsDataset1=',fakeenr_filename_host,
'expressionDataset1=',fakeexp_name_host,
'ranksDataset1=',fakernk_name_host,
'filterByExpressions=false',
sep=" ")

from enrichmentmapapp.

mikekucera avatar mikekucera commented on September 13, 2024

This looks like a bug. When the ranks file is parsed each gene is assigned a "score", which is the actual value from the rank file, and a "rank" which is basically the position (line number) of the gene in the rank file. I'm guessing this is done because sometimes an EM network is created without a rank file, so sometimes the scores are not available? The heat map is sorting the ranks column based on the "rank", but its showing the "score", that's why it looks broken.

But shouldn't I be able to compute the "rank" by just sorting the genes by "score" and then assigning an index to it?

I can't just fix the heat map because this mismatch of rank and score could affect other things. I think this needs to be fixed in the parser.

from enrichmentmapapp.

risserlin avatar risserlin commented on September 13, 2024

Ok. Now I remember all the intricacies with GSEA ranks files. The expectation is the rank file is sorted in the order GSEA used it to calculate the enrichments.

The reason for the score and the rank is linked to GSEA's leading edge. The column "Rank at max" gives us the rank of the gene where the ES score is at its maximum and any genes with lower rank are part of the leading edge. The reason why we take the rank file from GSEA as it is and don't re-rank it is because if there are ties in the data changing the order would potentially change the composition of the leading edge (even if the ranks we calculated differed only slightly). I think that there were bugs where the one or two genes were missing from the leading edge and it came down to slightly different rank files.

Maybe instead of sorting the unsorted rank file maybe it is better to put in an alert "Your rank file is not sorted". We can give the user the option to have EM sort it for you or keep it as is.

from enrichmentmapapp.

mikekucera avatar mikekucera commented on September 13, 2024

I like the idea of just warning the user. My worry about changing the way we compute ranks/scores is that it could have other effects that we aren't aware of. Basically I'm worried it could cause other bugs.

from enrichmentmapapp.

risserlin avatar risserlin commented on September 13, 2024

agreed. GSEA ranks and leading edge calculations are messy. Best to not tamper.

from enrichmentmapapp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.