Giter Club home page Giter Club logo

Comments (15)

cgreene avatar cgreene commented on August 21, 2024

Recent preprint evals compared to DeepSEA:
http://dx.doi.org/10.1101/069682

Worth noting that in their TF binding site eval (Supplementary Figure 2), DeepSEA is still the top performing method. Also its nice to see this from an independent study.

from deep-review.

agitter avatar agitter commented on August 21, 2024

Cross referencing that with #83. That issue is currently closed, but could be reopened if we want to use it.

from deep-review.

cgreene avatar cgreene commented on August 21, 2024

@agitter : Sorry for the failed cross-ref. Didn't even realize we had that paper already. Seems like we may want to discuss these two together since it might get to whether or not deep is transformational...

from deep-review.

akundaje avatar akundaje commented on August 21, 2024

@cgreene Whats the negative set they used for the TFBS prediction. Entirely unclear from reading the methods. Also was evaluation of the methods done on held out chromosomes not used in training? E.g. DeepSEA holds out chr8 and 9 and trains on all other chromosomes for all data types. So if they are evaluating performance on sites in the training chromosomes its going to be super-inflated. These benchmark comparisons are generally very poorly done and very poorly described. And of course once again auROC is reported. I would not consider this a reasonable comparative evaluation by any measure.

from deep-review.

cgreene avatar cgreene commented on August 21, 2024

@akundaje : The description isn't sufficient to determine how this evaluation was done. A quick e-mail to the authors might clarify.

from deep-review.

cgreene avatar cgreene commented on August 21, 2024

@akundaje : worth noting that the auROC that they report is in line with the DeepSEA pub: "We found that DeepSEA predicted chromatin features with high accuracy, including TF binding sites, for which the median area under the curve (AUC) was 0.958." This suggests to me that they retained the same eval (chr8 & 9) or that there wasn't much overfitting.

from deep-review.

cgreene avatar cgreene commented on August 21, 2024

[caveats with auROC desirability still apply, but we have to eval what we actually have]

from deep-review.

gokceneraslan avatar gokceneraslan commented on August 21, 2024

In the multilabel/multitask setting, negative set of one TF is the binding site of all the others. So, I think it's quite clear. You can look at the Torch tensor that they provide for more stats on that.

from deep-review.

gokceneraslan avatar gokceneraslan commented on August 21, 2024

Ah ok, I thought this is regarding Deepsea. Apparently it's about LINSIGHT.

from deep-review.

cgreene avatar cgreene commented on August 21, 2024

Yifei Huang replied to my e-mail with a helpful summary of the DeepSEA evaluation in the LINSIGHT paper:

We used all autosomes in our comparisons. I personally think DeepSEA is unlikely to overfit in our comparisons, since we used the DeepSEA functional significance score which was not trained using known TFs or disease variants. The DeepSEA functional significance score aggregated tissue-specific DeepSEA scores using polymorphism data and can be viewed as an indirect measurement of natural selection. Note that in the original DeepSEA paper, sometimes they trained meta-scores using known disease/eQTL variants and these meta-scores might overfit.

from deep-review.

akundaje avatar akundaje commented on August 21, 2024

DeepSEA models are trained on TF Chipseq data so I'm not sure what this
means. Also I was specifically referring to the TF prediction task that
they evaluate and not the variant scoring task. Anyway, I also posted
comments on biorxiv.

On Oct 11, 2016 9:02 AM, "Casey Greene" [email protected] wrote:

Yifei Huang replied to my e-mail with a helpful summary of the DeepSEA
evaluation in the LINSIGHT paper:

We used all autosomes in our comparisons. I personally think DeepSEA is
unlikely to overfit in our comparisons, since we used the DeepSEA
functional significance score which was not trained using known TFs or
disease variants. The DeepSEA functional significance score aggregated
tissue-specific DeepSEA scores using polymorphism data and can be viewed as
an indirect measurement of natural selection. Note that in the original
DeepSEA paper, sometimes they trained meta-scores using known disease/eQTL
variants and these meta-scores might overfit.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#13 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAI7EZBZLTWqOjwU_7rUnucZuiijHRA5ks5qy7LogaJpZM4JdwjG
.

from deep-review.

cgreene avatar cgreene commented on August 21, 2024

@akundaje : Agree that potential for overfitting exists for the TF eval. However, the TF eval that they do gives similar performance to the DeepSEA paper's TF eval IIRC (~.96). To me that suggests little overfitting, since they didn't holdout but DeepSEA did. Did your evals show DeepSEA overfitting if eval on all chromosomes? Sorry for brevity - posting b/w meetings.

from deep-review.

akundaje avatar akundaje commented on August 21, 2024

We haven't explicitly replicated the DeepSEA model but for instance the
Basset model has much stronger prediction (in terms of auPRCs) on the
training set than the validation or test set. Validation and test set
performances are similar. But training performance is often much higher.
auROCs always look much closer for training, validation and test as they
are all inflated and in the 0.9 range. The auPRCs can diverge a lot. I dont
know what the training set performance was for DeepSEA but I expect it will
be much better (in terms of auPRC) than the validation and test sets.

On Tue, Oct 11, 2016 at 12:55 PM, Casey Greene [email protected]
wrote:

@akundaje https://github.com/akundaje : Agree that potential for
overfitting exists for the TF eval. However, the TF eval that they do gives
similar performance to the DeepSEA paper's TF eval IIRC (~.96). To me that
suggests little overfitting, since they didn't holdout but DeepSEA did. Did
your evals show DeepSEA overfitting if eval on all chromosomes? Sorry for
brevity - posting b/w meetings.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#13 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAI7ERmEfL-u-GyTLULxsrZy6VjmfylGks5qy-nJgaJpZM4JdwjG
.

from deep-review.

cgreene avatar cgreene commented on August 21, 2024

Totally agree that auPRC would be more likely to diverge than auROC. It would be great to have those figures for all of these methods.

from deep-review.

cgreene avatar cgreene commented on August 21, 2024

This one gets lots of discussion. We should probably talk about it - tagged for 'study'. The conversation around this one makes it clear to me that we also need to have at least a short section on evaluation. If we can get some people away from AUC in cases where it's not well suited, that'd be a huge win. Not sure if that should go in 'study' or a more general area. Opened #109 to make sure that this discussion makes it into our paper.

from deep-review.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.