https://dx.doi.org/10.1

Recent preprint evals compared to DeepSEA: <a href="http://dx.doi.org/10.1101/0696

Cross referencing that with <a class="issue-link js-issue-link" data-error-text="Faile

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

DeepSEA: Predicting effects of noncoding variants with deep learning–based sequence model about deep-review HOT 15 CLOSED

cgreene commented on August 21, 2024

DeepSEA: Predicting effects of noncoding variants with deep learning–based sequence model

from deep-review.

Comments (15)

cgreene commented on August 21, 2024

Recent preprint evals compared to DeepSEA:
http://dx.doi.org/10.1101/069682

Worth noting that in their TF binding site eval (Supplementary Figure 2), DeepSEA is still the top performing method. Also its nice to see this from an independent study.

from deep-review.

agitter commented on August 21, 2024

Cross referencing that with #83. That issue is currently closed, but could be reopened if we want to use it.

from deep-review.

cgreene commented on August 21, 2024

@agitter : Sorry for the failed cross-ref. Didn't even realize we had that paper already. Seems like we may want to discuss these two together since it might get to whether or not deep is transformational...

from deep-review.

akundaje commented on August 21, 2024

@cgreene Whats the negative set they used for the TFBS prediction. Entirely unclear from reading the methods. Also was evaluation of the methods done on held out chromosomes not used in training? E.g. DeepSEA holds out chr8 and 9 and trains on all other chromosomes for all data types. So if they are evaluating performance on sites in the training chromosomes its going to be super-inflated. These benchmark comparisons are generally very poorly done and very poorly described. And of course once again auROC is reported. I would not consider this a reasonable comparative evaluation by any measure.

from deep-review.

cgreene commented on August 21, 2024

@akundaje : The description isn't sufficient to determine how this evaluation was done. A quick e-mail to the authors might clarify.

from deep-review.

cgreene commented on August 21, 2024

@akundaje : worth noting that the auROC that they report is in line with the DeepSEA pub: "We found that DeepSEA predicted chromatin features with high accuracy, including TF binding sites, for which the median area under the curve (AUC) was 0.958." This suggests to me that they retained the same eval (chr8 & 9) or that there wasn't much overfitting.

from deep-review.

cgreene commented on August 21, 2024

[caveats with auROC desirability still apply, but we have to eval what we actually have]

from deep-review.

gokceneraslan commented on August 21, 2024

In the multilabel/multitask setting, negative set of one TF is the binding site of all the others. So, I think it's quite clear. You can look at the Torch tensor that they provide for more stats on that.

from deep-review.

gokceneraslan commented on August 21, 2024

Ah ok, I thought this is regarding Deepsea. Apparently it's about LINSIGHT.

from deep-review.

cgreene commented on August 21, 2024

Yifei Huang replied to my e-mail with a helpful summary of the DeepSEA evaluation in the LINSIGHT paper:

We used all autosomes in our comparisons. I personally think DeepSEA is unlikely to overfit in our comparisons, since we used the DeepSEA functional significance score which was not trained using known TFs or disease variants. The DeepSEA functional significance score aggregated tissue-specific DeepSEA scores using polymorphism data and can be viewed as an indirect measurement of natural selection. Note that in the original DeepSEA paper, sometimes they trained meta-scores using known disease/eQTL variants and these meta-scores might overfit.

from deep-review.

akundaje commented on August 21, 2024

DeepSEA models are trained on TF Chipseq data so I'm not sure what this
means. Also I was specifically referring to the TF prediction task that
they evaluate and not the variant scoring task. Anyway, I also posted
comments on biorxiv.

On Oct 11, 2016 9:02 AM, "Casey Greene" [email protected] wrote:

Yifei Huang replied to my e-mail with a helpful summary of the DeepSEA
evaluation in the LINSIGHT paper:

We used all autosomes in our comparisons. I personally think DeepSEA is
unlikely to overfit in our comparisons, since we used the DeepSEA
functional significance score which was not trained using known TFs or
disease variants. The DeepSEA functional significance score aggregated
tissue-specific DeepSEA scores using polymorphism data and can be viewed as
an indirect measurement of natural selection. Note that in the original
DeepSEA paper, sometimes they trained meta-scores using known disease/eQTL
variants and these meta-scores might overfit.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#13 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAI7EZBZLTWqOjwU_7rUnucZuiijHRA5ks5qy7LogaJpZM4JdwjG
.

from deep-review.

cgreene commented on August 21, 2024

@akundaje : Agree that potential for overfitting exists for the TF eval. However, the TF eval that they do gives similar performance to the DeepSEA paper's TF eval IIRC (~.96). To me that suggests little overfitting, since they didn't holdout but DeepSEA did. Did your evals show DeepSEA overfitting if eval on all chromosomes? Sorry for brevity - posting b/w meetings.

from deep-review.

akundaje commented on August 21, 2024

We haven't explicitly replicated the DeepSEA model but for instance the
Basset model has much stronger prediction (in terms of auPRCs) on the
training set than the validation or test set. Validation and test set
performances are similar. But training performance is often much higher.
auROCs always look much closer for training, validation and test as they
are all inflated and in the 0.9 range. The auPRCs can diverge a lot. I dont
know what the training set performance was for DeepSEA but I expect it will
be much better (in terms of auPRC) than the validation and test sets.

On Tue, Oct 11, 2016 at 12:55 PM, Casey Greene [email protected]
wrote:

@akundaje https://github.com/akundaje : Agree that potential for
overfitting exists for the TF eval. However, the TF eval that they do gives
similar performance to the DeepSEA paper's TF eval IIRC (~.96). To me that
suggests little overfitting, since they didn't holdout but DeepSEA did. Did
your evals show DeepSEA overfitting if eval on all chromosomes? Sorry for
brevity - posting b/w meetings.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#13 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAI7ERmEfL-u-GyTLULxsrZy6VjmfylGks5qy-nJgaJpZM4JdwjG
.

from deep-review.

cgreene commented on August 21, 2024

Totally agree that auPRC would be more likely to diverge than auROC. It would be great to have those figures for all of these methods.

from deep-review.

cgreene commented on August 21, 2024

This one gets lots of discussion. We should probably talk about it - tagged for 'study'. The conversation around this one makes it clear to me that we also need to have at least a short section on evaluation. If we can get some people away from AUC in cases where it's not well suited, that'd be a huge win. Not sure if that should go in 'study' or a more general area. Opened #109 to make sure that this discussion makes it into our paper.

from deep-review.

DeepSEA: Predicting effects of noncoding variants with deep learning–based sequence model about deep-review HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent