I noticed that the parameters in <a href="https://github.com/isayev/

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Parameter sets about ase_ani HOT 4 CLOSED

isayev commented on May 25, 2024

Parameter sets

from ase_ani.

Comments (4)

proteneer commented on May 25, 2024

PS the main issue is that we're having some difficultites trying to reproduce the gdb-10 results, (we've repro'd the val/test results).

from ase_ani.

Jussmith01 commented on May 25, 2024

Okay, a couple of things. 1) The network in this repo is not the same one as that from the paper. This one was trained to the ANI-1 data set + some amino acid and peptide data. Also, through hyper parameter searching we determine the AEV parameters used here work just as well as for the 768 sized AEV on the ANI-1 + peptide data set. 2) In the paper we trim energies > 300kcal from each set of conformers minimum for the GBD-10 test. This may not have been explicitly mentioned in the paper, but is clear from the range in figure 4 that this is what we are comparing. The high energy GDB-10 stuff is VERY hard to fit to if you are using the trimmed (@275kcal/mol) version of the ANI-1 data set (which is what we used in the paper and recently published as the "low" energy part of the ANI-1 data set).

As it turns out I recently built an ensemble of original ANI-1 networks (5 of our model trained to a 5 fold cross-validation style split of the ANI-1 "low" energy data set) to compare on a new benchmark I have been developing. The new networks were developed with the same parameter file used in this repository. For the ensemble we get a prediction of 1.7kcal/mol RMSE. You can view these results here (this notebook will also show you how we do the comparison):

https://github.com/Jussmith01/ANI-Tools/blob/master/notebooks/eval_testset.ipynb

If you'd like me to make the ANI-1 ensemble available on this repo for comparison I can do that.

from ase_ani.

proteneer commented on May 25, 2024

@Jussmith01 Thank you for the very detailed explanation and the notebook. We've confirmed internally and our test scores become significantly better after pruning the high energy conformations. For many of the applications we care about, we typically only consider the conformations in <100kcal/mol range (you report using 300kcal/mol).

We did some analysis on the training set as well, of the 22 million conformations you provide, about 6 million of them have >100 kcal/mol energy differences from the minimum. It looks like this dataset has a fairly large number of outliers, some with rather interesting geometries (smaller C=O bonds, as an example).

from ase_ani.

Jussmith01 commented on May 25, 2024

6M > 100kcal/mol of the 22M sounds about right. With regular normal mode sampling it will by default bias conformations towards energy minima. We have since refined our methods and have a soon to be submitted paper that covers this topic a little. As for weird geometries, it can happen when using a harmonic approximation to determine the structural perturbations. However, it is a very cheap way to generate non-equilibrium conformations and from what we have seen it works well when you filter out high energy conformations (which tend to be the weird structures).

from ase_ani.

Parameter sets about ase_ani HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent