Giter Club home page Giter Club logo

Comments (4)

proteneer avatar proteneer commented on May 25, 2024

PS the main issue is that we're having some difficultites trying to reproduce the gdb-10 results, (we've repro'd the val/test results).

from ase_ani.

Jussmith01 avatar Jussmith01 commented on May 25, 2024

Okay, a couple of things. 1) The network in this repo is not the same one as that from the paper. This one was trained to the ANI-1 data set + some amino acid and peptide data. Also, through hyper parameter searching we determine the AEV parameters used here work just as well as for the 768 sized AEV on the ANI-1 + peptide data set. 2) In the paper we trim energies > 300kcal from each set of conformers minimum for the GBD-10 test. This may not have been explicitly mentioned in the paper, but is clear from the range in figure 4 that this is what we are comparing. The high energy GDB-10 stuff is VERY hard to fit to if you are using the trimmed (@275kcal/mol) version of the ANI-1 data set (which is what we used in the paper and recently published as the "low" energy part of the ANI-1 data set).

As it turns out I recently built an ensemble of original ANI-1 networks (5 of our model trained to a 5 fold cross-validation style split of the ANI-1 "low" energy data set) to compare on a new benchmark I have been developing. The new networks were developed with the same parameter file used in this repository. For the ensemble we get a prediction of 1.7kcal/mol RMSE. You can view these results here (this notebook will also show you how we do the comparison):

https://github.com/Jussmith01/ANI-Tools/blob/master/notebooks/eval_testset.ipynb

If you'd like me to make the ANI-1 ensemble available on this repo for comparison I can do that.

from ase_ani.

proteneer avatar proteneer commented on May 25, 2024

@Jussmith01 Thank you for the very detailed explanation and the notebook. We've confirmed internally and our test scores become significantly better after pruning the high energy conformations. For many of the applications we care about, we typically only consider the conformations in <100kcal/mol range (you report using 300kcal/mol).

We did some analysis on the training set as well, of the 22 million conformations you provide, about 6 million of them have >100 kcal/mol energy differences from the minimum. It looks like this dataset has a fairly large number of outliers, some with rather interesting geometries (smaller C=O bonds, as an example).

from ase_ani.

Jussmith01 avatar Jussmith01 commented on May 25, 2024

6M > 100kcal/mol of the 22M sounds about right. With regular normal mode sampling it will by default bias conformations towards energy minima. We have since refined our methods and have a soon to be submitted paper that covers this topic a little. As for weird geometries, it can happen when using a harmonic approximation to determine the structural perturbations. However, it is a very cheap way to generate non-equilibrium conformations and from what we have seen it works well when you filter out high energy conformations (which tend to be the weird structures).

from ase_ani.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.