Comments (4)
PS the main issue is that we're having some difficultites trying to reproduce the gdb-10 results, (we've repro'd the val/test results).
from ase_ani.
Okay, a couple of things. 1) The network in this repo is not the same one as that from the paper. This one was trained to the ANI-1 data set + some amino acid and peptide data. Also, through hyper parameter searching we determine the AEV parameters used here work just as well as for the 768 sized AEV on the ANI-1 + peptide data set. 2) In the paper we trim energies > 300kcal from each set of conformers minimum for the GBD-10 test. This may not have been explicitly mentioned in the paper, but is clear from the range in figure 4 that this is what we are comparing. The high energy GDB-10 stuff is VERY hard to fit to if you are using the trimmed (@275kcal/mol) version of the ANI-1 data set (which is what we used in the paper and recently published as the "low" energy part of the ANI-1 data set).
As it turns out I recently built an ensemble of original ANI-1 networks (5 of our model trained to a 5 fold cross-validation style split of the ANI-1 "low" energy data set) to compare on a new benchmark I have been developing. The new networks were developed with the same parameter file used in this repository. For the ensemble we get a prediction of 1.7kcal/mol RMSE. You can view these results here (this notebook will also show you how we do the comparison):
https://github.com/Jussmith01/ANI-Tools/blob/master/notebooks/eval_testset.ipynb
If you'd like me to make the ANI-1 ensemble available on this repo for comparison I can do that.
from ase_ani.
@Jussmith01 Thank you for the very detailed explanation and the notebook. We've confirmed internally and our test scores become significantly better after pruning the high energy conformations. For many of the applications we care about, we typically only consider the conformations in <100kcal/mol range (you report using 300kcal/mol).
We did some analysis on the training set as well, of the 22 million conformations you provide, about 6 million of them have >100 kcal/mol energy differences from the minimum. It looks like this dataset has a fairly large number of outliers, some with rather interesting geometries (smaller C=O bonds, as an example).
from ase_ani.
6M > 100kcal/mol of the 22M sounds about right. With regular normal mode sampling it will by default bias conformations towards energy minima. We have since refined our methods and have a soon to be submitted paper that covers this topic a little. As for weird geometries, it can happen when using a harmonic approximation to determine the structural perturbations. However, it is a very cheap way to generate non-equilibrium conformations and from what we have seen it works well when you filter out high energy conformations (which tend to be the weird structures).
from ase_ani.
Related Issues (20)
- wB97 single vs double precision? HOT 1
- Clarification on training the model HOT 2
- Cuda 9.1/9.2 support HOT 8
- Portable version? HOT 15
- Would angluar shifts from 0 to +pi be sufficient? HOT 2
- geometry optimization doesn't work HOT 1
- problem installing ASED3 HOT 4
- libAEV.so can not open HOT 2
- What is 'default_neighborlist' method in aev.py doing? HOT 3
- computing energy of a benzene crystal results in a RuntimeError HOT 7
- Example of generating the feature vector from a geometry HOT 1
- `ani_quicktest.py` example requires changes to script HOT 4
- Libraries for OpenMM-ANI HOT 2
- libAtomicNNP.so: undefined symbol HOT 1
- Periodic Boundary Conditions HOT 4
- ANI-2X
- Import Error: cannot open shared object file: No such file or directory HOT 5
- Problem excecuting ani_quicktest.py with Geforce RTX 3080 HOT 6
- ANI-2x training data set
- Inclusion of Point Charge HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ase_ani.