Comments (11)
I'm not claiming that my code is ready for prime time yet, but I do have a somewhat automated pipeline for building the AMBER files using OpenEye and Antechamber.
https://github.com/choderalab/gbff/blob/master/code/rebuild_freesolv.py
from freesolv.
I think we decided to not use gaff2xml
for this purpose until we have a stable public API, so my plan was to ingest the dependencies into a standalone script for FreeSolv
until gaff2xml
(or some other tool) stabilized and we could reduce complexity by using these external functions.
from freesolv.
Agreed. But we may still want to check that the standalone script does the same thing as the latest gaff2xml functions. A lot of the code we have floating around differs in the various steps used to prepare ligands, which leads to different charges. In particular, the setSampleHydrogens
.
Another possibilty is for me to roll out a release of gaff2xml. Even if the API is not stable, we can still just specific a hard-coded version number for installation via conda. I'm happy with whatever here.
from freesolv.
What's the issue with setSampleHydrogens? Is it an only an issue with
ambiguity about protonation state/tautomer state? The compounds here in
general are quite straightforward at this point; there are only a handful
where there would be any concern at all about tautomer state.
John, what's your preference on the approaches Kyle suggested?
Also, I should point out that Chris Bayly just pointed me to the
documentation on how to generate canonical charges using his multiconformer
procedure -
https://docs.eyesopen.com/toolkits/cookbook/python/modeling/am1-bcc.html.
This is what we should be using for charging.
On Sat, Apr 18, 2015 at 10:17 AM, Kyle Beauchamp [email protected]
wrote:
Agreed. But we may still want to check that the standalone script does the
same thing as the latest gaff2xml functions. A lot of the code we have
floating around differs in the various steps used to prepare ligands, which
leads to different charges. In particular, the setSampleHydrogens.Another possibilty is for me to roll out a release of gaff2xml. Even if
the API is not stable, we can still just specific a hard-coded version
number for installation via conda. I'm happy with whatever here.—
Reply to this email directly or view it on GitHub
#20 (comment).
David Mobley
[email protected]
949-385-2436
from freesolv.
Christopher Bayly's canonical AM1-BCC recipe, which we have now adopted, uses
omega.SetSampleHydrogens(True)
which @kyleabeauchamp has found leads to modest charge differences for ethanol due to placement of the proton.
@kyleabeauchamp has also found that
omega.SetRMSThreshold(rms)
seems to behave independently of the specified rms
value, and omega.SetRMSThreshold(0)
behaves completely differently from omitting a call to this method altogether. We're not yet sure if this is a bug. Using or omitting this line leads to charge differences for ethyl acrylate.
I think we can either use a specific gaff2xml
version to regenerate these or ingest the methods that we are using into scripts here---either method will work from a technical standpoint.
@kyleabeauchamp makes a good point about testing, however---gaff2xml
currently has an automated testing framework to ensure correct behavior of its methods, while this project does not yet have automated testing framework.
The biggest question might therefore be "What kind of testing do we need to ensure correctness of FreeSolv?". It seems that not much in the way of testing process existed previously (which is why all of these errors have come to light). Perhaps we want to figure out how we want things to be tested first, and that will make the path forward clearer?
from freesolv.
What is being done about the omega.SetRMSThreshold issue you mentioned?
Have you talked with support & Bayly?
Can you elaborate on how gaff2xml is tested and how that might be extended
to FreeSolv?
There is no obvious way that I can tell to check that the
topology/coordinate files generated are in fact correct, though we can
certainly check a number of side issues:
- Do the number of water molecules/water atoms in the topology file and
coordinate file match, and match what's expected? - Do the charges in the topology file match the expected charges, within
rounding? - Do the atom types in the topology file match those in the
AMBER-atom-typed .mol2 file from antechamber? - others?
- Do the number of atoms in the topology file match what's expected?
You could also do more complicated testing, such as:
- If you turn the resulting topology file back into an OE molecule (which
would require writing a parser) do you get the same isomeric SMILES as the
original molecule?
On Mon, Apr 20, 2015 at 10:02 AM, John Chodera [email protected]
wrote:
Christopher Bayly's canonical AM1-BCC recipe
http://docs.eyesopen.com/toolkits/cookbook/python/modeling/am1-bcc.html,
which we have now adopted, usesomega.SetSampleHydrogens(True)
which @kyleabeauchamp https://github.com/kyleabeauchamp has found leads
to modest charge differences for ethanol due to placement of the proton.@kyleabeauchamp https://github.com/kyleabeauchamp has also found that
omega.SetRMSThreshold(rms)
seems to behave independently of the specified rms value, and
omega.SetRMSThreshold(0) behaves completely differently from omitting a
call to this method altogether. We're not yet sure if this is a bug. Using
or omitting this line leads to charge differences for ethyl acrylate.I think we can either use a specific gaff2xml version to regenerate these
or ingest the methods that we are using into scripts here---either method
will work from a technical standpoint.@kyleabeauchamp https://github.com/kyleabeauchamp makes a good point
about testing, however---gaff2xml currently has an automated testing
framework to ensure correct behavior of its methods, while this project
does not yet have automated testing framework.The biggest question might therefore be "What kind of testing do we need
to ensure correctness of FreeSolv?". It seems that not much in the way of
testing process existed previously (which is why all of these errors have
come to light). Perhaps we want to figure out how we want things to be
tested first, and that will make the path forward clearer?—
Reply to this email directly or view it on GitHub
#20 (comment).
David Mobley
[email protected]
949-385-2436
from freesolv.
What is being done about the omega.SetRMSThreshold issue you mentioned?
Have you talked with support & Bayly?
@kyleabeauchamp emailed Christopher Bayly on 11 Apr 2015, and I've just pinged him again today. We can email OpenEye support for clarification if that doesn't work.
Can you elaborate on how gaff2xml is tested and how that might be extended
to FreeSolv?
There is a suite of nosetests used to test individual functions and run some datasets through them:
https://github.com/choderalab/gaff2xml/tree/master/gaff2xml/tests
There is no obvious way that I can tell to check that the topology/coordinate files generated are in fact correct,
Supposing technical constraints were not a problem, what would the true test of "correctness" be? Would it be identical free energies of hydration in each program (gromacs, AMBER, etc)? Identical potential energies for various conformations?
we can certainly check a number of side issues:
All of these things (many of which caused trouble in the past) would be good to test.
- If you turn the resulting topology file back into an OE molecule (which would require writing a parser) do you get the same isomeric SMILES as the original molecule?
@swails has a neat tool called ParmEd that is very much getting close to being able to do a lot of these tests, so it's possible we can utilize it in a testing framework.
from freesolv.
There is no obvious way that I can tell to check that the
topology/coordinate files generated are in fact correct,Supposing technical constraints were not a problem, what would the true
test of "correctness" be? Would it be identical free energies of hydration
in each program (gromacs, AMBER, etc)? Identical potential energies for
various conformations?When I said "correct", what I had in mind was actually a different issue -
that the topology in question contains the intended set of force field
parameters for the intended molecule with the intended protonation/tautomer
state. For the sake of terminology, let's call the issue you're raising
"consistent" rather than correct. That is, we would like FreeSolv to
contain topology/coordinate files which are both correct (containing the
intended parameters for the intended molecule as just described) AND
consistent (across file formats). I think "correctness" is probably harder
(in theory) to test than "consistent", and I don't know how to check it in
general except the specific aspects I already mentioned.
Consistency in principle ought to be able to be verified by identical free
energies of hydration and identical potential energies. However, this is
really still a research issue (i.e. we have a separate project trying to
get reproducibility of hydration free energies across several different
codes, and it is not straightforward. This also looks some at potential
energies, and a similar thing is true there). Potential energies are
certainly the easier of the two and probably ought to be our benchmark
here, at least for reference configurations between AMBER and GROMACS
files.
David Mobley
[email protected]
949-385-2436
from freesolv.
I think "correctness" is probably harder (in theory) to test than "consistent", and I don't know how to check it in general except the specific aspects I already mentioned.
Excellent points.
I think it might be practical to check consistency via some potential energy comparisons (especially with tools the OpenMM app
and its file readers or ParmEd
to help), and correctness via Topology -> OpenEye OEMol -> canonical SMILES, at least for now.
from freesolv.
Yes, I think that ought to be the goal.
On Mon, Apr 20, 2015 at 11:14 AM, John Chodera [email protected]
wrote:
I think "correctness" is probably harder (in theory) to test than
"consistent", and I don't know how to check it in general except the
specific aspects I already mentioned.Excellent points.
I think it might be practical to check consistency via some potential
energy comparisons (especially with tools the OpenMM app and its file
readers or ParmEd to help), and correctness via Topology -> OpenEye
OEMol -> canonical SMILES, at least for now.—
Reply to this email directly or view it on GitHub
#20 (comment).
David Mobley
[email protected]
949-385-2436
from freesolv.
This is basically resolved by #28 but can be re-opened later if there is anything we need to reawaken here. A lot of the energetic consistency stuff has been handled elsewhere via ParmEd and in tests for openmoltools
.
I intend to work with Michael Shirts separately to get in input files in other formats such as DESMOND.
from freesolv.
Related Issues (20)
- Decide any other supporting files/data which ought to be captured when database is re-constructed from primary data HOT 2
- Strip water molecules from all topology/coordinate files in current database HOT 8
- Protocol for generating solvated input files for various codes (AMBER, gromacs) HOT 1
- Have GBSA models been benchmarked on FreeSolv? HOT 4
- Migrate issues and close this repo? HOT 10
- In next update, include a json format version of database HOT 1
- Flag molecules with possibly problematic tautomers (and investigate tautomers for them)?
- Update column names to be more informative HOT 2
- Make CHARMM input files via ParmEd
- Potential duplicate molecules in FreeSolv Set HOT 12
- Set up Travis-CI testing
- GAFF version HOT 5
- Cannot generate GAFF mol2 from Tripos mol2 file HOT 2
- rebuild_freesolv.py script HOT 2
- Problems with processing some SMILES - omega returned error code 0
- Sanitize SDF files
- gromacs-mdp:with verlet lists rcoulomb!= rvdw is not supported HOT 3
- mobley_3323117 (sulfolane) has non-standard SMILES HOT 1
- Is gromacs_energies available to download anywhere? HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from freesolv.