choderalab / perses Goto Github PK

Experiments with expanded ensembles to explore chemical space

License: MIT License

Python 85.74% Shell 0.04% Jupyter Notebook 8.09% TeX 6.08% Dockerfile 0.05%

molecular-design python molecular-dynamics free-energy-perturbation simulation expanded-ensembles openmm relative-free-energy

perses's Introduction

Perses

Experiments with expanded ensemble simulation to explore chemical and mutational space.

License

This software is licensed under the MIT license, a permissive open source license.

Notice

Please be aware that this code is made available in the spirit of open science, but is currently pre-alpha--that is, it is not guaranteed to be completely tested or provide the correct results, and the API can change at any time without warning. If you do use this code, do so at your own risk. We appreciate your input, including raising issues about potential problems with the code, but may not be able to address your issue until other development activities have concluded.

Install

See our installation instructions here.

Quick Start

In a fresh conda environment:

$ conda config --add channels conda-forge openeye
$ conda install perses openeye-toolkits

Manifest

perses/ - Package containing code for performing expanded ensemble simulations
examples/ - Contains examples for various systems and methods of simulation
attic/ - some old code that may be useful as part of the new setup
devtools/ - Continuous integration and packaging utilities
notes/ - LaTeX notes deriving acceptance criteria and stochastic approximation methods

Contributors

A complete list of contributors can be found at GitHub Insights.

Major contributors include:

Julie M. Behr
Hannah E. Bruce Macdonald
John D. Chodera
Patrick B. Grinaway
Mike M. Henry
Iván J. Pulido
Jaime Rodríguez-Guerra
Dominic A. Rufa
Ivy Zhang

perses's People

Contributors

Stargazers

Watchers

perses's Issues

Parameterizing systems with ForceField

I've now put a few methods into simtk.openmm.app.ForceField that give us a lot of flexibility for creating new residue definitions, either via ffxml files written beforehand (as in gaff2xml) or via a plugin that generates templates and parameters on the fly.

For creating parameters on the fly, the problem right now is that ForceField.createSystem(topology) only takes in a topology object. If we need to parameterize a new residue (small molecule or biopolymer), it is not clear if we have sufficient information about bond order to generate conformations for parameterizing the missing residue.

Note that this doesn't affect cases where we already have or can pre-generate parameters, such as standard biopolymers (proteins and nucleic acids) and small molecule libraries (where parameters can be pregenerated).

If we had an input structure, we could in principle perceive bond order from this (e.g. using the OpenEye OEPerceiveBondOrders method), but in Perses, if we are proposing changes to a new molecular species, we need the System in order to generate the geometry. This is a chicken-and-egg problem.

I see three possible solutions:

A. Find a way to to go from the Residue (complete with fully explicit hydrogens) and determine bond orders from connectivity alone and either specification of the charge state or inference as to the likely charge state.
B. Add optional information about bond order to Topology that can be used in creating the system.
C. Externally generate plausible structural data for the new residue

It turns out that antechamber includes a program bondtype that infers bond order automatically, so provided this works robustly enough, I think we can use this scheme. I don't know if we will run into trouble with it not being able to guess the appropriate ionization state, and there may be issues with the workflow

residueTemplateGenerator -> bondtype -> OEMol -> OEOmega (coordinates) -> antechamber

but I think it will work in principle.

Sampler driver class

It would be useful for us to store the components of the RJMC acceptance probability to help us figure out how to optimize these moves later.

If we have the statistics for all moves stored in a log, we can later analyze the data to compute things like the log acceptance probability of every component,

log <A> = log <min{1, exp[-w]}>

If we also store the identities or indices of the proposed switch (e.g. from species 3 -> 7), we can later use BAR to reconstruct pair free energies, or an approximate Bayesian pairwise version of BAR to get a better estimate of relative free energies.

Plan going forward

Now that we have some base API, my imagination of the priorities are:

1a. Get the meat of instantaneous compound switching put together
1b. Get tests for this part in the repo and running on travis

Make conda-installable
Add in NCMC switching

Unless there are any objections, I'll start working on them in this order.

Impropers

Relating to both #53 and #44 , in some situations, inclusion of impropers is desirable. In order to implement this, we may want to just directly use OpenMM to calculate the potential. This would also allow the inclusion of other terms as well.

Topology proposals and system construction

General question--how should topology proposals be organized, and how should they be matched with System construction?

Currently, under the small molecule scheme, there is:

Transformation
- SmallMoleculeTransformation
  - SingleSmallMolecule
  - SmallMoleculeProteinComplex

So, it's grouped by the task of constructing systems. However, @jchodera points out that it's wise to group by the type of proposal (again, using small molecules as an example):

Transformation
- SmallMoleculeTransformation
  - LibrarySmallMoleculeTransformation
  - SyntheticAccessbilityTransformation
    ...etc.

Thoughts on this?

test_sampler implements the whole ExpandedEnsembleSampler

test_sampler implements a whole ExpandedEnsembleSampler. We should put that logic into a class, either here or in openmmmcmc.

Add contribution where we perturb from unmodified System objects to alchemical versions

For large systems, it will become critical to include the contribution where we perturb between the physical System objects and alchemically-modified versions of the System object.

This is just a reminder to myself to include this into the NCMC scheme.

Thoughts on a first paper

We should think about what might go into a first paper demonstrating the theory behind perses.

One possibility would be to focus on the absolute simplest problem of computing relative hydration free energies between molecules in the FreeSolv dataset. That may be interesting, since there are a few hundred of them, and we could potentially compute all of them in a single simulation by using standard self-adjusted mixture sampling (SAMS) to sample all of the molecular species equally.

Another possibility is to explore the computation of binding affinities for ligands to the T4 lysozyme cavity mutants, such as the L99A, L99A/M102Q, or more recent mutants. This could either be using SAMS to sample all of the species equally to compute all binding affinities, or to use our design scheme to bias sampling toward better binders. We could even do both, since there aren't that many ligands.

Other things for JDC to do?

I've finished the forcefield generators for small molecules, so I am just trying to figure out what else you guys need me to do.

I only have one item remaining on my list from the design thread:

A dual-topology AlchemicalFactory for alchemy to allow us to try different RJMC schemes that may eliminate discontinuous jumps in charges, types, and bond parameters

Anything else I was slated to do?

Thoughts on How To Geometry

I think the geometry generation issue might be more complicated than we've been assuming.

Consider a linear molecule ABC where we are appending atoms D and E. We specify the position of D with a bond (CD), angle (BCD), and torsion (ABCD). Similarly, we then can specify the position of E with a bond (DE), angle (CDE), and torsion (BCDE).

When computing the Jacobian, we had previously been assuming that we just need individual Jacobian determinants for each (bond,angle,torsion), and then these can be multiplied. But I'm not certain this is the case, since the position of atom E will depend not only on (r_{DE}, \theta_{CDE}, \phi_{BCDE}) but also on the (r_{CD}, \theta_{BCD}, and \phi_{ABCD}) used to place atom D. Doesn't that mean we would have to form the whole Jacobian matrix for each branched chain of atoms we are sprouting and then take the determinant of the whole thing? Or is it guaranteed we can decompose it into individual (bond,angle,torsion) Jacobians?

There is someone who can certainly help us with this if we get stuck: Vageli Coutsias, now at Stony Brook. We should first review his paper on kinematic loop closure, since I think there are some of the same challenges there. He's one of my oldest collaborators and I think would be thrilled to help us out here.

Remaining issues in the GeometryEngine stuff

Currently, there are a couple issues with the GeometryEngine, in particular relating to the implementation that uses all angle and torsion potentials when choosing a torsion.

Sometimes, the proposal potential results in some nans somehow making their way to the rejection sampler. When a uniform random number is drawn with nan as the upper limit, numpy throws an exception. Looking into why this is still happening despite replacing nans with inf.
The speed is a bit of an issue. I've discovered through profiling that this is a result of the conversion between internal and cartesian coordinates. This can probably be accelerated a lot via vectorization or something of the sort.

Manuscript / notes on theory

Perhaps this is a good point to pull the overleaf document with theory notes into this repo?

NCMC bug

This line:
https://github.com/pgrinaway/perses/blob/run_example/perses/annihilation/ncmc_switching.py#L279

        self.addComputeGlobal('initial_total_energy', 'ke + potential')

should be

        self.addComputeGlobal('final_total_energy', 'ke + potential')

Alchemical transformation should transform parameters for existing atoms

I just realized there is a potentially large flaw in our current molecule transformation scheme.

Currently, our molecular transformation scheme looks like this:

Step 1: Propose new topology/system
Step 2: Use AbsoluteAlchemicalFactory and NCMC to alchemically eliminate atoms not present in new molecule
Step 3: Use RJMC to propose new atomic positions for missing atoms in new molecule
Step 4: Use AbsoluteAlchemicalFactory and NCMC to alchemically introduce atoms in the new molecule

The problem is Step 3 creates an instantaneous jump between the old parameters/charges and the new parameters/charges for the atoms that are present (not alchemically deleted/inserted) in both old and new molecules. That will fundamentally limit our acceptance probability, especially if we end up changing the polarity of the molecule a great deal. There may also be problems with creating/destroying charges, either integral or partial.

We presumably want to either introduce additional steps before/after the RJMC step in which the parameters/charges for retained atoms are changed, or we want to integrate the parameter/charge change into the existing NCMC steps (Step 2 and Step 4).

We can still use the framework we have now for testing, but we should be aware that this issue will limit our acceptance probabilities.

We also need to think more deeply about the charge creation/deletion issue, as we will likely want to couple charge elimination/introduction to the introduction/elimination of a counterion (or transformation of a water into a counterion) for explicit solvent simulations.

Extending PointMutationEngine API to let allowed mutations be specified

I think it should be easy to extend the PointMutationEngine to let the set of mutatable sites and allowed residues at each site be restricted.

Right now, the allowed set of amino acids is hard-coded here:

        aminos = ['ALA','ARG','ASN','ASP','CYS','GLN','GLU','GLY','HIS','ILE','LEU','LYS','MET','PHE','PRO','SER','THR','TRP','TYR','VAL']

Instead, we might extend the API from PointMutationEngine from

engine = PointMutationEngine(max_point_mutants, proposal_metadata)

to add an optional argument:

allowed_mutations = { 
   34 : ['ALA', 'LYS', 'MET'],
  142 : ['VAL', 'MET', 'LEU']
}
engine = PointMutationEngine(max_point_mutants, proposal_metadata, allowed_mutations=allowed_mutations)

By default, if this isn't specified, all residues are allowed to be mutated to any residue. With the optional argument, only the residue indices in allowed_mutations.keys() are allowed to mutate to the residues listed in the corresponding value. We could even get sophisticated and let the keyword ALL denote all amino acids:

allowed_mutations = { 
   34 : ['ALA', 'LYS', 'MET'],
  142 : 'ALL'
}

Fix imatinib NCMC null transformation test failures

The imatinib NCMC null transformation test still occasionally fails:

======================================================================
ERROR: Testing alchemical null elimination for 'imatinib' with 50 NCMC steps
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/miniconda/envs/_test/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/travis/miniconda/envs/_test/lib/python2.7/site-packages/perses-0.1-py2.7.egg/perses/tests/test_elimination.py", line 267, in check_alchemical_null_elimination
    raise Exception(msg)
Exception: Delta F (50 steps switching) = 3.173915 +- 0.367807 kT; should be within 6.000000 sigma of 0
delete logP:
[ 52.83907179  56.96881809  58.72221244  53.66332188  54.39244732
  55.20878563  56.42911385  62.5737624   47.22239953  52.64230924
  52.68643293  51.52396585  48.99477714  52.35681286  50.4809076
  53.9227835   56.61910085  52.69879979  52.22066052  52.8527985
  53.48836845  51.30178305  56.78153917  56.04516468  51.6749105
  55.60416723  51.679132    49.14696474  51.27811396  52.11830492]
insert logP:
[-56.94178584 -57.93855619 -75.74928178 -57.65467855 -64.17625056
 -63.93724275 -61.50533476 -69.62354963 -57.2744736  -79.88086955
 -56.74460312 -60.32094396 -55.35969296 -56.36628618 -53.71085795
 -66.62961333 -60.6884257  -65.73408047 -53.86550795 -59.83222306
 -64.63476613 -56.69728159 -60.02577208 -73.42986568 -59.48264145
 -58.47741466 -62.61625228 -51.43052224 -52.73681281 -54.31872933]
logP:
[ -4.10271405  -0.9697381  -17.02706934  -3.99135666  -9.78380324
  -8.72845712  -5.07622091  -7.04978723 -10.05207407 -27.23856031
  -4.05817019  -8.79697811  -6.36491582  -4.00947332  -3.22995035
 -12.70682983  -4.06932486 -13.03528068  -1.64484743  -6.97942456
 -11.14639768  -5.39549854  -3.24423292 -17.384701    -7.80773094
  -2.87324743 -10.93712028  -2.2835575   -1.45869886  -2.20042441]
-------------------- >> begin captured logging << --------------------

I thought this was due to the lack of initial equilibration, but it could also be due to the fact that the samples are not really uncorrelated and so the statistical error is being underestimated. Or there could be something actually wrong here. I will investigate.

Randomly select alignment for small molecule proposal

As @jchodera pointed out, many small molecule pairs have multiple possible alignments. In order to preserve reversibility, we should randomly select an alignment from the list of MCSS results.

Package name

How happy are we with the name "perses"? I notice that the domain name perses.io is available, but so is examol.org...

Have NCMCEngine correctly handle case where protocol is given with nsteps=0

NCMCEngine should reduce to standard instantaneous MC when nsteps = 0.

Stable simulations after geometry proposal

Just wanted to start a thread to discuss the best way to ensure we get stable simulations after the geometry proposal step.

In order for NCMC to work, we'll need to have a stable simulation for the alchemically-modified lambda=0 state after the geometry proposal step for some timestep. If we can't get stable dynamics with lambda=0, NCMC is not going to help us improve acceptance probabilities. In principle, this should be relatively straightforward since we should only be dealing with intramolecular valence terms at this point.

Obviously, the best solution would be to see if we can improve the geometry proposal, but there are always possibilities where we can't control what is going on with more complex intramolecular interactions or steric clashes that will occur when we start switching lambda from 0 to 1.

Ideas @pgrinaway has suggested are:

Metropolis Monte Carlo. Rejects NaNs (good!), but does not respect constraints and makes unbiased Gaussian proposals in each atomic coordinate (bad).
GHMC. Rejects NaNs (good!) and uses gradient information to achieve high acceptance probabilities (also good). Timestep could potentially be autotuned, or even more sophisticated schemes could be used (e.g. extra-chance GHMC). We have to change the NCMC engine to accumulate the appropriate form of work for this, however, since it will be the potential energy change accumulated from each step since Metropolization already handles the shadow work.

Do we want to have the flexibility to try multiple integrators inside of NCMCEngine? If so, what should the API for that look like?

I'm happy to take a stab at this once we decide what the appropriate functionality and API would be.

Extend TopologyProposal with convenience methods to retrieve atoms being deleted and atoms being created

Perhaps we should make TopologyProposal a real class, since we could then define convenience methods that use new_to_old_atom_map along with old_topology and new_topology to return the lists of atoms being deleted or created. This would be helpful for creating alchemically modified systems without having to recompute these lists.

Geometry Engine refactor

I thought I would start a discussion here first. I'm not suggesting that all of this needs to be implemented in the next PR (it's a lot), but I wanted to get some ideas.

Unit Tests

This has been pointed out, but the entire code needs a lot more testing. In particular:

it would be good to have each method tested separately, in addition to the test of the entire GeometryEngine.
A full suite of tests would also provide a target for test-driven development of new components of the GeometryEngine (and actually perses in general--especially since we'll need a different components for different sampling strategies).
Perhaps adding a service such as coveralls or landscape.io can help with making sure that we are writing code that is sufficiently tested.

Python 3 support

We want to make sure that all components support Python 3
Adding type hints can be helpful, especially if we use a static type checker to verify that our functions are dealing with the types we expect.

The Architecture of the Geometry Engine

From our discussions and experience so far, it seems there are three (conceptual). parts of the Geometry Engine:

The calculation of potentials and making proposals.

This component is responsible for calculating potentials associated with bonds, angles, and torsions, and also proposes them (there might be a clever way of separating potential calculation and proposal to make things cleaner/more modular, but I'm not sure). We definitely want to include impropers, as well as potentially other terms (if we use OpenMM to calculate potentials).

Is it cleaner to separate proposal and potential calculation? (long term question)
- They are intimately related, which might make this a bad idea
- We may want to use different backends for the potential, such as OpenMM or our own code.
Should we implement the impropers in the Python code, or should we outsource to OpenMM?

Retrieving parameters and topology information

This component is responsible for:

Deciding the order of atom proposal
Deciding which atoms can be used for proposal
Extracting parameters relevant to the proposal

By not having this part separate and modular, we risk:

Multiple sites of computation of forcefield parameters (potential for errors)
Duplication of code (both logp_reverse and propose need it)
More difficult-to-read code

The conversion of coordinates between internal and cartesian

I think this code should be separate (most of it is right now) for several reasons:

We want to test it separately
It might be useful to others
We might consider writing higher-performance versions of it, since it seems to be the bottleneck.

These are some initial thoughts. Eventually, unless there is disagreement, I think we want to implement most if not all of that. However, we should prioritize. To me, it seems:

Unit tests
Python 3 support
Refactor/cleanup (this is large, and some of it can go in the first PR, but I'm not exactly sure how to prioritize it at the moment)

Thoughts? Sorry for the massive issue, but I think discussing the next steps could be helpful.

Thanks everyone!

How will we visualize trajectories hopping over chemical space?

Any idea how we will be able to view trajectories where the number and identities of atoms are changing?

autograd

If you're going to use autograd, @pgrinaway, can you create a recipe in the omnia conda recipes?

Looks like there may already be a conda recipe you can use.

Alternatively, there are conda packages built if we want to use the richardotis conda channel.

Add test that uses both geometry and alchemical engines

Seems like only examples/rjmc/run_example.py does this, but is not run by the nosetests.

Refactor to have system creation separated from Topology creation

There are a hojillion places where tleap and antechamber are called in the current code. It would be great to refactor these to all use a single factory to generate a System from a Topology using the new ForceField with GAFF plugin generator. If the ForceField object is persistent in this factory, the antechamber calls will be cached automatically, though you could also cache Topology : System pairs via memoization if you wanted.

Add a ring opening/closing test case

The null transformation of naphthalene to naphthalene (where a ring is destroyed at one end and created at the other end) would be a great test case since we know the free energy difference should be zero. (Jed Pitera had a paper on this test case with Wilfred van Gunsteren.). We can do this in vacuum, implicit solvent, and explicit solvent.

This would be a very powerful test because of the novelty of our ability to handle ring opening and closing.

Update doctests and example

Looks like the doctests are failing because of the change in the TopologyProposal pattern. I will update this, as well, as update the example (creating a sampler class as suggested).

Change dependency to openmoltools

This is a reminder to myself to change the dependency from openmoltools-dev to openmoltools and ensure that Python 3 testing is working on travis. Will do this in the next PR.

Long-range anisotropic dispersion correction

I just realized we could include the anisotropic long-range dispersion correction in a very simple and inexpensive way via HMC: When we run the dynamics part of MCMC for the MCMCSampler, we always evaluate the reduced potential we want to sample from using a very large cutoff (e.g. just slightly smaller than half the smallest box dimension) when using explicit solvent, but always use a much shorter cutoff (e.g. 9*angstroms) when running dynamics. We can then accept/reject with HMC.

Bond constraints

What are we doing about constrained bonds at the moment?

We obviously have to make sure not to try to convert constrained bonds to unconstrained bonds.

If we only deal with hydrogen constraints, does our current MCSS atom matching algorithm ensure we will never map a bond that changes constraint status in old/new topologies as atoms that should be retained?

Also, we will screw up if we draw a bond length from a harmonic X-H bond length distribution and then apply constraints that change the bond length. We need to make sure to fix those bond lengths to the constrained bond length in the geometry engine.

MCSS overlap

Currently, we just take the first MCSS match, but there might be a more clever choice.

Near term goals and planning

This is what I am currently thinking--please let me know if you disagree.

perses has these parts which need development:

*TopologyProposalEngine - this already has an implemented base class and subclasses for small molecules. The small molecule bits need to be refactored for two reasons:

The alignment can have multiple results, so the choice of alignment should be random and factored in to the acceptance probability. This is more urgent and I will do it soon.
The innards should be refactored to use the OpenMM APIs rather than tleap. This is lower priority, since I think the priority is getting something that actually works. This also doesn't change the outward-facing API.
- GeometryEngine - This is implemented, but needs a couple things, some of which are high priority:
  - GeometryEngine needs a suite of tests. This is very high priority and was the next thing I wanted to do.
GeometryEngine should use OpenMM to calculate potentials. The code for this is already started, but is in the unmerged closed PR. Now that we have a set goal, I can finish this. This is also very high priority.
It could use some optimizations. This is low priority.
AlchemicalEliminationEngine and NCMCEngine - complete.
Sampler classes (even if only using simple algorithms) need to be written in order to actually do anything with perses. This is high priority.

So, my near-term plan of action was, in order:

Finish relevant GeometryEngine stuff.
Fix the TopologyProposalEngine to correctly deal with multiple possible alignments.
Write a SAMSSampler and ExpandedEnsembleSampler.
Write an adaptive bias SAMSSampler.

After (3), we should be able to do relative free energy calculations with SAMS, as well as design with a fixed bias. I think that should be the initial goal here.

Does this sound ok?

Missing a contribution to sampler acceptance probability?

I think we're missing a critical contribution in the acceptance criteria for test_sampler. While we include the ratio of proposal probabilities in the geometry proposal, I don't believe we include the difference in potential energies between the old and new conformations.

This results in some pretty weird geometries when the trajectory is visualized.

run_example.py dies with recursion error

[LSKI1497:perses.choderalab/examples/rjmc] choderaj% python run_example.py 
LOG: 
Warning: run_antechamber has been moved to openmoltools.amber.
LOG: antechamber -i in.mol2 -fi mol2 -o out.mol2 -fo mol2 -s 2 -c bcc
...
  File "/Users/choderaj/anaconda/lib/python2.7/site-packages/perses-0.1-py2.7.egg/perses/rjmc/topology_proposal.py", line 131, in metadata
    return self.metadata
RuntimeError: maximum recursion depth exceeded while calling a Python object

Thoughts on validation

Just collecting some initial brainstorming ideas on validation in advance of Monday's meeting:

Validating the overall `MultiTargetDesign` sampler

Comparing alchemical hydration free energies with relative hydration free energies

We can test the overall sampler by having the sampler compute relative solvation free energies between linear alkanes (methane, ethane, propane, n-butane, ...) with something like

target_samplers = { solvent_sampler : 1.0, vacuum_sampler : -1.0 }
designer = MultiTargetDesign(target_samplers)
designer.run()

and comparing the relative free energies to differences in absolute solvation free energies. The reference values could either be computed with YANK or could be computed inside perses using the NCMC module to switch bidirectionally between solvent and vacuum.

Validating alchemical NCMC

There are already two tests in there:

Alchemical deletion / reinsertion

Deleting and reinserting the same sets of atoms should give a free energy difference (computed via EXP) within a few statistical errors of zero. I have this threshold set to 6*sigma for right now.

NCMC switching for a harmonic oscillator

The NCMC integrator currently drags a harmonic well around in a bidirectional switching experiment and checks that the BAR estimate of free energy differences is zero.

Validating `AbsoluteAlchemicalFactory`

There are already a battery of tests in alchemy, but we could add a test to perses which makes sure that when atoms are alchemically eliminated, the only contributions to the energy when these atoms are moved comes from valence energy terms.

Validating the geometry engine

Null transformations

Null transformations---such as napthalene-to-napthalene with manually-encoded atom mappings---will be useful for making sure that there is no unexpected asymmetry. We should get equal samplings of both napthalenes. While a cool demonstration (and one we should include int he paper), this is unfortunately not the most demanding test since there could still be bugs and yet the result will still be symmetric because some contribution cancel out for both states.

Marginal distributions of internal coordinates

If we just run a SAMSSampler or properly-weighted ExpandedEnsembleSampler without using dynamics to propagate in between chemical switches and using instantaneous Monte Carlo (rather than NCMC) in the chemical switches, provided we randomly choose from the list of potential atomic alignments, we should get ergodic sampling of a series of small alkanes in vacuum. The marginal distributions of internal coordinates (bonds, angles, torsions) for a given molecule should match up with marginal distributions sampled for that molecule using GHMC. I'm unclear on whether or not this should also result in equal visits all chemical species---I think it should if we get all of the contributions correct, but I'm not convinced that I can prove it.

Analytical forward and reverse probabilities

For a linear chain of beads W-X-Y-Z with a forcefield that includes only bonds, angles, and torsions with a single Fourier component, placing or deleting Z should have an analytically computable Cartesian probability density function (which we already have code to compute) that can be compared to the numerical probability density function we actually use.

`ExpandedEnsembleSampler`

Harmonic oscillators

If we have a series of states that differ in Hamiltonian by the spring constant of a harmonic oscillator, we can impose analytically-computed weights and make sure we sample each state uniformly. We can use openmmtools.testsystems.HarmonicOscillator for this purpose; I can add the analytical free energy and normalizing constant to the list of analytically-available properties.

`SAMSSampler`

Harmonic oscillators

We can initialize the SAMS sampler with all zeros and make sure that multiple simulations converge to the analytically-computed free energies of harmonic oscillators.

Protonation and tautomeric states

As noted in #64 (comment), we have to think about how to handle protonation and tautomeric states within perses.

We have a few options for this. The most straightforward is to simply let the various protonation/tautomeric states of both polymers (e.g. protein residues) and small molecules be selected as targets for transformation proposals. This would involve changing our proposals to be two-stage:

select molecule ~ P(molecule | current state)
select protomer ~ P(protomer | molecule)

For example, for a small molecule, we might draw the molecule identity and then select from among the protonation states accessible by the molecule. For a protein residue mutation, we might select the new residue identity and then select among the protonation states accessible to that residue. Probabilities could be uniform, proportional to the population of the protomeric species in solution, or possibly even tuned to be proportional to the population in the complex (though this seems much harder).

We will need a few things to make this work:

A way to recognize that the various protonation states all represent equivalent chemical species that should have equivalent target probabilities in the SAMS recursion. This is most simply handled by having the "key" we use to keep track of which species has which target probabilities and log weights be insensitive to the choice of protonation state.
A way to introduce an additional log bias weight to account for the aqueous population of each titration state. This should be such that, for the small molecule or isolated (blocked) residue in solution, the target populations \hat{\pi}_k of each titration state match the aqueous populations given by Epik or the population computed from the pKa at the given pH. This log weight would be introduced into the acceptance probability.
These log weights would need to be calibrated by SAMS to give the aqueous populations of each titration state from Epik or the population computed from the pKa. We could pre-compute them for the given forcefield and nonbonded treatment. Alternatively, we could figure out a more clever way to also include these calibration simulations in the recursion scheme so that this is calibrated on the fly, but it might be complicated to figure out a nice, general framework to make this simple.

Split geometry proposal into two components for more flexibility

Currently, the geometry proposal x_new ~ p(new | old) (the "geometry" step below) and computation of the Metropolis-Hastings ratio p(old | new) / p(new | old) (the "ratio" step below) is done in a single step. It would be useful to split this up into two steps that can use different old/new coordinates to give us more flexibility in exploring alternative NCMC splittings.

For example, the current scheme requires we do this:

old molecule --NCMC--> old core --geometry--ratio--> new core --NCMC--> new molecule

For NCMC efficiency reasons, we may want to instead use NCMC in a dual topology alchemical scheme to transform one molecule into the other directly:

old molecule --geometry--> dual-topology old --NCMC--> dual-topology new --ratio--> new molecule

or add a separate NCMC stage to convert the old core into the new core parameters:

old molecule --NCMC--> old core --geometry--> dual-topology old core --NCMC--> dual-topology new core --ratio--> new core --NCMC--> new molecule

These could all be explored if we simply split the geometry proposal and ratio computation parts into separate steps that could use different coordinates.

Documentation, etc.

Reminder that we'll need documentation for the examples, the user-facing API, and the internal API.

Also, reminder that we are using numpy-style docstrings in methods.

Discussion of weight adaptation and sampling schemes

Creating a thread to discuss weight adaptation and sampling schemes we should explore and the order in which we should explore them.

Very large (creating entire rings) proposals

So, I just tried a very large proposal of transforming erlotinib into gefitinib. A couple things:

While it didn't die in dynamics (1000 steps of verlet at 300K, no NCMC), the potential did get much larger (~1e12 kJ/mol)
The aromatic ring is really messed up, presumably due to the need for impropers.
The non-aromatic ring is somewhat less messed up, though could definitely use a bit of help.

For viewing pleasure, the results are available here. The configuration after dynamics looks pretty messed up, and may be on its way to NaNdyland.

Another NCMC bug

This section uses the wrong context parameter names:

default_functions = {
    'alchemical_sterics' : 'lambda',
    'alchemical_electrostatics' : 'lambda',
    'alchemical_bonds' : 'lambda',
    'alchemical_angles' : 'lambda',
    'alchemical_torsions' : 'lambda'
    }

It should be lambda_[name], rather than alchemical_[name], though we should probably rename this in alchemy release 2.0.

NCMCEngine API

In #44, we talked about various choices of integrator for the NCMCEngine, including GHMC and VV. I think it would be convenient to have an API for deciding which integrator to use. I'm starting this thread to brainstorm that API.

Perhaps the user can specify a set of CustomIntegrator-syntax strings for the propagation bits
Or can we actually combine integrators as mixins? I seem to remember Kyle doing something like this, though I'm not sure what actually happened.

test_geometry_engine nosetest failure

Does this fail on your system?

======================================================================
ERROR: Run the geometry engine a few times to make sure that it actually runs
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/choderaj/miniconda/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/Users/choderaj/github/choderalab/perses/perses.jchodera/perses/tests/test_geometry_engine.py", line 74, in test_run_geometry_engine
    new_to_old_atom_mapping = align_molecules(molecule1, molecule2)
  File "/Users/choderaj/github/choderalab/perses/perses.jchodera/perses/tests/test_geometry_engine.py", line 56, in align_molecules
    match = matches[0]
TypeError: 'OEMatchBaseIter' object does not support indexing

Don't use value_with_units/value_with_units.unit

Let's not use expressions like

value/value.unit

in the code. There's just too much potential for these being in different units and getting converted to arbitrary units. Let's just define kT somewhere convenient and use that throughout:

value/kT

        # Create new deep copy reference system to modify.
        system = openmm.System()

        # Set periodic box vectors.
        [a,b,c] = reference_system.getDefaultPeriodicBoxVectors()
        system.setDefaultPeriodicBoxVectors(a,b,c)

        # Add atoms.
        for atom_index in range(reference_system.getNumParticles()):
            mass = reference_system.getParticleMass(atom_index)
            system.addParticle(mass)

        # Add constraints
        for constraint_index in range(reference_system.getNumConstraints()):
            [iatom, jatom, r0] = reference_system.getConstraintParameters(constraint_index)
            system.addConstraint(iatom, jatom, r0)

        # Copy selected appropriate.
        nforces = reference_system.getNumForces()
        for force_index in range(nforces):
            reference_force = reference_system.getForce(force_index)
            force_name = reference_force.__class__.__name__
            if force_name == 'HarmonicBondForce':
                new_force = openmm.HarmonicBondForce()
                system.addForce(new_force)
                for bond_index in range(reference_force.getNumBonds()):
                    [particle1, particle2, r, K] = reference_force.getBondParameters(bond_index)
                    if set([particle1, particle2]) in atom_set:
                        new_force.addBond(particle1, particle2, r, K)

Working on NonbondedForce

This is actually pretty complicated, so won't finish tonight. Will review my progress with you tomorrow.

choderalab / perses Goto Github PK

perses's Introduction

Perses

License

Notice

Install

Quick Start

Manifest

Contributors

perses's People

Contributors

Stargazers

Watchers

Forkers

perses's Issues

Unit Tests

Python 3 support

The Architecture of the Geometry Engine

The calculation of potentials and making proposals.

Retrieving parameters and topology information

The conversion of coordinates between internal and cartesian

Validating the overall MultiTargetDesign sampler

Comparing alchemical hydration free energies with relative hydration free energies

Validating alchemical NCMC

Alchemical deletion / reinsertion

NCMC switching for a harmonic oscillator

Validating AbsoluteAlchemicalFactory

Validating the geometry engine

Null transformations

Marginal distributions of internal coordinates

Analytical forward and reverse probabilities

ExpandedEnsembleSampler

Harmonic oscillators

SAMSSampler

Harmonic oscillators

Recommend Projects

Recommend Topics

Recommend Org

Validating the overall `MultiTargetDesign` sampler

Validating `AbsoluteAlchemicalFactory`

`ExpandedEnsembleSampler`

`SAMSSampler`