choderalab / yank Goto Github PK
View Code? Open in Web Editor NEWAn open, extensible Python framework for GPU-accelerated alchemical free energy calculations.
Home Page: http://getyank.org
License: MIT License
An open, extensible Python framework for GPU-accelerated alchemical free energy calculations.
Home Page: http://getyank.org
License: MIT License
Currently, it seems like it stalls on cloning mdtraj
via github:
https://travis-ci.org/choderalab/yank/builds/20381725#L3206-L3210
Can I just use the pypi
package instead, or a conda
package?
Since Python2.4, set has been a built in type.
I'm pretty sure we can deprecate support for python 2.4--it's been ~2 years since I last used 2.6...
We should try to move as much as possible to a "standard" modules.
For example,
kyleb@kb-intel:~/src/kyleabeauchamp/yank$ cat yank/*.py|grep analyze_accept
def analyze_acceptance_probabilities(ncfile, cutoff = 0.4):
def analyze_acceptance_probabilities(ncfile, cutoff = 0.4):
I've seen other duplicated functions as well. IMHO, one way to deal with this is to create utils.py
and dump any "utility" functions there--until we find a better place to put them.
We have to make some decisions about how we tell YANK what we want it to do.
To be specific, we need to tell it:
We have a few options for how to specify this:
Yank
module. All parameters are coded in Python.yank setup
to set up a calculationyank run
to run/resume a calculationyank info
to get some quick info on progressyank analyze
to analyze a calculationThoughts?
Would be useful to add a yank/tests/test_systembuilder.py
set of nosetests that just make sure SystemBuilder
constructs valid systems from various combinations of inputs.
In several places in Yank, we have code that formats various tables (e.g. TProb).
If we don't mind a Pandas dependency, we could just do this:
T = pd.DataFrame(T)
T.to_string(formatter_lambda_function)
I already took the liberty of trying this in my Repex refactor.
If we don't do this, at the very least we should write one function that formats tables and try to re-use it as much as possible.
So, when the SystemBuilder-made complex system is used to construct a yank object, the alchemical intermediate systems explode. Here is the code that I'm currently using to construct the exploding systems:
import os
import simtk.unit as unit
import simtk.openmm as openmm
import numpy as np
import alchemy
import simtk.openmm.app as app
#os.environ['AMBERHOME']='/Users/grinawap/anaconda/pkgs/ambermini-14-py27_0'
os.chdir('../examples/p-xylene')
ligand = Mol2SystemBuilder('ligand.tripos.mol2', 'ligand')
receptor = BiomoleculePDBSystemBuilder('receptor.pdb','protein')
complex_system = ComplexSystemBuilder(ligand, receptor, "complex")
complex_positions = complex_system.positions
receptor_positions = receptor.positions
print type(complex_system.coordinates_as_quantity)
timestep = 1.0 * unit.femtoseconds # timestep
temperature = 300.0 * unit.kelvin # simulation temperature
collision_rate = 20.0 / unit.picoseconds # Langevin collision rate
minimization_tolerance = 10.0 * unit.kilojoules_per_mole / unit.nanometer
minimization_steps = 20
plat = "CUDA"
i=2
platform = openmm.Platform.getPlatformByName(plat)
forcefield = app.ForceField
systembuilders = [ligand, receptor, complex_system]
receptor_atoms = range(0,receptor.traj.top.n_atoms)
ligand_atoms = range(receptor.traj.top.n_atoms,complex_system.traj.top.n_atoms)
factory = alchemy.AbsoluteAlchemicalFactory(systembuilders[i].system, ligand_atoms=ligand_atoms)
protocol = factory.defaultComplexProtocolImplicit()
systems = factory.createPerturbedSystems(protocol)
#test an alchemical intermediate and
for p in range(1,len(systems)):
print "now simulating " + str(p)
if p==5:
continue #for some reason 5 is poorly behaved
integrator_partialinteracting = openmm.LangevinIntegrator(temperature, collision_rate, timestep)
context = openmm.Context(systems[p], integrator_partialinteracting, platform)
context.setPositions(systembuilders[i].openmm_positions)
openmm.LocalEnergyMinimizer.minimize(context, minimization_tolerance, minimization_steps)
outfile = open('out_test'+str(p)+'.pdb','w')
app.PDBFile.writeHeader(systembuilders[i].traj.top.to_openmm(), outfile)
for k in range(10):
integrator_partialinteracting.step(100)
state = context.getState(getEnergy=True, getPositions=True)
app.PDBFile.writeModel(systembuilders[i].traj.top.to_openmm(), state.getPositions(), outfile,0)
app.PDBFile.writeModel(systembuilders[i].traj.top.to_openmm(), state.getPositions(), outfile,0)
app.PDBFile.writeFooter(systembuilders[i].traj.top.to_openmm(), outfile)
outfile.close()
Might be nice if we could do fast automated hydration free energies via OpenMM / Yank
I'm finding a lot of hidden syntax errors via pyflakes...
So should ModifiedHamiltonianExchange be replaced by a "regular" HamiltonianExchange object with a particular choice of MCMC moveset? I'm trying to wrap my head around where each component belongs.
So we have a lot of doctests that are pretty complex, with ~20 lines of code.
It might be nice for us to reserve doctests for tests that are primarily illustrative--and putting more complex tests in a separate set of nosetests.
Is this speed considered normal?
[reference_system, coordinates] = testsystems.LysozymeImplicit()
[...]
In [55]: %time reference_state = reference_context.getState(getEnergy=True)
CPU times: user 139.48 s, sys: 0.00 s, total: 139.48 s
Wall time: 139.62 s
[...]
In [69]: %time alchemical_state = alchemical_context.getState(getEnergy=True)
CPU times: user 142.45 s, sys: 0.02 s, total: 142.46 s
Wall time: 142.54 s
If so, we should file some [WIP] pull requests, so that we can each see what everyone is working on--this will help us avoid dealing with conflict resolution.
We should prefer either
import X
X.method()
or
from X import Y
Y()
When running yank.py example "p-xylene," an IndexError is thrown when yank.py attempts to access self.complex_coordinates[0].
It seems to be related to the following comment at the top of the code:
I'll play around with it.
I think we should be able to use logging to control the verbosity level without having to pass around verbose
arguments everywhere.
It might be possible to make this work with MPI as well: https://github.com/jrs65/python-mpi-logger
What license do we want to use for YANK?
Currently, everything is GPL, but we may want to use a more permissive license, like LGPL.
It may be easiest to use the same license as OpenMM, unless there are issues with the libraries we use.
What does this read and can it be deprecated?
It looks like SystemBuilder
does not have a standard interface for retrieving OpenMM-style Quantity
-wrapped positions.
We should just have the thing.positions
@property
return standard Quantity
-wrapped positions.
The oldrepex._show_mixing_statistics()
function (also in new repex
) is useful, but slows down as the number of iterations increases. We should accelerate this with something like weave
, cython
, or the like.
build_forcefield
needs to pass keyword arguments to antechamber.
For instance, I can't parametrize a molecule with a net-charge until antechamber receives charge as input.
IMHO, this is the "standard" way of using numpy imports.
Also, it might be good to try to use all numpy, whenever possible, rather than switching between math
and numpy
Is there a modern replacement for weave
that is fast but less clunky?
Sometimes our doctests contain code that only works during a doctest, e.g.:
>>> # Create a reference system.
>>> import testsystems
>>> [reference_system, coordinates] = testsystems.AlanineDipeptideImplicit()
>>> # Create a factory.
>>> factory = AbsoluteAlchemicalFactory(reference_system, ligand_atoms=[0, 1, 2])
>>> factory._is_restraint([0,1,2])
False
>>> factory._is_restraint([1,2,3])
True
>>> factory._is_restraint([3,4])
False
>>> factory._is_restraint([2,3,4,5])
True
Because we're calling AbsoluteAlchemicalFactory
in a local namespace, this code will not run without first running from alchemy import *
.
We should therefore explicitly import alchemy
and use the module name in the docstring.
The key advantage of this is that it allow users to copy paste docstrings into their python session for learning purposes.
I personally think it's worth the day of work that it will take.
IMHO the easiest thing is to use MDTraj a guide (e.g. copy any necessary template files).
We should stick to the numpy docstring convention:
https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt
@kyleabeauchamp : You want the following files:
yank/testsystems.py
yank/data/ - everything in this directory
IMHO we should avoid typing out 50 alchemical intermediates, e.g.:
+ alchemical_states.append(AlchemicalState(0.00, 0.00, 0.95, 1.)) #
+ alchemical_states.append(AlchemicalState(0.00, 0.00, 0.925, 1.)) #
+ alchemical_states.append(AlchemicalState(0.00, 0.00, 0.90, 1.)) #
+ alchemical_states.append(AlchemicalState(0.00, 0.00, 0.85, 1.)) #
+ alchemical_states.append(AlchemicalState(0.00, 0.00, 0.80, 1.)) #
+ alchemical_states.append(AlchemicalState(0.00, 0.00, 0.75, 1.)) #
+ alchemical_states.append(AlchemicalState(0.00, 0.00, 0.70, 1.)) #
+ alchemical_states.append(AlchemicalState(0.00, 0.00, 0.675, 1.)) #
+ alchemical_states.append(AlchemicalState(0.00, 0.00, 0.65, 1.)) #
+ alchemical_states.append(AlchemicalState(0.00, 0.00, 0.60, 1.)) #
+ alchemical_states.append(AlchemicalState(0.00, 0.00, 0.55, 1.)) #
+ alchemical_states.append(AlchemicalState(0.00, 0.00, 0.50, 1.)) #
+ alchemical_states.append(AlchemicalState(0.00, 0.00, 0.40, 1.)) #
+ alchemical_states.append(AlchemicalState(0.00, 0.00, 0.30, 1.)) #
+ alchemical_states.append(AlchemicalState(0.00, 0.00, 0.20, 1.)) #
+ alchemical_states.append(AlchemicalState(0.00, 0.00, 0.10, 1.)) #
+ alchemical_states.append(AlchemicalState(0.00, 0.00, 0.05, 1.)) #
+ alchemical_states.append(AlchemicalState(0.00, 0.00, 0.025, 1.)) #
+ alchemical_states.append(AlchemicalState(0.00, 0.00, 0.00, 1.)) # discharged, LJ annihilated
+ alchemical_states.append(AlchemicalState(0.00, 0.00, 0.00, 1.)) # discharged, LJ annihilated
+ alchemical_states.append(AlchemicalState(0.00, 0.00, 0.00, 1.)) # discharged, LJ annihilated
Also, this means that we shouldn't invest too much time cleaning up those files in the current Yank repo.
PS: I think TestSystems is pretty much ready to integrate, so that should happen first.
We can easily add an alternative alchemical intermediate generator based on this scheme:
http://dx.doi.org/10.1002/jcc.21829
Is there a reason that we manually parse PDB files instead of letting app.PDBFile.getPositions(asNumpy=True)
do all the work for us?
I've just removed the old pyopenmm
pure-Python implementation of System
and Force
classes that I previously used to determine periodicity or remove Force
objects from a System
object. Now we have no way to create vacuum versions of molecules in case we want to compute hydration free energies in parallel with binding free energies.
It may be OK to simply leave this out and focus on binding free energies, since hydration free energies are extremely specialized anyway.
File "yank.py", line 917, in analyze
[nequil, g_t, Neff_max] = timeseries.detectEquilibration(u_n)
ValueError: too many values to unpack
Michael Shirts mentioned that his datasets for T4 lysozyme using old YANK were 44GB total for about 20 ligands. For his work on larger sets of proteins, he is generating ~4TB of data.
We should explore ideas for cutting down file sizes. Some obvious ones:
Lee-Ping would like to be able to compute arbitrary solvation free energies for a molecule in another type of molecule.
Right now, many parameters in yank are set via the following scheme:
yank = Yank()
yank.n_iterations = 10
yank.timestep = 1.0
yank.other_thing = other_thing
To prevent insane combinations of these objects, it might be nice for us to use getter and setter decorators for all possible properties.
Robert's MixTape has a pretty clean object oriented tool for building up command line apps. It's worth considering here.
I'd like to change the license of yank to be LGPL as well. Any objections?
All the mdtraj
-based stuff should be private (internal) only. We may support mdtraj
-based stuff in the future when we convert to repex
, but not yet.
So Robert and I have done some work integrating RMSD and alignment features into MDTraj.geometry
We could possibly pull in some of the rotation stuff that's currently in Yank. One advantage is that it could be easier to maintain there...
So analyze.py has lots of trajectory analysis code that duplicates things I've already written in MDTraj. We can replace a lot of this redundant code with only a few lines of MDTraj code.
There is one remaining to-do item on the Repex side: a member function that slices the netCDF database and outputs MDTraj trajectories. (See, e.g. https://github.com/choderalab/repex/issues/49).
I think cleaning up this issue will lead to massive readability improvements and could make the Yank-Repex-MDTraj pipeline the go-to tool for analysis on repex datasets.
I've put assignments for general code cleanup here:
https://github.com/choderalab/yank/wiki/YANK-Roadmap
Note that I've just made some updates to eliminate unused files, so update your repositories.
After installing yank with setup.py, I get this error:
>>> import yank Traceback (most recent call last): File "<stdin>", line 1, in <module> File "yank/__init__.py", line 29, in <module> import version ImportError: No module named version
We want the following test cases to work initially:
prmtop/inpcrd
route)
app.ForceField
with specified forcefield file(s) and implicit/explicit solvent choicegaff2xml
app.Forcefield
with specified forcefield file(s) and implicit/explicit solvent choicegaff2xml
All modules should have unit tests for classes and methods.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.