choderalab / yank Goto Github PK

An open, extensible Python framework for GPU-accelerated alchemical free energy calculations.

License: MIT License

Python 99.74% Shell 0.21% Dockerfile 0.05%

molecular-dynamics molecular-dynamics-simulation alchemical-free-energy-calculations drug-discovery free-energy openmm mskcc python alchemical free-energy-perturbation

yank's Introduction

YANK

An open, extensible Python framework for GPU-accelerated alchemical free energy calculations

Documentation

Documentation, tutorials, and best practices can be found at getyank.org.

Getting Started

See the Quickstart instructions.

Examples

Examples are available in the yank-examples repository:

git clone https://github.com/choderalab/yank-examples.git

Maintainers

Andrea Rizzi <[email protected]> (WCMC)
Hannah Bruce Macdonald <hannah.brucemacdonald> (MSKCC)
John D. Chodera <[email protected]> (MSKCC)
Levi N. Naden <[email protected]> (MSKCC)

Contributors

Kim Branson (Stanford)
Kyle A. Beauchamp (MSKCC)
Peter M. Eastman (Stanford)
Mark Friedrichs (Stanford)
Imran Haque (Stanford)
Patrick B. Grinaway (MSKCC)
Christoph Klein (University of Virginia)
Rosa Luirink (VU Amsterdam)
Daniel L. Parton (MSKCC)
Randy Radmer (Stanford)
Arien Sebastian Rustenburg (MSKCC)
Michael Shirts (University of Colorado Boulder)
Kai Wang (University of Virginia)

yank's People

Contributors

Stargazers

Watchers

yank's Issues

Deprecate `from sets import Set`

Since Python2.4, set has been a built in type.

I'm pretty sure we can deprecate support for python 2.4--it's been ~2 years since I last used 2.6...

Slow tests in alchemy.py

Is this speed considered normal?

[reference_system, coordinates] = testsystems.LysozymeImplicit()

[...]

In [55]: %time reference_state = reference_context.getState(getEnergy=True)
CPU times: user 139.48 s, sys: 0.00 s, total: 139.48 s
Wall time: 139.62 s

[...]

In [69]: %time alchemical_state = alchemical_context.getState(getEnergy=True)
CPU times: user 142.45 s, sys: 0.02 s, total: 142.46 s
Wall time: 142.54 s

Add getter and setter decorators for `yank.something` simulation parameters

Right now, many parameters in yank are set via the following scheme:

yank = Yank()
yank.n_iterations = 10
yank.timestep = 1.0
yank.other_thing = other_thing

To prevent insane combinations of these objects, it might be nice for us to use getter and setter decorators for all possible properties.

Switch to external repositories for Repex and TestSystems (when they are ready)

Also, this means that we shouldn't invest too much time cleaning up those files in the current Yank repo.

PS: I think TestSystems is pretty much ready to integrate, so that should happen first.

How should we feed input to YANK?

We have to make some decisions about how we tell YANK what we want it to do.

To be specific, we need to tell it:

What input to use for the ligand, which might be a mol2 file, SDF file, IUPAC or common name, AMBER prmtop/inpcrd pair, etc.
What input to use for the receptor, which might eventually be a PDB file, a PDB ID, an AMBER prmtop/inpcrd pair, or even another small molecule (as in the host-guest case).
If anything isn't parameterized, we need to tell YANK how to assign parameters.
There are some other things we may need to tell YANK about how to set up systems in explicit solvent or build in missing atoms/residues.
There are some run parameters too, like how many iterations to use, what kind of restraints, etc. Most of this should eventually be fully automated, but there are a few parameters right now.

We have a few options for how to specify this:

Python scripts that use the Yank module. All parameters are coded in Python.
Command-line scheme, perhaps using Robert's commandline tool, so we can say something like
- yank setup to set up a calculation
- yank run to run/resume a calculation
- yank info to get some quick info on progress
- yank analyze to analyze a calculation
Some sort of input parameter file format, like XML or JSON

Thoughts?

Fix travis-ci for yank

Currently, it seems like it stalls on cloning mdtraj via github:

https://travis-ci.org/choderalab/yank/builds/20381725#L3206-L3210

Can I just use the pypi package instead, or a conda package?

Avoid use of "from X import *" idiom

We should prefer either

import X
X.method()

from X import Y
Y()

Use MDTraj for trajectory alignment / quaternions

So Robert and I have done some work integrating RMSD and alignment features into MDTraj.geometry

We could possibly pull in some of the rotation stuff that's currently in Yank. One advantage is that it could be easier to maintain there...

Set up examples / test cases for SystemBuilder

We want the following test cases to work initially:

T4 lysozyme L99A + p-xylene (to compare to prmtop/inpcrd route)
- T4 lysozyme: PDB file, to be parameterized with app.ForceField with specified forcefield file(s) and implicit/explicit solvent choice
- p-xylene: mol2 file, to be parameterized with gaff2xml
CB[7] host-guest system
- CB[7] host: mol2 file
- hosts: list of IUPAC names
ligand design
- RCSB ID with chain identifiers, to be parameterized with app.Forcefield with specified forcefield file(s) and implicit/explicit solvent choice
- Chemdraw file containing one or more molecules, to be parameterized with gaff2xml

User np.linspace / etc for spacings

IMHO we should avoid typing out 50 alchemical intermediates, e.g.:

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.95, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.925, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.90, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.85, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.80, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.75, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.70, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.675, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.65, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.60, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.55, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.50, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.40, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.30, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.20, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.10, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.05, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.025, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.00, 1.)) # discharged, LJ annihilated

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.00, 1.)) # discharged, LJ annihilated

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.00, 1.)) # discharged, LJ annihilated

Add unit tests to SystemBuilder

All modules should have unit tests for classes and methods.

Use logging for logging

I think we should be able to use logging to control the verbosity level without having to pass around verbose arguments everywhere.

It might be possible to make this work with MPI as well: https://github.com/jrs65/python-mpi-logger

Accelerate _show_mixing_statistics()

The oldrepex._show_mixing_statistics() function (also in new repex) is useful, but slows down as the number of iterations increases. We should accelerate this with something like weave, cython, or the like.

Vacuum calculation: How should we handle this?

I've just removed the old pyopenmm pure-Python implementation of System and Force classes that I previously used to determine periodicity or remove Force objects from a System object. Now we have no way to create vacuum versions of molecules in case we want to compute hydration free energies in parallel with binding free energies.

It may be OK to simply leave this out and focus on binding free energies, since hydration free energies are extremely specialized anyway.

Replace weave code in oldrepex.py with something more modern

Is there a modern replacement for weave that is fast but less clunky?

Do people have unmerged commits?

If so, we should file some [WIP] pull requests, so that we can each see what everyone is working on--this will help us avoid dealing with conflict resolution.

Find ways to reduce file sizes

Michael Shirts mentioned that his datasets for T4 lysozyme using old YANK were 44GB total for about 20 ligands. For his work on larger sets of proteins, he is generating ~4TB of data.

We should explore ideas for cutting down file sizes. Some obvious ones:

Enabling NetCDF compression by default
Saving checkpoint data less frequently, but energy data more frequently

YANK license?

What license do we want to use for YANK?

Currently, everything is GPL, but we may want to use a more permissive license, like LGPL.

It may be easiest to use the same license as OpenMM, unless there are issues with the libraries we use.

ImportError after install using setup.py

After installing yank with setup.py, I get this error:
>>> import yank Traceback (most recent call last): File "<stdin>", line 1, in <module> File "yank/__init__.py", line 29, in <module> import version ImportError: No module named version

read_openeye_crd

What does this read and can it be deprecated?

Create Yank class for solvation free energies

Lee-Ping would like to be able to compute arbitrary solvation free energies for a molecule in another type of molecule.

Eliminate duplicated functions

We should try to move as much as possible to a "standard" modules.

For example,

kyleb@kb-intel:~/src/kyleabeauchamp/yank$ cat yank/*.py|grep analyze_accept
def analyze_acceptance_probabilities(ncfile, cutoff = 0.4):
def analyze_acceptance_probabilities(ncfile, cutoff = 0.4):

I've seen other duplicated functions as well. IMHO, one way to deal with this is to create utils.py and dump any "utility" functions there--until we find a better place to put them.

Run pyflakes and pep8 (syntax checking and style checking)

I'm finding a lot of hidden syntax errors via pyflakes...

ModifiedHamiltonianExchange

So should ModifiedHamiltonianExchange be replaced by a "regular" HamiltonianExchange object with a particular choice of MCMC moveset? I'm trying to wrap my head around where each component belongs.

Team assignments for general code cleanup

I've put assignments for general code cleanup here:
https://github.com/choderalab/yank/wiki/YANK-Roadmap

Note that I've just made some updates to eliminate unused files, so update your repositories.

Update analyze.py to be able to extract just hydration free energies

IndexError complex_coordinates

When running yank.py example "p-xylene," an IndexError is thrown when yank.py attempts to access self.complex_coordinates[0].

It seems to be related to the following comment at the top of the code:

Handle complex_coordinates argument in Yank more intelligently if different kinds of input are provided.
Currently crashes if a Quantity is provided rather than a list of coordinate sets.

I'll play around with it.

Use a library to print tables?

In several places in Yank, we have code that formats various tables (e.g. TProb).

If we don't mind a Pandas dependency, we could just do this:

T = pd.DataFrame(T)
T.to_string(formatter_lambda_function)

I already took the liberty of trying this in my Repex refactor.

If we don't do this, at the very least we should write one function that formats tables and try to re-use it as much as possible.

Add support for atom-by-atom alchemical intermediate definition

We can easily add an alternative alchemical intermediate generator based on this scheme:
http://dx.doi.org/10.1002/jcc.21829

Implement automated hydration free energy calculations / test with FreeSolve dataset

Might be nice if we could do fast automated hydration free energies via OpenMM / Yank

Standardize SystemBuilder interface for getting positions

It looks like SystemBuilder does not have a standard interface for retrieving OpenMM-style Quantity-wrapped positions.

We should just have the thing.positions @property return standard Quantity-wrapped positions.

Change license to LGPL

I'd like to change the license of yank to be LGPL as well. Any objections?

Split out testsystems into new project

@kyleabeauchamp : You want the following files:

yank/testsystems.py
yank/data/ - everything in this directory

Allow analysis function to use multiple NetCDF files (with their systems) for MBAR reweighting

Standardize docstrings (Numpy / sphinx / readthedocs) and tests (nose / travis)

I personally think it's worth the day of work that it will take.

IMHO the easiest thing is to use MDTraj a guide (e.g. copy any necessary template files).

Eliminate mdtraj-specific public API for SystemBuilder

All the mdtraj-based stuff should be private (internal) only. We may support mdtraj-based stuff in the future when we convert to repex, but not yet.

Migrate complicated doctests to nosetests

So we have a lot of doctests that are pretty complex, with ~20 lines of code.

It might be nice for us to reserve doctests for tests that are primarily illustrative--and putting more complex tests in a separate set of nosetests.

Mol2SystemBuilder does not pass kwargs to antechamber

build_forcefield needs to pass keyword arguments to antechamber.

For instance, I can't parametrize a molecule with a net-charge until antechamber receives charge as input.

Check out bokeh for autogenerated reports

http://bokeh.pydata.org/

Reading PDB files

Is there a reason that we manually parse PDB files instead of letting app.PDBFile.getPositions(asNumpy=True) do all the work for us?

Have analyze module cache MBAR solution

SystemBuilder systems explode in alchemically-modified states

So, when the SystemBuilder-made complex system is used to construct a yank object, the alchemical intermediate systems explode. Here is the code that I'm currently using to construct the exploding systems:

    import os
    import simtk.unit as unit
    import simtk.openmm as openmm
    import numpy as np
    import alchemy
    import simtk.openmm.app as app
    #os.environ['AMBERHOME']='/Users/grinawap/anaconda/pkgs/ambermini-14-py27_0'
    os.chdir('../examples/p-xylene')
    ligand = Mol2SystemBuilder('ligand.tripos.mol2', 'ligand')
    receptor = BiomoleculePDBSystemBuilder('receptor.pdb','protein')
    complex_system = ComplexSystemBuilder(ligand, receptor, "complex")
    complex_positions = complex_system.positions
    receptor_positions = receptor.positions
    print type(complex_system.coordinates_as_quantity)
    timestep = 1.0 * unit.femtoseconds # timestep
    temperature = 300.0 * unit.kelvin # simulation temperature
    collision_rate = 20.0 / unit.picoseconds # Langevin collision rate
    minimization_tolerance = 10.0 * unit.kilojoules_per_mole / unit.nanometer
    minimization_steps = 20
    plat = "CUDA"
    i=2
    platform = openmm.Platform.getPlatformByName(plat)
    forcefield = app.ForceField
    systembuilders = [ligand, receptor, complex_system]
    receptor_atoms = range(0,receptor.traj.top.n_atoms)
    ligand_atoms = range(receptor.traj.top.n_atoms,complex_system.traj.top.n_atoms)
    factory = alchemy.AbsoluteAlchemicalFactory(systembuilders[i].system, ligand_atoms=ligand_atoms)
    protocol = factory.defaultComplexProtocolImplicit()
    systems = factory.createPerturbedSystems(protocol)

    #test an alchemical intermediate and

    for p in range(1,len(systems)):
        print "now simulating " + str(p)
        if p==5:
            continue #for some reason 5 is poorly behaved
        integrator_partialinteracting = openmm.LangevinIntegrator(temperature, collision_rate, timestep)
        context = openmm.Context(systems[p], integrator_partialinteracting, platform)
        context.setPositions(systembuilders[i].openmm_positions)
        openmm.LocalEnergyMinimizer.minimize(context, minimization_tolerance, minimization_steps)
        outfile = open('out_test'+str(p)+'.pdb','w')
        app.PDBFile.writeHeader(systembuilders[i].traj.top.to_openmm(), outfile)
        for k in range(10):
            integrator_partialinteracting.step(100)
            state = context.getState(getEnergy=True, getPositions=True)
            app.PDBFile.writeModel(systembuilders[i].traj.top.to_openmm(), state.getPositions(), outfile,0)
        app.PDBFile.writeModel(systembuilders[i].traj.top.to_openmm(), state.getPositions(), outfile,0)
        app.PDBFile.writeFooter(systembuilders[i].traj.top.to_openmm(), outfile)
        outfile.close()

Eliminate Trajectory Analysis code

So analyze.py has lots of trajectory analysis code that duplicates things I've already written in MDTraj. We can replace a lot of this redundant code with only a few lines of MDTraj code.

There is one remaining to-do item on the Repex side: a member function that slices the netCDF database and outputs MDTraj trajectories. (See, e.g. https://github.com/choderalab/repex/issues/49).

I think cleaning up this issue will lead to massive readability improvements and could make the Yank-Repex-MDTraj pipeline the go-to tool for analysis on repex datasets.

Ensure that doctests run in interactive mode.

Sometimes our doctests contain code that only works during a doctest, e.g.:

        >>> # Create a reference system.
        >>> import testsystems
        >>> [reference_system, coordinates] = testsystems.AlanineDipeptideImplicit()
        >>> # Create a factory.
        >>> factory = AbsoluteAlchemicalFactory(reference_system, ligand_atoms=[0, 1, 2])
        >>> factory._is_restraint([0,1,2])
        False
        >>> factory._is_restraint([1,2,3])
        True
        >>> factory._is_restraint([3,4])
        False
        >>> factory._is_restraint([2,3,4,5])
        True

Because we're calling AbsoluteAlchemicalFactory in a local namespace, this code will not run without first running from alchemy import *.

We should therefore explicitly import alchemy and use the module name in the docstring.

The key advantage of this is that it allow users to copy paste docstrings into their python session for learning purposes.

choderalab / yank Goto Github PK

yank's Introduction

YANK

Documentation

Getting Started

Examples

Maintainers

Contributors

yank's People

Contributors

Stargazers

Watchers

Forkers

yank's Issues

Recommend Projects

Recommend Topics

Recommend Org