Giter Club home page Giter Club logo

yank's Introduction

GH Actions Status Travis Build Status Anaconda Cloud Badge Anaconda Cloud Downloads DOI CodeClimate

YANK

An open, extensible Python framework for GPU-accelerated alchemical free energy calculations

Documentation

Documentation, tutorials, and best practices can be found at getyank.org.

Getting Started

See the Quickstart instructions.

Examples

Examples are available in the yank-examples repository:

git clone https://github.com/choderalab/yank-examples.git

Maintainers

Contributors

  • Kim Branson (Stanford)
  • Kyle A. Beauchamp (MSKCC)
  • Peter M. Eastman (Stanford)
  • Mark Friedrichs (Stanford)
  • Imran Haque (Stanford)
  • Patrick B. Grinaway (MSKCC)
  • Christoph Klein (University of Virginia)
  • Rosa Luirink (VU Amsterdam)
  • Daniel L. Parton (MSKCC)
  • Randy Radmer (Stanford)
  • Arien Sebastian Rustenburg (MSKCC)
  • Michael Shirts (University of Colorado Boulder)
  • Kai Wang (University of Virginia)

yank's People

Contributors

andrrizzi avatar bas-rustenburg avatar dprada avatar hannahbrucemacdonald avatar jaimergp avatar jchodera avatar jeffcomer avatar jugmac00 avatar kyleabeauchamp avatar lnaden avatar mikemhenry avatar pgrinaway avatar smsaladi avatar steven-albanese avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

yank's Issues

Deprecate `from sets import Set`

Since Python2.4, set has been a built in type.

I'm pretty sure we can deprecate support for python 2.4--it's been ~2 years since I last used 2.6...

Slow tests in alchemy.py

Is this speed considered normal?

[reference_system, coordinates] = testsystems.LysozymeImplicit()

[...]

In [55]: %time reference_state = reference_context.getState(getEnergy=True)
CPU times: user 139.48 s, sys: 0.00 s, total: 139.48 s
Wall time: 139.62 s

[...]

In [69]: %time alchemical_state = alchemical_context.getState(getEnergy=True)
CPU times: user 142.45 s, sys: 0.02 s, total: 142.46 s
Wall time: 142.54 s

How should we feed input to YANK?

We have to make some decisions about how we tell YANK what we want it to do.

To be specific, we need to tell it:

  • What input to use for the ligand, which might be a mol2 file, SDF file, IUPAC or common name, AMBER prmtop/inpcrd pair, etc.
  • What input to use for the receptor, which might eventually be a PDB file, a PDB ID, an AMBER prmtop/inpcrd pair, or even another small molecule (as in the host-guest case).
  • If anything isn't parameterized, we need to tell YANK how to assign parameters.
  • There are some other things we may need to tell YANK about how to set up systems in explicit solvent or build in missing atoms/residues.
  • There are some run parameters too, like how many iterations to use, what kind of restraints, etc. Most of this should eventually be fully automated, but there are a few parameters right now.

We have a few options for how to specify this:

  • Python scripts that use the Yank module. All parameters are coded in Python.
  • Command-line scheme, perhaps using Robert's commandline tool, so we can say something like
    • yank setup to set up a calculation
    • yank run to run/resume a calculation
    • yank info to get some quick info on progress
    • yank analyze to analyze a calculation
  • Some sort of input parameter file format, like XML or JSON

Thoughts?

Fix travis-ci for yank

Currently, it seems like it stalls on cloning mdtraj via github:

https://travis-ci.org/choderalab/yank/builds/20381725#L3206-L3210

Can I just use the pypi package instead, or a conda package?

Use MDTraj for trajectory alignment / quaternions

So Robert and I have done some work integrating RMSD and alignment features into MDTraj.geometry

We could possibly pull in some of the rotation stuff that's currently in Yank. One advantage is that it could be easier to maintain there...

Set up examples / test cases for SystemBuilder

We want the following test cases to work initially:

  • T4 lysozyme L99A + p-xylene (to compare to prmtop/inpcrd route)
    • T4 lysozyme: PDB file, to be parameterized with app.ForceField with specified forcefield file(s) and implicit/explicit solvent choice
    • p-xylene: mol2 file, to be parameterized with gaff2xml
  • CB[7] host-guest system
    • CB[7] host: mol2 file
    • hosts: list of IUPAC names
  • ligand design
    • RCSB ID with chain identifiers, to be parameterized with app.Forcefield with specified forcefield file(s) and implicit/explicit solvent choice
    • Chemdraw file containing one or more molecules, to be parameterized with gaff2xml

User np.linspace / etc for spacings

IMHO we should avoid typing out 50 alchemical intermediates, e.g.:

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.95, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.925, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.90, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.85, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.80, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.75, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.70, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.675, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.65, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.60, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.55, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.50, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.40, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.30, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.20, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.10, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.05, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.025, 1.)) # 

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.00, 1.)) # discharged, LJ annihilated

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.00, 1.)) # discharged, LJ annihilated

+        alchemical_states.append(AlchemicalState(0.00, 0.00, 0.00, 1.)) # discharged, LJ annihilated

Accelerate _show_mixing_statistics()

The oldrepex._show_mixing_statistics() function (also in new repex) is useful, but slows down as the number of iterations increases. We should accelerate this with something like weave, cython, or the like.

Vacuum calculation: How should we handle this?

I've just removed the old pyopenmm pure-Python implementation of System and Force classes that I previously used to determine periodicity or remove Force objects from a System object. Now we have no way to create vacuum versions of molecules in case we want to compute hydration free energies in parallel with binding free energies.

It may be OK to simply leave this out and focus on binding free energies, since hydration free energies are extremely specialized anyway.

Do people have unmerged commits?

If so, we should file some [WIP] pull requests, so that we can each see what everyone is working on--this will help us avoid dealing with conflict resolution.

Find ways to reduce file sizes

Michael Shirts mentioned that his datasets for T4 lysozyme using old YANK were 44GB total for about 20 ligands. For his work on larger sets of proteins, he is generating ~4TB of data.

We should explore ideas for cutting down file sizes. Some obvious ones:

  • Enabling NetCDF compression by default
  • Saving checkpoint data less frequently, but energy data more frequently

YANK license?

What license do we want to use for YANK?

Currently, everything is GPL, but we may want to use a more permissive license, like LGPL.

It may be easiest to use the same license as OpenMM, unless there are issues with the libraries we use.

ImportError after install using setup.py

After installing yank with setup.py, I get this error:
>>> import yank Traceback (most recent call last): File "<stdin>", line 1, in <module> File "yank/__init__.py", line 29, in <module> import version ImportError: No module named version

Eliminate duplicated functions

We should try to move as much as possible to a "standard" modules.

For example,

kyleb@kb-intel:~/src/kyleabeauchamp/yank$ cat yank/*.py|grep analyze_accept
def analyze_acceptance_probabilities(ncfile, cutoff = 0.4):
def analyze_acceptance_probabilities(ncfile, cutoff = 0.4):

I've seen other duplicated functions as well. IMHO, one way to deal with this is to create utils.py and dump any "utility" functions there--until we find a better place to put them.

ModifiedHamiltonianExchange

So should ModifiedHamiltonianExchange be replaced by a "regular" HamiltonianExchange object with a particular choice of MCMC moveset? I'm trying to wrap my head around where each component belongs.

IndexError complex_coordinates

When running yank.py example "p-xylene," an IndexError is thrown when yank.py attempts to access self.complex_coordinates[0].

It seems to be related to the following comment at the top of the code:

  • Handle complex_coordinates argument in Yank more intelligently if different kinds of input are provided.
    Currently crashes if a Quantity is provided rather than a list of coordinate sets.

I'll play around with it.

Use a library to print tables?

In several places in Yank, we have code that formats various tables (e.g. TProb).

If we don't mind a Pandas dependency, we could just do this:

T = pd.DataFrame(T)
T.to_string(formatter_lambda_function)

I already took the liberty of trying this in my Repex refactor.

If we don't do this, at the very least we should write one function that formats tables and try to re-use it as much as possible.

Migrate complicated doctests to nosetests

So we have a lot of doctests that are pretty complex, with ~20 lines of code.

It might be nice for us to reserve doctests for tests that are primarily illustrative--and putting more complex tests in a separate set of nosetests.

Reading PDB files

Is there a reason that we manually parse PDB files instead of letting app.PDBFile.getPositions(asNumpy=True) do all the work for us?

SystemBuilder systems explode in alchemically-modified states

So, when the SystemBuilder-made complex system is used to construct a yank object, the alchemical intermediate systems explode. Here is the code that I'm currently using to construct the exploding systems:

    import os
    import simtk.unit as unit
    import simtk.openmm as openmm
    import numpy as np
    import alchemy
    import simtk.openmm.app as app
    #os.environ['AMBERHOME']='/Users/grinawap/anaconda/pkgs/ambermini-14-py27_0'
    os.chdir('../examples/p-xylene')
    ligand = Mol2SystemBuilder('ligand.tripos.mol2', 'ligand')
    receptor = BiomoleculePDBSystemBuilder('receptor.pdb','protein')
    complex_system = ComplexSystemBuilder(ligand, receptor, "complex")
    complex_positions = complex_system.positions
    receptor_positions = receptor.positions
    print type(complex_system.coordinates_as_quantity)
    timestep = 1.0 * unit.femtoseconds # timestep
    temperature = 300.0 * unit.kelvin # simulation temperature
    collision_rate = 20.0 / unit.picoseconds # Langevin collision rate
    minimization_tolerance = 10.0 * unit.kilojoules_per_mole / unit.nanometer
    minimization_steps = 20
    plat = "CUDA"
    i=2
    platform = openmm.Platform.getPlatformByName(plat)
    forcefield = app.ForceField
    systembuilders = [ligand, receptor, complex_system]
    receptor_atoms = range(0,receptor.traj.top.n_atoms)
    ligand_atoms = range(receptor.traj.top.n_atoms,complex_system.traj.top.n_atoms)
    factory = alchemy.AbsoluteAlchemicalFactory(systembuilders[i].system, ligand_atoms=ligand_atoms)
    protocol = factory.defaultComplexProtocolImplicit()
    systems = factory.createPerturbedSystems(protocol)

    #test an alchemical intermediate and

    for p in range(1,len(systems)):
        print "now simulating " + str(p)
        if p==5:
            continue #for some reason 5 is poorly behaved
        integrator_partialinteracting = openmm.LangevinIntegrator(temperature, collision_rate, timestep)
        context = openmm.Context(systems[p], integrator_partialinteracting, platform)
        context.setPositions(systembuilders[i].openmm_positions)
        openmm.LocalEnergyMinimizer.minimize(context, minimization_tolerance, minimization_steps)
        outfile = open('out_test'+str(p)+'.pdb','w')
        app.PDBFile.writeHeader(systembuilders[i].traj.top.to_openmm(), outfile)
        for k in range(10):
            integrator_partialinteracting.step(100)
            state = context.getState(getEnergy=True, getPositions=True)
            app.PDBFile.writeModel(systembuilders[i].traj.top.to_openmm(), state.getPositions(), outfile,0)
        app.PDBFile.writeModel(systembuilders[i].traj.top.to_openmm(), state.getPositions(), outfile,0)
        app.PDBFile.writeFooter(systembuilders[i].traj.top.to_openmm(), outfile)
        outfile.close()

Eliminate Trajectory Analysis code

So analyze.py has lots of trajectory analysis code that duplicates things I've already written in MDTraj. We can replace a lot of this redundant code with only a few lines of MDTraj code.

There is one remaining to-do item on the Repex side: a member function that slices the netCDF database and outputs MDTraj trajectories. (See, e.g. https://github.com/choderalab/repex/issues/49).

I think cleaning up this issue will lead to massive readability improvements and could make the Yank-Repex-MDTraj pipeline the go-to tool for analysis on repex datasets.

Ensure that doctests run in interactive mode.

Sometimes our doctests contain code that only works during a doctest, e.g.:

        >>> # Create a reference system.
        >>> import testsystems
        >>> [reference_system, coordinates] = testsystems.AlanineDipeptideImplicit()
        >>> # Create a factory.
        >>> factory = AbsoluteAlchemicalFactory(reference_system, ligand_atoms=[0, 1, 2])
        >>> factory._is_restraint([0,1,2])
        False
        >>> factory._is_restraint([1,2,3])
        True
        >>> factory._is_restraint([3,4])
        False
        >>> factory._is_restraint([2,3,4,5])
        True

Because we're calling AbsoluteAlchemicalFactory in a local namespace, this code will not run without first running from alchemy import *.

We should therefore explicitly import alchemy and use the module name in the docstring.

The key advantage of this is that it allow users to copy paste docstrings into their python session for learning purposes.

import numpy as np

IMHO, this is the "standard" way of using numpy imports.

Also, it might be good to try to use all numpy, whenever possible, rather than switching between math and numpy

Implement SystemBuilder tests

Would be useful to add a yank/tests/test_systembuilder.py set of nosetests that just make sure SystemBuilder constructs valid systems from various combinations of inputs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.