Giter Club home page Giter Club logo

alchemlyb's Introduction

alchemlyb: the simple alchemistry library

Zenodo DOI Documentation Build Status Code coverage anaconda package

alchemlyb makes alchemical free energy calculations easier to do by leveraging the full power and flexibility of the PyData stack. It includes:

  1. Parsers for extracting raw data from output files of common molecular dynamics engines such as GROMACS, AMBER, NAMD and other simulation codes.
  2. Subsamplers for obtaining uncorrelated samples from timeseries data (including extracting independent, equilibrated samples [Chodera2016] as implemented in the pymbar package).
  3. Estimators for obtaining free energies directly from this data, using best-practices approaches for multistate Bennett acceptance ratio (MBAR) [Shirts2008] and BAR (from pymbar) and thermodynamic integration (TI).

Installation

Install via pip from PyPi (alchemlyb) :

pip install alchemlyb

or as a conda package from the conda-forge (alchemlyb) channel :

conda install -c conda-forge alchemlyb 

Update with pip :

pip install --update alchemlyb

or with conda run :

conda update -c conda-forge alchemlyb

to get the latest released version.

Getting involved

Contributions of all kinds are very welcome.

If you have questions or want to discuss alchemlyb please post in the alchemlyb Discussions.

If you have bug reports or feature requests then please get in touch with us through the Issue Tracker.

We also welcome code contributions: have a look at our Developer Guide. Open an issue with the proposed fix or change in the Issue Tracker and submit a pull request against the alchemistry/alchemlyb GitHub repository.

References

Chodera2016

Chodera, J.D. (2016). A Simple Method for Automated Equilibration Detection in Molecular Simulations. Journal of Chemical Theory and Computation 12, 1799–1805.

Shirts2008

Shirts, M.R., and Chodera, J.D. (2008). Statistically optimal analysis of samples from multiple equilibrium states. The Journal of Chemical Physics 129, 124105.

alchemlyb's People

Contributors

dotsdl avatar drdomenicomarson avatar harlor avatar helmutcarter avatar hl2500 avatar ialibay avatar ianmkenney avatar jhenin avatar lee212 avatar msoroush avatar orbeckst avatar ptmerz avatar schlaicha avatar shuail avatar trje3733 avatar ttjoseph avatar vtlim avatar wehs7661 avatar xiki-tempula avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

alchemlyb's Issues

Add `openany` context manager for decompression of bzip2, gzip, zip, etc.

We want parsers to be able to support parsing files that are compressed using common compression utilities, such as bzip2, gzip, zip, rar, etc. This is easy enough to support by building a context manager such as openany like that found in MDAnalysis. We cannot directly lift the code from MDAnalysis due to incompatible licensing, but we can implement our own simplified version of this mechanism for use by all parsers.

AMBER parsers

In line with the overall API proposal, we want to have parsers for each of the major MD engines, and eventually have coverage for all of those in use. Since there are essentially two types of estimators (TI and FEP), each packages needs a parser for:

  1. Extracting reduced potentials u_nk from output files (for FEP). (EDIT: This objective is postponed, emphasis right now on TI. Open issue for u_nk when needed. — @orbeckst 2017-11-03)
  2. Extracting derivatives DHdl from output files (for TI).

This issue is the nexus for discussion for such parsers for the AMBER package. If you have existing parsing code for this package, comment below and we can begin adapting it into the parsers outlined above in a PR.

Gromacs XVG parser - Calculation of the reduced potential goes badly wrong

Assuming to find values of U and pV in the second and last column can go badly wrong if they are not given.

# not entirely sure if we need to get potentials relative to
# the state actually sampled, but perhaps needed to stack
# samples from all states?
U = df[df.columns[1]]
# gromacs also gives us pV directly; need this for reduced potential
pV = df[df.columns[-1]]
u_k = dict()
cols= list()
for col in dH:
u_col = eval(col.split('to')[1])
u_k[u_col] = beta * (dH[col].values + U.values + pV.values)

What do you think: Should we print a warning when these columns are not given and continue with what is given or directly throw an exception?

Btw: Is it really necessary to add pV if the simulations are in NVT? Why do we add U?

Amber FEP parser should return reduced potentials u_nk

The current version of the Amber FEP parser returns energies (as used to be the case in alchemical-analysis). alchemlyb prescribes that the u_nk standard form of the data be in reduced units. Specifically, this means that

(All energies are given in units of kT.)

  • amber.extract_u_nk() needs additional temperature T argument
  • make raw data into reduced "u_nk" potentials
  • adjust tests

IEXP estimator

Create an IEXP free energy estimator that:

  1. takes u_nk as an argument
  2. computes the free energy differences between each state along with their uncertainties

use logging

For debug and info output we do not want to use print but use the standard logging facility.

GDEL estimator

Create a GDEL estimator that:

  1. takes u_nk as an argument
  2. computes the free energy differences between each state along with their uncertainties

Nonequilibrium calculations/BAR estimator?

Has overlap with #28 , but we're working on some nonequilibrium free energy calculations (in GROMACS), which will require a BAR estimator. OK with us adding one? I would have to think some about whether a single BAR estimator can easily handle both nonequilibrium calculations, but it does within pymbar so it seems like it should be straightforward enough. @mrshirts may have input.

This is with @hannahbaumann . It would probably also be good to have input form @orbeckst , @harlor and @dotsdl .

AMBER FEP parser

In line with the overall API proposal, we want to have parsers for each of the major MD engines, and eventually have coverage for all of those in use. For Amber we already have a TI parser (#10), we still need an "MBAR/FEP" parser:

  • Extracting reduced potentials u_nk from output files (for FEP).

This issue is the nexus for discussion for such parsers for the AMBER package. If you have existing parsing code for this package, comment below and we can begin adapting it into the parsers outlined above in a PR.

Dask graphs for data processing

From my use of MBAR, MBAR itself isn't terribly expensive compared to preprocessing reduced potentials to maximize the number of independent frames N_eff using @jchodera 's automated equilibrium detection. When pulling reduced potential data from many simulations, preprocessing of this data can be done independently and in parallel prior to feeding the results to MBAR.

I used dask to make this happen using a modified version of alchemlyb.mbar.get_DG:

def process_df(sim, name, lower, upper, states):
    import numpy as np
    from pymbar.timeseries import detectEquilibration

    # get data for every `step`
    df = sim.data.retrieve(name)

    df = df.loc[lower:upper]

    # drop any rows that have missing values
    df = df.dropna()

    # subsample according to statistical inefficiency after equilibration detection
    # we do this after slicing by lower/upper to simulate
    # what we'd get with only this data available
    out  = detectEquilibration(df[df.columns[sim.categories['state']]])
    t, statinef, Neff_max = [out[i] for i in range(3)]

    # we round up
    statinef = int(np.rint(statinef))

    # drop any NA rows, which can happen from subsampled data
    df = df.dropna()

    #df = df.to_delayed()[0]

    # extract only columns that have the corresponding sim present        
    df = df[df.columns[states]]

    # subsample according to statistical inefficiency and equilibrium detection
    df = df.iloc[t::statinef]

    return df

def get_DG(sims, name, lower, upper):
    """Get DG and DDG from set of simulations. Does automatic subsampling
    for each simulation on the basis of automated equilibrium detection on its
    own reduced potential timeseries.

    Parameters
    ----------
    sims : Bundle
        Bundle of sims to grab data from.
    name : str
        Name of dataset to use.
    lower : float, dict
        Time (ps) to start block from. Could also be a dict 
        giving state number as keys and float as value.
    upper : float
        Time (ps) to end block at. Could also be a dict 
        giving state number as keys and float as value.

    Returns
    -------
    DG : array
        Delta G between each state as calculated by MBAR.
    DDG : array
        Standard deviation of Delta G between each state as calculated by MBAR.

    """
    import numpy as np    
    from pymbar import MBAR

    states = sorted(list(set(sims.categories['state'])))

    if isinstance(lower, (float, int)) or lower is None:
        lower = {state: lower for state in states}

    if isinstance(upper, (float, int)) or upper is None:
        upper = {state: upper for state in states}

    dfs = []
    N_k = []

    groups = sims.categories.groupby('state')
    for state in groups:
        dfs_g = []
        for sim in groups[state]:
            print "\r{}".format(sim.name),

            df = delayed(process_df)(sim, 
                                     name, 
                                     lower[state], 
                                     upper[state], 
                                     states)

            dfs_g.append(df)

        df = delayed(np.vstack)(dfs_g)
        dfs.append(df)

        N_k.append(delayed(len)(df))

    u_kn = delayed(np.vstack)(dfs)
    u_kn = u_kn.T

    mbar = delayed(MBAR)(u_kn, N_k)

    outs = mbar.getFreeEnergyDifferences()

    DG, DDG = [outs[i] for i in range(2)]

    return DG, DDG

For 82 simulations, this gives a graph like:

daskmbar

This graph can then be fed to any dask scheduler, and in particular the distributed scheduler has been excellent so far in making quick work of them.

The question here is what to do in general: should parts of the library return dask.Delayed objects (graphs) instead of the results themselves, leaving it to the user to feed these to a scheduler of their choice? I think there's much to be gained from a good decision on this front.

OpenMM parsers

In line with the overall API proposal, we want to have parsers for each of the major MD engines, and eventually have coverage for all of those in use. Since there are essentially two types of estimators (TI and FEP), each packages needs a parser for:

  1. Extracting reduced potentials u_nk from output files (for FEP).
  2. Extracting derivatives DHdl from output files (for TI).

This issue is the nexus for discussion for such parsers for the OpenMM package. If you have existing parsing code for this package, comment below and we can begin adapting it into the parsers outlined above in a PR.

Amber TI parser should provide dimensionless dH/dl

The current implementation of the Amber TI parser (amber.extract_dHdl) returns a dataframe with native Amber units (as used to be the case in alchemical-analysis) instead of the prescribed reduced dHdl standard form: the potential energy derivatives should be divided by 1/kT.

See for example

dHdl = beta * dHdl

Todo:

  • amber.extract_dHdl() needs additional temperature T argument
  • make data dimensionless (something like dHdl = beta * dHdL with beta = 1/(kB*T)
  • adjust tests

(See also #56 )

remove gitter

We have a gitter channel associated with alchemlyb (see the badge in the README) but we are not using it.

I would remove the badge and close the channel and for right now

  • use the issue tracker for user input,
  • the wiki for notes, and
  • email to coordinate between developers/stake holders.

We can revisit if the situation changes but I don't like opening too many channels because that just gets exhausting and will invariably disappoint someone who thinks that, in this case, gitter is a standard way to communicate with the project.

Opinions @davidlmobley @mrshirts ?

Port in plotting/analysis functionality from alchemical-analysis and yank

In recent discussions with OpenEye about free energy calculations on their Orion cloud computing platform, it became clear they will benefit from an open analysis library (alchemlyb in particular) because users are running calculations with Yank, GROMACS, and probably soon AMBER on their platform. Thus they are likely prepared to invest some developer time in ensuring alchemlyb has the functionality needed. I was recently discussing this with Christopher Bayly ( @cbayly13 ) and Gaetano Calabro ( @nividic ) there and I said I'd summarize what I thought needed to be done on the issue tracker so that @nividic could begin work in the coming weeks if everyone here is on board with it.

I think the main things that I think we would want added to analysis in the short term is:

  1. Overlap matrix analysis from alchemical-analysis, e.g. as in DOI 10.1007/s10822-015-9840-9
  2. Mixing diagrams as seen in Yank's simulation health reports
  3. Graphs of equilibration/number of effective samples as in Yank's simulation health reports
  4. Numerical analysis of statistical inefficiency/number of effective samples in input datasets
  5. Any other key features of Yank simulation health reports I'm missing? (I don't have a report in front of me at the moment.)

Possibly also there is a need for ensuring at least a threshold minimal number of samples are retained for analysis after decorrelation.

For Orion purposes I would also suggest running multiple analysis methods (BAR, MBAR, TI) whenever the data allows for it (often) and cross-checking results for consistency; inconsistencies are usually a warning of problems.

An additional list of possible changes is on the alchemical analysis features list where we were brainstorming about this.

Tagging also @harlor for thoughts as well as @hannahbaumann and @andrrizzi . And of course @dotsdl and @orbeckst .

@nividic the way to get started would be to basically pull the relevant code from alchemical-analysis and yank (allowable by the licenses) and adapt to (improve/generalize for) this.

need CI

  • set up travis to run tests #2
  • coverage or coveralls

"Advanced" GROMACS parsers needed?

I'm guessing, @dotsdl , that your current GROMACS parsers only handle what I like to think of as "normal" alchemical free energy calculations where one runs simulations at a variety of lambda values (with our without Hamiltonian exchange). Is that correct? Presumably we would want this library to also be able to parse data for more "advanced" types of free energy calculations such as those @mrshirts has worked on, like Wang-Landau sampling and other expanded ensemble techniques which are enabled in GROMACS (e.g. one simulation spanning multiple lambda values).

@mrshirts - is there someone in your group or elsewhere currently working on these techniques in GROMACS who would want to contribute parsers? We had some attempt to parse for this in alchemical-analysis, but since my group doesn't use these at all, maintaining/testing the code was problematic since things would change without us knowing about it.

Alternatively, if there's not someone currently working with these methods in GROMACS we should put this on hold until someone who is wants to contribute and just proceed without expanded ensemble.

list features to be moved from alchemical-analysis to alchemlyb

We would like to produce a list of highly desirable alchemical-analysis features in alchemlyb.

As expressed in the Roadmap, the idea is to use little pieces from alchemical-analysis and re-implement in alchemlyb together with tests and docs.

@alchemistry/alchemlyb : Please add what you want to see (or what you want to contribute) to the wiki page alchemical analysis features.

Once we have a list of features that we agree to be important then it becomes a bit easier to spend resources on working on them.

UBAR estimator

Create a UBAR estimator that:

  1. takes u_nk as an argument
  2. computes the free energy differences between each state along with their uncertainties

GOMC parser

Hi,

I implemented GOMC parser for alchemlyb and you can find my implementation in my GitHub.

I have few questions about alchemlyb and I would appreciate your help and suggestion.

  1. In alchemical-analysis, if PV value was available, was added to u_nk. However, in alchemlyb, Total Energy is also added to u_nk. Is there any reason you chose different approach?

  2. In MBAR documentation it is mentioned that The MBAR method is only applicable to uncorrelated samples from probability distributions. Is there any appropriate series that can be applied to Monte Carlo results for equilibrium_detection, statistical_inefficiency, and more ?

Remove GromacsWrapper dependency from alchemlyb.parsers.gmx

We currently use the gromacs.fileformats.XVG class for parsing out data from XVG files in the alchemlyb.parsers.gmx module. This works well, but it means GromacsWrapper is a dependency and it is not yet working under Python 3. This holds back alchemlyb from full Python 3 support.

I recommend implementing a helper function in alchemlyb.parsers.gmx that parses out the raw tabular information of an XVG file similarly to how the GromacsWrapper XVG class does this in its XVG.to_df() method. We cannot outright lift the code from GromacsWrapper due to its licensing, but we can reimplement the (minimal) functionality we need here.

RBAR estimator

Create a RBAR estimator that:

  1. takes u_nk as an argument
  2. computes the free energy differences between each state along with their uncertainties

Switching Travis CI to Xenial (Ubuntu 16.04) gives different statistical inefficiency test results

I recently tried changing our Travis CI configuration to use dist: xenial so that we could add testing against Python 3.7. I've made this into a PR (#65), since it gave errors that will require some discussion to resolve.

From what I can see, we get the following failures:

  1. It looks like the change in underlying Ubuntu version gives different results for statistical inefficiency slicing.
  2. For Python 3.7, the Amber parsers throw exceptions related to their use of generators.

We'll need to resolve these in the PR before merging to master.

parsing .gz using anyopen

The gmx parser (as well as the amber parser) uses anyopen with mode 'r'. This causes .gz files to be read in binary mode and causes a TypeError.

JOSS paper

Write a short software paper on alchemlyb/alchemtest for JOSS.

  • Author instructions – create a repo and then submit, ~1000 words, see examples
  • authors?
    • any contributor (name on a commit)
    • must edit and ok paper
    • willing to help with revisions (paper/code/docs)

A JOSS paper does not really pre-empt a full scientific paper (such as a "Best practices in free energy calculations" for LiveCOMS, unless this is already in the works) but it would put a citable "stake in the ground".

Opinions @davidlmobley @mrshirts @brycestx @shuail @harlor @dotsdl @alchemistry/alchemlyb ?

agree on license

When @dotsdl started the project he used the permissive BSD 3-clause license. However, it is still early in the project and we can easily change the license at this stage. I would like to have a discussion and eventually a consensus on the license.

I tend to license any code coming out of academia under GPL, reasoning that this will in the long run create the largest benefit to the (academic) scientific eco system because it forces anyone using the code for their projects making their code available in turn under conditions that allow me (and others) to use it without restriction.

I am fully aware that not everyone agrees with this view and I had a very good general discussion with @davidlmobley on the matter already. I am inviting anyone with a stake and opinion on the matter to make their point and I hope that this will be a fruitful discussion.

Support Python 2?

The scientific Python development community is very much in favor of dropping support for Python 2 in the next year, since supposedly advances in Python 3 make it harder to justify trying to maintain Python 2 compatibility. Being aware that much of the scientific Python user community is a good deal slower in adopting Python 3, there's certainly an argument for doing our best to make sure everything we do in alchemlyb works for Python 2. However, this does cost limited development time and effort.

What is the current feeling here? Do we proceed trying to support both, or go for broke with Python 3 without looking back?

Implement advanced (e.g. expanded ensemble) parsers for GROMACS/provide sample data

As per #11 , we need some more advanced parsers for expanded ensemble and other more sophisticated free energy calculations for the GROMACS side of things. @mrshirts and @trje3733 will help with this, though the likely timescale is a couple of months.

Progress could perhaps be made sooner if sample XVG files could be provided. @mrshirts , are you currently doing these calculations or do you know anyone who is, so we could get some samples?

Implement minimal TI estimator

We would like to ship a minimal TI estimator with the first pre-release, since getting this to work will solidify the structure of the dHdl data structure that all parsers will need to produce. At a minimum, it requires a fit method and attributes yielding the delta_f_ dimensionless free energy matrix and d_delta_f_ uncertainties.

create roadmap

On the Skype call on 2017-10-16 we agreed to come up with a preliminary roadmap (for the next call in two weeks).

Please add to the Roadmap wiki page.

General discussion in this issue. If you want to be kept up to date, just subscribe or comment on this issue.

cc @mrshirts @davidlmobley

GINS estimator

Create a GINS estimator that:

  1. takes u_nk as an argument
  2. computes the free energy differences between each state along with their uncertainties

API proposal

The following is an API proposal for the library. This proposal has been prototyped, with some of the components described already implemented at a basic level. This functionality is demoed in this gist.

alchemlyb

alchemlyb is a library that seeks to make doing alchemical free energy calculations easier and less error prone. It will include functions for parsing data from formats common to existing MD engines, subsampling these data, and fitting these data with an estimator to obtain free energies. These functions will be simple in usage and pure in scope, and can be chained together to build customized analyses of data.

alchemlyb seeks to be as boring and simple as possible to enable more complex work. Its components allow work at all scales, from use on small systems using a single workstation to larger datasets that require distributed computing using libraries such as dask.

Core philosophy

  1. Use functions when possible, classes only when necessary (or for estimators, see (2)).
  2. For estimators, mimic the scikit-learn API as much as possible.
  3. Aim for a consistent interface throughout, e.g. all parsers take similar inputs and yield a common set of outputs.

API components

The library is structured as follows, following a similar style to scikit-learn:

alchemlyb
|
 -- parsing
 |  |
 |   -- gmx
 |  |
 |   -- amber
 |  |
 |   -- openmm
 |  |
 |   -- namd
 |
  -- preprocessing
 |  |
 |   -- subsampling
 |
  -- estimators
    |
     -- mbar_
    |
     -- ti_

The parsing submodule contains parsers for individual MD engines, since the output files needed to perform alchemical free energy calculations vary widely and are not standardized. Each module at the very least provides an extract_u_nk function for extracting reduced potentials (needed for MBAR), as well as an extract_DHdl function for extracting derivatives required for thermodynamic integration. Other helper functions may be exposed for additional processing, such as generating an XVG file from an EDR file in the case of GROMACS. All extract_* functions take similar arguments (a file path, parameters such as temperature), and produce standard outputs (pandas.DataFrames for reduced potentials, pandas.Series for derivatives).

The preprocessing submodule features functions for subsampling timeseries, as may be desired before feeding them to an estimator. So far, these are limited to slicing, statistical_inefficiency, and equilibrium_detection functions, many of which make use of subsampling schemes available from pymbar. These functions are written in such a way that they can be easily composed as parts of complex processing pipelines.

The estimators module features classes a la scikit-learn that can be initialized with parameters that determine their behavior and then "trained" on a fit method. So far, MBAR has been partially implemented, and because the numerical heavy-lifting is already well-implemented in pymbar.MBAR, this class serves to give an interface that will be familiar and consistent with the others. Thermodynamic integration is not yet implemented.

The convergence submodule will feature convenience functions/classes for doing convergence analysis using a given dataset and a chosen estimator, though the form of this is not yet thought-out.
However, the gist shows an example for how this can be done already in practice.

All of these components lend themselves well to writing clear and flexible pipelines for processing data needed for alchemical free energy calculations, and furthermore allow for scaling up via libraries like dask or joblib.

Development model

This is an open-source project, the hope of which is to produce a library with which the community is happy. To enable this, the library will be a community effort. Development is done in the open on GitHub, with a Gitter channel for discussion among developers for fast turnaround on ideas. Software engineering best-practices will be used throughout, including continuous integration testing via Travis CI, up-to-date documentation, and regular releases.

David Dotson (@dotsdl) is employed as a software engineer by Oliver Beckstein (@orbeckst), and this project is a primary point of focus for him in this position. Ian Kenney (@ianmkenney) and Hannes Loeffler (@halx) have also expressed interest in direct development.

Following discussion, refinement, and consensus on this proposal, issues for each need will be posted and work will begin on filling out the rest of the library. In particular, parsers will be crowdsourced from the existing community and refined into the consistent form described above. Expertise in ensuring theoretical correctness of each component, in particular estimators, will be needed from David Mobley (@davidmobley), John Chodera (@jchodera), and Michael Shirts (@mrshirts).

DEXP estimator

Create a DEXP free energy estimator that:

  1. takes u_nk as an argument
  2. computes the free energy differences between each state along with their uncertainties

change the subsampling method to use pymbar.timeseries.subsampleCorrelatedData()

Proposed to change the subsampling method to avoid using the pandas slicing for subsampling, this issue is corresponding to the pull request #38

The pandas slicing method requires the steps to be integer which is rounded from the statistical inefficiency of a time series. This will sometimes make different time series indistinguishable from each other in terms of their subsampling positions. Proposed to switch to the pymbar subsampleCorrelatedData function.

TI_CUBIC estimator

Create a TI_CUBIC estimator that:

  1. takes dHdl as an argument
  2. computes the free energy differences between each state along with their uncertainties

TI estimator uncertainty

The TI estimator uses the trapezoid rule to integrate the <dH/dl> values. I believe that the current implementation doesn't calculate the correct uncertainty for the Integral because it calculates just the sum of the squares of uncertainties for each bin:

dout.append(d_deltas[i] + d_deltas[i+1:i+j+1].sum())

I suggest to calculate the uncertainty equivalently to alchemical-analysis where the uncertainty is directly derived from the variances of the given dH/dls. See #61

Expand TI estimator doc page

The TI estimator doc page is currently a stub. The top matter should give a brief summary (already present) followed by a more detailed explanation of how TI works, with references and figures. This will function as an example of how other estimator pages should also look, and is a (relatively) easy case to start with.

Gromacs EDR reader

At the moment, the native Gromacs energy (EDR) files have to be converted to XVG files.

Instead we should also offer a EDR parser.

@jbarnoud wrote https://github.com/jbarnoud/panedr which reads a EDR into a dataframe. This would interact nicely with alchemlyb.

panedr is published under LGPL so we would be able to link (import) it in alchemlyb (BSD!) but we cannot integrate the code itself. For that to work, panedr needs to become available as a pip-installable (and eventually conda-installable) package.

If we want to go in this direction then we can fork panedr and contribute to its development.

BAR estimator

Create a BAR estimator that:

  1. takes u_nk as an argument
  2. computes the free energy differences between each state along with their uncertainties

NAMD parsers

In line with the overall API proposal, we want to have parsers for each of the major MD engines, and eventually have coverage for all of those in use. Since there are essentially two types of estimators (TI and FEP), each packages needs a parser for:

  1. Extracting reduced potentials u_nk from output files (for FEP).
  2. Extracting derivatives DHdl from output files (for TI).

This issue is the nexus for discussion for such parsers for the NAMD package. If you have existing parsing code for this package, comment below and we can begin adapting it into the parsers outlined above in a PR.

Improve docs by giving multidimensional example for GROMACS

For those less familiar with pandas it would probably be helpful to give an example of GROMACS analysis where two lambda values are changed simultaneously; right now the docs only give an example with a single lambda variable, which makes the syntax less complex. For two dimensions I ended up with something like mbar.delta_f_.loc[[(0.0, 0.0)], [(1.0, 1.0)]] but figuring out the right placement of brackets and parens was tricky to me (I'm still a pandas newb).

CHARMM parsers

In line with the overall API proposal, we want to have parsers for each of the major MD engines, and eventually have coverage for all of those in use. Since there are essentially two types of estimators (TI and FEP), each packages needs a parser for:

  1. Extracting reduced potentials u_nk from output files (for FEP).
  2. Extracting derivatives DHdl from output files (for TI).

This issue is the nexus for discussion for such parsers for the CHARMM package. If you have existing parsing code for this package, comment below and we can begin adapting it into the parsers outlined above in a PR.

new release 0.2.0

We reached the "Immediate/Pressing" milestones in our roadmap and have a whole bunch of new features as well as really good testing coverage.

I'd like to release a 0.2.0 once the latest Amber parser changes are in. (If we can get a NAMD parser, too, then that would be a bonus, but I don't know what the time line is for that)

Is there anything else that we need to take care of, @alchemistry/alchemlyb ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.