ramess101 / mbar_lj_two_states Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 1.0 51.6 MB

Python 82.03% Shell 17.97%

mbar_lj_two_states's People

Contributors

Watchers

Forkers

mrshirts

mbar_lj_two_states's Issues

Pressures in condensed states

Michael,

I have worked a lot on trying to predict pressures from the PCF-PSO approach. Unfortunately, we need these pressures to be for liquid and/or supercritical states where it becomes very difficult to obtain reliable estimates of P. Basically, even with a very accurate PCF you can get very large differences in P using the PCF-PSO approach.

Because of this short coming we (Richard Elliott and I) are trying to use MBAR to predict pressures for different values of epsilon from a single reference state epsilon (with the same sigma). The phase space overlap is fairly good (i.e we have Neff > 10 and sometimes ~100 from a total of N=1000). We get good estimates of U (when compared to the direct U value for a given epsilon). However, the pressures we obtain do not appear to be very good estimates. Although the uncertainties in P from MBAR are larger than they are for U, they still do not compensate for the discrepancy between the direct simulation P and the MBAR P. I have three questions that might shed some light on the issue.

When scaling the internal energy U by beta, should we be using the instantaneous T? Or the NVT ensemble value of T that is used in the thermostat? Previously I used the fixed value of T but in this example we actually used the snapshot values of T. The fluctuations in T were fairly large so I thought this might lead to poor weighting.
Do the MBAR uncertainties adequately represent the very large scatter in the virial that you obtain from liquid simulations? The MBAR uncertainties just seem so much smaller than the scatter in the virial (pressure) would suggest.
Is it important to make sure the snapshots are not highly correlated? Currently I am using snapshots every 1 ps, just so that we have 1001 snapshots from a 1 ns run. However, I noticed in your original lj_shirts MBAR code you only captured every 20 snapshots (i.e. every 20 ps). I assume this is to account for correlated data. Would correlated snapshots (i.e. sampled every 1 ps) lead MBAR to underestimate the error?

Thanks

Rich

Ethane questions

Michael,

I am working on comparing PCF-PSO and MBAR some more. I have added a new folder titled "Ethane" to the "MBAR_LJ_two_states" repository on Github. I believe I already invited you to join that repository, but I have also attached the necessary files just in case.

The simulation results were obtained using gromacs by performing NVT simulations for ethane using different models. The energyi_j.txt files contain the simulation energies and pressures, where "i" refers to the model used to generate the configurations and "j" refers to the model used in the "rerun". For example, energy0_2.txt was simulated with model 0 (TraPPE) and rerun with model 2 (Mess-UP).

The purpose is to see how well MBAR predicts U for two different models (Potoff and Mess-UP) provided configurations obtained from a single simulation using a "reference" model (TraPPE). Potoff is a Mie potential and the overlap with the TraPPE model is very poor (only 1 sample from 1000 actually contributes). Mess-UP is a LJ potential that is very similar to the TraPPE potential and so it has much better overlap (from 1000 samples the number of effective samples is about 500).

I am trying to make sense of the results and I was hoping you could answer some of my questions.

Is the computing of expectation values done properly on line 112? Note that on line 112 I am providing A_kn, which is U_kn (see line 110) which is U_00 (see line 103). In other words, A_kn is just the internal energies of the TraPPE model evaluated with the TraPPE model (NOT with the new state, i.e. the Potoff or Mess-UP models).
Or, should I compute the expectation values like I did on lines 120-121? Note that I get the same result for EA_k[0] and EA_k_alt[0] but I get a very different value for EA_k[1] and EA_k_alt[1]. The difference is that for EA_k_alt[1] (line 121) I use the MBAR weights multiplied by U_01, i.e. the internal energies evaluated using the new state (Potoff or Mess-UP) from a rerun evaluation using the configurations for the reference state (TraPPE).
Compare "MBAR" and "MBAR_alt" for internal energy on the top figure (titled "Comparison of Different..."). The goal is to have the MBAR blue line or the MBAR_alt orange line match the "Simulated" green line. Recall that "Simulated" is obtained by actually simulating the new model, rather than using MBAR or "Rerun". Whereas, "Rerun" just assumes that the configurations all have a weighting of 1 (the PCF-PSO approach).

You may notice that when the new state is the Potoff model (i.e. new_state = 1) the MBAR_alt orange line is not only better than the MBAR blue line but it is better than the Rerun red line. (By "better" I mean that at "Model = 1.0" it matches the Simulated green line more closely.) On the other hand, when the new model is the Mess-UP model (new_state = 2) the "Rerun" red line agrees most closely with the "Simulated" green line. While the MBAR_alt orange line significantly under predicts internal energy and MBAR blue line significantly over predicts internal energy. This is very surprising since Mess-UP has better phase space overlap (number of effective samples is much greater for Mess-UP than for Potoff).

So why would the Potoff model work better with MBAR_alt than Mess-UP when Potoff only has one effective sample? Did Potoff just get "lucky"? Or is it because MBAR_alt assigned a weight of 1 to the only sample for Potoff that had an internal energy similar to what the real Potoff system would produce? For example, the only sample that contributes (sample ~750) also has the most negative internal energy (see "Model 1 Rerun" in the second figure from the top). Or is it because MBAR_alt is incorrect? If MBAR_alt is incorrect (and MBAR is correct) why then does "Rerun" outperform the Mess-UP model when Mess-UP has over 500 effective samples? Could it be because the pressure fluctuations are so large in the liquid phase that some states are assigned a greater weight than they really should have? For example, when using the Mess-UP model (i.e. new_state = 2) the second figure from the bottom shows that samples ~220 and ~750 are contributing too much to the average. This can be seen in the second figure from the top where samples ~220 and ~750 for "Model 2 Rerun" have internal energies that are much more negative than most of the internal energies for "Model 2 Simulated".

I know that it might be hard to follow exactly what I did and what I am referring to in my questions, so if something is unclear just let me know. Note that I have also included the main conclusions and my questions in the README file and the python code.

Finally, based on the information you provided in your previous emails, I have attempted to use smaller systems sizes (50 molecules rather than 400) and longer simulation runs (20 times longer). But the results I obtained were essentially the same. That is, Potoff still has around 1 effective sample and MBAR_alt is better than Rerun which is better than MBAR.

Also, sorry for the notation "MBAR_alt". I don't mean to imply that I have modified MBAR in any way. I am just not sure which "MBAR" result is the correct way to implement MBAR.

I look forward to hearing what ideas you might have.

Thanks again for your expertise.

Enjoy your time in Blacksburg!

Rich

Argon questions

Michael,

I have invited you to collaborate on two github repositories, "MBAR_LJ_two_states" and "MBAR_sampling_LJ_methane".

MBAR _LJ_two_states is a very simple python code that helps demonstrate what I was trying to explain yesterday. Essentially, if I only use configurations from state 0 I get a very poor prediction of internal energy and density for state 1. You were correct in your assumption that the reason why this is happening is because state 1 only has a single configuration that has a non-negligible weight (sample 986). The sigma and epsilon values are different by about 1% and 3%, respectively, for states 0 and 1. I guess I am surprised that this small of a deviation in sigma and epsilon would result in such poor overlap.

MBAR_sampling_LJ_methane is a modified version of the code that you provided me. It requires a few of the additional files that you provided me in lj_bayesian (outlined in the README file). I believe that this file demonstrates that the previous conclusion described for MBAR_LJ_two_states is a general one. In other words, the expectation values for all 91 other states are not accurate when they are obtained using configurations from only state 0. Again, it looks like this is a weighting/overlap issue. (Note that I have changed line 465 so that the data are only imported from state 0 when load_ukln is set to true.)

I guess you already explained why this is the case, my only questions are:

Am I implementing MBAR correctly?
Is this a typical result when only a single state is used for generating configurations?

Let me know if the code or plots are unclear.

Thanks

Rich

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.