ramess101 / jced_fomms_manuscript Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 1.0 279.92 MB

TeX 100.00%

jced_fomms_manuscript's People

Contributors

Watchers

Forkers

mrshirts

jced_fomms_manuscript's Issues

Reviewer 3 Comment 2

@mrshirts @jpotoff @msoroush

The authors present a very clear example of cyclohexane to
demonstrate the utility of this approach in parametrizing new force
fields. Can the authors comment on the complexity of the optimization
process when more than two parameters need to be optimized (for example,
a molecule comprising of more than one type of interaction site)?
Further, while this maybe beyond the scope of this work, it may be
useful to comment on the ease/difficulty of using this approach when
vapor--liquid equilibria of mixtures are considered as part of the
scoring function.

I believe mixtures are outside of the scope of this work.

I will include a short statement about how we performed a single-site optimization so that it is easy to visualize. GCMC-MBAR should be just as reliable for a higher order optimization, all that would need to change is the optimization approach.

Reviewer 1 Comment 7

@mrshirts @jpotoff @msoroush

In section 3.3, the authors show the effective snapshot results for GCMC-MBAR and use them to explain the bad performance of the method in predicting liquid phase properties. They also compare with MBAR-ITIC but the comparison is not very clear to me. It would be nice to have some data from MBAR-ITIC listed in the comparison. Also the authors listed two of their hypothesis that GCMC-MBAR would experience better overlap than MBAR-ITIC when 𝜃𝑟𝑟 ≉ 𝜃𝑟𝑒𝑓. Are these hypothesis proved to be correct after the comparison? Does GCMC-MBAR still does relatively better even for liquid phase since the two hypothesis the authors raise should hold true for liquid phase.

Any ideas for what the reviewer intended with the last sentence?

I think there is some confusion regarding MBAR-ITIC, which is understandable since it is not a common method and we only reference it briefly. The whole reason we mention MBAR-ITIC here is to point out that MBAR-ITIC fails miserably when lambda_rr \neq lambda_ref, while GCMC-MBAR is still relatively reliable. But there really is no way to compare MBAR-ITIC and GCMC-MBAR directly because the state points are completely different.

However, I do agree that we need to make it clear that these results support our hypothesis, namely, GCMC-MBAR yields more reliable estimates with greater Keff than MBAR-ITIC for theta_rr \not\approx theta_ref.

Reviewer 3 Comment 3

@mrshirts @jpotoff @msoroush

For polar molecules, do the authors anticipate that the GCMC-MBAR
approach will be capable of parametrizing the electrostatic charges with
a similar efficiency as it can be done for the non-polar interactions?

The GCMC-MBAR method is applicable to parameterizing electrostatic charges, although it is unknown if GCMC-MBAR will be as reliable over a wide range of charge values. From an optimization standpoint, fitting charges and van der Waals parameters simultaneously can lead to a multi-modal problem which might require MBAR to explore very large regions of parameter space to hop between local optima. I am not sure we need to discuss the optimization aspect, but maybe we should just leave electrostatics as future work.

Comparison of MiPPE with other force fields

@jpotoff @msoroush

JCED requires that we compare our results with those for other force fields.

Previously I had only included the TraPPE literature values as validation of our GCMC results for TraPPE. Now I include the Exp-6 model of Errington, LJ model of Mauricio (Vrabec), LJ+quadrupole model of Eckl (Hasse), and the AUA model of Bourasseau et al.

I use lines for GCMC results since these are so close together that it doesn't make sense to use data points.

Unfortunately, I cannot find the tabulated DeltaHv for Exp-6. From their figures, I know the values are pretty similar to MiPPE, but it would be nice to show that here.

From this figure it is clear that no other literature model is far superior to MiPPE.

Reviewer 1 Comment 5

@mrshirts @jpotoff @msoroush

On page 6 in the force fields section, authors mention that the bond stretching potential is neglected for all the force fields. They compared their result with those from Mick et al. but Mick also used fixed bond models. This is not valid justification. Have the authors conducted any tests showing that the bond stretching has no influence on the simulation result or is this reported from literatures in the similar systems? The same question comes to electrostatic interactions. Please show evidence that it is valid to ignore these interactions.

We compare with Mick et al because we are validating that our MBAR results agree with HR. There would actually be much less justification if we compared MBAR results that used fixed bonds with HR results that used flexible bonds, or vice versa. But I think the reviewer's point is really aimed at whether VLE calculations, in general, are sensitive to flexible or fixed bonds. I know there have been a few (older) studies that looked into the impact of bond treatment. I will try to find those. @jpotoff @msoroush can you think of any studies off the top of your head?

The question regarding electrostatic interactions is somewhat misplaced. One could argue that performing simulations with fixed or flexible bonds is a set-up choice (not strictly tied to the force field) whereas electrostatics are completely a model choice when fitting a force field. All UA or AUA models, that I am aware of, do not use point charges for normal, branched, and cyclic hydrocarbons. I don't think we need to vindicate this decision that has been made for the
OPLS-UA, TraPPE, NERD, AUA4, TAMie, and MiPPE force fields. Any thoughts?

Abstract

@mrshirts @jpotoff @msoroush

Let me know if you have any concerns/suggestions for the abstract. Here are the JCED MMS guidelines that I was trying to follow:

Hardware description

@msoroush

I would like to include a one sentence description of the hardware you are using to run the cyclohexane simulations. Here is what it says so far:

Could you help me fill in the missing information? Also, please correct if you used a different compiler or GOMC version. Thanks!

Name of Potoff Mie force field

@jpotoff @msoroush @jrelliottoh @mostafa-razavi

I know that we discussed the name of the Potoff Mie force field prior to our IFPSC10 submission. We decided just to call it "Potoff," and I have done this in all of my previous publications. However, @jpotoff was not included as an author in any of those publications. Referring to the "Potoff" force field in a manuscript co-authored by Potoff seems somewhat strange to me. Do we want to assign a more definitive name to the force field?

The most similar force fields (in construction, atom typing, target data, etc.) are probably TraPPE and TAMie. I have actually seen one article refer to the Potoff force field as "TraPPE-Mie." I don't think Ilja would be particularly pleased if we stole the iconic TraPPE trademark. TAMie stands for "Transferable Anisotropic Mie", so an obvious option would be TUMie (Transferable United-atom Mie" or TIMie (Transferable Isotropic Mie). But we don't need to limit ourselves to mimic these two names. We could try something like NERD (Nath Escobedo and de Pablo revised, where they took some artistic liberties to rearrange the letters). In fact, TAMie referred to the Potoff force field for alkanes as PBB (Potoff and Bernard-Brunel). But there have been so many subsequent papers with different authors that neither NERD or PBB seems appropriate.

Any more ideas?

Here are some that come to mind:

Mie-UA
Mie-Potoff
TraMP (Transferable Mie Potentials)
TraMie (Transferable Mie)

I guess the question is, what do we really want to emphasize? Is it the Mie potential? Transferability? United-atoms? Vapor-liquid coexistence properties? The authors?

Basis functions

@jpotoff @msoroush

In your comments and in Issue #8 we discussed whether the basis functions details should be in the text or SI. I decided to include it in the text since it only requires a paragraph and one equation. Does this help clarify how we compute basis functions? Any suggestions?

Manuscript ready for further review

@mrshirts @jpotoff @msoroush

I have finished a long list of revisions to the manuscript.

Please read over any sections that you had concerns about previously. A full review would also be appreciated if possible. Please make sure to get your changes back to me no later than Tuesday evening so that I can address your comments before the Wednesday deadline.

To avoid any issues with GitHub, the current manuscript can be found here:

JCED_FOMMS_manuscript.pdf

Also, I will continue to make revisions to the supporting information. But I will post the SI when it is ready.

Thank you again!

Reviewer 3 Comment 1

@mrshirts @jpotoff @msoroush

In figure 2, the authors perform small local variations to identify a
slightly shifted local minima for the different molecules. Have the
authors considered performing larger variations in the scaling factor
for the well-depth to check if there is an even lower minima in the
scoring (error) function? It may be useful to also show the sensitivity
of these plots to reasonable variations in the scoring function. (Let's
say vary the distribution of weights between liquid density and vapor
pressure by about 10%).

Two separate issues raised here.

First, regarding the individualized scan over a wider range to find an even lower minima. Because this analysis does not modify sigma or lambda, the scoring function just continues to increase with large deviations in the scaling factor. We can see this clearly in the heat maps for cyclohexane where the errors just continue to get worse over a wide range of epsilon for a fixed sigma/lambda.

Second, although it would be very simple to play around with the scoring function and it could be insightful from an optimization standpoint, I think it would distract from the main purpose of this analysis, namely, MBAR provides very smooth optimization valleys when varying a single scaling factor and that this scaling factor is typically close to 1. If we changed the scoring function we would no longer have any reason to hope for psi close to 1 because the transferable and individualized models would have different targets.

Deadline

@jpotoff @mrshirts @msoroush

Good news! I just found out that the deadline is actually not today, it is Wednesday December 19th. So we have a few more days to read through the manuscript and make sure that all of our data look good.

Comments follow-up

@jpotoff @msoroush

I just wanted to ask some questions about a few of your comments.

On issue with HS-GCMC is that it is unclear how much efficiency is gained by actually sampling the different states within one simulation. That was never quantified by Errington et al. MBAR has the benefit of not requiring any additional CPU time to sample different Hamiltonians.

Do we want to raise this point? I wasn't sure if I want to speculate on the efficiency gain of HS-GCMC. However, it is nice to point out that MBAR does not require any additional CPU time.

would it be better to state that we found a typographical error in the table (missed negative sign)? What we have here sounds a bit ominous when this was really just a transcription error.

Is this any better?

BTW, this is how our reweighting code works. It's summing over snapshots. Nigel Wilding explained the method to me back around 1997. Originally we did it to get around hardware memory limitations.

So are the equations in the histogram reweighting section not actually representative of how you compute VLE?

How is the 3X array generated? Are you taking the snapshot lists produced by GOMC and reprocessing them with the basis functions to add the third column of data?

I plan on explaining this more in the supporting information. Do you think an explanation needs to be in the main text?

Seems a little strange that TraPPE to MiPPE is quite a bit better than MiPPE to TraPPE for 2,2,4-trimethylpentane (and most other compounds). Any ideas why?

I would say they are equally accurate in the vapor phase, but your observation appears to apply to the liquid phase. My only idea is that because TraPPE has a softer potential (lam = 12 and smaller sigma) it samples a more diverse group of short range distances while MiPPE samples a much more narrow group of short range distances.

Physical explanation of Keff in vapor and liquid phases

@mrshirts

You mentioned that it would be helpful to have a physical explanation for why the overlap is worse in the liquid phase. I have added a paragraph to explain this more clearly. Let me know if you would like to modify this explanation at all.

Reviewer 2 Comment 3

@mrshirts

comment about equation 13: I am glad that the sign of the mu*N term is now corrected to be negative. I assumed the positive sign to be a misprint in the first MBAR paper. But when the wrong positive sign reappeared in subsequent publications I started to have doubt about whether some intricacy existed about the sign.

This is equation 13:

@mrshirts I assume the "first MBAR paper" the reviewer is referring to is "Statistically optimal analysis of samples from multiple equilibrium states," where the reduced potential energy is defined as:

I believe we have the sign correct in the present manuscript, but I just wanted to double check with you first.

Reviewer 1 Comment 4

@mrshirts @jpotoff @msoroush

On page 5 in the third paragraph the authors admit that GCMC-MBAR is very similar to HS-GCMC in nature. They are both powerful tools to optimize force field parameters and the authors list several advantages of GCMC-MBAR over HS-GCMC in their opinion. I think this is a very important comparison that the authors need to address more. Does HS-GCMC possess the same capability as GCMC-MBAR and does it suffer from the same problem that GCMC-MBAR has when predicting the liquid phase saturation properties at 𝜃𝑟𝑟 ≉ 𝜃𝑟𝑒𝑓? It will be very helpful to see some simulation result comparison in the paper between these two methods.

I agree that the comparison with HS-GCMC is important, but I certainly do not intend to perform simulations with the HS-GCMC method. From my understanding, HS-GCMC is limited to values of theta that are considered a priori. So the question of whether it works for 𝜃𝑟𝑟 ≉ 𝜃𝑟𝑒𝑓 is somewhat misguided, i.e., HS-GCMC only works for theta_ref which is why it uses a set of theta_ref. @jpotoff is there anything you think we need to add to our comparison between GCMC-MBAR and HS-GCMC?

Reviewer 1 Comment 1

@mrshirts @jpotoff @msoroush

I probably should have just posted all of the reviewer comments from the beginning. A few of the comments seemed simple enough that I could answer them by myself. But I think it is best to allow everyone to provide feedback, if desired. So I am now going to post the other (non-typographical) comments.

Optimization Algorithm

@mrshirts @jpotoff @msoroush

I thought it would be helpful to include the recommended optimization algorithm, namely, starting with the TraPPE force field, obtaining pseudo-optimals for each lambda, simulating at these points, refining the optimization. In two simple small batches of simulations we go from TraPPE to MiPPE.

I put the algorithm at the end of Section 3.3 so that it is right before the Case Study. Here is what I have so far. Please provide feedback on how to make this more clear:

Equation 8 and 9

@mrshirts @jpotoff

Michael expressed a concern that I have had for a while but failed to mention. In the manuscript we refer to Pr_i(N,U) as the probability, when really I believe it is the number of snapshots that fall in the histogram bin N,U.

Here are the equations and text as they were found previously:

I think we want this to read K_snaps,i(N,U), since this would more clearly denote a histogram "count" rather than a probability.

Does this look correct?

Review of Outline

Here are @mrshirts comments for the outline:

Justification of these force fields? If it's that MBAR can be used to perform a parameter scan, justify doing something that isn't a parameter scan.

Want to make sure all the comparison are as much apples-to-apples as possible. i.e. use the same amount of data (ideally, the same data!)

I like mentioning the importance sampling from the mixture distrubution part - it's a little easier to understand.

They can be estimated analytically, but if computer time available, bootstrap is better. NOTE: bootstrap can be done trivially on histograms as well - and that's better (at fixed cost) to running 5 simulations and taking the std.

Can be generalized to more complex systems (cite Levi's work) but a pain.

Statistically indistinguishable if there is enough data and bins are well chosen.

What do you see as the main scientific point to this part [estimating Potoff, NERD from TraPPE, etc.]?

Should quantify in terms of N_eff, i.e. expain why the difference. Will also explain how to correct if needed.

Doing it with both sigma and epsilon, or just a single overall scaling parameter?

This could be too much data to ask people to keep. Maybe just ensure that all programs are compatible with outputing configurations and reevaluating energies rapidly from stored configurations.

be more precise what you mean by improvements.

Be more precise what you want to compute here

Reviewer 2 Comment 4

@mrshirts @jpotoff @msoroush

Eq. 17: internal energies for vapor phases are suffering strong finite size scaling effects in GCMC simulations with low number / vanishing number of molecules. Among all quantities considered in this work, it is probably the least size-robust quantity. The enthalpy of vaporization determined from eq. 17 therefore suffers from substantial finite size effects. Robust values for enthalpies of vaporization that more closely resemble infinite size systems are determined from the Clausius-Clapeyron equation (as stated by Weidler and Gross, 2016, see above). The derivative dln(p_sat)/d(beta) is a native quantity in the spirit of histogram reweighing.

Here is Eq 17:

Here is what Weidler and Gross stated:

I don't think we want to change how we compute DeltaHv for two reasons. First, this will mess up the validation comparison with GCMC-HR (Figures 1 and 3). Second, the individualized optimizations (Figure 2) will be inconsistent with the original MiPPE parameterization (for branched alkanes only since alkynes do not include DeltaHv). We could modify the approach for the cyclohexane optimization, but I think this will only confuse the reader to have two DeltaHv approaches. @jpotoff @msoroush what are your thoughts?

Reviewer 2 Comment 5

@mrshirts @jpotoff @msoroush

Cyclohexane torsions

@jpotoff @msoroush

We have discussed this previously in Issue #8 . The question is how/where do we want to address the use of n-alkane torsions for cyclohexane?

I have moved most of the bonded parameter discussion to supporting information, according to the JCED MMS guidelines:

So now I am not sure if we want to include the statement about TraPPE cyclohexane torsions in the manuscript or in the supporting information. If we put it in the supporting information it is much less likely that this error will be corrected and it is quite likely that future practitioners will continue to use the wrong parameters. So maybe we should include it in the manuscript. However, if we are trying not to offend Siepmann the SI might be a better spot.

I think we should include it in the manuscript but be very careful how we word it. The current version is probably still too harsh, so I would appreciate any additional feedback.

Reviewer 2 Comment 6

@mrshirts @jpotoff @msoroush

why is the pressure defined along a slope extrapolation? In HR-GCMC it is simply determined from the probability density function at N=0, where it the value relates to the grand canonical partition function.

Here is how this is described in the manuscript:

@jpotoff @msoroush Is our explanation not correct or inconsistent with the common approach for GCMC-HR?

Uncertainties

@mrshirts @jpotoff @msoroush

I have included the following paragraph at the end of Section 2.3.4 ("Computing saturation properties") to describe that we use bootstrap re-sampling to determine the VLE uncertainties. Note that the one exception is for the MiPPE cyclohexane results, where we have 20 replicates running right now.

Any suggestions to make this more clear?

Task list

Include bootstrapped uncertainties from MBAR:
- When comparing MBAR and HR
- When plotting the optimal epsilon-scaling factor
Analyze generalized Potoff and TraPPE for branched alkanes
Obtain branched alkane data
Basis function results
- Validation
- 2-D scan for CH3 and CH2
- 2-D scan for CH2 of cyclohexane with TraPPE as reference

section 3.4: this case study ignores a paper where a Mie force field (also lamda=16) was earlier proposed for cyclohexane, with (I guess) essentially the same results. Differences should appear only because the objective function is slightly different here. The conclusion of this work is that the here proposed force field is „the most accurate“. Omitting the previous study is (at least) awkward, because the authors seem to be aware of the line of developments from these authors, as the reference list shows. Is there a reason for omitting the study mentioned above from the list of „most reliable force fields from literature. Ref 22,24,27,55,56,57“ ?

The reviewer never clarifies what paper has a Mie force field for cyclohexane. @jpotoff @msoroush Any ideas which paper the reviewer is referring to? Who are "these authors" that are included in our reference list? I can't find one for TAMie (the only other Mie family we cite). I know there are parameters obtained with SAFT-gamma that use fractional lambda values and some single-cite cyclohexane Mie models. But I don't think we have any data for those models.

Reviewer 2 Comment 2

@mrshirts @jpotoff @msoroush

Reviewer 1 Comment 6

@mrshirts @jpotoff @msoroush

On page 10 in the last paragraph, the authors state that they use CBMC moves to enhance the insertion acceptance rate. It would be nice to show the actual acceptance ratio from sample simulations in the supporting information.

@msoroush I assume GOMC outputs these values somewhere in the log file. I am not sure how we would want to present these results (figure or table?) and how many systems we would need to include (a single example or all of the systems?). How would you recommend we present the acceptance ratio data?

Title

@mrshirts @jpotoff @msoroush

I have thrown out a lot of different titles for the manuscript (although although most of them are pretty similar). I would appreciate if we could each vote on our favorite title and/or propose new ones.

Here are the top ones so far:

Histogram-free reweighting to estimate vapor-liquid coexistence properties of non-simulated force fields
Using histogram-free reweighting to estimate vapor-liquid coexistence properties of non-simulated force fields
Histogram-free reweighting enables the estimation of vapor-liquid coexistence properties for non-simulated force fields
A histogram-free reweighting approach to estimate vapor-liquid coexistence properties for non-simulated force fields
A histogram-free reweighting approach for estimating vapor-liquid coexistence properties of non-simulated force fields
Estimating vapor-liquid coexistence properties of non-simulated force fields with histogram-free reweighting

Reporting parameters from optimization

@mrshirts @jpotoff @msoroush

I thought we should include a table with the different parameter sets for each iteration of the optimization. I have updated the notation in the manuscript such that superscripts correspond to the stage. I still need to fill in the 14-6, 18-6 and 20-6 optimals for stage 2.

Any thoughts or suggestions?

Review of Introduction

@mrshirts @msoroush @jpotoff

I decided to keep track of the manuscript feedback on GitHub. You are welcome to respond via email because I will copy all comments to this issue tracker anyways:

I know that I just sent you the outline this morning, but I thought I would send you the manuscript as certain sections are completed. This way you can review the paper piecemeal at your convenience. Currently, I have a rough draft of the Introduction (see attached, alternatively, you can access the JCED_FOMMS_manuscript.tex file at https://github.com/ramess101/JCED_FOMMS_Manuscript).

I would especially appreciate your opinions whether you think the comparison of GEMC and GCMC-HR is necessary/helpful for the Introduction. Because Michael is our MBAR expert and Jeff and Mohammad are the GCMC-HR experts, please make sure my descriptions are accurate regarding these two fundamental aspects of the manuscript.

ramess101 / jced_fomms_manuscript Goto Github PK

jced_fomms_manuscript's People

Contributors

Watchers

Forkers

jced_fomms_manuscript's Issues

Recommend Projects

Recommend Topics

Recommend Org