openbiosim / sire Goto Github PK
View Code? Open in Web Editor NEWSire Molecular Simulations Framework
Home Page: https://sire.openbiosim.org
License: GNU General Public License v3.0
Sire Molecular Simulations Framework
Home Page: https://sire.openbiosim.org
License: GNU General Public License v3.0
The import of ProgressBar
is missing when trying to perform a custom measure across a trajectory.
To Reproduce
>>> import sire as sr
>>> mols = sr.load(sr.expand(sr.tutorial_url,
... "kigaki.gro", "kigaki.top"), silent=True)
>>> df = mols.atoms().trajectory().measures({
... "distance" : lambda atoms: sr.measure(atoms[8192], atoms[8191]),
... }, to_pandas=True)
This raises the error
NameError: name 'ProgressBar' is not defined
Expected behavior
It should work ;-)
(please complete the following information):
This works using the legacy API, but not with the new string grammar.
New units grammar:
In [1]: from sire import u as unit
In [2]: temp = unit("kelvin")
In [3]: print(temp)
1 K
In [4]: temp.to("celsius")
Out[4]: 1.0
Direct conversion using legacy units:
In [1]: import sire.units as units
In [2]: temp = units.kelvin
In [3]: print(temp)
1 K
In [4]: temp.to(units.celsius)
Out[4]: -272.15
Unlike for the constraint
option, the validation of perturbable_constraint
doesn't allow underscores, e.g.
sire/wrapper/Convert/SireOpenMM/openmmmolecule.cpp
Lines 86 to 162 in 4a1025e
This option is now parsed via sire.options.PerturbableConstraint
, which will sanitize all input to use underscores. As such, the creation of a dynamics object will fail for any perturbable constraint other than "none"
or "auto"
.
Hi, I'm using a cationic dummy model which incorporates 12-6-4 Lennard-Jones potential in building my system. Following system parameterisation, a new flag named LENNARD_JONES_CCOEF has been introduced to the topology file. However, upon loading and saving the topology file in biosimspace, BSS omits both this flag and another flag named IPOL.
LENNARD_JONES_CCOEF:
This section contains the LJ C-coefficients(Ci,j) for all pairs of distinct LJ types.
IPOL:
This section contains a single integer that is 0 for fixed-charge force fields and 1 for force fields that contain polarization.
I use Amber22/Ambertools23, and my parameterisation process is similar to this: https://ambermd.org/tutorials/advanced/tutorial20/12_6_4.php
Here are the prm7/rst7 files of my system before and after BioSimSpace and Sire. 1264.zip
Is it possible for BioSimSpace to consider the missing flags as the 12-6-4 LJ has been fully supported by Amber?
Additionally, I have checked that other MD engines do not support 12-6-4 LJ. Therefore, maybe could let BioSimSpace decide whether to omit these flags based on the output file format or engine chosen by the user?
Thanks!
Describe the bug
When simulations at different lambda windows are not all run for an equal amount of time, the TI result from analyse_freenrg
is incorrect by up to several kcal / mol. This is because the arrays used to hold the gradients are initialised with zeros according to the length of the longest simulation. For all simulations other than the longest, the last values will remain 0. This means that when finding the mean, the result is much closer to 0 than it should be.
I will submit a PR to fix this shortly.
(please complete the following information):
dev
releaseDescribe the bug
Errors writing Gro87 files when only some molecules have velocities defined.
This is being worked on in the fix_garbage_gro87_velocities
I have fixed the bug. Just need to add a regression test, and will then issue a PR.
I suspect there may be something similar with AmberRst7 that I will double-check too.
We'd like to support reading and writing for Schrödinger MAE (and MAEGZ) files. There is an open source reader for the underlying data structure, and an openbabel implementation that includes test files and shows how the keys relate to the molecular data.
Describe the solution you'd like
A working parser so that we can use sire.load()
and sire.save()
for MAE and MAEGZ type files.
Describe alternatives you've considered
We could look at wrapping OpenBabel and use that to convert file formats that we don't natively support. The file format is relatively simple, so a native solution shouldn't be too difficult.
It would be really useful if the existing trajectory alignment code could support alignment against arbitrary views or objects. Currently, it is only possible to align the trajectory against a sub-view of one frame from that trajectory.
Describe the solution you'd like
# align the first N atoms of this trajectory against the
# N atoms selected in `other_mols["search_string"]`
>>> aligned = mols.trajectory(align=other_mols["search_string"])
# perform an alignment with `other_mols` where the mapping
# is passed in the `mapping` argument
>>> mapping = sr.match_atoms(mols["search_string"],
... other_mols["search_string"])
>>> aligned = mols.trajectory(align=other_mols,
... mapping=mapping)
Note that we will also need a sr.match_atoms
function, which could draw heavy inspiration from BSS.matchAtoms
.
Describe the bug
toRDKit function is broken
To Reproduce
create the env with
mamba create -n BSS_test -c openbiosim/label/main -c conda-forge BioSimSpace rdkit ambertools
Then run
import BioSimSpace as BSS
mol = BSS.Parameters.openff_unconstrained_2_0_0('C').getMolecule()
BSS.Convert.toRDKit(mol)
Gives
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <module>:1 │
│ │
│ /home/ubuntu/miniconda3/envs/BSS_test/lib/python3.10/site-packages/BioSimSpace/Convert/_convert. │
│ py:518 in toRDKit │
│ │
│ 515 │ converted_obj : │
│ 516 │ The object in OpenMM format. │
│ 517 │ """ │
│ ❱ 518 │ return to(obj, format="rdkit", property_map=property_map) │
│ 519 │
│ 520 │
│ 521 def toSire(obj, property_map={}): │
│ │
│ /home/ubuntu/miniconda3/envs/BSS_test/lib/python3.10/site-packages/BioSimSpace/Convert/_convert. │
│ py:182 in to │
│ │
│ 179 │ format = format.lower().replace(" ", "") │
│ 180 │ │
│ 181 │ if not format in supportedFormats(): │
│ ❱ 182 │ │ raise ValueError( │
│ 183 │ │ │ f"Unsupported format '{format}', options are: {', '.join(supportedFormats()) │
│ 184 │ │ ) │
│ 185 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Unsupported format 'rdkit', options are: biosimspace, openmm, sire.
Expected behavior
It works.
(please complete the following information):
Describe the bug
Sorry for pushing this bug to you. It is a interchange 0.3 issue. Do you mind chat with the interchange people to fix this issue? Thanks.
To Reproduce
Create a new environment mamba create -n test -c conda-forge -c OpenBioSim/label/dev biosimspace=2023.2 AmberTools
import BioSimSpace as BSS
BSS.Parameters.parameterise("c1ccccc1C(=O)[O-]", "openff_unconstrained-2.0.0").getMolecule()
Gives
Segmentation fault (core dumped)
Expected behavior
Runs
(please complete the following information):
Additional context
Add any other context about the problem here.
In the process of creating the 2023.2.3
release I ran sire_bigtests against the release builds on all platforms. All passed, except for Windows where I see the following failure for all Python variants:
test_timer.test_timer ... FAIL
======================================================================
FAIL: test_timer.test_timer
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\Miniconda3\envs\sire_build\lib\site-packages\nose\case.py", line 197, in runTest
self.test(*self.arg)
File "D:\a\sire_bigtests\sire_bigtests\tests\SireQt\test_timer.py", line 33, in test_timer
assert ns >= 990000000
AssertionError:
-------------------- >> begin captured stdout << ---------------------
Minimum time is 0.0010999999999999998 ms
Slept for 1 s == 989.5609999999999 ms
(See here for the full log.)
This is a legacy test, i.e. it doesn't test new functionality, or anything that was updated for this patch release. The run used the exact same version of qt-main
as the last successful build, i.e. 5.15.8
.
(The 2023.2.3
packages are now labelled with main
and test
since I wanted to get out working BioSimSpace builds for the platforms that do work.)
This feature branch contains development of the code to support loading and saving of files in the SDF v3000 format.
This is proposed to be via a new SDF3000
molecule parser, which will be heavily inspired by the SDF
parser (perhaps even inherited). This will recognise files with the .sdf
and .sd3
file extensions.
>>> mols = sr.load("molecules_v3000.sdf")
>>> mols = sr.load("molecules.sd3")
The challenge will be telling the parser to save the file in v3000 format as opposed to the standard SDF format. This will either be by the file extension (if it is .sd3
) or by setting the appropriate format
flag, e.g.
>>> sr.save(mols, "molecules.sd3")
>>> sr.save(mols, "molecules.sdf", format="SDF3000")
Describe the bug
I'm running somd-freenrg
with a slurm submission script and my runs seem to only do 1 cycle (according to the somd.out
). Looking at somd.err
I only get a UserWarning and slurm does not produce any errors.
To Reproduce
I've attached one of my lambda run directories to the issue with all my files.
$BSS_HOME/somd-freenrg -C ./somd.cfg -l $lambda -c ./somd.rst7 -t ./somd.prm7 -m ./somd.pert -p CUDA 1> ./somd.out 2> ./somd.err
Traceback (most recent call last):
File "/home/jguven/Software/miniconda3/envs/obss/share/Sire/scripts/somd-freenrg.py", line 161, in <module>
OpenMMMD.runFreeNrg(params)
File "/home/jguven/Software/miniconda3/envs/obss/lib/python3.10/site-packages/sire/legacy/Tools/__init__.py", line 175, in inner
retval = func()
File "/home/jguven/Software/miniconda3/envs/obss/lib/python3.10/site-packages/sire/legacy/Tools/OpenMMMD.py", line 3041, in runFreeNrg
Sire.Stream.save([system, moves], restart_file.val)
UserWarning: std::bad_alloc
Expected behavior
All cycles should finish normally, and somd.err
should not raise the bad_alloc
warning.
Input files
scripts_for_issue.zip
issue.zip
(please complete the following information):
Describe the bug
Calling sire functions within a parallel region (for example using a worker
within concurrent.futures
) results in exceptions due to failures to import sire modules (despite sire being imported prior to parallelisation).
To Reproduce
Steps to reproduce the behavior:
Run the following code:
import sire as sr
import concurrent.futures
def get_info(mols):
map = {
"integrator": "langevin_middle",
"platform": "CPU",
"threads": 1,
"temperature": 300 * sr.units.kelvin,
}
omm = sr.convert.to(mols, "openmm", map=map)
s = omm.getState(getEnergy=True)
return s.getPotentialEnergy()
if __name__ == "__main__":
workers = 4
mols = sr.load(sr.expand(sr.tutorial_url, "ala.top", "ala.crd"), silent=True)
with concurrent.futures.ThreadPoolExecutor(max_workers=workers) as executor:
jobs = []
results_done = []
for i in range(workers):
kw = {"mols": mols}
jobs.append(executor.submit(get_info, **kw))
for job in concurrent.futures.as_completed(jobs):
result = job.result()
results_done.append(result)
This gives the error
AttributeError: module 'sire.convert' has no attribute 'to'
Expected behavior
This code should run, in this case giving 4 identical potential energies (this is precisely what it does when run in serial)
(please complete the following information):
Additional context
This is almost certainly an issue with lazy importing, as of now the issue can be fixed by either importing sire.convert.to
within the parallel region, or calling this functionality prior to the parallel region.
The SIRE_NO_LAZY_IMPORT
doesn't appear to be functioning properly, so it doesn't fix the issue, instead it gives the error AttributeError: module 'sire' has no attribute 'units'
.
QString::compare
is used to compare property keys when swapping water topologies, i.e. to make sure that properties aren't updated if the molecule is flagged as non_searchable
. However, the return value is being treated as a bool
, when it is actually an int
, where a value of zero indicates different strings.
I have fixed an import error triggered by updates of the Sire Units code in StandardState.py
https://github.com/OpenBioSim/sire/blob/feature-fixstandardstatecorrection/wrapper/Tools/StandardState.py
This has allowed me to run the program from the command line, but I have encountered various other runtime errors.
Running
standardstatecorrection -C ./somd.cfg -r traj000000001.dcd -s 1 -b 5 -d 0.25 -o 4
with
somd.zip
(dcd file not included in the attached)
generates the following output
64], [30.611033320426944, 40.67983399118694, 40.89288466317314]]}
Traceback (most recent call last):
File "/home/julien/miniforge3/envs/sire-dev/share/Sire/scripts/standardstatecorrection.py", line 137, in <module>
StandardState.run(params)
File "/home/julien/miniforge3/envs/sire-dev/lib/python3.10/site-packages/sire/legacy/Tools/__init__.py", line 175, in inner
retval = func()
File "/home/julien/miniforge3/envs/sire-dev/lib/python3.10/site-packages/sire/legacy/Tools/StandardState.py", line 361, in run
weight_norm_factor = ROT / sum([x[1] for x in guest_orientations])
TypeError: unsupported operand type(s) for +: 'int' and 'tuple'
The TypeError happens at a line of code introduced in this previous commit e2e3709 by @fjclark
I don't understand what would have changed in Sire since November 2022 to cause this previous code to break ?
The "analyse_freenrg.py" script is currently broken. Around line 520, it tries to get the simulation temperature by parsing the input .dat file. It's assuming that it can split the line on whitespace and get the temperature from field 3 and the unit from field 4. However, the simfile.dat file now reads
#Generating temperature is 298°K
instead of
#Generating temperature is 298 K
so the parsing fails.
In addition, the code later in the script doesn't work as k_boltz now seems to have a bunch of units attached which causes an exception when it is applied to a bunch of matrix slices - I don't understand the units code well enough to work out how to fix this.
To Reproduce
Unpack the attached files. Run
sire_python --ignore-ipython analyse_freenrg.py mbar -i simfile_*.dat" -o freenrg-MBAR.dat --temperature 298.0 --percent 95 --overlap
This is with Sire 2023.1.1, Linux
Angles returned by the box_angles()
method have length units:
In [1]: import sire as sr
In [2]: s = sr.load_test_files("ala.crd", "ala.top")
In [3]: rst7 = sr.legacy.IO.AmberRst7(s._system)
In [4]: rst7.box_angles()
Out[4]: ( 90 Å, 90 Å, 90 Å )
In [5]: rst7.box_angles()[0].angle()
Out[5]: 0
In [6]: rst7.box_angles()[0].length()
Out[6]: 1
As discovered in Exscientia/biosimspace#22, the binary AMBER restart file requires a title attribute to be used with pmemd
for certain simulation types, specifically FEP. (No idea why this is the case.) From testing here all that seems to be required is that a title is present, i.e. it doesn't need to be anything specific.
We'd like to add support for searching for the closest N views to a selection. This would be useful for, e.g. saving trajectories with the closest 10 waters to a ligand.
Describe the solution you'd like
closest = mols["water and closest(5) to ligand"]
Hi,
I've recently been running RBFEs for the TYK2 benchmark set with constraints=none
, because we want to correct these results using MM -> MM/ML legs with ANI-2x. The results generated using Perses for the same network are fine:
But the SOMD results are very poor:
The largest outlier from the SOMD ddG plot is a H -> methyl transformation (ejm 46 to jmc 28):
The experimental ddG for this edge is -0.3 kcal / mol. Previous results in the group with OFF1.3 were in very good agreement with this. Without constraints, this edge gave an average ddG of ~ - 10 kcal /mol:
Leg | Run1 | Run2 | Run3 |
---|---|---|---|
Bound | 1.3 | 1.1 | 0.6 |
Free | 11.0 | 11.0 | 11.4 |
But rerunning this with constraints=hbonds
(I now realise I should have been using hbonds=hbonds-notperturbed
) resulted in an average ddG of ~ -1.0 kcal / mol:
Leg | Run1 | Run2 | Run3 |
---|---|---|---|
Bound | 1.6 | 1.8 | 1.9 |
Free | 2.7 | 2.8 | 2.7 |
The main issue seems to be with the free leg. Looking at the free overlap matrix without constraints, there are issues with poor overlap between lambda = 0.9 and 1.0:
#Overlap matrix
0.6718 0.2828 0.0431 0.0022 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.2828 0.4370 0.2372 0.0404 0.0025 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000
0.0431 0.2372 0.4320 0.2374 0.0467 0.0035 0.0001 0.0000 0.0000 0.0000 0.0000
0.0022 0.0404 0.2374 0.4188 0.2461 0.0501 0.0046 0.0004 0.0001 0.0000 0.0000
0.0000 0.0025 0.0467 0.2461 0.4034 0.2351 0.0557 0.0078 0.0019 0.0007 0.0000
0.0000 0.0001 0.0035 0.0501 0.2351 0.3998 0.2469 0.0514 0.0097 0.0034 0.0000
0.0000 0.0000 0.0001 0.0046 0.0557 0.2469 0.4395 0.2298 0.0188 0.0045 0.0001
0.0000 0.0000 0.0000 0.0004 0.0078 0.0514 0.2298 0.5658 0.1401 0.0046 0.0001
0.0000 0.0000 0.0000 0.0001 0.0019 0.0097 0.0188 0.1401 0.7699 0.0593 0.0001
0.0000 0.0000 0.0000 0.0000 0.0007 0.0034 0.0045 0.0046 0.0593 0.9231 0.0044
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0001 0.0001 0.0044 0.9953
There is a dramatic free energy change from lambda = 0.8 to lambda = 1.0:
#DG from neighbouring lambda in kcal/mol
0.0000 0.1000 0.0617 0.0281
0.1000 0.2000 0.0672 0.0291
0.2000 0.3000 -0.0325 0.0284
0.3000 0.4000 -0.0788 0.0263
0.4000 0.5000 -0.1024 0.0262
0.5000 0.6000 0.0170 0.0245
0.6000 0.7000 0.7415 0.0300
0.7000 0.8000 2.1797 0.0509
0.8000 0.9000 3.9390 0.0921
0.9000 1.0000 4.2300 0.3859
Adding constraints fixes the overlap issue:
#Overlap matrix
0.2070 0.1807 0.1584 0.1386 0.1201 0.0995 0.0626 0.0184 0.0089 0.0043 0.0014
0.1807 0.1685 0.1557 0.1423 0.1281 0.1100 0.0720 0.0231 0.0118 0.0058 0.0019
0.1584 0.1557 0.1507 0.1437 0.1345 0.1198 0.0821 0.0289 0.0158 0.0080 0.0025
0.1386 0.1423 0.1437 0.1427 0.1388 0.1286 0.0928 0.0364 0.0213 0.0111 0.0036
0.1201 0.1281 0.1345 0.1388 0.1404 0.1357 0.1046 0.0470 0.0297 0.0159 0.0052
0.0995 0.1100 0.1198 0.1286 0.1357 0.1384 0.1197 0.0673 0.0464 0.0260 0.0086
0.0626 0.0720 0.0821 0.0928 0.1046 0.1197 0.1433 0.1350 0.1044 0.0622 0.0213
0.0184 0.0231 0.0289 0.0364 0.0470 0.0673 0.1350 0.2341 0.2140 0.1429 0.0528
0.0089 0.0118 0.0158 0.0213 0.0297 0.0464 0.1044 0.2140 0.2432 0.2088 0.0956
0.0043 0.0058 0.0080 0.0111 0.0159 0.0260 0.0622 0.1429 0.2088 0.2835 0.2314
0.0014 0.0019 0.0025 0.0036 0.0052 0.0086 0.0213 0.0528 0.0956 0.2314 0.5758
and prevents the large jumps in ddG near lambda = 1:
#DG from neighbouring lambda in kcal/mol
0.0000 0.1000 0.1069 0.0007
0.1000 0.2000 0.1193 0.0006
0.2000 0.3000 0.1123 0.0005
0.3000 0.4000 0.1045 0.0005
0.4000 0.5000 0.1380 0.0007
0.5000 0.6000 0.3177 0.0019
0.6000 0.7000 0.2583 0.0031
0.7000 0.8000 0.0521 0.0017
0.8000 0.9000 0.4016 0.0026
0.9000 1.0000 1.1172 0.0050
Example inputs are here.
System Details
Additional context
This is not a pressing problem as we will use Perses for these calculations. Julien asked me to raise this as a reminder for when SOMD is redeveloped in future.
Thanks very much.
New feature request - adding the frame number of a trajectory (and possibly the time data?) to the NGLView widget, so it is easier to find a particular frame, and then extract it in the next cell.
Describe the solution you'd like
The trajectory frame number should just appear in one corner of the NGLView widget, and should change as the trajectory is played.
This is very minor but has confused me in the past: the simfile.dat
files produced by somd-freenrg
have columns containing the gradients and potential energies, with stated units of kcal/mol. However, the actual values seem to be unitless, as they are multiplied by beta before being written to the file and divided again by beta before being written to the output of analyse-freenrg
.
I'm not sure if it would be preferable to correct the headings or update the output.
Thanks very much.
While the logic used to handle TER
records, i.e. adjusting atom numbers for real TER
records and HETAM
entries, it fails when there are multiple molecules since the line index (which is used for numbering) is reset for each molecule. To fix this we will need to parse in serial, and keep a tally of the total line count as we loop over the molecules.
This feature branch contains code that generalised the positional restraints and constraints functionality available in somd-freenrg.
Is your feature request related to a problem? Please describe.
The current code offers limited options to control which atoms in a system are subject to positional restraints. Currently it is possible to appy positional restraints to every heavy atoms present a system, excluding atoms contained in a list of excluded residue names, as described here.
It is only possible to freeze atoms that are contained in a list of allowed residue names, as described here
Describe the solution you'd like
It should be possible to restrain or freeze some or all atoms present in a list of residues. Such list of atoms would be contained in a csv file autogenerated by a preprocessing script run on the prm7/rst7 files used as input to somd-freenrg. The format would look like
#ResidueNumber, #AtomNumber
1,all
2,45-54
8,all
10,185-199
The residue specific positional restraints would be activated by specifying a config file keyword option
positional restraints = path_to_restraint_file
Describe the bug
When running prepareFEP.py, water models apart from TIP3P appear to be broken
To Reproduce
sire_python /path/to/share/Sire/scripts/prepareFEP.py --input1 Input1.parm7 Input1.dcd --input2 Input2.parm7 Input2.dcd --mapping mapping.txt --output Output --allow_ring_breaking True
Expected behavior
The prepareFEP call should run OK.
Input files
Input files attached as a zip file.
(please complete the following information):
Additional context
It looks like Sire tries to work out the water model based on the number of points, given the comments in _set_water_topology in _system.py. For 3-point models it seems to assume TIP3P. For 4-point models it assumes TIP4P, but the share/Sire/templates/water doesn't contain files for TIP4P, only for TIP4Pew. Renaming the "tip4pew.rst" and "tip4pew.prm7" to "tip4p.rst7" and "tip4p.prm7" seems to let things work OK.
Questions:
if the input parmtop was set up with e.g. SPC/E water, what (if anything) bad will happen here from Sire assuming that the water is in fact TIP3P?
Similarly, what (if anything) bad will happen from renaming the tip4pew files to tip4p?
Hello,
I've been attempting to run ABFE calculations with OFF1 / OFF2. These run without crashing when constraint = hbonds
but crash immediately after minimisation upon changing to constraint = allbonds
(NaN or Inf has been generated along the simulation
). When using GAFF2, there are no issues with constraint = allbonds
.
Please see all input here. The system is a ligand in a box of water, and I've included the sdf file I used to parameterise the ligand with OFF2 (through BSS).
Thanks.
Describe the bug
Waterswap calculations will not restart successfully
To Reproduce
Run a short (2 or 3 iteration) Waterswap calculation. Restart it from the restart file.
The output you get is:
Error from WaterswapTask id 10 running on cresset22: 'Process crashed??Command: "/home/mark/.conan/data/Sire/2023.1.2/cresset/stable/package/b66dc3dc3eaad13727f79a27b0d1529ed5020351/bin/sire_python" "--ignore-ipython" "/home/mark/.conan/data/Sire/2023.1.2/cresset/stable/package/b66dc3dc3eaad13727f79a27b0d1529ed5020351/share/Sire/scripts/waterswap.py" "-l" "5KR" "-t" "/tmp/FieldEngine-mark/waterswap_350997_1/Input.prmtop" "-c" "/tmp/FieldEngine-mark/waterswap_350997_1/Input.inpcrd" "-C" "/tmp/FieldEngine-mark/waterswap_350997_1/Config.txt" "-n" "6"??Cwd: /tmp/FieldEngine-mark/waterswap_350997_1??Exit code: 0??Output:??Starting /home/mark/.conan/data/Sire/2023.1.2/cresset/stable/package/b66dc3dc3eaad13727f79a27b0d1529ed5020351/bin/sire_python: number of threads equals 16????Loading configuration information from file /tmp/FieldEngine-mark/waterswap_350997_1/Config.txt????Running a waterswap calculation using files /tmp/FieldEngine-mark/waterswap_350997_1/Input.prmtop and /tmp/FieldEngine-mark/waterswap_350997_1/Input.inpcrd.??The absolute binding free energy of the molecule containing residue 5KR will be calculated.????Number of iterations to perform == 6????Using parameters:??===============??ligand name == 5KR??nmoves == 6??pdb frequency == 1??protein crdfile == /tmp/FieldEngine-mark/waterswap_350997_1/Input.inpcrd??protein topfile == /tmp/FieldEngine-mark/waterswap_350997_1/Input.prmtop??restart frequency == 1??water monitor distance == 7 Å???===============??Loading the restart file took 3121 ms??Number of iterations to perform: 6. Number of iterations completed: 3.??Initialising / loading the free energy files took 0 ms??Performing iteration 4...??Test replicas 1 2??Test replicas 3 4??Test replicas 5 6??Test replicas 7 8??Test replicas 9 10??Test replicas 11 12??Test replicas 13 14??...iteration complete (took 104780 ms)??Analysing iteration 4...??Analysing iteration 4...??Saving PDBs of the system at iteration 4...??Traceback (most recent call last):?? File "/home/mark/.conan/data/Sire/2023.1.2/cresset/stable/package/b66dc3dc3eaad13727f79a27b0d1529ed5020351/share/Sire/scripts/waterswap.py", line 205, in <module>?? WSRC.run(params)?? File "/home/mark/.conan/data/Sire/2023.1.2/cresset/stable/package/b66dc3dc3eaad13727f79a27b0d1529ed5020351/lib/python3.9/site-packages/sire/legacy/Tools/__init__.py", line 175, in inner?? retval = func()?? File "/home/mark/.conan/data/Sire/2023.1.2/cresset/stable/package/b66dc3dc3eaad13727f79a27b0d1529ed5020351/lib/python3.9/site-packages/sire/legacy/Tools/WSRC.py", line 2979, in run?? analyseWSRC(wsrc_system, i, bennetts_freenrgs, fep_freenrgs, ti_freenrgs, bound_freenrgs, free_freenrgs,?? File "/home/mark/.conan/data/Sire/2023.1.2/cresset/stable/package/b66dc3dc3eaad13727f79a27b0d1529ed5020351/lib/python3.9/site-packages/sire/legacy/Tools/WSRC.py", line 2882, in analyseWSRC?? printComponents(res_freenrgs[iteration], FILE)?? File "/home/mark/.conan/data/Sire/2023.1.2/cresset/stable/package/b66dc3dc3eaad13727f79a27b0d1529ed5020351/lib/python3.9/site-packages/sire/legacy/Tools/WSRC.py", line 2765, in printComponents?? print("%s %s %s %s" % (comps.viewAt(i).residue(), \??KeyError: 'SireMol::missing_residue: This view does not contain any residues. (call sire.error.get_last_error_details() for more info)'??'
Expected behavior
The calculation should restart successfully.
Input files
Input files for an example restart are attached.
(please complete the following information):
The wrapper around the OpenMM LocalEnergyMinimizer
isn't catching exceptions, so there is no way to know if a minimisation fails. This means that subsequent dynamics might blow up and there is no-way to autominimise, since this will also silently fail. As an example consider this stream file for a solvated merged methane-ethane system.
First, let's do things with OpenMM directly. Here lambda = 0 works, whereas lambda = 0.4 fails:
In [1]: import BioSimSpace as BSS
In [2]: from openmm import LocalEnergyMinimizer
In [3]: system = BSS.Stream.load("Methane_Ethane_solv.bss")
In [4]: omm = BSS.Convert.toOpenMM(system)
In [5]: omm.set_lambda(0.0)
In [6]: LocalEnergyMinimizer.minimize(omm, maxIterations=100)
In [7]: omm = BSS.Convert.toOpenMM(system)
In [8]: omm.set_lambda(0.4)
In [9]: LocalEnergyMinimizer.minimize(omm, maxIterations=100)
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <module>:1 │
│ │
│ /home/lester/.conda/envs/openbiosim/lib/python3.10/site-packages/openmm/openmm.py:17208 in │
│ minimize │
│ │
│ 17205 │ │ maxIterations : int │
│ 17206 │ │ │ the maximum number of iterations to perform. If this is 0, minimation is con │
│ 17207 │ │ """ │
│ ❱ 17208 │ │ return _openmm.LocalEnergyMinimizer_minimize(context, tolerance, maxIterations) │
│ 17209 │ __swig_destroy__ = _openmm.delete_LocalEnergyMinimizer │
│ 17210 │
│ 17211 # Register LocalEnergyMinimizer in _openmm: │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
OpenMMException: All Forces must have identical exclusions
Now let's use the Sire interface:
In [1]: import BioSimSpace as BSS
In [2]: import sire as sr
In [3]: system = BSS.Stream.load("Methane_Ethane_solv.bss")
In [4]: mols = sr.system.System(system._sire_object)
In [5]: m = mols.minimisation(lambda_value=0.0)
In [6]: m.run()
minimisation 0.3 s 3.1 checks / s
Out[6]: Minimisation()
In [7]: m = mols.minimisation(lambda_value=0.4)
In [8]: m.run()
Out[8]: Minimisation()
In [9]: m = mols.minimisation(lambda_value=0.0)
In [10]: m.run()
minimisation 0.3 s 2.9 checks / s
Out[10]: Minimisation()
Here the lambda = 0 minimisation runs, whereas the lambda = 0.4 one returns immediately with no exception. (I've run the lambda = 0 one twice to show that it wasn't trying to run on an already minimised system. In which case, I never committed the changes.)
Looking at the code here, it appears that the exception is being silently caught (and ignored) in the process of waiting for the run promise to finish. If I replace spinner.tick()
with raise
, then I get the same exception as I would when running directly, i.e.:
In [1]: import BioSimSpace as BSS
In [2]: import sire as sr
In [3]: system = BSS.Stream.load("Methane_Ethane_solv.bss")
In [4]: mols = sr.system.System(system._sire_object)
In [5]: m = mols.minimisation(lambda_value=0.4)
In [6]: m.run()
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <module>:1 │
│ │
│ /home/lester/.conda/envs/openbiosim/lib/python3.10/site-packages/sire/mol/_minimisation.py:98 in │
│ run │
│ │
│ 95 │ │ of max_iterations iterations. │
│ 96 │ │ """ │
│ 97 │ │ if not self._d.is_null(): │
│ ❱ 98 │ │ │ self._d.run(max_iterations=max_iterations) │
│ 99 │ │ │
│ 100 │ │ return self │
│ 101 │
│ │
│ /home/lester/.conda/envs/openbiosim/lib/python3.10/site-packages/sire/mol/_minimisation.py:39 in │
│ run │
│ │
│ 36 │ │ │ │ │
│ 37 │ │ │ │ while not run_promise.done(): │
│ 38 │ │ │ │ │ try: │
│ ❱ 39 │ │ │ │ │ │ run_promise.result(timeout=0.2) │
│ 40 │ │ │ │ │ except Exception: │
│ 41 │ │ │ │ │ │ raise │
│ 42 │ │ │ │ │ │ spinner.tick() │
│ │
│ /home/lester/.conda/envs/openbiosim/lib/python3.10/concurrent/futures/_base.py:458 in result │
│ │
│ 455 │ │ │ │ if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]: │
│ 456 │ │ │ │ │ raise CancelledError() │
│ 457 │ │ │ │ elif self._state == FINISHED: │
│ ❱ 458 │ │ │ │ │ return self.__get_result() │
│ 459 │ │ │ │ else: │
│ 460 │ │ │ │ │ raise TimeoutError() │
│ 461 │ │ finally: │
│ │
│ /home/lester/.conda/envs/openbiosim/lib/python3.10/concurrent/futures/_base.py:403 in │
│ __get_result │
│ │
│ 400 │ def __get_result(self): │
│ 401 │ │ if self._exception: │
│ 402 │ │ │ try: │
│ ❱ 403 │ │ │ │ raise self._exception │
│ 404 │ │ │ finally: │
│ 405 │ │ │ │ # Break a reference cycle with the exception in self._exception │
│ 406 │ │ │ │ self = None │
│ │
│ /home/lester/.conda/envs/openbiosim/lib/python3.10/concurrent/futures/thread.py:58 in run │
│ │
│ 55 │ │ │ return │
│ 56 │ │ │
│ 57 │ │ try: │
│ ❱ 58 │ │ │ result = self.fn(*self.args, **self.kwargs) │
│ 59 │ │ except BaseException as exc: │
│ 60 │ │ │ self.future.set_exception(exc) │
│ 61 │ │ │ # Break a reference cycle with the exception 'exc' │
│ │
│ /home/lester/.conda/envs/openbiosim/lib/python3.10/site-packages/sire/mol/_minimisation.py:27 in │
│ runfunc │
│ │
│ 24 │ │ from ..base import ProgressBar │
│ 25 │ │ │
│ 26 │ │ def runfunc(max_its): │
│ ❱ 27 │ │ │ LocalEnergyMinimizer.minimize( │
│ 28 │ │ │ │ self._omm_mols, maxIterations=max_its │
│ 29 │ │ │ ) │
│ 30 │
│ │
│ /home/lester/.conda/envs/openbiosim/lib/python3.10/site-packages/openmm/openmm.py:17208 in │
│ minimize │
│ │
│ 17205 │ │ maxIterations : int │
│ 17206 │ │ │ the maximum number of iterations to perform. If this is 0, minimation is con │
│ 17207 │ │ """ │
│ ❱ 17208 │ │ return _openmm.LocalEnergyMinimizer_minimize(context, tolerance, maxIterations) │
│ 17209 │ __swig_destroy__ = _openmm.delete_LocalEnergyMinimizer │
│ 17210 │
│ 17211 # Register LocalEnergyMinimizer in _openmm: │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
OpenMMException: All Forces must have identical exclusions
Describe the bug
bss.IO.readMolecules
is not working as excpected on lipid molecules.
To Reproduce
Steps to reproduce the behavior:
Copy the files in the working directory (they are one POPC lipd and its corresponded GROMACS topology files):
files.zip. I am Using Slipids2020 force field
Run the code
import BioSimSpace as bss
import sire
try:
membrane = bss.IO.readMolecules(["conf.gro", "topol.top"])
except:
print(sire.error.get_last_error_details())
{'where': 'FILE: /usr/share/miniconda3/envs/sire_build/conda-bld/sire_1675934655021/work/wrapper/IO/_IO_load.cpp, LINE: 160, FUNCTION: SireSystem::System load_molecules(const QStringList&, const SireBase::PropertyMap&)', 'why': 'Cannot load the molecules: There is not enough information in this parser (Gro87( title() = Protein in water, nAtoms() = 134, nResidues() = 1, nFrames() = 1, hasCoordinates() = 1, hasVelocities() = 0 )) to start the creation of a new System. You need to use a more detailed input file.', 'pid': '', 'type': 'SireError::io_error', 'backtrace': 'Backtrace disabled. Call sire.error.enable_backtrace_exceptions() to re-enable.', 'from': 'FILE: /usr/share/miniconda3/envs/sire_build/conda-bld/sire_1675934655021/work/wrapper/IO/_IO_load.cpp, LINE: 160, FUNCTION: SireSystem::System load_molecules(const QStringList&, const SireBase::PropertyMap&)'}
Expected behavior
Load the molecule
(please complete the following information):
cat /etc/*-release
:LSB_VERSION=1.4
DISTRIB_ID=Arch
DISTRIB_RELEASE=rolling
DISTRIB_DESCRIPTION="Arch Linux"
NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling
ANSI_COLOR="38;2;23;147;209"
HOME_URL="https://www.archlinux.org/"
DOCUMENTATION_URL="https://wiki.archlinux.org/"
SUPPORT_URL="https://bbs.archlinux.org/"
BUG_REPORT_URL="https://bugs.archlinux.org/"
LOGO=archlinux
sire --version
: Sire version: 2023.1.2 (dc4c1cd42953583375f162f5f240169d568ef6a9/Christopher Woods/2023-02-09T09:22:32+00:00 | **UNCLEAN**)
Hi,
I've been encountering issues when trying to use the Align.merge() functionality.
Code:
paramd = []
for filebase in ['ejm31', 'ejm42']:
mol = BSS.IO.readMolecules([f"{filebase}.sdf"])
param_mol = BSS.Parameters.gaff2(mol.getMolecule(0)).getMolecule()
paramd.append(param_mol)
ejm31, ejm42 = paramd
merged = BSS.Align.merge(
ejm31,
ejm42,
force = True
)
Error:
Large stack trace so added as text file
Input files
I am using the tyk2 data files provided in the old BioSimSpaceTutorials repo.
ejm31.sdf, ejm42.sdf
(please complete the following information):
Many thanks,
Noah
I assume this is because the SMILES conversion is failing, but it's possible to get back an empty molecule, e.g:
In [1]: import sire as sr
In [2]: sr.smiles("c3cc[c+]2cc(C1CCCC1)[nH]c2c3")
[13:03:23] Can't kekulize mol. Unkekulized atoms: 0 1 2 4 5 12 13
Out[2]: Molecule( :0 num_atoms=0 num_residues=0 )
If an exception was raised by rdkit
then it would be good if this was caught. From working with BioSimSpace, I know that rdkit
often doesn't raise an exception and just returns nothing, so it might be necessary to check for this too.
If I go via rdkit
directly, then I see the following:
In [1]: from rdkit import Chem
In [2]: rdmol = Chem.MolFromSmiles("c3cc[c+]2cc(C1CCCC1)[nH]c2c3")
[13:05:40] Can't kekulize mol. Unkekulized atoms: 0 1 2 4 5 12 13
In [3]: type(rdmol)
Out[3]: NoneType
This feature branch contains the code developed to support calculating internal energies of molecules parameterised to include Gromacs-style dihedrals (e.g. specifically the harmonic improper term). This will also include adding support to the OpenMM converter for writing Urey-Bradley angle terms and harmonic improper terms.
This should be fully automatic and require no changes to the API. Essentially molecules that contain such internal terms will now not trigger exceptions saying that only amber-style dihedrals are supported.
>>> mols = sr.load("files_containing_gromacs_dihedrals")
>>> mols.energy()
Result!
>>> mols = mols.dynamics().run(5*sr.units.picosecond).commit()
Result!
Describe the bug
An error when BioSimSpace is imported : ImportError: DLL load failed while importing _Base: The specified module could not be found
To Reproduce
Steps to reproduce the behavior:
ImportError Traceback (most recent call last)
Cell In[4], line 1
----> 1 import BioSimSpace
File ~\anaconda3\envs\openbiosim\lib\site-packages\BioSimSpace_init_.py:71
68 pass
70 try:
---> 71 import sire
73 del sire
74 except ModuleNotFoundError:
File ~\anaconda3\envs\openbiosim\lib\site-packages\sire_init_.py:1
----> 1 from . import config
3 import os as _os
4 import warnings as _warnings
File ~\anaconda3\envs\openbiosim\lib\site-packages\sire\config_init_.py:13
1 all = [
2 "binary_directory",
3 "include_directory",
(...)
10 "version_string",
11 ]
---> 13 from ..legacy import Config as _Config
15 from ..legacy.Config import (
16 binary_directory,
17 include_directory,
(...)
23 sire_repository_branch,
24 )
26 version_string = _Config.versionString
File ~\anaconda3\envs\openbiosim\lib\site-packages\sire\legacy_init_.py:15
13 from . import Qt
14 from . import Error
---> 15 from . import Config
17 version = Config.version
19 branch = Config.sire_repository_branch
File ~\anaconda3\envs\openbiosim\lib\site-packages\sire\legacy\Config_init_.py:7
1 #################################
2 #
3 # Defines some variables that say
4 # where Sire is installed etc.
5 #
----> 7 import sire.legacy.Base as _Base
8 import os as _os
10 binary_directory = _Base.getBinDir()
File ~\anaconda3\envs\openbiosim\lib\site-packages\sire\legacy\Base_init_.py:2
1 from ..Units import _Units # Need to import so that we have GeneralUnit
----> 2 from ._Base import *
4 _wrap_functions = []
6 _base_wrap = wrap
ImportError: DLL load failed while importing _Base: he specified module could not be found.
(please complete the following information):
In trying to fix this issue I'd like to translate a molecule (or molecules) back to the box center based on their current center-of-mass. Doing this via the Sire object evaluator doesn't work if the molecule is wrapped across the boundary of any periodic box since the calculation doesn't take account of the space.
It is possible to fix this in BioSimSpace by manually computing the center of mass as shown below, but this is slow for large systems. Is it possible to get the evaluator to respect the space, or should I just add a C++ utility function to do this for me, e.g. like the other BIoSimSpace ones.
Here is a unit test that reproduces the issue:
from math import isclose as _isclose
from sire.legacy.Maths import Vector as _SireVector
import BioSimSpace as BSS
def test_com_wrapping():
# Load the single-molecule system.
system = BSS.IO.readMolecules("wrapped.*7")
# Compute the centre-of-mass using Sire's evaluator.
com_sire = system[0]._sire_object.evaluate().centerOfMass()
# Now compute it manually, taking into account periodic boundaries.
# We want the centre-of-mass of the molecule as a connected object.
# Store the space.
space = system._sire_object.property("space")
# Get the first atom in the molecule..
atom = system[0].getAtoms()[0]
total_mass = atom._sire_object.property("mass").value()
# Reference coordinate.
ref_coord = atom._sire_object.property("coordinates")
# Initialise the centre-of-mass.
com_bss = total_mass * atom._sire_object.property("coordinates")
# Sum over the rest of the atoms.
for atom in system[0].getAtoms()[1:]:
# Get the mass and update the total.
mass = atom._sire_object.property("mass").value()
total_mass += mass
# Get the coordinate and add to the reference coord using the
# its distance from the reference in the minimum image convention.
coord = atom._sire_object.property("coordinates")
coord = ref_coord + _SireVector(space.calcDistVector(ref_coord, coord))
# Update the centre of mass.
com_bss += mass * coord
# Normalise.
com_bss /= total_mass
# Make sure the centre-of-masses are approximately the same.
for x, y in zip(com_sire, com_bss):
assert _isclose(x.value(), y.value())
The input files are here.
As you can see, the molecule is wrapped across the periodic boundary in the input file.
The two center of masses that are computed are:
# Sire
( 18.4199 Å, 28.3353 Å, 3.9244 Å )
# Manual
( 31.0734 Å, 31.8911 Å, 0.871955 Å )
Note that I've needed to use the manual center-of-mass calculation when working out the correct location to place a funnel during the funnel metadynamics setup, i.e. locating the binding site of the protein, which is performed here.
I am also in the process of converting the existing ABFE restraint generation code, which uses MDAnalysis searches, over to a native Sire implementation. This performs some distance measurements along a trajectory so will also need to deal with wrapping.
Perhaps this needs to be solved by having a way to unwrap the molecules. (I think there might already be a way to do this with the new trajectory code, and I know that MDTraj has a simple way to do this.)
Let me know what you think. I'm happy to add my own function within biosimspace.cpp to achieve what I want, but was wondering if it was already possible with the existing code.
For example, to do this with mdtraj
we can do:
In [1]: import mdtraj
In [2]: traj = mdtraj.load("wrapped.rst7", top="wrapped.prm7")
In [3]: 10*mdtraj.compute_center_of_mass(traj)
Out[3]: array([[18.41984188, 28.33505072, 3.92459827]])
In [4]: traj.make_molecules_whole()
Out[4]: <mdtraj.Trajectory with 1 frames, 22 atoms, 3 residues, and unitcells at 0x7f36bd391090>
In [5]: 10*mdtraj.compute_center_of_mass(traj)
Out[5]: array([[31.07343482, 31.89108144, 0.87194654]])
Cheers.
Hello,
As suggested recently, I attempted to install BioSimSpace in a way which allowed for easy modification:
mamba create -n openbiosim-az -c conda-forge -c openbiosim/label/dev biosimspace --only-deps
mamba activate openbiosim-az
mamba install -c conda-forge ambertools gromacs pytest
git clone https://github.com/openbiosim/biosimspace
cd biosimspace/python
BSS_SKIP_DEPDENCIES=1 python setup.py develop
cd ..
pytest test
This works and all tests pass on my Ubuntu 22 machine. However, on my Windows 10 machine, I get the following error:
Traceback (most recent call last):
File "C:\Users\finla\Documents\research\test_graeme\biosimspace\python\setup.py", line 38, in <module>
import sire.legacy.Base
File "C:\Users\finla\anaconda3\envs\openbiosim-az\lib\site-packages\sire\__init__.py", line 1, in <module>
from . import config
File "C:\Users\finla\anaconda3\envs\openbiosim-az\lib\site-packages\sire\config\__init__.py", line 13, in <module>
from ..legacy import Config as _Config
File "C:\Users\finla\anaconda3\envs\openbiosim-az\lib\site-packages\sire\legacy\__init__.py", line 15, in <module>
from . import Config
File "C:\Users\finla\anaconda3\envs\openbiosim-az\lib\site-packages\sire\legacy\Config\__init__.py", line 7, in <module>
import sire.legacy.Base as _Base
File "C:\Users\finla\anaconda3\envs\openbiosim-az\lib\site-packages\sire\legacy\Base\__init__.py", line 3, in <module>
from ._Base import *
ImportError: DLL load failed while importing _Base: The specified module could not be found.
My path is:
['C:\\Users\\finla\\anaconda3\\envs\\openbiosim-az\\Scripts',
'C:\\Program Files\\PerkinElmerInformatics\\ChemOffice2021\\ChemScript\\Lib',
'C:\\Users\\finla\\anaconda3\\envs\\openbiosim-az\\python39.zip',
'C:\\Users\\finla\\anaconda3\\envs\\openbiosim-az\\DLLs',
'C:\\Users\\finla\\anaconda3\\envs\\openbiosim-az\\lib',
'C:\\Users\\finla\\anaconda3\\envs\\openbiosim-az',
'',
'C:\\Users\\finla\\anaconda3\\envs\\openbiosim-az\\lib\\site-packages',
'C:\\Users\\finla\\anaconda3\\envs\\openbiosim-az\\lib\\site-packages\\win32',
'C:\\Users\\finla\\anaconda3\\envs\\openbiosim-az\\lib\\site-packages\\win32\\lib',
'C:\\Users\\finla\\anaconda3\\envs\\openbiosim-az\\lib\\site-packages\\Pythonwin']
I've tried adding other likely paths to the DLL by adding
sys.path.append(r"C:\Users\finla\anaconda3\envs\openbiosim-az\Library\bin")
sys.path.append(r"C:\Users\finla\anaconda3\envs\openbiosim-az\Library")
to the start of setup.py
, but got the same result.
I then tried to install sire from source to see of this would help, starting with a clean environment and following the gh instructions:
mamba create -n openbiosim-dev "python<3.10"
mamba activate openbiosim-dev
mamba install cmake pip-requirements-parser
mamba install conda-build
git clone https://github.com/OpenBioSim/sire
cd sire
python setup.py --install-bss-deps install
However, setup.py
failed with:
Compiling the C++ code
Using compilers None | None
-- Building Sire 2023.1.2
-- Configuring on a machine with number of cores = 8
-- Building into an existing conda installation...
-- Sire will be compiled and installed to directory C:/Users/finla/anaconda3/envs/openbiosim-dev/pkgs/sire-2023.1.2
-- sire.app root directory is C:/Users/finla/anaconda3/envs/openbiosim-dev
-- ABOUT TO RUN PROJECT
CMake Error at CMakeLists.txt:105 (project):
Generator
Visual Studio 15 2017 Win64
could not find any instance of Visual Studio.
-- Configuring incomplete, errors occurred!
See also "C:/Users/finla/Documents/research/test_graeme/sire/build/openbiosim-dev_openbiosim-dev_corelib/CMakeFiles/CMakeOutput.log".
SOMETHING WENT WRONG WHEN USING CMAKE ON CORELIB!
== OUTPUT LOG ==
The system is: Windows - 10.0.19044 - AMD64
== ERROR LOG ==
[Errno 2] No such file or directory: 'CMakeFiles/CMakeError.log'
The only line in CMakeOutput.log was
The system is: Windows - 10.0.19044 - AMD64
Following this issue, I attempted to fix this by installing a few extras such as C++ CMake Tools for Windows and MSVC VS 2017 C++ build tools before restarting. However, I ran into the same issue.
Thanks.
Describe the bug
I've recently been trying to implement multiple distance restraints for absolute binding free energy calculations in BioSimSpace and SOMD (modified BSS, modified sire, and an example of how to run a complete calculation). The idea is that the ligand intermolecular interactions are removed with many distance restraints active, then there's a final stage where all but one of the restraints are released. The correction for releasing the ligand to the standard state volume can then easily be calculated.
During the final release stage, where the ligand is completely non-interacting I've been getting crashes. The config file I was using with my modified version of sire was:
save coordinates = True
ncycles = 50
nmoves = 25000
ncycles_per_snap = 4
buffered coordinates frequency = 500
timestep = 4 * femtosecond
reaction field dielectric = 78.4
cutoff type = cutoffperiodic
cutoff distance = 12 * angstrom
barostat = True
pressure = 1.00000 atm
thermostat = True
temperature = 300.00 kelvin
constraint = hbonds-notperturbed
energy frequency = 100
lambda array = (0.0, 0.125, 0.25, 0.375, 0.5, 1.0)
lambda_val = 0.0
perturbed residue number = 293
random seed = -1
gpu = 0
center solute = False
minimise = True
hydrogen mass repartitioning factor = 3
use distance restraints = True
use permanent distance restraints = True
distance restraints dictionary = {(1474, 4687): (4.432198728689354, 20.0, 0.5557321930560404), (1472, 4686): (3.164351608202915, 20.0, 0.6751681461642662), (1474, 4690): (4.700974930995326, 20.0, 0.6701990692583388), (1488, 4688): (6.7858681626072315, 20.0, 0.7488970818377805), (2444, 4677): (4.863475388033847, 20.0, 0.7671705708955612), (1472, 4691): (6.367026275053535, 20.0, 0.821252569114769), (2261, 4683): (6.773986513796716, 20.0, 0.8828616026118992), (1490, 4685): (7.720582022157998, 20.0, 0.9347068069890287), (286, 4674): (3.8127377709679453, 20.0, 0.9511268213071902), (374, 4680): (6.181106731821419, 20.0, 0.9637979205880889), (1521, 4692): (3.7598878700775633, 20.0, 0.9830902561404447), (1529, 4684): (7.87209938990775, 20.0, 1.0582870196044247), (374, 4693): (4.975946331072206, 20.0, 1.063773425343931), (346, 4676): (8.470528418491696, 20.0, 1.0492604125738865), (284, 4675): (4.725973194492003, 20.0, 1.1798376555744614), (390, 4682): (8.350568045090748, 20.0, 1.2118321370763052), (2436, 4679): (4.32826597759919, 20.0, 1.2565532487514752), (307, 4678): (7.55090821416431, 20.0, 1.3127014576568437), (1421, 4681): (8.80522670899781, 20.0, 1.3688291245090927), (259, 4673): (9.195983552051707, 20.0, 2.1342422686136686)}
permanent distance restraints dictionary = {(1474, 4689): (3.875183995573175, 20.0, 0.5185270232355128)}
turn on receptor-ligand restraints mode = True
And the pert file just mapped dummy atoms to dummy atoms, e.g.:
version 1
molecule LIG
atom
name C
initial_type du
final_type du
initial_LJ 0.00000 0.00000
final_LJ 0.00000 0.00000
initial_charge 0.00000
final_charge 0.00000
endatom
atom
name C1
initial_type du
final_type du
initial_LJ 0.00000 0.00000
final_LJ 0.00000 0.00000
initial_charge 0.00000
final_charge 0.00000
endatom
...
I still observed crashes at random values of lambda if I dropped the timestep to 1 fs and if I switched to Langevin dynamics.
To check if this was due to changes I had made to sire or the restraints, I ran some sets of calculations with Sire 2023.4.0 (a073701/GitHub/2023-08-07T17:45:41+01:00 | UNCLEAN) but with ligand still non-interacting and with no restraints. Please see the input files. For each test I ran four repeats of each window. I got two crashes at lambda = 0 (Nan) and all simulations at lambda =1 resulted in Segmentation fault (core dumped) somd-freenrg -C somd.cfg -p CUDA -m somd.pert -c somd.rst7 -t somd.prm7
.
Have I chosen some configuration parameters poorly, or should this not be happening?
Also, when I watch the trajectories back, the ligand makes large jumps after every 50th frame. There are 600 frames, and 600 / 50 =12 and ncycles / ncycles_per_snap = 12.5, so I assume this is an artifact of how the trajectories are written?
To Reproduce
Please see the input files. Run run.sh. with sire 2023.4.0. (I observed two crashes in four repeats of the stage).
Expected behavior
Stable MD.
(please complete the following information):
Thank you.
This feature branch contains development of the code needed to save in-memory trajectory frames to a file.
>>> mols = mols.dynamics().run(10*sr.units.picosecond).commit()
>>> sr.save(mols.trajectory(), "output_file", format="format_string")
The addition is to sr.save
to recognise when a Trajectory
is passed. When it is, each frame in the Trajectory
is iterated and placed into the output_file
, which is opened using the MoleculeParser
class dedicated to that file format (or file formats), as specified in the format_string
(or list of format_strings
).
The user controls how much of the trajectory to save by using the existing controls in the .trajectory()
function to limit / subset etc.
To append frames (which will be tricky given the flexibility afforded by the api...) we could do something like
>>> sr.save(mols.trajectory(), "output_file", format="format_string",
... append=True)
We would just have to trust that the coordinate/velocity files are compatible, that the frame spacing is compatible etc. Trying to write something that checks that this is sane may be difficult, but may also not be needed given that this function would normally be called as part of dynamics, e.g.
>>> d = mols.dynamics()
>>> for chunk in range(0, 5):
... d.run(10*sr.units.picosecond)
... mols = d.commit()
... sr.save(mols.trajectory(), "output.crd", append=True)
To save each chunk to a new file we could use
>>> last_frame_index = mols.num_frames()
>>> d = mols.dynamics()
>>> for chunk in range(0, 5):
... d.run(10*sr.units.picosecond)
... mols = d.commit()
... sr.save(mols.trajectory()[last_frame_index:],
... f"output_{chunk}.crd")
... last_frame_index = mols.num_frames()
(or similar)
Describe the bug
Running a somd-freenrg calculation with the latest sire/devel produces the following output (excerpt)
(...)
###===========================================================###
Initial energy: 2.25653e+08 g Å^2 mol-1 ps-2
###=======================Minimisation========================###
Running minimization.
Tolerance for minimization: 1
Maximum number of minimization iterations: 1000
Energy after the minimization: -6.70368e+07 g Å^2 mol-1 ps-2
Energy minimization done.
###===========================================================###
###====================somd-freenrg run=======================###
Starting somd-freenrg run...
10000 moves 10 cycles, 400 ps simulation time
(...)
Expected behavior
The units of energy used to be in kcal.mol-1 which are more intuitive to work with
To reproduce run the command below on the sample inputs attached, this should be reproducible with any somd input.
somd-freenrg -C ./somd.cfg -p CUDA
input.zip
Calling system.ensemble()
will return the wrong temperature units, e.g.:
In [1]: import sire as sr
In [2]: s = sr.load_test_files("ala.crd", "ala.top")
In [3]: d = s.dynamics(temperature="300K")
In [4]: d.run("10ps")
dynamics 2.0 s 4992.5 steps / s
Out[4]: Dynamics(completed=6010 ps, energy=-4493.34 kcal mol-1, speed=431.4 ns day-1)
In [5]: ss = d.commit()
In [6]: ss.ensemble()
Out[6]: canonical (NVT) ensemble { temperature = 300 C }
Looking at the code in corelib, it should be being converted to Celsius, i.e.:
sire/corelib/src/libs/SireMove/ensemble.cpp
Lines 359 to 362 in 1b249c3
If I pass in a value in Celsius, then it is being converted on output, i.e.:
In [1]: import sire as sr
In [2]: s = sr.load_test_files("ala.crd", "ala.top")
In [3]: d = s.dynamics(temperature=300*sr.units.celsius)
In [4]: d.run("10ps")
dynamics 1.5 s 6872.9 steps / s
Out[4]: Dynamics(completed=6010 ps, energy=-600.677 kcal mol-1, speed=593.7 ns day-1)
In [5]: ss = d.commit()
In [6]: ss.ensemble()
Out[6]: canonical (NVT) ensemble { temperature = 573.15 C }
In this case, it appears to be converted to Kelvin?
In [7]: (300*sr.units.celsius).to(sr.units.kelvin)
Out[7]: 573.15
Describe the bug
I am trying to modify BioSimSpace so that specifying restraint="backbone"
also works for nucleic acids. However, several nucleic acid backbone atoms contain ' in their names, which results in an error:
sys.search("atomname O5'")
Gives:
ValueError: 'Invalid search query: "atomname O5'" : SireMol::parse_error: Failed to parse any of the selection 'atomname
O5''. (call sire.error.get_last_error_details() for more info)
I am doing this correctly? Apologies if I have missed this in the tutorials. Thanks very much.
To Reproduce
Here is my RNA test system.
Steve in my group has come across this problem. He’s trying to put together an OpenMM tutorial showing how to import a solvated protein-ligand complex in OpenMM to run an MD simulation.
He started by getting inputs from here
# Clone the repository
git clone https://github.com/MCompChem/fep-benchmark.git
# Take the last ligand from the hif2a benchmark
tail -n90 fep-benchmark/hif2a/ligands.sdf > PT2385.sdf
# Take the prepared protein structure
cp fep-benchmark/hif2a/5tbm_prepared.pdb .
receptor_path = "5tbm_prepared.pdb"
ligand_path = "PT2385.sdf"
He then adapted this openff notebook to generate via interchange openmm inputs
I asked him if he could also use this BSS node
This doesn’t work because the pdb file 5tbm_prepared.pdb comes from Maestro and it contains a few atom and residue names that differ from those used in leap templates. Thus the BSS line in the above script
prot_p = BSS.Parameters.parameterise(protein, node.getInput("proteinff")).getMolecule()
generates in the backend a leap process that fails due to the following issues.
FATAL: Atom .R<ACE 1>.A<1H 7> does not have a type.
FATAL: Atom .R<ACE 1>.A<2H 8> does not have a type.
FATAL: Atom .R<ACE 1>.A<3H 9> does not have a type.
FATAL: Atom .R<LEU 2>.A<H1 20> does not have a type.
FATAL: Atom .R<HIE 11>.A<HD1 18> does not have a type.
FATAL: Atom .R<HIE 76>.A<HD1 18> does not have a type.
FATAL: Atom .R<NMA 112>.A<N 1> does not have a type.
FATAL: Atom .R<NMA 112>.A<CA 2> does not have a type.
FATAL: Atom .R<NMA 112>.A<H 3> does not have a type.
FATAL: Atom .R<NMA 112>.A<HA1 4> does not have a type.
FATAL: Atom .R<NMA 112>.A<HA2 5> does not have a type.
FATAL: Atom .R<NMA 112>.A<HA3 6> does not have a type.
FATAL: Atom .R<HOH 113>.A<O 1> does not have a type.
FATAL: Atom .R<HOH 113>.A<H1 2> does not have a type.
FATAL: Atom .R<HOH 113>.A<H2 3> does not have a type
FATAL: Atom .R<HOH 114>.A<O 1> does not have a type.
FATAL: Atom .R<HOH 114>.A<H1 2> does not have a type.
FATAL: Atom .R<HOH 114>.A<H2 3> does not have a type.
FATAL: Atom .R<HOH 115>.A<O 1> does not have a type.
FATAL: Atom .R<HOH 115>.A<H1 2> does not have a type.
FATAL: Atom .R<HOH 115>.A<H2 3> does not have a type.
FATAL: Atom .R<HOH 116>.A<O 1> does not have a type.
FATAL: Atom .R<HOH 116>.A<H1 2> does not have a type.
FATAL: Atom .R<HOH 116>.A<H2 3> does not have a type.
The OpenFF notebook is able to deal with these issues because it uses PDBFixer to rename atoms and residues in ACE, LEU and NMA using Amber naming conventions. The OpenFF Topology object is also able load a pdb that contains multiple molecules and is presumably clever enough to work out which type of histine it is dealing with on the basis of the atom names found.
These issues can be fixed by running PDBFixer on the downloaded pdb, and then manually renaming HIE 11 and HIE 76 to HID 11, HID 76, and deleting the water molecules to end up with a single protein molecule.
I thought I could put together a simple sire script to automate the above steps. That is
1) Rename atom 1H in residue ACE 1 to H1
2) Rename atom 2H in residue ACE 1 to H2
3) Rename atom 3H in residue ACE 1 to H3
4) Rename atom H1 in residue LEU 2 to H
5) Rename all atoms in residue HIE 11 to HID 11
6) Rename all atoms in residue HIE 76 to HID 76
7) Move HOH 114, 115, 116 to new molecules.
8) Save the updated protein, and the water molecules in separate pdb files.
This would allow me to then use the updated protein pdb with the BSS parameterisation node. I could also update the BSS parameterisation node to also parameterise the crystallographic waters and add them to the system.
I thought I could use cursors to select the atoms (https://sire.openbiosim.org/tutorial/part03/02_cursors.html) . While I can select them easily it’s not clear if there is with the new API a simple way to rename atom names, rename residue names, delete atoms etc… ?
Basically a few pointers on how this could be done would be great.
Currently the cursor.rotate function only rotates the coordinates
property, when it should really rotate all vector properties, i.e. velocity
too. At present this can be achieved by calling the function multiple times, using the map
option to remap coordinates to velocities, e.g.:
# Here m is a rotation matrix.
# First the coordinates.
cursor = cursor.rotate(matrix=m)
# Now the velocities.
cursor = cursor.rotate(matrix=m, map={"coordinates": "velocity"})
SOMD via OpenMMD.py
specifies and inverse friction
coefficient with units of picoseconds, e.g. here. The default value can be found via:
somd-freenrg -H | grep -A2 inverse
inverse friction = 0.1 ps
Inverse friction time for the Langevin thermostat.
This is set in the OpenMMFrEnergyST
object by using setFriction(inverse_friction.val)
. This is then applied to the OpenMM integrator as the friction coefficient directly, which should be in units of inverse picoseconds:
const double converted_friction = convertTo(friction.value(), picosecond);
integrator_openmm = new OpenMM::LangevinMiddleIntegrator(converted_Temperature, converted_friction, dt);
Here's the OpenMM docs showing that it should be in inverse picoseconds.
The unit for the inverse friction coefficient is correct (picoseconds) but the value is not being inverted when creating the integrator. To work around this in BioSimSpace I have applied the thermostat time constant in an inverse fashion to the other engines, but have incorrectly stripped the unit which is causing SOMD to crash on load.
Describe the bug
Running the below script will always (eventually) end in a segfault
To Reproduce
python test_sire_crash_py.txt
Expected behavior
Doesn't segfault ;-)
(please complete the following information):
This is consistent across all OSs and platforms.
(this will crash with any topology file - change the one in the script to one you want to test)
Previously we always automatically applied a lattice rotation and reduction when a TriclinicBox
object was instantiated. As such, the binary data stream only needed to store the original box vectors that were passed in and everything could be reconstructed from those. (This allowed us to store the minimum information.) Since we no longer automatically rotate and reduce, TricnlicBox
objects are no longer reconstructed correctly if they were rotated or reduced before being streamed to file. We now need to store all instance attributes, or some flags to say that a rotation and/or reduction have been performed.
This feature branch contains development of the code relating to aligning frames of a trajectory based on coordinates of selected views in the trajectory. This will be useful for analysis and also for visualisation via mol.view()
.
>>> aligned_trajectory = mols.trajectory().align("search string")
>>> type(aligned_trajectory)
Trajectory (with extra info about the alignment)
>>> ... access aligned_trajectory just like any other trajectory object
Each frame obtained from the aligned_trajectory
will just-in-time align the coordinates based on matching the atoms in search_string
. This search will turned into a list of atoms when the aligned trajectory is constructed. The match will result in a Transform
object that can be used to translate and rotate the coordinates in that trajectory Frame
to the aligned reference coordinate frame.
For visualisation, you could use the existing .view()
function on the trajectory, e.g.
>>> mols.trajectory().align("search string").view()
I propose also adding an extra align
option to the view()
function that does the same thing, e.g.
>>> mols.view(align="search_string")
Under the hood...
if align is not None:
view = view.trajectory().align(align)
Describe the bug
If a system has more than 100'000 atoms, BioSimSpace.IO.saveMolecules() messes up the PDB format
To Reproduce
import BioSimSpace as BSS
ligand_coord = BSS.IO.readMolecules('aspirine.sdf')[0]
ligand = BSS.Parameters.parameterise(ligand_coord, 'gaff2').getMolecule()
box = [110 * BSS.Units.Length.angstrom] * 3
angles = [90 * BSS.Units.Angle.degree] * 3
solvated = BSS.Solvent.solvate('tip3p', ligand, box=box, angles=angles)
BSS.IO.saveMolecules("test.pdb", solvated , 'pdb')
Expected behavior
Correct PDB formatting
Input files
aspirine.sdf (Can be downloaded from Pubchem)
(please complete the following information):
Describe the bug
A clear and concise description of what the bug is.
To Reproduce
Steps to reproduce the behavior:
import BioSimSpace as BSS
mol = BSS.IO.readMolecules('TAD-03001_minimised.sdf')
Unable to read the file. Errors are below.
== /data/temp/TAD-03001_minimised.sdf ==
This file could not be parsed by any of the file parsers! It was recognised as a file of type sdf, but all parsers failed to parse this file. The errors from the parsers associated with the suffix sdf are printed below:
*-- Failed to parse '/data/temp/TAD-03001_minimised.sdf' with parser 'SDF'.
The file is not recognised as being of the required format.
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <module> │
│ │
│ ❱ 1 mola = BSS.IO.readMolecules('TAD-03001_minimised.sdf') │
│ 2 │
│ │
│ /data/miniconda3/envs/openbiosim-dev/lib/python3.9/site-packages/BioSimSpace/IO/_io.py:528 in │
│ readMolecules │
│ │
│ 525 │ │ │ │ if _isVerbose(): │
│ 526 │ │ │ │ │ raise IOError(msg) from e0 │
│ 527 │ │ │ │ else: │
│ ❱ 528 │ │ │ │ │ raise IOError(msg) from None │
│ 529 │ │
│ 530 │ return _System(system) │
│ 531 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
OSError: Failed to read molecules from: ['TAD-03001_minimised.sdf']
Expected behavior
It should not raise an exception.
dev
Describe the bug
Solvate a system with rhombic dodecahedron square box would give a non-integer degree
To Reproduce
import BioSimSpace.Sandpit.Exscientia as BSS
mol = BSS.Parameters.openff_unconstrained_2_0_0("C").getMolecule()
solvated = BSS.Solvent.tip3p(mol, box=[4*BSS.Units.Length.nanometer]*3, angles=[120*BSS.Units.Angle.degree, 120*BSS.Units.Angle.degree, 90*BSS.Units.Angle.degree])
solvated.getBox()[1][0].value()
gives 119.99998318630921
Expected behavior
gives 120 or at least to the precision of double
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.