Giter Club home page Giter Club logo

qubekit's Introduction

QUBEKit_logo

QUBEKit - Quantum Mechanical Bespoke force field toolkit

Newcastle University UK - Cole Group

Status Code style: black CI codecov Integration tests
Foundation License: MIT python platforms Conda (channel only)
Installation Anaconda-Server Badge Anaconda-Server Badge Anaconda-Server Badge

Table of Contents

What is QUBEKit?

QUBEKit is a Python 3.7+ based force field derivation toolkit for Linux operating systems. Our aims are to allow users to quickly derive molecular mechanics parameters directly from quantum mechanical calculations. QUBEKit pulls together multiple pre-existing engines, as well as bespoke methods to produce accurate results with minimal user input. QUBEKit aims to avoid fitting to experimental data where possible while also being highly customisable.

Users who have used QUBEKit to derive force field parameters should cite the following references (also given in the CITATION.cff file):

In Development

While QUBEKit is stable we are constantly working to improve the code and broaden its compatibilities.

We use software written by many people; if reporting a bug, please (to the best of your ability) make sure it is a bug with QUBEKit and not with a dependency. We welcome any suggestions, issues or pull requests for additions or changes.

Installation

QUBEKit is available through conda-forge; this is the recommended installation method. Github has our latest version which will likely have newer features but may not be stable.

# Recommended
conda install -c conda-forge qubekit

# Recommended for Developers (see below)
git clone https://github.com/qubekit/qubekit.git
cd <install location>
python setup.py develop    

Requirements

Download Anaconda from the above link and install with the linux command:

./Anaconda3<version>.sh

You may need to use chmod +x Anaconda3<version>.sh to make it executable.

We recommend you add conda to your .bashrc when prompted.

Optionally, you may choose to use Gaussian. Installation of Gaussian is likely handled by your institution. Gaussian is an option for QM optimisations, hessian calculations and density calculations. QUBEKit contains alternative approaches for these calculations.

Minimal conda packages are included in the conda-forge install with all optional engine packages left to the user to install. If there are dependency issues or version conflicts in your environment, packages can be installed individually.

conda install -c conda-forge qubekit

The following table details some optional calculation engine packages which are not included in the QUBEKit conda package and how to install them.

Package Conda Install
PSI4 >=1.4.1 conda install -c psi4 psi4
Chargemol conda install -c conda-forge chargemol
xtb-python conda install -c conda-forge xtb-python
torchani conda install -c conda-forge torchani

Adding lots of packages can be a headache. If possible, install using Anaconda through the terminal. This is generally safest, as Anaconda should deal with versions and conflicts in your environment. Generally, conda packages will have the conda install command on their website or github. As a last resort, either git clone them and install:

git clone https://<git_address_here>
cd <location of cloned package>
python setup.py install

or follow the described steps in the respective documentation.

Installing as dev

If downloading QUBEKit to edit the latest version of the source code, the easiest method is to install via conda, then remove the conda version of qubekit and git clone. This is accomplished with a few simple commands:

# Install QUBEKit as normal
conda install -c conda-forge qubekit

# Remove ONLY the QUBEKit package itself, leaving all dependencies installed
# and on the correct version
conda remove --force qubekit

# Re-download the latest QUBEKit through github
git clone https://github.com/qubekit/qubekit.git

# Re-install QUBEKit outside of conda
cd qubekit
python setup.py develop

Help

Below is general help for most of the commands available in QUBEKit. There is some short help available through the terminal (invoked with --help). --help may be called for any command you are unsure of:

qubekit --help

qubekit bulk run --help

qubekit config create --help

All necessary long-form help is within this document.

Config files

QUBEKit has a lot of familiar settings which are used in production and changing these can result in very different force field parameters. The settings are controlled using json style config files which are easy to edit.

QUBEKit does have a full suite of defaults built in. To create a config file template with the name "example", containing default options, run the command:

qubekit config create example.json

Simply edit the sections as desired to change the method, basis set, memory, etc.

This config file can then be used to execute a QUBEKit job using the --config or -c flag in the run command.

QUBEKit Commands: Running Jobs

A single molecule job can be executed with the run command.

Some form of molecule input must also be supplied; either a file (designated with the --input-file or -i flag), or a smiles string (--smiles or -sm).

In the case of a smiles string, the molecule must also be named with the --name or -n flag.

qubekit run -i methane.pdb

qubekit run -i ethane.mol2

qubekit run -sm CO -n methanol

QUBEKit Commands: High Throughput

Bulk commands are for high throughput analyses; they are invoked with the bulk keyword. A csv file must be used when running a bulk analysis. If you would like to generate a blank csv file, simply run the command:

qubekit bulk create example.csv 

where example.csv is the name of the file you want to create. This will automatically generate a csv file with the appropriate column headers. When writing to the csv file, simply append rows after the header row with the relevant information required.*

Importantly, different config files can be supplied for each molecule in each row.

name,smiles,multiplicity,config_file,restart,end

*Only the name column needs to be filled (which is filled automatically with the generated csv), any empty columns will simply use the default values:

  • The smiles string column only needs to be filled if a file is not supplied;
  • If the multiplicity column is empty, multiplicity will be set to 1;
  • If the config column is empty, the default config is used;
  • Leaving the restart column empty will start the program from the beginning;
  • Leaving the end column empty will end the program after a full analysis from its start point.

A bulk analysis is started with the run command, followed by the name of the csv file:

qubekit bulk run example.csv

Any necessary files should all be in the same place: where you're running QUBEKit from. Upon executing this bulk command, QUBEKit will work through the rows in the csv file. Each molecule will be given its own directory (the same as single molecule analyses).

Be aware that the names of the files are used as keys to find the configs. So, each molecule file being analysed should have a corresponding row in the csv file with the correct name (if using smiles strings, the name column will just be the name given to the created file).

For example (barring the header, csv row order does not matter, and you do not need to include smiles strings when a file is provided; column order does matter):

<location>/:
    benzene.pdb
    ethane.mol2
    bulk_example.csv

bulk_example.csv:
    name,smiles,multiplicity,config_file,restart,end
    methane,C,,default_config.json,,
    benzene,,,default_config.json,,
    ethane,,,default_config.json,,

QUBEKit Commands: Custom Start and End Points (single molecule)

QUBEKit also has the ability to run partial analyses, or redo certain parts of an analysis. For a single molecule analysis, this is achieved with the --end (-e) and restart commands.

The stages are:

Stage Description
parametrisation The molecule is parametrised using OpenFF, AnteChamber or an xml file. This step also loads in the molecule and extracts key information like the atoms and their coordinates.
optimisation There is a quick, preliminary optimisation which speeds up the following QM optimisation.
hessian This uses the same QM engine as the prior optimisation to calculate the Hessian matrix which is needed for calculating bonding parameters.
charges The electron density is calculated. This is where the solvent is applied as well (if configured). Next, the charges are partitioned and calculated
virtual_sites Virtual sites are added where the error in ESP would be large without them.
non_bonded Lennard-Jones parameters (sigma and epsilon values) are calculated.
bonded_parameters Using the Hessian matrix, the bond lengths, angles and force constants are calculated with the Modified Seminario Method.
torsion_scanner Using the molecule's geometry, a torsion scan is performed. The molecule can then be optimised with respect to these parameters.
torsion_optimisation The fitting and optimisation step for the torsional analysis.

In a normal run, all of these stages are executed sequentially, but with --end, restart and --skip-stages you are free to run from any step to any step inclusively while skipping any non-essential steps.

When using --end (or -e), specify the end-point in the proceeding command.

When using restart, specify the start-point in the proceeding command.

When using --skip-stages (or -s), specify any stage(s) which you needn't run. Be aware some stages depend on one another and cannot necessarily be skipped.

QUBEKit Commands: Custom Start and End Points (multiple molecules)

When using custom start and/or end points with bulk commands, the stages are written to the csv file, rather than the terminal. If no start point is specified, a new working directory be created and QUBEKit will run as normal. Otherwise, QUBEKit will find the correct directory based on the molecule name.

QUBEKit Commands: Checking Progress

Throughout an analysis, key information will be added to the run directory. This information can be quickly parsed by QUBEKit's progress command

To display the progress of all analyses in your current directory and below, use the command:

qubekit progress

QUBEKit will use the workflow_result.json which is in all QUBEKit directories and display a colour-coded table of the progress.

Indicator Meaning
โœ“ Completed successfully
~ Neither finished nor errored
S Skipped
E Started and failed for some reason

Viewing the QUBEKit.err file will give more information as to why it failed.

QUBEKit Commands: Other Commands and Information

You cannot run multiple kinds of analysis at once. For example:

qubekit bulk run example.csv run -i methane.pdb

is not a valid command. These should be performed separately:

qubekit bulk run example.csv
qubekit run -i methane.pdb

Be wary of running QUBEKit concurrently through different terminal windows. The programs QUBEKit calls often just try to use however much memory is assigned in the config files; this means they may try to take more than is available, leading to a hang-up, or--rarely--a crash.

qubekit's People

Contributors

bieniekmateusz avatar cringrose94 avatar jthorton avatar lgtm-com[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

qubekit's Issues

Ligand rework

It would be great if we could reduce the complexity of the ligand class which is currently a catch-all object which holds everything about the ligand at all times.

Improvements

  • Add properties to information which can be calculated quickly and does not need to be stored
  • Work out a minimal data structure which can be used to calculate all currently stored properties
  • Work out a way to better store fitting information like Hessians
  • Introduce a to/from rdkit molecule method (this allows better conversion to openff molecules)
  • Lift file io where possible to rdkit (we would then only support qc code file io)

Properties

This is a list of current functions and objects which could be quickly calculated on the fly

  • angles tuples / values
  • torsions tuples / values
  • impropers tuples / values
  • rings tuples / values
  • bond lengths
  • symmetry groups (bonds, angles dihedrals).
  • topology graph (networkx)
  • rdkit representation
  • find rotatable bonds

Optimisation always fails even for a simple methane molecule

Optimisation always fails even for a simple methane molecule which has been pre-optimized. No idea whats wronng

Warning: Unable to load toolkit 'OpenEye Toolkit'. The Open Force Field Toolkit does not require the OpenEye Toolkits, and can use RDKit/AmberTools instead. However, if you have a valid license for the OpenEye Toolkits, consider installing them for faster performance and additional file format support: https://docs.eyesopen.com/toolkits/python/quickstart-python/linuxosx.html OpenEye offers free Toolkit licenses for academics: https://www.eyesopen.com/academic-licensing
Traceback (most recent call last):
File "/bin/QUBEKit", line 10, in
sys.exit(main())
File "/lib/python3.6/site-packages/QUBEKit/run.py", line 1038, in main
ArgsAndConfigs()
File "/lib/python3.6/site-packages/QUBEKit/run.py", line 129, in init
Execute(self.molecule)
File "/lib/python3.6/site-packages/QUBEKit/run.py", line 430, in init
self.run()
File "/lib/python3.6/site-packages/QUBEKit/utils/decorators.py", line 118, in wrapper
return func(*args, **kwargs)
File "/Q/lib/python3.6/site-packages/QUBEKit/run.py", line 602, in run
next_key = self.stage_wrapper(next_key, *stage_dict[next_key])
File "/Q/lib/python3.6/site-packages/QUBEKit/run.py", line 648, in stage_wrapper
mol = self.orderstart_key
File "/Q/lib/python3.6/site-packages/QUBEKit/run.py", line 774, in qm_optimise
raise OptimisationFailed("The optimisation did not converge")
QUBEKit.utils.exceptions.OptimisationFailed: The optimisation did not converge

Migrate to new Pydantic

Pydantic V2 breaks qubekit and we can not fully migrate to v2 until the rest of the openff and qcengine stack do. Until then we should consider supporting both versions via the v1 API exposed in v2.

Vsite error reduction factor

Due to random noise which can appear in the optimisation, density calculation and aim partitioning the vsite fitting of some molecules (mol 22 from the test set) can be inconsistent. In the next refit, we should look at changing the error reduction factor to make sure extra sites are only added if the reduction in error is significant.

OpenFF Parametrisation Error

When using Openff as the parent forcefield and a pdb as the molecule input file the molecule can be miss-typed. I think this is due to the aromaticity model not being set as required by openff, see here for the processing steps done before converting to an openff molecule. One way around this is to use a valid SDF as the input type.

Example using benzene
pdb

ATOM      1  C   UNL     1       1.885  -1.036  -0.112  0.00  0.00 
ATOM      2  C   UNL     1       2.921  -1.631   0.608  0.00  0.00 
ATOM      3  C   UNL     1       2.878  -1.652   2.002  0.00  0.00 
ATOM      4  C   UNL     1       1.800  -1.079   2.676  0.00  0.00 
ATOM      5  C   UNL     1       0.764  -0.484   1.955  0.00  0.00 
ATOM      6  C   UNL     1       0.807  -0.463   0.561  0.00  0.00 
ATOM      7  H   UNL     1       1.917  -1.020  -1.198  0.00  0.00 
ATOM      8  H   UNL     1       3.761  -2.077   0.084  0.00  0.00 
ATOM      9  H   UNL     1       3.685  -2.115   2.563  0.00  0.00 
ATOM     10  H   UNL     1       1.767  -1.095   3.761  0.00  0.00 
ATOM     11  H   UNL     1      -0.076  -0.038   2.479  0.00  0.00 
ATOM     12  H   UNL     1       0.000   0.000  -0.000  0.00  0.00 
CONECT    1    2    6    7
CONECT    2    3    8
CONECT    3    4    9
CONECT    4    5   10
CONECT    5    6   11
CONECT    6   12
END

sdf


  -OEChem-03012118033D

 12 12  0     0  0  0  0  0  0999 V2000
    1.8851   -1.0363   -0.1124 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.9210   -1.6309    0.6081 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.8783   -1.6520    2.0022 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.7997   -1.0786    2.6757 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.7638   -0.4841    1.9551 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.8066   -0.4630    0.5610 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.9174   -1.0203   -1.1978 H   0  0  0  0  0  0  0  0  0  0  0  0
    3.7613   -2.0769    0.0841 H   0  0  0  0  0  0  0  0  0  0  0  0
    3.6848   -2.1150    2.5631 H   0  0  0  0  0  0  0  0  0  0  0  0
    1.7665   -1.0955    3.7614 H   0  0  0  0  0  0  0  0  0  0  0  0
   -0.0760   -0.0377    2.4791 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0004    0.0003   -0.0003 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  6  2  0  0  0  0
  1  2  1  0  0  0  0
  2  3  2  0  0  0  0
  3  4  1  0  0  0  0
  4  5  2  0  0  0  0
  5  6  1  0  0  0  0
  1  7  1  0  0  0  0
  2  8  1  0  0  0  0
  3  9  1  0  0  0  0
  4 10  1  0  0  0  0
  5 11  1  0  0  0  0
  6 12  1  0  0  0  0
M  END
$$$$
from QUBEKit.molecules import Ligand
from openff.toolkit.topology import Molecule
from openff.toolkit.typing.engines.smirnoff import ForceField
# load the qube molecule
qb_mol = Ligand.from_file("mol.pdb")
# bulld the openff mol
ff_mol = Molecule.from_rdkit(qb_mol.rdkit_mol)
# type it and look at the parameters
ff = ForceField("openff_unconstrained-1.3.0.offxml")
labels = ff.label_molecules(ff_mol.to_topology())[0]
for torsion, param in labels["ProperTorsions"].items():
    print(torsion, param)

using pdb we get

<ProperTorsionType with smirks: [:1]~[#6X3:2]-[#6X3:3]~[:4] periodicity1: 2 phase1: 180.0 deg id: t43 k1: 1.048715180139 kcal/mol idivf1: 1.0 >

using sdf we get

<ProperTorsionType with smirks: [:1]~[#6X3:2]:[#6X3:3]~[:4] periodicity1: 2 phase1: 180.0 deg id: t44 k1: 3.739199932852 kcal/mol idivf1: 1.0 >

Where the only difference between the two smarts matches is an aromatic bond at the central bond.

Setting solvent dielectric in psi4 workflow

Hi all,

I was trying to use QUBEKit and psi4 to generate ff parameters, especially virtual sites. In the paper J. Chem. Inf. Model. 2019, 59, 4, 1366โ€“1381; a medium of dielectric constant = 4 was used to account for the inductive effect. However, in the protocols given in QUBEKit it is only possible to set the dielectric constant when using Gaussian.

Is it possible to set the dielectric constant of medium in psi4 so that it best reproduces the Gaussian workflow? If not, what is the psi4 set up that would be closest to the conditions reported in the paper?

Thanks very much,

Qi

Torsion scan failing

Hello, I am trying to run a molecule in QUBEKit but it keeps failing on the torsion_scanner step. The molecule is a fragment of a bigger molecule. For reference, I have attached a copy of the .sdf file I am using, the QUBEKit.err file that is being outputted, and an image outlining the fragment of the molecule I am using.

y6-fring5.sdf.txt
QUBEKit.err.txt
y6-endgroup.pdf

Pydantic and changes in python 3.10

qubekit run -i lig_3a.sdf -n ejm_44 -c prot4bxtb.json -s hessian -s bonded_parameters
Traceback (most recent call last):
  File ".../mamba/envs/qk/bin/qubekit", line 6, in <module>
    from qubekit.cli.cli import cli
  File ".../lib/python3.10/site-packages/qubekit/cli/cli.py", line 4, in <module>
    from qubekit.cli.bulk import bulk
  File ".../lib/python3.10/site-packages/qubekit/cli/bulk.py", line 6, in <module>
    from qubekit.cli.config import prep_config
  File ".../lib/python3.10/site-packages/qubekit/cli/config.py", line 5, in <module>
    from qubekit.workflow import WorkFlow, WorkFlowResult, get_workflow_protocol
  File ".../lib/python3.10/site-packages/qubekit/workflow/__init__.py", line 1, in <module>
    from qubekit.workflow.protocols import get_workflow_protocol
  File ".../lib/python3.10/site-packages/qubekit/workflow/protocols.py", line 1, in <module>
    from qubekit.charges import DDECCharges, MBISCharges, SolventGaussian
  File ".../lib/python3.10/site-packages/qubekit/charges/__init__.py", line 1, in <module>
    from qubekit.charges.ddec import DDECCharges
  File ".../lib/python3.10/site-packages/qubekit/charges/ddec.py", line 11, in <module>
    from pydantic import Field
  File "pydantic/__init__.py", line 2, in init pydantic.__init__
  File "pydantic/dataclasses.py", line 41, in init pydantic.dataclasses
    # + Value   | Meaning                                 |
ImportError: cannot import name dataclass_transform

"MissingParameterError" at torsion_optimisation stage

Hello,
I'm using qubekit to do some testing work. The program stopped at the "torsion_optimisation" stage.

The input molecule is a glycine.
image

I ran the following command:
qubekit run -i mol.sdf -n mol -c config.json -s hessian -s charges -s virtual_sites -s non_bonded -s bonded_parameters -s optimisation

I tried many times and it always stopped with a "MissingParameterError"

Error
Torsiondrive finished and QM results saved.
Performing torsion optimisations using ForceBalance.
Traceback (most recent call last):
  File "/home/user/anaconda3/envs/qubekit/lib/python3.8/site-packages/qubekit/workflow/workflow.py", line 335, in _run_stage
    result_mol = stage.run(
  File "/home/user/anaconda3/envs/qubekit/lib/python3.8/site-packages/qubekit/torsions/fitting/forcebalance_wrapper.py", line 274, in run
    return self._optimise(molecule=molecule)
  File "/home/user/anaconda3/envs/qubekit/lib/python3.8/site-packages/qubekit/torsions/fitting/forcebalance_wrapper.py", line 309, in _optimise
    self.generate_forcefield(molecule=molecule)
  File "/home/user/anaconda3/envs/qubekit/lib/python3.8/site-packages/qubekit/torsions/fitting/forcebalance_wrapper.py", line 353, in generate_forcefield
    master_parameter = copy_mol.TorsionForce[master_dihedral]
  File "/home/user/anaconda3/envs/qubekit/lib/python3.8/site-packages/qubekit/forcefield/force_groups.py", line 48, in __getitem__
    raise MissingParameterError
qubekit.utils.exceptions.MissingParameterError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/user/anaconda3/envs/qubekit/bin/qubekit", line 10, in
sys.exit(cli())
File "/home/user/anaconda3/envs/qubekit/lib/python3.8/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/home/user/anaconda3/envs/qubekit/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/user/anaconda3/envs/qubekit/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/user/anaconda3/envs/qubekit/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/user/anaconda3/envs/qubekit/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/user/anaconda3/envs/qubekit/lib/python3.8/site-packages/qubekit/cli/run.py", line 135, in run
workflow.new_workflow(molecule=molecule, skip_stages=skip_stages, end=end)
File "/home/user/anaconda3/envs/qubekit/lib/python3.8/site-packages/qubekit/workflow/workflow.py", line 257, in new_workflow
return self._run_workflow(
File "/home/user/anaconda3/envs/qubekit/lib/python3.8/site-packages/qubekit/workflow/workflow.py", line 295, in _run_workflow
molecule = self._run_stage(
File "/home/user/anaconda3/envs/qubekit/lib/python3.8/site-packages/qubekit/workflow/workflow.py", line 359, in _run_stage
raise WorkFlowExecutionError(
qubekit.utils.exceptions.WorkFlowExecutionError: The workflow stopped unexpectedly due to the following error at stage: torsion_optimisation

Then I checked that __getitem__ in force_groups.py. But I have no idea why the item (4, 0, 1, 2) (correspongding to (5,1,2,3) in the graph) can not be found in the parameter. Much appreciate for your help.

Here is my environment:

conda list
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
ambertools                23.0             py38h3974a27_0    conda-forge
amberutils                21.0                     pypi_0    pypi
arpack                    3.7.0                hdefa2d7_2    conda-forge
blosc                     1.21.4               h0f2a231_0    conda-forge
boltons                   23.0.0             pyhd8ed1ab_0    conda-forge
boost                     1.78.0           py38h4e30db6_4    conda-forge
boost-cpp                 1.78.0               h6582d0a_3    conda-forge
brotli                    1.0.9                h166bdaf_8    conda-forge
brotli-bin                1.0.9                h166bdaf_8    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.19.1               hd590300_0    conda-forge
ca-certificates           2023.5.7             hbcca054_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
cachetools                5.3.0              pyhd8ed1ab_0    conda-forge
cairo                     1.16.0            hbbf8b49_1016    conda-forge
certifi                   2023.5.7           pyhd8ed1ab_0    conda-forge
cffi                      1.15.1           py38h4a40e3a_3    conda-forge
charset-normalizer        3.1.0              pyhd8ed1ab_0    conda-forge
click                     8.1.3           unix_pyhd8ed1ab_2    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
conda                     23.3.1           py38h578d9bd_0    conda-forge
conda-package-handling    2.0.2              pyh38be061_0    conda-forge
conda-package-streaming   0.7.0              pyhd8ed1ab_1    conda-forge
contourpy                 1.0.7            py38hfbd4bf9_0    conda-forge
cryptography              40.0.2           py38h3d167d9_0    conda-forge
cudatoolkit               11.8.0              h37601d7_11    conda-forge
curl                      8.1.1                h409715c_0    conda-forge
cycler                    0.11.0             pyhd8ed1ab_0    conda-forge
cython                    0.29.34          py38h8dc9893_0    conda-forge
edgembar                  0.2                      pypi_0    pypi
expat                     2.5.0                hcb278e6_1    conda-forge
fftw                      3.3.10          nompi_hc118613_107    conda-forge
fmt                       9.1.0                h924138e_0    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 hab24e00_0    conda-forge
fontconfig                2.14.2               h14ed4e7_0    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
fonttools                 4.39.4           py38h01eb140_0    conda-forge
forcebalance              1.9.5            py38h507a481_0    conda-forge
freetype                  2.12.1               hca18f0e_1    conda-forge
future                    0.18.3             pyhd8ed1ab_0    conda-forge
geometric                 1.0                pyhd8ed1ab_1    conda-forge
gettext                   0.21.1               h27087fc_0    conda-forge
greenlet                  2.0.2            py38h17151c0_1    conda-forge
hdf4                      4.2.15               h501b40f_6    conda-forge
hdf5                      1.14.0          nompi_hb72d44e_103    conda-forge
icu                       72.1                 hcb278e6_0    conda-forge
idna                      3.4                pyhd8ed1ab_0    conda-forge
importlib-metadata        6.6.0              pyha770c72_0    conda-forge
importlib-resources       5.12.0             pyhd8ed1ab_0    conda-forge
importlib_resources       5.12.0             pyhd8ed1ab_0    conda-forge
jinja2                    3.1.2              pyhd8ed1ab_1    conda-forge
jsonpatch                 1.32               pyhd8ed1ab_0    conda-forge
jsonpointer               2.0                        py_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.4            py38h43d8883_1    conda-forge
krb5                      1.20.1               h81ceb04_0    conda-forge
lcms2                     2.15                 haa2dc70_1    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libaec                    1.0.6                hcb278e6_1    conda-forge
libarchive                3.6.2                h039dbb9_1    conda-forge
libblas                   3.9.0           16_linux64_openblas    conda-forge
libbrotlicommon           1.0.9                h166bdaf_8    conda-forge
libbrotlidec              1.0.9                h166bdaf_8    conda-forge
libbrotlienc              1.0.9                h166bdaf_8    conda-forge
libcblas                  3.9.0           16_linux64_openblas    conda-forge
libcurl                   8.1.1                h409715c_0    conda-forge
libdeflate                1.18                 h0b41bf4_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libexpat                  2.5.0                hcb278e6_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 12.2.0              h65d4601_19    conda-forge
libgfortran-ng            12.2.0              h69a702a_19    conda-forge
libgfortran5              12.2.0              h337968e_19    conda-forge
libglib                   2.76.3               hebfc3b9_0    conda-forge
libgomp                   12.2.0              h65d4601_19    conda-forge
libiconv                  1.17                 h166bdaf_0    conda-forge
libjpeg-turbo             2.1.5.1              h0b41bf4_0    conda-forge
liblapack                 3.9.0           16_linux64_openblas    conda-forge
libmamba                  1.4.2                hcea66bb_0    conda-forge
libmambapy                1.4.2            py38h7fa060d_0    conda-forge
libnetcdf                 4.9.2           nompi_h0f3d0bb_105    conda-forge
libnghttp2                1.52.0               h61bc06f_0    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libopenblas               0.3.21          pthreads_h78a6416_3    conda-forge
libpng                    1.6.39               h753d276_0    conda-forge
libsolv                   0.7.23               h3eb15da_0    conda-forge
libsqlite                 3.42.0               h2797004_0    conda-forge
libssh2                   1.10.0               hf14f497_3    conda-forge
libstdcxx-ng              12.2.0              h46fd767_19    conda-forge
libtiff                   4.5.0                ha587672_6    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libwebp-base              1.3.0                h0b41bf4_0    conda-forge
libxcb                    1.15                 h0b41bf4_0    conda-forge
libxml2                   2.11.4               h0d562d8_0    conda-forge
libxslt                   1.1.37               h0054252_1    conda-forge
libzip                    1.9.2                hc929e4a_1    conda-forge
libzlib                   1.2.13               h166bdaf_4    conda-forge
lxml                      4.9.1            py38h0a891b7_0    conda-forge
lz4-c                     1.9.4                hcb278e6_0    conda-forge
lzo                       2.10              h516909a_1000    conda-forge
mamba                     1.4.2            py38haad2881_0    conda-forge
markupsafe                2.1.2            py38h1de0b5d_0    conda-forge
matplotlib-base           3.7.1            py38hd6c3c57_0    conda-forge
mctc-lib                  0.3.1                h74f4db8_0    conda-forge
mmpbsa-py                 16.0                     pypi_0    pypi
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
ncurses                   6.3                  h27087fc_1    conda-forge
netcdf-fortran            4.6.1           nompi_h4f3791c_100    conda-forge
networkx                  3.1                pyhd8ed1ab_0    conda-forge
nomkl                     1.0                  h5ca1d4c_0    conda-forge
numexpr                   2.8.4           py38h69a160b_100    conda-forge
numpy                     1.24.3           py38h59b608b_0    conda-forge
ocl-icd                   2.3.1                h7f98852_0    conda-forge
ocl-icd-system            1.0.0                         1    conda-forge
openff-amber-ff-ports     0.0.3              pyh6c4a22f_0    conda-forge
openff-forcefields        2023.05.1          pyh1a96a4e_1    conda-forge
openff-toolkit-base       0.12.1             pyhd8ed1ab_2    conda-forge
openff-units              0.2.0              pyh1a96a4e_1    conda-forge
openff-utilities          0.1.8              pyh1a96a4e_0    conda-forge
openjpeg                  2.5.0                hfec8fc6_2    conda-forge
openmm                    8.0.0            py38hd11a18e_1    conda-forge
openssl                   3.1.0                hd590300_3    conda-forge
packaging                 23.1               pyhd8ed1ab_0    conda-forge
packmol                   20.010               h86c2bf4_0    conda-forge
packmol-memgen            2023.2.24                pypi_0    pypi
pandas                    2.0.1            py38h01efb38_1    conda-forge
parmed                    4.1.0            py38h8dc9893_0    conda-forge
pcre2                     10.40                hc3806b6_0    conda-forge
pdb4amber                 22.0                     pypi_0    pypi
perl                      5.32.1          2_h7f98852_perl5    conda-forge
pillow                    9.5.0            py38h885162f_1    conda-forge
pint                      0.21               pyhd8ed1ab_0    conda-forge
pip                       23.1.2             pyhd8ed1ab_0    conda-forge
pixman                    0.40.0               h36c2ea0_0    conda-forge
platformdirs              3.5.1              pyhd8ed1ab_0    conda-forge
pluggy                    1.0.0              pyhd8ed1ab_5    conda-forge
pooch                     1.7.0              pyha770c72_3    conda-forge
psutil                    5.9.5            py38h1de0b5d_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
py-cpuinfo                9.0.0              pyhd8ed1ab_0    conda-forge
pybind11-abi              4                    hd8ed1ab_3    conda-forge
pycairo                   1.23.0           py38h190342e_0    conda-forge
pycosat                   0.6.4            py38h0a891b7_1    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pydantic                  1.10.8           py38h01eb140_0    conda-forge
pymbar                    3.1.1            py38h26c90d9_2    conda-forge
pymsmt                    22.0                     pypi_0    pypi
pyopenssl                 23.1.1             pyhd8ed1ab_0    conda-forge
pyparsing                 3.0.9              pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.8.16          he550d4f_1_cpython    conda-forge
python-constraint         1.4.0                      py_0    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-tzdata             2023.3             pyhd8ed1ab_0    conda-forge
python_abi                3.8                      3_cp38    conda-forge
pytraj                    2.0.6                    pypi_0    pypi
pytz                      2023.3             pyhd8ed1ab_0    conda-forge
pyyaml                    6.0              py38h0a891b7_5    conda-forge
qcelemental               0.25.1             pyhd8ed1ab_1    conda-forge
qcengine                  0.26.0             pyhd8ed1ab_0    conda-forge
qubekit                   2.1.1              pyhd8ed1ab_0    conda-forge
rdkit                     2023.03.1        py38h36d2b2f_0    conda-forge
readline                  8.2                  h8228510_1    conda-forge
reportlab                 3.6.13           py38h57c54bf_0    conda-forge
reproc                    14.2.4               h0b41bf4_0    conda-forge
reproc-cpp                14.2.4               hcb278e6_0    conda-forge
requests                  2.31.0             pyhd8ed1ab_0    conda-forge
ruamel.yaml               0.17.26          py38h01eb140_0    conda-forge
ruamel.yaml.clib          0.2.7            py38h1de0b5d_1    conda-forge
sander                    22.0                     pypi_0    pypi
scipy                     1.10.1           py38h59b608b_3    conda-forge
setuptools                67.7.2             pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
smirnoff99frosst          1.1.0              pyh44b312d_0    conda-forge
snappy                    1.1.10               h9fff704_0    conda-forge
sqlalchemy                2.0.15           py38h01eb140_0    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
toolz                     0.12.0             pyhd8ed1ab_0    conda-forge
torsiondrive              1.1.0              pyhd8ed1ab_0    conda-forge
tqdm                      4.65.0             pyhd8ed1ab_1    conda-forge
typing-extensions         4.6.0                hd8ed1ab_0    conda-forge
typing_extensions         4.6.0              pyha770c72_0    conda-forge
tzdata                    2023c                h71feb2d_0    conda-forge
unicodedata2              15.0.0           py38h0a891b7_0    conda-forge
urllib3                   2.0.2              pyhd8ed1ab_0    conda-forge
wheel                     0.40.0             pyhd8ed1ab_0    conda-forge
xmltodict                 0.13.0             pyhd8ed1ab_0    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.0.10               h7f98852_0    conda-forge
xorg-libsm                1.2.3             hd9c2040_1000    conda-forge
xorg-libx11               1.8.4                h8ee46fc_1    conda-forge
xorg-libxau               1.0.11               hd590300_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h0b41bf4_2    conda-forge
xorg-libxrender           0.9.10            h7f98852_1003    conda-forge
xorg-libxt                1.2.1                h7f98852_2    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h0b41bf4_1003    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xtb                       6.5.1                h03160e7_1    conda-forge
xtb-python                22.1             py38h1de0b5d_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yaml                      0.2.5                h7f98852_2    conda-forge
yaml-cpp                  0.7.0                h27087fc_2    conda-forge
zipp                      3.15.0             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               h166bdaf_4    conda-forge
zstandard                 0.19.0           py38h5945529_1    conda-forge
zstd                      1.5.2                h3eb15da_6    conda-forge

RdKit _CIPError Keyerror

Hi, I'm trying to parameterize two molecules "NBD" & "Methoxy Coumarin" in their ground state but I keep landing on this error:
`"""
Traceback (most recent call last):
File "/home/akhanna2/anaconda3/envs/qubekit_2022/lib/python3.9/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home/akhanna2/qubekit/ndb_nh2/qubekit_2022/qubekit/qubekit/engines/geometry_optimiser.py", line 155, in optimise
result_mol = self._handle_output(molecule=molecule, opt_output=opt_result)
File "/home/akhanna2/qubekit/ndb_nh2/qubekit_2022/qubekit/qubekit/engines/geometry_optimiser.py", line 211, in _handle_output
result_mol.to_file(file_name="opt.xyz")
File "/home/akhanna2/qubekit/ndb_nh2/qubekit_2022/qubekit/qubekit/molecules/ligand.py", line 220, in to_file
return RDKit.mol_to_file(rdkit_mol=self.to_rdkit(), file_name=file_name)
File "/home/akhanna2/qubekit/ndb_nh2/qubekit_2022/qubekit/qubekit/molecules/ligand.py", line 927, in to_rdkit
assert qb_atom.stereochemistry == rd_atom.GetProp(
KeyError: '_CIPCode'
"""
I tried googling the issue on RDKit GitHub and found this workaround:
"If R or S can't be achieved by trying CW/CCW, force-add them with rdatom.SetProp('_CIPCode', 'R')". I tried changing, rd_atom.GetProp --> rdatom.GetProp('_CIPCode', 'R') but then there was C++ issue.

Click on this link to get the PDB files and the configuration files. Could you please provide your feedback on this? Thank you

General QM support through QCEngine

We currently use QCEngine for optimization and hessian calculations, but only for very specific settings (PSI4 and Geometric) it would be good to allow more QC programs via qcengine for optimizations, hessians and torsiondrives. The single point calculations used to derive charges and LJ parameters are more specialized since they involve the use of an implicit solvent so we can be more restrictive on the use of program there. This could be achieved by changing the check done here to check the program is not gaussian as this is the only engine we support that is not currently available through QCEngine. A similar check could also be done at the hessian and torsion scan stages.

QUBEKit fails with linear molecules

Assume you want to parametrise Br2 (for clathrates calculations or whatever else). QUBEKit fails at stage bonded_parameters.

Reproducing the problem

qubekit config create -p 5b
qubekit run -c test.conf -sm BrBr -m 1
Error traceback
Traceback (most recent call last):
  File "/home/unwado/.local/miniconda3/envs/qubekit/lib/python3.10/site-packages/qubekit/workflow/workflow.py", line 335, in _run_stage
    result_mol = stage.run(
  File "/home/unwado/.local/miniconda3/envs/qubekit/lib/python3.10/site-packages/qubekit/bonded/mod_seminario.py", line 231, in run
    self._modified_seminario_method(molecule=molecule, hessian=hessian)
  File "/home/unwado/.local/miniconda3/envs/qubekit/lib/python3.10/site-packages/qubekit/bonded/mod_seminario.py", line 264, in _modified_seminario_method
    self.calculate_angles(eigenvals, eigenvecs, molecule, bond_lens)
  File "/home/unwado/.local/miniconda3/envs/qubekit/lib/python3.10/site-packages/qubekit/bonded/mod_seminario.py", line 283, in calculate_angles
    for count, angle in enumerate(molecule.angles):
TypeError: 'NoneType' object is not iterable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/unwado/.local/miniconda3/envs/qubekit/bin/qubekit", line 8, in <module>
    sys.exit(cli())
  File "/home/unwado/.local/miniconda3/envs/qubekit/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/unwado/.local/miniconda3/envs/qubekit/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/unwado/.local/miniconda3/envs/qubekit/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/unwado/.local/miniconda3/envs/qubekit/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/unwado/.local/miniconda3/envs/qubekit/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/unwado/.local/miniconda3/envs/qubekit/lib/python3.10/site-packages/qubekit/cli/run.py", line 135, in run
    workflow.new_workflow(molecule=molecule, skip_stages=skip_stages, end=end)
  File "/home/unwado/.local/miniconda3/envs/qubekit/lib/python3.10/site-packages/qubekit/workflow/workflow.py", line 257, in new_workflow
    return self._run_workflow(
  File "/home/unwado/.local/miniconda3/envs/qubekit/lib/python3.10/site-packages/qubekit/workflow/workflow.py", line 295, in _run_workflow
    molecule = self._run_stage(
  File "/home/unwado/.local/miniconda3/envs/qubekit/lib/python3.10/site-packages/qubekit/workflow/workflow.py", line 359, in _run_stage
    raise WorkFlowExecutionError(
qubekit.utils.exceptions.WorkFlowExecutionError: The workflow stopped unexpectedly due to the following error at stage: bonded_parameters

What to do?

Main part of traceback (in my opinion)
  File "/home/unwado/.local/miniconda3/envs/qubekit/lib/python3.10/site-packages/qubekit/bonded/mod_seminario.py", line 283, in calculate_angles
    for count, angle in enumerate(molecule.angles):
TypeError: 'NoneType' object is not iterable

According to spolier above, the main problem is in mod_seminario.py file, where molecule.angles returns None if there are no angles in molecule. I'm not sure if it is critical somewhere, but may be Molecule.angles should return empty list instead of None. If it is important for some check, probably the condition should be changed from molecule.angles is not None to len(molecule.angles) != 0. I can implement changes myself, but would like to hear your opinion on the problem

NB

HHal molecules (Hal = F, Cl, Br, I) fail on virtual_sites stage due to error below, but I have no idea why are they different from Hal2

Error with HHal

Traceback (most recent call last):
  File "/home/unwado/.local/miniconda3/envs/qubekit/lib/python3.10/site-packages/qubekit/workflow/workflow.py", line 335, in _run_stage
    result_mol = stage.run(
  File "/home/unwado/.local/miniconda3/envs/qubekit/lib/python3.10/site-packages/qubekit/nonbonded/virtual_sites.py", line 153, in run
    self._save_virtual_sites()
  File "/home/unwado/.local/miniconda3/envs/qubekit/lib/python3.10/site-packages/qubekit/nonbonded/virtual_sites.py", line 1180, in _save_virtual_sites
    close_b_coords = self._coords[closest_atoms[1]]
IndexError: list index out of range

Phosphorous valency 3 works but valency 5 does not work

Hello,
I am working on getting the parameters for a ligand in which the phosphorous has a valency of 5, but the sequence of calculation fails with the following error:

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/aaojha/anaconda3/envs/qubekit/lib/python3.8/site-packages/qubekit/workflow/workflow.py", line 335, in _run_stage
result_mol = stage.run(
File "/home/aaojha/anaconda3/envs/qubekit/lib/python3.8/site-packages/qubekit/workflow/helper_stages.py", line 130, in run
preopt_geometries = self._pre_opt(
File "/home/aaojha/anaconda3/envs/qubekit/lib/python3.8/site-packages/qubekit/workflow/helper_stages.py", line 250, in _pre_opt
result_mol, opt_result = result.get()
File "/home/aaojha/anaconda3/envs/qubekit/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
qcelemental.exceptions.ValidationError: Inconsistent or unspecified chg/mult: sys chg: 0, frag chg: [None], sys mult: 1, frag mult: [None]

However the charge on the system is 0.

But, if I remove the atoms to make it a valency of 3, the workflow works.

Please help me in this issue. I am attaching the files:

files.zip

ONETEP support via ASE

It would be good to add support to automatically run the ONETEP calculations via ase. We should be able to then interface the output with chargemol to get non-bonded parameters and fit virtual sites.

Change MM optimise to initial optimise

It might be a good idea to change the MM optimise stage to be a general initial optimisation stage which aims to help speed up the full QM optimisation. If we allow the use of QCEngine at this stage we get access to QM/MM/ML methods and semi-empirical methods including xtb this way we expand the range of usable methods while keeping the current options and use a single interface which should keep development cost low. The downside of this would be that the openforcefield toolkit is needed to run gaff/openff optimisations via the qcengine interface, however, as these packages are now all on conda we can just add these to the recipe.

Add QForce hessian fitting support

We plan to add the ability for users to use hessian fitting provided via QForce instead of the modified seminario step. This issue will track the progress of this implementation.

Notes:
QForce install is currently only on pip and requires the following packages, numba means that python must be <3.9.
install-requires =
scipy
numpy
ase
matplotlib
seaborn
networkx
pulp<2.2
numba>0.40
pycolt>=0.3

How to generate GROMACS format files

Hi,
I'm trying to use qubekit to get bespoke force filed parameters. I tested it with ethanol. It worked without error. However, I checked the folder of "final parameter". The default format seems to be OpenMM. I created a config file and found the following command:
"parametrisation": {
"type": "OpenFF",
"force_field": "openff_unconstrained-1.3.0.offxml"
Is this the correct spot to change the format of derived parameters? If it is, how to change it for GROMACS? I have read the paper and it said that qubekit supports BOSS/MCPRO, OpenMM and GROMACS. But I could not find the instruction. Thanks a lot.

Torsiondrive state file format is wrong

The torsiondrive state file format is wrong meaning that restarting a torsiondrive is not possible without losing all current progress, the dihedral values need to be corrected.

XML combiner

Now we know the xml combiner here works it would be good to incorporate this into the qubekit CLI. I think it might make sense to expose the available fitting parameters as options on the CLI so users who want to fit the Rfree for elements H, C, N, O for a selection of molecules could use a command like
qubekit combine --output master.xml -p c -p h -p n -p o
where p is parameterise?

All Rfree would be options along with alpha and beta and they would start the optimisation from the current values, ie the ones used in the initial protocol.

Questions

  • should we have an option to reset the rfree and alpha and beta to some other values or just alpha and beta as each optimisation in the paper started from alpha=1, beta=0.5.

Logging Errors

When writing tests for the run.py file logging can cause them to hang. This is due to each stage function in the execute class looking for a log file to append messages to but these are not present. It might be best to have the level of logging be an execute class variable like verbose so we can toggle it off for testing.

Add Fragmentation support

To help speed up torsion drives on larges molecules we should use a fragmentation scheme like that in fragmenter, we should try and build an interface to fragmenter first to see if this fits our needs.

Add units to fields?

It might be helpful to start using a unit package like openff-units to help keep track of units for force field parameters, hessians and geometries as we regularly swap between nanometer, bohr and angstrom for the geometries for example. The package also provides an easy way to convert between different unit systems and is pulled in by the openff-toolkit so its already in the environment.

Calculate molecule net charge

While we do have a CLI option for users to supply a net charge however most of the time this can be calculated from the formal charges on the atoms returned from the rdkit molecule object, as done here in the openff-toolkit.

Output XML file format

Hello!

Before finding issue #328 I thought, that the resulting force field is in OpenMM format. However you noted in mentioned issue, that epsilon and sigma parameters can not be used in OpenMM directly

So what is the actual sense of parameters from final_parameters' .xml file? Do other parameters follow OpenMM notation? (it is important, e.g. for constant k in HarmonicBondForce, which may be already divided by 2 or not)

Do Amber combination rules refer to this letter and do sigma from ForceField mean the same as r* from the letter?

Would be glad to get aid on this problem

PSI4 density option permitted, but does nothing

The code for "density" calculations in the psi4 engine is currently commented out. The program does not throw any error until it can't find the expected .wfx file output from the density step during the charges step.

Calculating bonds and angles with modified Seminario method... Bond and angle parameters calculated
Performing density calculation with psi4... Density calculation complete
Chargemol calculating charges using DDEC3...

An exception occurred with: Execute.run
Exception: [Errno 2] No such file or directory: '/data/QUBEKit_ethane2_2020_09_07_ethane_example/06_density/ethane2.wfx'

It would be great to be able to use PSI4 for density calculations (g09 licenses etc etc), but if that's not an option right now it would still be nice if the program threw some error a little earlier on.

Really enjoying testing this package out!

Optimization always failed

Hi,
I tried a molecule with 63 atoms. And I keep getting this error:"qubekit.utils.exceptions.WorkFlowExecutionError: The workflow stopped unexpectedly due to the following error at stage: optimisation". The content in the QUBEKit.err file was copied in the following paragraph. Do you know what causes this? And how can I fix this? If you need more information about this error, please let me know. I really appreciate your help.

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/user/miniconda3/envs/qubekit/lib/python3.9/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home/user/miniconda3/envs/qubekit/lib/python3.9/site-packages/qubekit/engines/geometry_optimiser.py", line 155, in optimise
result_mol = self._handle_output(molecule=molecule, opt_output=opt_result)
File "/home/user/miniconda3/envs/qubekit/lib/python3.9/site-packages/qubekit/engines/geometry_optimiser.py", line 211, in _handle_output
result_mol.to_file(file_name="opt.xyz")
File "/home/user/miniconda3/envs/qubekit/lib/python3.9/site-packages/qubekit/molecules/ligand.py", line 220, in to_file
return RDKit.mol_to_file(rdkit_mol=self.to_rdkit(), file_name=file_name)
File "/home/user/miniconda3/envs/qubekit/lib/python3.9/site-packages/qubekit/molecules/ligand.py", line 927, in to_rdkit
assert qb_atom.stereochemistry == rd_atom.GetProp(
KeyError: '_CIPCode'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/user/miniconda3/envs/qubekit/lib/python3.9/site-packages/qubekit/workflow/workflow.py", line 335, in _run_stage
result_mol = stage.run(
File "/home/user/miniconda3/envs/qubekit/lib/python3.9/site-packages/qubekit/workflow/helper_stages.py", line 130, in run
preopt_geometries = self._pre_opt(
File "/home/user/miniconda3/envs/qubekit/lib/python3.9/site-packages/qubekit/workflow/helper_stages.py", line 250, in _pre_opt
result_mol, opt_result = result.get()
File "/home/user/miniconda3/envs/qubekit/lib/python3.9/multiprocessing/pool.py", line 771, in get
raise self._value
KeyError: '_CIPCode'

Conflict between QUBEKit and psi4 in conda

Hi all,

I tried to install QUBEKit and the dependencies via conda, however, after installing QUBEKit using conda, I can't install psi4 into the same environment - there were many conflicts reported by conda. The same error occurred when I installed psi4 first and tried to install QUBEKit - I used the same conda commands as suggested in the README file. I built the conda envs using Python 3.9

Any suggestions that can help make QUBEKit and psi4 work in the same conda environment? A related question is that is it possible to use QUBEKit to generate input files for psi4 and read the results back to QUBEKit again to avoid using them in the same environment, if so, how can I do it?

Thanks very much!

Qi

QM Optimisation fails

Hi,

I am currently trying to generate parameters for a molecule and it consistently fails in the optimization step because no conformer was found that could be optimized to GAU TIGHT. I am running qubekit with psi4 using 6-311++G as basis and wb97x-d as method. Charges are handled with MBISCharges in psi4 with water as medium solvent and angstrom as units. Additionally, I added the Rfree parameters for sulfur from the Paper. Parametrisation is handled by OpenFF.

I have tried to up the convergece criteria of the pre-optimization run to GAU VERYTIGHT but it did not change the outcome. I also tried to ramp up the number of maximum iterations to 500 and the number of seed conformers to 60 but again this did not help.

Maybe there are some special configurations that I am not aware of that you could hint me to?

The molecule is: "C=c5cc(N1CCOCC1)oc6c(c3cccc4c2ccccc2sc34)cccc56"

Issues with Installation

There are currently a few issues with installation due to custom forks; conda-forge migrations of other packages; and rdkit. Current list of issues:

  • RDKit will not install properly if using python=3.7
  • OpenMM and OpenForceField aren't being pulled properly due to migrations to conda-forge. Currently need to be installed from omnia channel.
  • Geometric and Torsiondrive need to be installed from forks. This is only necessary if using virtual sites or Gaussian 09 for torsions.

We can fix the OpenForceField and Openmm issues by changing the imports and updating the required versions in the conda build. Could do with a good long-term solution for the fork installs.

Torsiondrive speed

Currently, we collect the torsion QM data using the CLI interface to torsiondrive and we run each grid point optimisation in series which can be quite slow even for simple small molecules. Here I want to collect together some ideas on how we can try and improve this.

QCArchive support

We plan to add this any way to allow us to reuse calculations, this framework also would allow us to parallelize torsiondrives and run multiple optimisations at the same across multiple dihedrals provided we have multiple workers set up which can execute the jobs. This would also work better if we supply multiple diverse conformers which sample the targeted rotatable bond allowing us to take advantage of parallel optimisations from the start and should help reduce the number of optimisations spawned by wavefront propagation. The downside would be that we lose the ability to use gaussian as only programs supported by qcengine can be used.

Torsiondrive Json API

Here we would be rebuilding some of the logic that qcarchive uses to parallelize the optimisations, but we would choose how to run them allowing us to still have access to gaussian. The API would give us a list of optimisations to run and the starting conformer, we would then have to build out a queue system and use subprocess to spawn multiple optimisations (replicating qcfractal workers). The upside is that we would not need a qcfractal instance to run the torsiondrive.

Initial torsiondrive using a xtb/ani

We could do the majority of the work using ani or xtb to get us conformers which are close to the qm minimum very quickly by running an initial torsiondrive with one of these methods. These conformers would then be used as input into the full QM optimisation which should only take a few optimisation steps to converge.

Fragmentation

Fragmentation should mean that the QM calculations are much faster and optimisations should converge in fewer steps, this would be great if we could combine it with one of the parallel methods above.

Are there any other methods which could help speed this up?

QCArchive support

To add the ability to store and reuse QM calculations we should support the use of QCArchive, we would require an interface to deposit and search for calculations, this could also help speed up QM calculations by taking advantage of the parallel computing offered by QCFractal.

Missing v-site test cases

Need to add test cases for (non-)symmetric sites with all variations of site vectors.

Currently testing only covers halogens; needs to also cover alcohols (symmetric and non-symmetric), amines, etc.

Once MBIS is implemented, testing should be much easier just using PSI4, rather than all the Chargemol output files.

Final Parameters Combination Rules

Hello,

I noticed that the combination rule used in the final_parameters output file in QUBEKit is in the Amber format. For instance, this is how it's written in .xml and .gro file format respectively.

<NonbondedForce coulomb14scale="0.8333333333" lj14scale="0.5" combination="amber">
[ defaults ]
; nbfunc        comb-rule       gen-pairs       fudgeLJ fudgeQQ
1               2               no              1            0.83333333  

Is it possible to change the format of the combination rule to that of other force fields?

I want to solvate this molecule with a given water model.

Assertion Error in dihedral.py

Hi there, I've been trying to parameterize ethanol molecule using qubekit but I keep getting this error:
Qubekit-Version: 2.6.3 (PyPI, Sep.2019)
Torsiondrive-Version: 0.9.5 (https://github.com/jthorton/torsiondrive)

File "/home/****/anaconda3/envs/qubekit/lib/python3.7/site-packages/QUBEKit/dihedrals.py", line 1063, in plot_results
assert len(angles) == len(self.qm_energy)
AssertionError

The parameterization process runs smoothly till the 10th step of "torsion_optimise", upon inspecting the "dihedral.py" file I found this comment "# Make sure we have the same angles as data points". Now, I was wondering, if you could please tell me what could be causing this issue:

  1. Is it something related to the master_config file
  2. Or Scipy's optimizer? (BFGS or NM I tried both)
  3. Or something else

Thank you for your time, have a good day

Regards
Ajay Khanna
Grad Student
UC Merced

Negative Force Constants

In some cases, we can calculate negative force constants using the modified Seminario method for bonds. This seems to be when we are not at the true minimum and we should probably raise an error if the force constants are calculated to be negative.

Gaussian Input format

  1. The default computational method is wb97X-D/6-311++G(d,p). How can I change it to some semi-empirical methods?
  2. Also the default command has "SCF=(XQC, MaxConventionalCycles=300) nosymm Force=NoStep Int(Grid=UltraFine)'. Can I modify it? I have installed the dev version. I couldn't find the right place to edit the source code.
  3. The limit of n_workers for torsion driver is 4. So the maximum number of gaussian jobs submitted for the torsion scan is 4. How do I change the limit to a bigger number?
    Sorry, I have been trying reading your python script. It's a bit hard for me to understand.

MissingRfreeError: The following elements have no reference Rfree values

Hi,

with the following molecule:

Oc1ccc(cc1)CC[C@@]1(OC(=O)C(=C(O)C1)Sc1cc(c(N)cc1C(C)(C)C)C)C1CCCC1

I get the error message

qubekit.utils.exceptions.MissingRfreeError: The following elements have no reference Rfree values which are required to parameterise the molecule {'s'}

which I assume is becaue of the S in the molecule ?
Is this chanceless or do you have any pointers on how I could proceed to get to an parametrization ?

Best regards,
Ben

Antechamber mm optimise step

Hi ๐Ÿ‘‹

For large systems, it looks like the bottleneck will be at the MM optimisation step which is currently done serially in Antechamber. Can you change this to run in parallel?

Write parameters to offxmls

To use more of the OpenFF stack, we need to be able to write force fields to the SMIRNOFF offxml format. We should be able to use chemper to generate the smirks patterns for the force field terms and encode the entire molecule into the pattern to ensure they are bespoke to the molecule we wish to parameterise.

One possible complication is the use of virtual sites and we will have to carefully map our general sites onto the ones available in the openff toolkit.

Got stuck in torsion_scanner

Hi @jthorton ,
I 'm running qubekit on a molecule using g09. The method I used is protocol 1a. All optimization, hessian calculations work well. During the torsion scanner, in the first dihedral bond (SCAN_8_9_10_11), it finished normally. However, in the second dehedral bond (SCAN_1_2_4_5), it got stuck in the grid_point_180_job_2. It generated the gaussion input(.com) file, and no results for a few days. No error is reported either. I tried restarting from the torsion_scanner and it got stuck in the same place. I also tried using g09 to just calculate the gasussin input file. It terminated normally and only took 27 mins. Could you help me figure out why it happended and how to solve it?
lig.txt
config_b3lyp.txt
workflow_result.txt

No molecule conformer could be optimised to GAU TIGHT

Sorry for opening yet another issue, I would not (yet) have done this if this was only with one molecule
but I even get the following error

Traceback (most recent call last):
  File "/work/scratch/b_mayer/miniconda3/envs/qubekit/lib/python3.10/site-packages/qubekit/workflow/workflow.py", line 335, in _run_stage
    result_mol = stage.run(
  File "/work/scratch/b_mayer/miniconda3/envs/qubekit/lib/python3.10/site-packages/qubekit/workflow/helper_stages.py", line 144, in run
    return self._run_qm_opt(
  File "/work/scratch/b_mayer/miniconda3/envs/qubekit/lib/python3.10/site-packages/qubekit/workflow/helper_stages.py", line 206, in _run_qm_opt
    raise GeometryOptimisationError(
qubekit.utils.exceptions.GeometryOptimisationError: No molecule conformer could be optimised to GAU TIGHT

while running one of the examples:

qubekit run -sm CO -n methanol

and as mentioned I also got this with one other molecule. So I have the feeling that something is wrong and this will
always be triggered.
Hope you have any ideas on how to solve this, I would really like to be able to use the parameters from your very nice toolkit in my MD simulations.

PSI4 error when not installed

When trying to make psi4 input files using the PSI4 class the init fails when checking if psi4 is installed, we probably should only fail with this error if the user requested an action that needs psi4 look building and executing an input file. This would be consistent with the Gaussian engine.

Use different qcoptions for optimisation and torsion scanner

Hi,
Since torsion scanner really takes time and ANI convergence issues on optimisation, I'm wondering if qubekit can use gaussian/psi4 for optimization/hessian calculations, and then use ANI only for single point energy calculations of tosion scan.

MBIS tests

In a recent PR #259 we added an interface to MBIS but the tests were left offline due to a conda environment issue. This caused psi4 to install with a missing dependency which caused all psi4 related tests to fail and seems to affect the newest dev versions of psi4 which are needed for the mbis functionality. Locally I was able to get around this by making an environment with an older version of psi4 and then updating it to the latest. This is just a reminder to come back to the tests.

Generate unique atom names

When an input is given that has no unique atom names we should generate them automatically as the final forcefield currently made generates an error when loaded into openmm. For example this pdb

REMARK   1 CREATED WITH QUBEKit 2021-01-28 10:45:36.697804
COMPND    MOL                 
HETATM    1   C  UNL     1       0.990   0.037  -0.098  1.00  0.00           C
HETATM    2   C  UNL     1       2.455  -0.145  -0.098  1.00  0.00           C
HETATM    3   O  UNL     1       3.212   0.987  -0.104  1.00  0.00           O
HETATM    4   C  UNL     1       4.512   0.594  -0.104  1.00  0.00           C
HETATM    5   C  UNL     1       4.600  -0.778  -0.098  1.00  0.00           C
HETATM    6   C  UNL     1       3.269  -1.258  -0.095  1.00  0.00           C
HETATM    7   H  UNL     1       0.719   1.098  -0.083  1.00  0.00           H
HETATM    8   H  UNL     1       0.547  -0.439   0.782  1.00  0.00           H
HETATM    9   H  UNL     1       0.549  -0.414  -0.992  1.00  0.00           H
HETATM   10   H  UNL     1       5.240   1.393  -0.108  1.00  0.00           H
HETATM   11   H  UNL     1       5.512  -1.358  -0.097  1.00  0.00           H
HETATM   12   H  UNL     1       2.940  -2.288  -0.090  1.00  0.00           H
CONECT    1    2    7    8    9
CONECT    2    1    3    6
CONECT    3    2    4
CONECT    6    2    5   12
CONECT    4    3    5   10
CONECT    5    4    6   11
END

will make the following XML

<Atom name="C" type="QUBE_0"/>
<Atom name="C" type="QUBE_1"/>
<Atom name="O" type="QUBE_2"/>
<Atom name="C" type="QUBE_3"/>
<Atom name="C" type="QUBE_4"/>
<Atom name="C" type="QUBE_5"/>
<Atom name="H" type="QUBE_6"/>
<Atom name="H" type="QUBE_7"/>
<Atom name="H" type="QUBE_8"/>
<Atom name="H" type="QUBE_9"/>
<Atom name="H" type="QUBE_10"/>
<Atom name="H" type="QUBE_11"/>

which will give the following error in openmm

QUBEKitdev/lib/python3.7/site-packages/simtk/openmm/app/internal/pdbstructure.py:536: UserWarning: WARNING: duplicate atom (HETATM    2   C  UNL     1       2.455  -0.145  -0.098  1.00  0.00           C  , HETATM    1   C  UNL     1       0.990   0.037  -0.098  1.00  0.00           C  )
  warnings.warn("WARNING: duplicate atom (%s, %s)" % (atom, old_atom._pdb_string(old_atom.serial_number, atom.alternate_location_indicator)))
/home/b6056633/miniconda3/envs/QUBEKitdev/lib/python3.7/site-packages/simtk/openmm/app/internal/pdbstructure.py:536: UserWarning: WARNING: duplicate atom (HETATM    4   C  UNL     1       4.512   0.594  -0.104  1.00  0.00           C  , HETATM    2   C  UNL     1       2.455  -0.145  -0.098  1.00  0.00           C  )
  warnings.warn("WARNING: duplicate atom (%s, %s)" % (atom, old_atom._pdb_string(old_atom.serial_number, atom.alternate_location_indicator)))
/home/b6056633/miniconda3/envs/QUBEKitdev/lib/python3.7/site-packages/simtk/openmm/app/internal/pdbstructure.py:536: UserWarning: WARNING: duplicate atom (HETATM    5   C  UNL     1       4.600  -0.778  -0.098  1.00  0.00           C  , HETATM    4   C  UNL     1       4.512   0.594  -0.104  1.00  0.00           C  )
  warnings.warn("WARNING: duplicate atom (%s, %s)" % (atom, old_atom._pdb_string(old_atom.serial_number, atom.alternate_location_indicator)))
/home/b6056633/miniconda3/envs/QUBEKitdev/lib/python3.7/site-packages/simtk/openmm/app/internal/pdbstructure.py:536: UserWarning: WARNING: duplicate atom (HETATM    6   C  UNL     1       3.269  -1.258  -0.095  1.00  0.00           C  , HETATM    5   C  UNL     1       4.600  -0.778  -0.098  1.00  0.00           C  )
  warnings.warn("WARNING: duplicate atom (%s, %s)" % (atom, old_atom._pdb_string(old_atom.serial_number, atom.alternate_location_indicator)))
/home/b6056633/miniconda3/envs/QUBEKitdev/lib/python3.7/site-packages/simtk/openmm/app/internal/pdbstructure.py:536: UserWarning: WARNING: duplicate atom (HETATM    8   H  UNL     1       0.547  -0.439   0.782  1.00  0.00           H  , HETATM    7   H  UNL     1       0.719   1.098  -0.083  1.00  0.00           H  )
  warnings.warn("WARNING: duplicate atom (%s, %s)" % (atom, old_atom._pdb_string(old_atom.serial_number, atom.alternate_location_indicator)))
/home/b6056633/miniconda3/envs/QUBEKitdev/lib/python3.7/site-packages/simtk/openmm/app/internal/pdbstructure.py:536: UserWarning: WARNING: duplicate atom (HETATM    9   H  UNL     1       0.549  -0.414  -0.992  1.00  0.00           H  , HETATM    8   H  UNL     1       0.547  -0.439   0.782  1.00  0.00           H  )
  warnings.warn("WARNING: duplicate atom (%s, %s)" % (atom, old_atom._pdb_string(old_atom.serial_number, atom.alternate_location_indicator)))
/home/b6056633/miniconda3/envs/QUBEKitdev/lib/python3.7/site-packages/simtk/openmm/app/internal/pdbstructure.py:536: UserWarning: WARNING: duplicate atom (HETATM   10   H  UNL     1       5.240   1.393  -0.108  1.00  0.00           H  , HETATM    9   H  UNL     1       0.549  -0.414  -0.992  1.00  0.00           H  )
  warnings.warn("WARNING: duplicate atom (%s, %s)" % (atom, old_atom._pdb_string(old_atom.serial_number, atom.alternate_location_indicator)))
/home/b6056633/miniconda3/envs/QUBEKitdev/lib/python3.7/site-packages/simtk/openmm/app/internal/pdbstructure.py:536: UserWarning: WARNING: duplicate atom (HETATM   11   H  UNL     1       5.512  -1.358  -0.097  1.00  0.00           H  , HETATM   10   H  UNL     1       5.240   1.393  -0.108  1.00  0.00           H  )
  warnings.warn("WARNING: duplicate atom (%s, %s)" % (atom, old_atom._pdb_string(old_atom.serial_number, atom.alternate_location_indicator)))
/home/b6056633/miniconda3/envs/QUBEKitdev/lib/python3.7/site-packages/simtk/openmm/app/internal/pdbstructure.py:536: UserWarning: WARNING: duplicate atom (HETATM   12   H  UNL     1       2.940  -2.288  -0.090  1.00  0.00           H  , HETATM   11   H  UNL     1       5.512  -1.358  -0.097  1.00  0.00           H  )
  warnings.warn("WARNING: duplicate atom (%s, %s)" % (atom, old_atom._pdb_string(old_atom.serial_number, atom.alternate_location_indicator)))

Parallelism

Hi,

I am currently running qubekit on a 96 core node on our cluster. I am only trying it out with one molecule right now my run command was

qubekit run -sm "O=C1C(=C(O)C[C@@](O1)(C1CCCC1)CCc1cc(nc(c1)CC)CC)Cc1nn2c(n1)nc(cc2C)C" -n 3FRZ

I am using guassian/g16 for this as it was available on our cluster via the module system.
I guess it is stil in the optimization phase as this output would suggest to me

  opt_result = qcng.compute_procedure(
/work/scratch/b_mayer/miniconda3/envs/qubekit/lib/python3.10/site-packages/qubekit/engines/geometry_optimiser.py:148: FutureWarning: Using the `local_options` keyword argument is depreciated in favor of using `task_config`, in version 0.30.0 it will stop working.
  opt_result = qcng.compute_procedure(

Now as far as I can see this only uses 4 cores of my available 96 cores which is an immense waste of computational power. So my question is: How can I make it use more cores, are there any flags for fine control of parallelization in qubekit ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.