Giter Club home page Giter Club logo

espsim's Introduction

Comparison of electrostatic potential and shape

This repository contains a small code snippet to calculate similarities of shapes and electrostatic potentials between molecules, see manuscript. It is based on Python3, RDKit, Numpy and Scipy. The package furthermore contains functionalities to embed (create 3D coordinates) molecules with a constrained core using RDKit functions.

New

  • On June 23, 2022 RSC-CICAG hosted an in-depth ESPsim workshop. The recording is available on Youtube, and the workshop materials are available in the workshop folder. We hope this resource is valuable to you!
  • December 2021: ESPsim has exciting new features: A machine-learned partial charge model, as well as aligning and scoring of molecules without a common core. We furthermore added various benchmarks, see jupyter notebooks in the benchmarks folder.

Table of Contents

Installation

It is easiest to install all required packages inside a conda environment. Run

conda env create -f environment.yml

to install all necessary packages. Next, activate the environment:

conda activate espsim

(or source activate espsim if you have an older version of conda). If you don't want to use conda, install Python3, RDKit, Numpy, Scipy, Scikit-Learn, Pytorch, Tqdm, as well as optionally Jupyter, py3Dmol, Psi4, RESP (for calculation of quantum mechanical partial charges), and https://github.com/hesther/chemprop-atom-bond.git (for calculation of machine-learned partial charges) in any way convenient to you.

Now, install espsim as a pip package as follows:

pip install -e .

Then you can use import espsim or from espsim import ... in your code. To test your installation, run

python scripts/test_imports.py

which should print Test passed, imports work fine. or tell you which package/module could not be imported.

Usage

Within a python script, you can either use only the ESP similarity routine on previously embedded, aligned RDKit molecule objects using the function GetEspSim() or use EmbedAlignConstrainedScore() to embed, align and score molecules with a common core, or EmbedAlignScore() to embed, align and score molecules without a common core.

ESP similarity

An example is provided in scripts/test_esp_function.py. Execute it via

python scripts/test_esp_function.py

The script loads two RDKit molecules from file (mol, mol2 and sdf with optional custom charges in mol2 and sdf), that have been previously embedded and aligned, and calls GetEspSim(), which calculates the overlap integrals of the electrostatic potentials of the two molecules. The function returns the ESP similarity (target: 0.85). The integration is performed via fitting the 1/r term in the Coloumb potential by three Gaussians. The overlap integral between the potentials of molecules A and B then becomes a sum of a large number of two-center integrals of Gaussians for which an analytic solution exists (see Good et al, doi.org/10.1021/ci00007a002). After integration, the overall similarity is calculated as the Carbo similarity (Carbo et al, doi.org/10.1002/qua.560320412 and Carbo, Arnau, Intl. J. Quantum Chem 17 (1980) 1185 (no doi available)), which is the overlap integral of A and B divided by the square root of the norms of A and B.

The function also takes optionally the arguments prbCharge and refCharge as input (list or 1D array of partial charges for the probe and reference molecule). If not given, RDKit's Gasteiger charges are computed.

Further options

ESPsim provides a range of options to customize the behavior of GetEspSim():

  • prbCharge and refCharge to use custom partial charges.
  • metric determines the similarity metric used. Defaults to metric = "carbo" (Carbo similarity). Currently implemented: carbo with a range of -1 to 1 and tanimoto with a range of -1/3 to 1 (Tanimoto similarity, which is the overlap integral of A and B divided by the sum of the norms of A and B minus the overlap integral).
  • renormalize is a Boolean which determines whether the obtained similarities should be rescaled to the range [0:1] (or to a custom range, given by customrange)
  • integrate determines the integration routine. Defaults to integrate = "gauss" (analytic integration via Gaussian functions). Currently implemented: gauss and mc (Monte Carlo numeric integration). The main difference between the two is that integration via Gaussian functions encompasses integration over all space (including within van-der-Waals radii of atoms, and locations far away from the molecule), whereas the MC integration is local (up to a specified margin around the molecules), and excludes space within the van-der-Waals radii of atoms. The margin of the MC integration, i.e. the minimum distance between the van-der-Waals surface and the integration box can be specified via marginMC in Angstrom, and defaults to 10 Angstrom. Smaller integration margins yield a similarity comparison for the immediate vicinity of the molecule. The number of points of the integration per 1 cubic Angstrom can be set by nMC, and defaults to 1. Smaller numbers, such as 0.1 or 0.01 speed up the calculation but decrease the accuracy of the integration.
  • partialCharges determines the partial charge distribution to use. Defaults to partialCharges = "gasteiger". Currently implemented: gasteiger, mmff (Merk Molecular Force Field MMFF94), ml (Machine learning model adapted from this article - caution, only trained on neutral molecules!), or resp (quantum-mechanical determination of partial charges via Psi4 and fitting to the electrostatic potential under constraints (RESP)). Note that resp is much slower than using Gasteiger, MMFF94 or machine-learned charges. The method and basis set of the quantum mechanical calculation can be customized via methodPsi4 (default "scf", that is, Hartree-Fock) and basisPsi4 (default "3-21G"). For available options, refer to Psi4. The number of electrostatic potential grid points within Psi4 can be regulated via gridPsi4 (default 1, that is 1 grid point per cubic Angstrom).

Embedding, alignment and similarity

If you need to embed and align the the probe and reference molecule, espsim provides a convenient way to do so. The function EmbedAlignConstrainedScore() takes a probe molecule, one or more reference molecules, and a core that is to be constrained (with 3D coordinates!) as input, computes constrained embeddings, compares the shape similarities of all combinations and returns both the shape and ESP similarity. The script scripts/test_embedalignscore.py uses this function to create 10 conformers of three probe molecules each, and compares their similiarities to 10 conformers of nine reference compounds each:

python scripts/test_embedalignscore.py

The function EmbedAlignConstrainedScore() takes several optional input arguments as well: prbNumConfs and refNumConfs, which are the number of conformers to be created for each probe and reference molecule. (The default is 10 each). The more conformers, the more accurate the results, since a better alignment may be found. Thus, the computed shape and ESP similarities are actually lower bounds to the true values (with perfectly aligned conformers). The function also takes prbCharge (list or 1D array of partial charges for the probe molecule) and refCharges (list of list, or 2D array of partial charges of all reference molecules) as input. If not specified (default), RDKit partial charges are calculated (Gasteiger charges). The option to input partial charges is especially convenient if you have already pre-computed charges, for example from quantum mechanics. The order of the inputted partial charges must be the same as the atoms in the RDKit molecules. Further options (partial charge distribution, metric, renormalization, integration routine) can be specified analogous to the GetEspSim() function. Finally, you may choose to calculate ESP similarities at every alignment (not only the best alignment based on shape), which may be achieved via getBestESP = True (caution, this takes longer to compute).

Embedding without constrained core

(Caution: This feature is currently experimental and not well tested yet). For molecules without common core, the function EmbedAlignScore() can be used instead of EmbedAlignConstrainedScore() (with the same input arguments except the core argument). Here, identificiation of the best alignment via shape scores only often leads to poor ESP similarities, so that it is often necessary to use getBestESP = True.

Jupyter demo

The scripts folder also holds a short demo of espsim on Jupyter. To view it, power up a notebook (you must have jupyter and py3Dmol installed, see Installation instructions):

jupyter notebook

and click on scripts and short_demonstration.ipynb. The file holds more information and explanations about espsim, as well as some 3D visualizations of molecule embeddings. The 3D visualizations are especially helpful to view the results of the constrained embedding and alignment. The notebook furthermore holds more detailed information about the integration of the electrostatic potenital.

Benchmarks

Visit the benchmarks folder to show scripts to benchmark ESP-Sim on DUD-E and D4-rescore, compare performance across different partial charges, as well as against EON.

Contact

Feel free to post questions, feedback, errors or concerns on github, or email to [email protected]

espsim's People

Contributors

cmargreitter avatar hesther avatar ljmartin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

espsim's Issues

Setting number of cores to use

Hi,

I really like the ESP-Sim package, but I would like to speed up the calculations for larger datasets. I see that ESP-Sim only uses one CPU core on my machine. Is there an easy way to set the number of cores I want to use for ESP-Sim calculations?

(Sorry if this is a noob question)

Thank you in advance!

Robin

Vectorizing `GaussianInt` speed up

Hi, thanks for this great package!

I wanted to speed it up for use in large-scale screening, so vectorized the gaussian integration step:

def GetIntegralsViaGaussians(prbCoor,
It gets about a 100x speedup, which is more obviously useful on larger molecules.

Here's a minimal-ish example, taken from the scripts dir:
setup:

## Setup:

from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole

from espsim import EmbedAlignConstrainedScore,ConstrainedEmbedMultipleConfs,GetEspSim, helpers

import numpy as np
from scipy.spatial.distance import cdist


## set up molecules:

refSmiles=['C1=CC=C(C=C1)C(C(=O)O)O','CCC(C(=O)O)O','OC(C(O)=O)c1ccc(Cl)cc1','C1=CC(=CC=C1C(C(=O)O)O)O','COc1ccc(cc1)C(O)C(O)=O','OC(C(O)=O)c1ccc(cc1)[N+]([O-])=O','CCCC(C(=O)O)O','CCC(C)C(C(=O)O)O','CC(C(=O)O)O']
prbSmile='C(C(C(=O)O)O)O'
refMols=[Chem.AddHs(Chem.MolFromSmiles(x)) for x in refSmiles]
prbMol=Chem.AddHs(Chem.MolFromSmiles(prbSmile))

patt=Chem.MolFromSmiles("[H]OC([H])(C)C(=O)O[H]",sanitize=False)
helper=Chem.AddHs(Chem.MolFromSmiles("[H]OC([H])(C)C(=O)O[H]"))

AllChem.EmbedMolecule(helper,AllChem.ETKDG()) #Embed first reference molecule, create one conformer
AllChem.UFFOptimizeMolecule(helper) #Optimize the coordinates of the conformer
core = AllChem.DeleteSubstructs(AllChem.ReplaceSidechains(helper,patt),Chem.MolFromSmiles('*')) #Create core molecule with 3D coordinates
core.UpdatePropertyCache()

# align the molecules:
simShape,simEsp=EmbedAlignConstrainedScore(prbMol,refMols,core)

vectorized integration function (should just be a drop-in for GaussInt):

## Define vectorized gaussian integration functions:
def VecGI(dist, charge1,charge2,):
    
    #These are precomputed coefficients:
    a=np.array([[ 15.90600036,   3.9534831 ,  17.61453176],
                [  3.9534831 ,   5.21580206,   1.91045387],
                [ 17.61453176,   1.91045387, 238.75820253]])
    b=np.array([[-0.02495   , -0.04539319, -0.00247124],
                [-0.04539319, -0.2513    , -0.00258662],
                [-0.00247124, -0.00258662, -0.0013    ]])
    
    a_flat = a.flatten()
    b_flat = b.flatten()
    dist = (dist**2).flatten()
    charges = (charge1[:,None]*charge2).flatten()
    
    return ((a_flat[:,None] * np.exp(dist * b_flat[:,None])).sum(0) * charges).sum()


def vecSim(refCoor, prbCoor, refCharge, prbCharge, metric):
    distPrbPrb = cdist(prbCoor,prbCoor)
    distPrbRef = cdist(prbCoor,refCoor)
    distRefRef = cdist(refCoor,refCoor)
    
    intPrbPrb= VecGI(distPrbPrb,prbCharge,prbCharge)
    intPrbRef= VecGI(distPrbRef,prbCharge,refCharge)
    intRefRef= VecGI(distRefRef,refCharge,refCharge)
    return SimilarityMetric(intPrbPrb,intRefRef,intPrbRef,metric)

test equivalence:

prbCoor = prbMol.GetConformer(0).GetPositions()
prbCharge = np.array([a.GetDoubleProp('_GasteigerCharge') for a in prbMol.GetAtoms()])

simEsp_vectorized = []

for refMol in refMols:
    refCoor = refMol.GetConformer(0).GetPositions()
    refCharge = np.array([a.GetDoubleProp('_GasteigerCharge') for a in refMol.GetAtoms()])
    
    metric = 'tanimoto'
    tanimoto_similarity = GetEspSim(prbMol, refMol, metric=metric, partialCharges='gasteiger')
    vectorized_tanimoto_similarity = vecSim(refCoor, prbCoor, refCharge, prbCharge,metric)

    print(np.isclose(tanimoto_similarity, vectorized_tanimoto_similarity), tanimoto_similarity, vectorized_tanimoto_similarity)
    

output:

True 0.6460256327061207 0.6460256327061223
True 0.6882976768462027 0.6882976768461678
True 0.6464181000546018 0.6464181000546553
True 0.4060487492735462 0.4060487492735308
True 0.30699421499052204 0.3069942149905257
True 0.23382614291885506 0.23382614291884982
True 0.6689556377160998 0.668955637716057
True 0.7182699415095307 0.718269941509474
True 0.7087896849268721 0.7087896849268774

These take ~6ms vs 100µs. The VecGI should just be a drop in replacement for GaussInt. Seems to work for Carbo and Tanimoto, so if it's of interest Im happy to submit a PR.
cheers
Lewis

Psi4 error

Hi, thank you for the great work.

I meed some problem when I try to run through the exaple notebook with partialCharges='resp'. The error is about key-values of options in helper.py. I try to modify it by removing the list in line 82. Then the charge output becomes only one value rather than a list/array in line 51 of electrostatics.py. I am not an expert of Psi4, hope these description can help you understand the issue.

Appreicate it if you can help, :D!

Best,

Chao

error when using `psi4/resp`: `Fatal Error: RHF: RHF reference is only for singlets.`

hello esther,

(I found a fix, please see my comment, but I don't know the root cause)
i've been trying out the espsim library on some molecules. the demo notebooks are very helpful and the analyses in the paper are also nicely done. great work!

i was trying to compare the ESP similarity of ATP against another known inhibitor of CDK2.
Specifically, I used the ATP co-crystal ligand from PDB ID 1B38,
(https://models.rcsb.org/v1/1b38/ligand?auth_seq_id=381&label_asym_id=C&encoding=sdf&filename=1b38_C_ATP.sdf)
and the Dinaciclib co-crystal ligand from PDB ID 5L2W
(https://models.rcsb.org/v1/5l2w/ligand?auth_seq_id=900&label_asym_id=C&encoding=sdf&filename=5l2w_C_1QK.sdf)

I am trying EmbedAlignScore() on these 2 mols, and the calculation works when I use gasteiger, mmff, ml, but not resp.
psi4 complains about this error:

RuntimeError: 
Fatal Error: RHF: RHF reference is only for singlets.
Error occurred in file: /build/source/psi4/src/psi4/libscf_solver/rhf.cc on line: 92
The most recent 5 function calls were:
psi::PsiException::PsiException(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, char const*, int)

I've also attached the detailed run log for your reference: https://gist.github.com/linminhtoo/84182da4bf727361b23905f34a429a5d

Here was how I run the code (I did rename some variables for my convenience but I didn't change any code logic)

# load co-crystal ATP
mol_atp = Chem.SDMolSupplier(str(RELATIVE / "data/cdk2_knownhits/1b38_C_ATP.sdf"), removeHs=False)[0]
mol_atp = Chem.AddHs(mol_atp, addCoords=True)

# load co-crystal dinaciclib
mol_dina = Chem.SDMolSupplier(str(RELATIVE / "data/cdk2_knownhits/5l2w_C_1QK.sdf"), removeHs=False)[0]
mol_dina = Chem.AddHs(mol_dina, addCoords=True)

# run ESPSim
shape_sims, esp_sims = EmbedAlignScore(
    probe_mol=deepcopy(mol_dina),
    ref_mols=deepcopy(mol_atp),
    probe_num_confs=10,
    ref_num_confs=10,
    partial_charge_mode="resp",
    renormalize=True,  # to [0, 1]
    getBestESP=True,  # more accurate but slower
    randomseed=2342,
)

I tried swapping probe_mol to mol_atp and ref_mols to mol_dina and psi4 did calculate the charges for one molecule before crashing again with the same error, which means it is not happy with Dinaciclib for some reason, but is fine with ATP. (assuming it calculates probe_mol first.

Do you happen to have any idea what's the issue?

Best,
Min Htoo

Trouble while installing in Google Colab

Hey Hesther,

Looking at all the appreciation and its presence in various other sources, I feel at loss not being able to use it.
Can you please help me installing it in Google Colab?

Thanks in advance
-Hemant

how to generate figure 10

hello esther,

image
figure 10 in your paper is really nice but I didn't see the code to draw that in this repo. Is that using some other software?

Best,
Min Htoo

ESP similarity formala intuition

Hi @hesther ,

Appeciate your work. I am having difficulty understaning the esp similarity formula in the end of short_demonstration.ipynb.

image

It seems to focus on the sign alignment situation among all space, then why not integral of phi_a * phi_b / (|phi_a|*|phi_b|), maybe I get it wrong. What's the intuition behind the formula.

Fails to install

The installation fails to complete, even with a clean conda installation. Problems include resp and psi4 not being available for the modern version of python aka > 3.6

When viewing aligned conformations how to select the best aligned?

From the tutorial I have been using the following code block to view the alignment of my prbMol to my refMol.
p = py3Dmol.view(width=400,height=400) dt = {} for i in range(len(prbMols)): dt[i] = [prbMols[i], refMol] interact(draw, ms=dt,p=fixed(p),confIds=fixed([6,6]));

However, there appears to be no clear way to select the confIds that have given the best simShape or simEsp score, and instead you have to manually iterate through the confIds until you see a pair that give reasonable overlay. Is there anyway to automatically retrieve the confID of the 'best scoring' conformations?

Thanks,
Noah

conda pakage

Hello, thanks for this great package!

Do you have any plans to make this package available through conda or pip?

Tharindu

DM21 integration

Hi @hesther ,

Deepmind released a functional approximation, DM21, claiming they didn't break the fractional charge and spin constraints, which seems to essential to molecule DFT. But it is built with PYSCF, will you consider integrating it in this project?

Best,

Chao

ML partial charges

Hello @hesther, thank you for this nice tool!
I would like to try ML-based charges, but it doesn't work for me:

752 if train_args.features_scaling != predict_args.features_scaling:
    753     raise ValueError('If scaling of the additional features was done during training, the '
    754                      'same must be done during prediction.')
    756 # If atom descriptors were used during training, they must be used when predicting and vice-versa

AttributeError: 'Namespace' object has no attribute 'features_scaling'

Am I missing some reqirements?
Thank you++

substructure similarity

Does espsim support this function:
3D shape and chemical similarity of two identical substructures from two molecule
Or any suggestions?

Stochastic results

Hi-

I love the work you've done on this package. I've been toying around with it and noticed that it can provide stochastic results. Would you welcome a PR that would add seeds to all of the top level functions that gets passed to the conformer generation functions? This would be similar to the param in your function ConstrainedEmbedMultipleConfs but passed on in the rest of the functions as well.

import model

Hello,

I just downloaded this package and excited to use it. However, when I run benchmark_1_partial_charges.ipynb I'm getting this warning for the ML predictions:

Computing ML charges (will print three warnings for failed predictions)
Warning: could not obtain prediction, defaulting to Gasteiger charges for one molecule
Warning: could not obtain prediction, defaulting to Gasteiger charges for one molecule
Warning: could not obtain prediction, defaulting to Gasteiger charges for one molecule

Do I need to worry about this?

Thanks and have a nice day!

New use case - comparison of docking poses

Hi,

I was interested in using this library to compare docking poses generated using a consensus docking approach. Essentially, I would like to compare the electrostatic field similarity between different poses for a single compound, to use the similarity data to cluster the poses, in a similar fashion that you would do using RMSD values. I am just wondering if it is possible to calculate the field similarity without prior alignment of the two molecules, as the docking poses geometry would need to be conserved. Also the two molecules I would be comparing would be the same compound just in different conformations.

Thanks for any insight you may have,

Tony

Package versions not included in environment.yml file

Trying to install ESPSim locally and conda is taking an age to solve the environment when using the .yml file. It is likely this is due to the .yml file not including specification of the Python or package versions required for a working install. Is it possible to upload a version which contains the dependencies and their versions?

Thanks,
Noah

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.