Giter Club home page Giter Club logo

dbstep's Introduction

DBSTEP

DBSTEP

DFT-based Steric Parameters

DOI PyPI version Conda Version Build Status

Allows a user to compute steric parameters from chemical structures.

Calculate Sterimol parameters1 (L, Bmin, Bmax), %Buried Volume2, Sterimol2Vec and Vol2Vec parameters

Features

  • Compute requested steric parameters from molecular structure files with input options:
    • -s or --sterimol - Sterimol Parameters (L, Bmin, Bmax)
    • -b or --volume - Percent Buried Volume
    • -s or --sterimol AND --scan [rmin:rmax:interval] - Sterimol2Vec Parameters
    • -b or --volumeAND --scan [rmin:rmax:interval] - Vol2Vec Parameters
  • -r - Adjust radius of percent buried volume measurements (default 3.5 Angstrom)
  • Exclude atoms from steric measurement with --exclude [atom indices] option (no spaces, separated by commas)
  • Steric parameters can be computed from van der Waals radii or using a three dimensional grid (default is grid).
    • Change measurement type with --measure ['classic' or 'grid'] where classic will use vdw radii.
    • Grid point spacing can be adjusted (default spacing is 0.05 Angstrom), adjust with --grid [# in Angstrom]
  • Steric parameters can be measured from electron density .cube files generated by Gaussian (see Gaussian cubegen for information on how to generate these)
    • The --surface density command (default vdw) with a .cube input file will measure sterics from density values read in from the file.
    • Density values read from the cube file greater than a default cutoff of 0.002 determine if a molecule is occupying that point in space, this can be changed with --isoval [number]
  • --noH - exclude hydrogen atoms from steric measurements
  • --addmetals - add metals to steric measurements (traditionally metal centers are removed from steric measurements)

2-D Graph contribution features (Requires RDKit and Pandas packages to be installed):

  • Compute graph-based steric contributions in layers spanning outward from a reference functional group with the following input options:
    • --2d - Toggle 2D measurements on
    • --fg - Specify an atom or functional group to use as a reference as a SMILES string
    • --maxpath - The number of layers to measure. A connectivity matrix is used to compute the shortest path to each atom from the reference functional group.
    • --2d-type - The type of steric contributions to use. Options include Crippen molar refractivities or McGowan volume

Requirements & Dependencies

  • Python 3.6 or greater
  • Non-standard dependencies will be installed along with DBSTEP, but include numpy, numba, scipy, and cclib.

Install

  • To run as a module (python -m dbstep), download this repository and install with python setup.py install

Conda and PyPI (pip)

  • Install using conda conda install -c conda-forge dbstep
  • Or using pip pip install dbstep

Citing DBSTEP

Please reference the DOI of our Zenodo repository with:

Luchini, G.; Patterson, T.; Paton, R. S. DBSTEP: DFT Based Steric Parameters. 2022, DOI: 10.5281/zenodo.4702097

Usage

File parsing is done by the cclib module, which can parse many quantum chemistry output files along with other common chemical structure file formats (sdf, xyz, pdb). For a full list of acceptable cclib file types, see their documentation here. Additionally, if used in a Python script, DBSTEP can also read coordinate information from RDKit mol objects if three-dimensional coordinates are present along with Gaussian 16 cube files containing volumetric density information.

To execute the program:

  • Run as a command line module with: python -m dbstep file --atom1 a1idx --atom2 a2idx

  • Run in a Python program by importing: import dbstep.Dbstep as db (example below)

    import dbstep.Dbstep as db
    
    #Create DBSTEP object
    mol = db.dbstep(file,atom1=atom1,atom2=atom2,commandline=True,verbose=True,sterimol=True,measure='classic')  
    
    #Grab Sterimol Parameters
    L = mol.L
    Bmin = mol.Bmin
    Bmax = mol.Bmax

DBSTEP currently takes a coordinate file (see information on appropriate file types above) along with reference atoms and other input options for steric measurement. Sterimol parameters are measured and output to the user using the --sterimol argument, volume parameters can be requested with the --volume option.

Atoms are specified by referring to the index of an atom in a coordinate file, (ex: "2", referencing the second atom in the file, with indexing starting at 1).

For Sterimol parameters, two atoms need to be specified using the arguments --atom1 [atom1idx] and --atom2 [atom2idx]. The L parameter is measured starting from the specified atom1 coordinates, extending through the atom1-atom2 axis until the end of the molecule is reached. The Bmin and Bmax molecular width parameters are measured on the axis perpendicular to L.

For buried volume parameters, only the --atom1 [atom] argument is necessary to specify.

If no atoms are specified, the first two atoms in the file will be used as reference.

Examples

Examples for obtaining Sterimol, Sterimol2Vec, Percent Buried Volume and Vol2Vec parameter sets are shown below (all example files found in examples/ directory).

  1. Sterimol Parameters for Ethane

    Obtain the Sterimol parameters for an ethane molecule along the C2-C5 bond on the command line:

>>>python -m dbstep examples/Et.xyz  --sterimol --atom1 2 --atom2 5

     Et.xyz / Bmin:  1.98 / Bmax:  2.13 / L:  3.20
where Et.xyz looks like: 
8
ethane
H	0.00	0.00	0.00
C	0.00	0.00    -1.10
H	-1.00	0.27	-1.47
H	0.27	-1.00	-1.47
C	1.03	1.03	-1.61
H	1.03	1.03	-2.71
H	2.03	0.76	-1.25
H	0.76	2.03	-1.25

A visualization of these parameters can be shown in the program PyMOL using the two output files created by DBSTEP, showing the L parameter in blue, Bmin parameter in green and Bmax parameter in red.

Example1

  1. Sterimol2Vec Parameters for Ph

    The --scan argument is formatted as rmin:rmax:interval where rmin is the distance from the center along the L axis to start measurements, rmax dictates when to stop measurements, and interval is the frequency of measurements. In this case the length of the molecule (~6A) is measured in 1.0A intervals

>>>python -m dbstep examples/Ph.xyz --sterimol --atom1 1 --atom2 2 --scan 0.0:6.0:1.0

    Ph.xyz / R:  0.00 / Bmin:  1.65 / Bmax:  3.16 
    Ph.xyz / R:  1.00 / Bmin:  1.65 / Bmax:  3.16 
    Ph.xyz / R:  2.00 / Bmin:  1.65 / Bmax:  3.16 
    Ph.xyz / R:  3.00 / Bmin:  1.65 / Bmax:  3.16 
    Ph.xyz / R:  4.00 / Bmin:  1.65 / Bmax:  3.16 
    Ph.xyz / R:  5.00 / Bmin:  1.65 / Bmax:  3.11 
    Ph.xyz / R:  6.00 / Bmin:  1.15 / Bmax:  1.17 

    L parameter is  5.95 Ang

Displayed in PyMOL, each new Bmin and Bmax axis is added along the L axis. Example2

  1. Percent Buried Volume

    %Vb is measured by constructing a sphere (typically with a 3.5A radius) around the center atom and measuring how much of the sphere is occupied by the molecule. Output will include the sphere radius, percent buried volume (%V_Bur) and percent buried shell volume (%S_Bur) (zero in all cases unless a scan is being done simultaneously).

>>>python -m dbstep examples/1Nap.xyz --atom1 2 --volume

     R/Å     %V_Bur     %S_Bur
    3.50      41.77       0.00

For percent buried volume, the PyMOL script will overlay an appropriate sized sphere where measurement took place. Example3

  1. Vol2Vec Parameters

    When invoking the --volume and --scan parameters simultaneously, vol2vec parameters can be obtained. In this case, a scan is performed using spheres with radii from 2.0A to 4.0A in 0.5A increments.

>>>python -m dbstep examples/CHiPr2.xyz --atom1 1 --volume --scan 2.0:4.0:0.5

     R/Å     %V_Bur     %S_Bur
    2.00      58.27      49.54
    2.50      53.53      46.14
    3.00      48.78      38.11
    3.50      43.37      29.19
    4.00      36.73      16.81
  1. 2D Additive sterics

    To calculate 2d graph-based additive sterics, the arguments --2d --fg --maxpath and --2d-type can be used. An input file listing SMILES strings of desired molecule measurements is necessary for calculation. The --fg argument specifies a SMILES string that is common in all provided SMILES inputs to use as a reference point for layer 0. A connectivity matrix will then be used to find atoms 1, 2, 3... N bonds away where N is the max path length specified with the --maxpath argument. One of two types of measurements will be summed at each layer, either Crippen molar refractivities or McGowan volumes, computed for each atom. This can be changed with the --2d-type argument.

>>>python -m dbstep examples/smiles.txt --2d --fg "C(O)=O" --maxpath 5 --2d-type mcgowan
where smiles.txt looks like: 
CC(O)=O
CCC(O)=O 
CCCC(O)=O 
CCCCC(O)=O
CC(C)C(O)=O
CCC(C)C(O)=O
The output will then be written to the file "smiles_2d_output.csv" in the format: 
0_mcgowan 1_mcgowan 2_mcgowan 3_mcgowan 4_mcgowan Structure
4.55 11.68 0 0 0 CC(O)=O
4.55 8.21 11.68 0 0 CCC(O)=O
4.55 8.21 8.21 11.68 0 CCCC(O)=O
4.55 8.21 8.21 8.21 11.68 CCCCC(O)=O
4.55 4.74 23.36 0 0 CC(C)C(O)=O
4.55 4.74 19.89 11.68 0 CCC(C)C(O)=O

Acknowledgements

This work is developed by Guilian Luchini, Toby Patterson and Robert Paton and is supported by the NSF Center for Computer-Assisted Synthesis, grant number CHE-1925607

References

  1. Verloop, A., Drug Design. Ariens, E. J., Ed. Academic Press: New York, 1976; Vol. III
  2. Hillier, A. C.; Sommer, W. J.; Yong, B. S.; Petersen, J. L.; Cavallo, L.; Nolan, S. P. Organometallics 2003, 22, 4322-4326.

dbstep's People

Contributors

bobbypaton avatar luchini18 avatar tooooby avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

dbstep's Issues

Free space of unoccupied grid once it is done being used

This can be done by making sure there is no longer a reference to it. Currently, once it is done being used, the reference is not set to None, so it just continues to unnecessarily exist in memory. The variable name in dbstep.py is self.unocc_grid.

Only generate points for atoms of interest in volume calculations

For %Vbur calculations, memory use and speed could be improved by only generating points for atoms which intersect with the probe sphere. This should be determinable by a simple calculation: if the distance of their center points is less than the sum of their radii.

This should be determined before fitting the grid, and should generally allow for much smaller grids to be required (depending on the radii you are probing).

Additions to README

Guilian can you update the sections for which I've added headers to the README? Also would be nice to show some graphics of sterimol and vectorised sterimol from pymol as well!

TypeError: Unexpected keyword argument {'n_jobs': -1}

When I try to use the python API to find the buried volume for a given molecule, I get the error TypeError: Unexpected keyword argument {'n_jobs': -1}. Below is a sample code to show how I get this error.
To install dbstep, I used conda install -c conda-forge dbstep and it installed version 1.0.0.

import urllib.request
import dbstep.Dbstep as db
url = "https://raw.githubusercontent.com/patonlab/DBSTEP/master/dbstep/examples/1Nap.xyz"
filename, headers = urllib.request.urlretrieve(url, filename="1Nap.xyz")
db.dbstep("1Nap.xyz", atom1=2, volume=True).bur_vol

Bug in calculating the 2D descriptors

When excute the following command line:
python -m dbstep examples/smiles.txt --2d --fg "C(O)=O" --maxpath 5 --2d-type mcgowan
the error appears:
Traceback (most recent call last):
File "C:\Users\DELL\anaconda3\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\DELL\anaconda3\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\DELL\anaconda3\lib\site-packages\dbstep_main
.py", line 17, in
sys.exit(Dbstep.main())
File "C:\Users\DELL\anaconda3\lib\site-packages\dbstep\Dbstep.py", line 531, in main
vec_df = graph.mol_to_vec(file,options.shared_fg,options.voltype,options.max_path_length,options.verbose)
File "C:\Users\DELL\anaconda3\lib\site-packages\dbstep\graph.py", line 106, in mol_to_vec
mol,prop = make_mol_obj(line)
File "C:\Users\DELL\anaconda3\lib\site-packages\dbstep\graph.py", line 75, in make_mol_obj
return mol,prop
UnboundLocalError: local variable 'prop' referenced before assignment

DBSTEP PyPi/conda not updated

Just wanted to enquire about a potential update to the DBSTEP PyPi and conda. The versions listed are 1.0 and still use the old scipy variable “n_jobs” instead of “workers”, unlike the latest version on github (DBSTEP 1.1.0).
Many thanks and apologies if you are already aware of this.

Improve duplicate handling in sterics.occupied

Currently, we convert the list of indices to a set and then back to a list to remove duplicates. This does not scale very well compared to a simple alternative: make a bit mask (each bit marking whether a point is occupied or not) and, as you are scanning out the sphere's, set the needed indices to true. This means there will be no duplicates as each value is restricted to true or false. The overall grid can then be indexed using this mask to make the occupied grid.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.