Giter Club home page Giter Club logo

ringsystems's Introduction

RingSystems

Ring systems in natural products: structural diversity, physicochemical properties, and coverage by synthetic compounds

This is the code used in the cheminformatic analysis for ring systems in natural prodcuts, for more details please see the publication: Natural Product Reports, 2022, DOI: 10.1039/D2NP00001F

Requirements

Anaconda (or minicoda) and Git should be installed.
Lisence is needed to use OpenEye applications and toolkits.

git clone https://github.com/anya-chen/RingSystems  
cd RingSystems  
conda env create -n ringsys -f environment.yml  
conda activate ringsys  
pip install -e .  

If you are installing manually/using only certain part:

  • Create ringsys env with python 3.8 and RDKit
conda create -n ringsys python=3.8
conda activate ringsys
conda install -c conda-forge rdkit

conda install -c conda-forge chembl_structure_pipeline

conda install -c openeye openeye-toolkits

  • Install scikit-learn, numpy, pandas, seaborn...

Input datasets needed

  • Data_prep/get_refined_coconut.py and Data_prep/get_organism_sets.py
    coconut.sourceNP.csv: from COCONUT database MongoDB dump, version 2020-10
  • Preprocessing/preprocess_Zinc.py
    zinc20/ and zinc_catalogs/: in-stock subset and biogenic sets from ZINC 20 database
  • Preprocessing/preprocessing_approveddrug.py
    approveddrug.sdf: from DrugBank, version 5.1.8

Algorithm to get ring systems from molecules

RingSystems/RingSystemClass.py

Ring systems are defined as all atoms forming a ring, plus any proximate exocyclic atom(s) connected via any type of bond other than a single bond. Two rings sharing at least one atom (i.e. fused and spiro rings) are considered as one ring system. In order to obtain the ring systems the following algorithm was applied to each chemical structure with at least one ring:

  1. Split of molecule into individual rings (with the RDKit function ringInfo). This process results in one or more ring atom sets.
  2. If two ring atom sets share at least one atom the sets are fused.
  3. The resulting ring systems (i.e. processed ring atom sets) are extended by all atoms directly connected to the ring via any type of bond other than a single bond.
  4. All other substituents are replaced by a hydrogen atom.

Algorithm to test whether or not two molecules are identical (if there is no evidence that the molecules are not identical)

RingSystems/superpose.py

In the scenario considering stereochemistry (i.e. tetrahedral atom configuration), pairs of molecules were tested for identity according to a procedure that builds on the evidence-based approach. The procedure returns TRUE for a pair of molecules, m1 and m2, if the two molecules are identical (more accurately, if there is no evidence that the molecules are not identical):

  • If the constitution of m1 and m2 is distinct (based on their SMILES notations, with any stereochemical information removed):
    • return FALSE
  • If the constitution of m1 and m2 is identical (based on their SMILES notations, with any stereochemical information removed):
    • Generate all possible substructure matches between m1 and m2 (using the GetSubstructMatches function of RDKit; stereochemical information disregarded with useChirality=False)
    • For each substructure match:
      • For each pair of matching atoms:
        • If the configuration of exactly one atom is not specified:
          • Add the unspecified atom to unspecified_atoms (a list of atoms for which their configuration will be enumerated)
      • Enumerate all possible enantiomers based on all atoms in unspecified_atoms (this results in 2n enantiomers, where n is the number of atoms in unspecified_atoms)
      • For each enantiomer:
        • Test whether m1 and m2 can be superposed (with the HasSubstructMatch function in RDKit; this time with useChirality=True)
          • If yes:
            • return TRUE
    • return FALSE

ringsystems's People

Contributors

anya-chen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

ringsystems's Issues

Where is MoleculePreprocessorExtended?

Hi,

Your preprocessing scripts all include the line

from cheminformatics import MoleculePreprocessorExtended

Where does the cheminformatics library come from? This doesn't seem to be in the installation requirements.

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.