Giter Club home page Giter Club logo

chembo's Introduction

ChemBO

Authors: Ksenia Korovina ([email protected]), Sailun (Celsius) Xu ([email protected]), Kirthevasan Kandasamy ([email protected])

ChemBO is library for joint molecular optimization and synthesis. It is based on Dragonfly - a framework for scalable Bayesian optimization.

Structure of the repo

  • experiments package contains experiment scripts. In particular, run_chemist.py script illustrates usage of the classes.
  • chemist_opt package isolates the Chemist class which performs joint optimization and synthesis. Contains harnesses for calling molecular functions (MolFunctionCaller) and handling optimization over molecular domains (MolDomain). Calls for mols and explore.
  • explorer implements the exploration of molecular domain. Currently, a RandomExplorer is implemented, which explores reactions randoml, starting from a given pool. Calls for synth.
  • mols contains the Molecule class, the Reaction class, a few examples of objective function definitions, as well as implementations of molecular versions of all components needed for BO to work: MolCPGP and MolCPGPFitter class and molecular kernels.
  • synth is responsible for performing forward synthesis.
  • rdkit_contrib is an extension to rdkit that provides computation of a few molecular scores (for older versions of rdkit).
  • baselines contains wrappers for models we compare against.

Current work

In the coming few weeks, we will try to clean up, refactor and further comment the code.

Getting started

It's recommended to use python3.

Python packages

First, set up environment for RDKit and Dragonfly:

conda create -c rdkit -n chemist-env rdkit python=3.6
# optionally: export PATH="/opt/miniconda3/bin:$PATH"
conda activate chemist-env  # or source activate chemist-env with older conda

Install basic requirements with pip:

pip install -r requirements.txt

Kernel-related packages

Certain functionality (some of the graph-based kernels) require the graphkernels package, which can be installed additionally. First, you need to install eigen3, pkg-config: see instructions here:

sudo apt-get install libeigen3-dev; sudo apt-get install pkg-config  # on Linux
brew install eigen; brew install pkg-config  # on MacOS
pip install graphkernels

If the above fails on MacOS (see stackoverflow), the simplest solution is

MACOSX_DEPLOYMENT_TARGET=10.9 pip install graphkernels

To use distance-based kernels, you need Cython and OT distance computers:

pip install Cython
pip install cython POT  # prepended with MACOSX_DEPLOYMENT_TARGET=10.9 if needed

Synthesis Path Plotting Functionality For plotting the synthesis path for an optimal molecule, install graphviz via:

pip install graphviz

However, the above only works on Linux as Homebrew removed the --with-pango option (see this)

Environment

Set PYTHONPATH for imports:

source setup.sh 

Getting data

ChEMBL data as txt can be found in kevinid's repo, official downloads. ZINC database can be downloaded from the official site. Run the following to automatically download the datasets and put them into the right directory:

bash download_data.sh

Running tests

TODO

Running experiments

See experiments/run_chemist.py for the Chemist usage example.

Citation

If you found this code helpful, please consider citing this manuscript:

@misc{korovina2019chembo,
    title={ChemBO: Bayesian Optimization of Small Organic Molecules with Synthesizable Recommendations},
    author={Ksenia Korovina and Sailun Xu and Kirthevasan Kandasamy and Willie Neiswanger and Barnabas Poczos and Jeff Schneider and Eric P. Xing},
    year={2019},
    eprint={1908.01425},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

chembo's People

Contributors

celsius38 avatar dependabot[bot] avatar kirthevasank avatar ks-korovina avatar stefanpricopie avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.