Giter Club home page Giter Club logo

pythiachem's Introduction

PythiaCHEM (PYthon Toolkit for macHIne leArning in CHEMistry)

Pythia

A modular toolkit, implemented in Python and organized in Jupyter Notebooks. It employs fingerprints Mordred descriptors and precalculated QM descriptors as input features for shallow learners and ensemble models for regression and classification tasks.

Paper: https://chemrxiv.org/engage/chemrxiv/article-details/6504251fb338ec988a83e057

Supported functionality

Tasks:

  • Classification
  • Regression

Data types

  • SMILES strings
  • Precalculated descriptors

Modules:

  • classification metrics: calculation of confusion matrix, accuracy, g-mean, precision, recall, generalized f, MCC, AUC
  • fingerprints generation: generation of Morgan, rdkit, atom pair, torsion fingerprints and MACCS keys with rd
  • molecules and structures: SMILES to molecules and images with rdkit
  • plots: plot of parity plots, ROC curves, confusion matrix with matplotlib
  • scaling: z, min-max, logarithmic scaling
  • workflow functions: correlation tests, training for regression and classification, ensemble learning with sklearn

We are working on populating this package with more models and other building blocks.

Notebooks:

Please refer to 'Example-enantioselective-Strecker-synthesis' for the full desmontration of all notebooks.

  • data analysis: data exploration, visualization, scaling
  • regression-fingerprints: regression with fingerprints, data set split, ensemble models
  • regression-Mordred: regression with Mordred, feature elimination techniques, data set split
  • regression-DFT: regression with DFT descriptors, PCA, data set split
  • classification-fingerprints: classification with fingerprints, feature exploration, synthetic data, data set split
  • classification-Mordred: classification with Mordred, feature elimination and exploration, synthetic data, data set split
  • classification-DFT: classification with DFT descriptors, synthetic data, data set split, interpretability

Please mix and match Notebook cells and Modules. The world is your oyster, the sky is the limit. Use the .csv files to run the Notebooks and use the comments to assist you.

Installation

From environment.yml file:

git clone https://github.com/duartegroup/PythiaChem.git
cd PythiaChem
conda env create -f environment.yml
conda activate pythiachem
conda install -c anaconda ipykernel -y
python -m ipykernel install --user --name=pythiachem
pip install -e .

If installation with environment.yml fails, you can install manually with the following steps:

conda create -n pythiachem -y
conda activate pythiachem
pip install rdkit 'mordred[full]' mlxtend imbalanced-learn scikit-learn scikit-plot seaborn notebook matplotlib matplotlib_venn
git clone https://github.com/duartegroup/PythiaChem.git
cd PythiaChem
pip install -e .
conda install -c anaconda ipykernel -y
python -m ipykernel install --user --name=pythiachem

pythiachem's People

Contributors

allybo avatar matzav avatar

Stargazers

Ron Cvek avatar James avatar Michael Corrado avatar Jin Xiao avatar  avatar sshy avatar  avatar  avatar  avatar  avatar

Watchers

Fernanda Duarte avatar sshy avatar

Forkers

gaybro8777

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.