Giter Club home page Giter Club logo

retroviz's Introduction

RetroViz

Visual Analysis of performance of state-of-the-art models for retrosynthesis The 3 models are:

I have retrained all 3 models from scratch on a slightly cleaner version of the USPTO-50K dataset than the original authors. Therefore, the predictions & evaluation statistics I got are slightly different (slightly worse actually, because there is some overlap in train & test reactions, which I removed from the training dataset, making the test set slightly harder). Unfortunately I am unable to release the cleaned data files nor the python scripts yet as it is still a work in progress for another project. See below for the link to the prediction CSV files.

Sample plots

  1. Comparing performance across the 10 reaction types in the USPTO-50K dataset. Not much surprises here, except maybe that GLN matches up/beats RetroXpert in classes 4 & 10 in terms of top-1. For top-10 though, GLN is very strong and almost always beats RetroXpert. A key conclusion is that the superiority of a model on the whole (regardless of reaction type) is very similarly reflected within each reaction type. This is good news in that a better model will do better across all reaction types (without having to worry about class-specific performance that much). However, at the same time, this is bad news if we want to squeeze more out of an ensemble of these models, since it doesn't seem possible to say, use GLN for a specific reaction type, and RetroXpert for another reaction type, to yield significantly better overall performance. plot

  2. Also comparing across 10 reaction types, but now, binning reactions depending on whether either or both models predicted the reactants correctly up to some rank (note that rank is 0-indexed, so rank = 0 means proposer recovered the true reactants as its top-1 prediction, while rank = 1 means proposer's top-2 prediction is the ground truth). These pairwise plots were generated for each pair of proposers. plot

Data

CSV files containing top-200 proposals from each of the 3 models are uploaded to this google drive folder As the name suggests, ground truth precursors have been filtered out since we don't need them; instead, just a column 'rank_of_true_precursor' is enough to remember the performance of the proposer. This column is 0-indexed (i.e. rank = 0 means the proposer recovered the true reactants as its top-1 prediction. Rank = 9999 means out of top-200 predictions, none of them matched the ground truth).

Requirements & Setup instructions

RDKit is the main package to generate the molecular drawings. Pandas is used to manipulate prediction data from CSV files. Tested on Python 3.6

    # ensure conda is initialized first
    conda create -n retroviz python=3.6 tqdm pathlib typing pandas -y
    conda activate retroviz

    conda install -y rdkit -c rdkit

retroviz's People

Contributors

linminhtoo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.