Giter Club home page Giter Club logo

chem-usecase's Introduction

chem-usecase

Chemical Similarity Usecase

Background

Chemical Similarity usecase uses Tanimoto algorithm to calculate % similarity between 1 molecule with others. We need to scan the whole data set to filter closest molecules, which inputs are molecule and threshold. For example, given a molecule with SMILE as "IC=C1/CCC(C(=O)O1)c2cccc3ccccc23" and threshold = 90%, returns set of molecules that have similarity >= 90%

Requirements

Import dataset to pilosa

Export data from sdf or mysql to csv file, then import csv to pilosa. Since chembl_id in SD file always come with CHEMBL, e.g CHEMBL6329, which pilosa hasn’t support string id yet, we will remove CHEMBL and import chembl_id with integer after that. If you export data from mysql, it uses molregno in compound_structures as unique integer ID for molecule, so you just need to import molregno as chembl_id

  • Export to csv from sdf file: python import_from_sdf -p ~/Downloads/chembl_22.sdf -file id_fingerprint.csv

  • Export to csv from mysql: python import_from_mysql -d chembl_22 -u root -file id_fingerprint.csv

  • Import from csv file to pilosa (mol and inverse-mol are default database, mole.n is default frame)

    • Open Pilosa server:

       pilosa server
      
    • Create mole database

       curl localhost:10101/index/mole -X POST -d '{"options": {"columnLabel": "position_id"}}'
    • Create frame fingerprint

        curl localhost:10101/index/mole/frame/fingerprint -X POST -d '{"options": {"rowLabel": "chembl_id", "inverseEnabled": true, "cacheSize": 2000000}}'
    • Run import script

       pilosa import -d mole -f fingerprint id_fingerprint.csv

Queries

  • Retrieve molecule_ids that have similarity >= 90%. (Default db="mole", frame="fingerprint", hosts=127.0.0.1:10101)

    python similar.py -s "I\C=C/1\CCC(C(=O)O1)c2cccc3ccccc23" -t 90
    
  • Retrive molecule_id from a smile

    python get_mol_fr_smile.py -s "I\C=C/1\CCC(C(=O)O1)c2cccc3ccccc23"
    
  • Benchmark running for thresholds = [50, 70, 75, 80, 85, 90]

    python benchmarks -id 24
    

License

chem-usecase by Linh Vo

To the extent possible under law, the person who associated CC0 with pilosa chem-usecase has waived all copyright and related or neighboring rights to chem-usecase.

You should have received a copy of the CC0 legalcode along with this work. If not, see http://creativecommons.org/publicdomain/zero/1.0/.

chem-usecase's People

Contributors

linhvo avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

pombredanne

chem-usecase's Issues

Remove dependency on inverse frames

The inverse frames feature is being deprecated, so this use case should not depend on it. If the use case requires inverse frames, they could be implemented externally. In that case we may need to rethink the use case more broadly, as inverse frames won't scale well.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.