Giter Club home page Giter Club logo

smartclass's Introduction

Smartclass

Smartclass classifies structures using SMARTS.

๐Ÿš€ Installation

poetry install

๐Ÿ’ช Getting Started

A mini notebook is available.

poetry run quarto render notebooks/smartclass.qmd

๐ŸŒŸ Main steps

The main steps are briefly shown below.

Get defined chemical classes

poetry run smartclass querywikidata -q src/smartclass/data/queries/classes_smarts.rq -o scratch/wikidata_classes_smarts.tsv

Get some SMILES to classify

poetry run smartclass querywikidata -q src/smartclass/data/queries/chemicals_smiles_canonical.rq -o scratch/wikidata_chemicals_smiles.tsv
poetry run smartclass querywikidata -q src/smartclass/data/queries/classes_fake_smiles.rq -o scratch/wikidata_classes_smiles.tsv

Classify compounds

poetry run smartclass searchclasses --help
# poetry run smartclass searchclasses
poetry run smartclass searchclasses -s "O=C1OC2CC3C(C=C(OC)C(=O)C3(C)C4C(=O)C(OC)=C(C)C(C1)C24C)C" -c scratch/wikidata_classes_smarts.tsv --verbose
poetry run smartclass searchclasses -s "O=C1OC2CC3C(C=C(OC)C(=O)C3(C)C4C(=O)C(OC)=C(C)C(C1)C24C)C" -c scratch/wikidata_classes_smarts.tsv --closest-only False --verbose
poetry run smartclass searchclasses -i src/smartclass/data/bitter_smiles.tsv -c scratch/wikidata_classes_smarts.tsv
# poetry run smartclass searchclasses -i scratch/wikidata_chemicals_smiles.tsv -c scratch/wikidata_classes_smarts.tsv --closest-only False
# TODO Improve classes taxonomy
# poetry run smartclass searchclasses -i scratch/wikidata_classes_smiles.tsv -c scratch/wikidata_classes_smarts.tsv --closest-only False

๐Ÿคฏ Future steps (not fully available now)

Measure substructures distances

poetry run python3 smartclass/measure_mhfp.py -i scratch/wikidata_classes_smarts.tsv -o scratch/distances_wikidata.tsv
poetry run python3 smartclass/measure_mhfp.py -i data/mia_smarts.tsv -o scratch/distances_mia.tsv

๐Ÿ› ๏ธ Wikidata maintenance

Generic maintenance

poetry run smartclass querywikidata -q src/smartclass/data/queries/maintenance_smiles_canonical_no_formula.rq -t transform_smiles_to_formula -o scratch/formulas_canonical.csv
poetry run smartclass querywikidata -q src/smartclass/data/queries/maintenance_no_smiles_isomeric_no_mass.rq -t transform_inchi_to_mass -o scratch/masses_inchi.csv
poetry run smartclass querywikidata -q src/smartclass/data/queries/maintenance_smiles_isomeric_no_mass.rq -t transform_smiles_to_mass -o scratch/masses_isomeric.csv
poetry run smartclass querywikidata -q src/smartclass/data/queries/maintenance_smiles_isomeric_no_inchi.rq -t transform_smiles_to_inchi -o scratch/inchis_isomeric.csv
poetry run smartclass querywikidata -q src/smartclass/data/queries/maintenance_isomeric_smiles_inchi_no_stereo.rq -t transform_smiles_to_inchi -o scratch/incorrect_inchis.csv
poetry run smartclass querywikidata -q src/smartclass/data/queries/maintenance_smiles_isomeric_no_formula.rq -t transform_smiles_to_formula -o scratch/formulas_isomeric.csv
poetry run smartclass querywikidata -q src/smartclass/data/queries/maintenance_smiles_isomeric_no_canonical.rq -t transform_smiles_i_to_smiles_c -o scratch/smiles_isomeric.csv
poetry run smartclass querywikidata -q src/smartclass/data/queries/maintenance_no_smiles_isomeric_inchi.rq -t transform_inchi_to_smiles_isomeric -o scratch/smiles_i_inchi.csv
poetry run smartclass querywikidata -q src/smartclass/data/queries/maintenance_inchi_no_isomeric_smiles.rq -t transform_inchi_to_smiles_isomeric -o scratch/smiles_i_inchi_2.csv
poetry run smartclass querywikidata -q src/smartclass/data/queries/maintenance_inchi_no_canonical_smiles.rq -t transform_inchi_to_smiles_canonical -o scratch/smiles_c_inchi.csv
poetry run smartclass querywikidata -q src/smartclass/data/queries/maintenance_inchi_no_inchikey.rq -t transform_inchi_to_inchikey -o scratch/inchikeys.csv
poetry run smartclass querywikidata -q src/smartclass/data/queries/maintenance_formula_malformed.rq -t transform_formula_to_formula -o scratch/formulas_malformed.csv
poetry run smartclass querywikidata -q src/smartclass/data/queries/maintenance_smiles_isomeric_incorrect_mass.rq -t transform_smiles_mass_to_smiles_mass -o scratch/masses_incorrect_isomeric.csv
poetry run smartclass querywikidata -q src/smartclass/data/queries/maintenance_no_smiles_isomeric_incorrect_mass.rq -t transform_inchi_mass_to_inchi_mass -o scratch/masses_incorrect_inchi.csv

poetry run smartclass querywikidata -q src/smartclass/data/queries/chemicals_smiles_canonical_no_ref.rq -t transform_smiles_c_to_smiles_c_tauto -o scratch/smiles_c_tauto.csv
poetry run smartclass querywikidata -q src/smartclass/data/queries/chemicals_smiles_isomeric_no_ref.rq -t transform_smiles_i_to_smiles_i -o scratch/smiles_i.csv
poetry run smartclass querywikidata -q src/smartclass/data/queries/chemicals_smiles_isomeric_no_ref.rq -t transform_smiles_i_to_smiles_i_tauto -o scratch/smiles_i_tauto.csv
# not working for now
# poetry run smartclass querywikidata -q src/smartclass/data/queries/stereoisomers_smiles_isomeric_inchi.rq -t transform_stereoisomers_to_entities -o scratch/stereo_i_to_entities.csv
# poetry run smartclass querywikidata -q src/smartclass/data/queries/stereoisomers_smiles_canonical_no_isomeric.rq -t transform_stereoisomers_to_entities -o scratch/stereo_c_to_entities.csv
# poetry run smartclass querywikidata -q src/smartclass/data/queries/chemical_entities_smiles_isomeric_inchi.rq -t transform_entities_to_stereoisomers -o scratch/entities_i_to_stereo.csv
# poetry run smartclass querywikidata -q src/smartclass/data/queries/chemical_entities_smiles_canonical_no_isomeric.rq -t transform_entities_to_stereoisomers -o scratch/entities_c_to_stereo.csv

Improve current classes

poetry run smartclass querywikidata -q src/smartclass/data/queries/chemicals_classes.rq -o scratch/wikidata_chemicals_classes.tsv
poetry run smartclass querywikidata -q src/smartclass/data/queries/chemical_entities_smiles_isomeric_inchi.rq -o scratch/wikidata_chemical_entities_smiles_inchi.tsv
poetry run smartclass querywikidata -q src/smartclass/data/queries/stereoisomers_smiles_isomeric_inchi.rq -o scratch/wikidata_stereoisomers_smiles_isomeric_inchi.tsv
poetry run smartclass querywikidata -q src/smartclass/data/queries/stereoisomers_smiles_canonical_no_isomeric_inchi.rq -o scratch/wikidata_stereoisomers_smiles_canonical_inchi.tsv
poetry run smartclass querywikidata -q src/smartclass/data/queries/chemicals_tautomer_of.rq -o scratch/wikidata_chemicals_tautomer_of.tsv
poetry run python3 notebooks/improve_classes.py
poetry run python3 src/smartclass/helpers/split_csv.py

Add stereoisomers pairs

poetry run smartclass querywikidata -q src/smartclass/data/queries/chemical_entities_inchi.rq -o scratch/wikidata_chemical_entities_inchis.tsv
poetry run smartclass querywikidata -q src/smartclass/data/queries/chemicals_stereosiomer_of.rq -o scratch/wikidata_chemicals_stereoisomer_of.tsv
poetry run python3 notebooks/pair_stereoisomers.py

๐Ÿ–ฅ Command Line Interface

The smartclass command line tool is automatically installed. It can be used from the shell with the --help flag to show all subcommands:

poetry run smartclass --help

๐Ÿ‘ Contributing

Contributions, whether filing an issue, making a pull request, or forking, are appreciated. See CONTRIBUTING.md for more information on getting involved.

๐Ÿ‘‹ Attribution

A lot of inspiration and initial data has been taken by the huge work done by @rwst with YACCL.

โš–๏ธ License

See LICENSE

๐Ÿช Cookiecutter

This package was created with @audreyfeldroy's cookiecutter package using @cthoyt's cookiecutter-snekpack template.

smartclass's People

Contributors

adafede avatar dependabot[bot] avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.