Giter Club home page Giter Club logo

idg_dream's Introduction

idg_dream

IDG Dream challenge

Working environment

You can install the conda environment by running :

conda env create -n idg_dream -f environment.yml

Then activate it :

conda activate idg_dream

Depending if you have cuda enabled or not :

pip install -r cpu_requirements.txt

or

pip install -r cuda_9_requirements.txt

if you have another cuda version, check the pytorch documentation.

Additional data

In order to use machine learning algorithms we will need additional information about the compounds and proteins.

  • For compound information I decided to use CHEMBL as it is the provided compound_id in the dataset.
  • For protein information I used Uniprot for the same reason

For both sources of information I decided to download the databases and to restore them to a local postgres database dedicated to the project.

To start the postgres container, run :

IDG_DREAM_DB_PORT=5432 docker-compose -f idg_dream_db/docker-compose.yml up -d

Note that IDG_DREAM_DB_PORT may be any port you like.

Then to restore the data you will need to run through the following procedure in order.

Restore CHEMBL

You can download the postgres dump from :

download chembl database

Extract the files and upload the content of the previous download to the database by running :

cat PATH_TO_CHEMBL_DUMP | docker exec -i idg-dream-db pg_restore -O --username=idg_dream -d idg_dream

This takes some time.

Restore UNIPROT

For now we will only use the protein sequence thus requiring only the fasta dump. You can get it from :

download uniprot

Extract the file and run :

PYTHONPATH=PATH_TO_PROJECT:$PYTHONPATH python bin/import_uniprot.py FASTA_PATH

Again, the port and host may be specified as options.

This can take some time too.

Create training set

The data used to create the training set is issued from the DTC website :

download dtc

In order to understand the details of the process of the training set creation you can look at the ipython notebook data_analysis.ipynb.

To create the table containing the training set, you can use the following script :

PYTHONPATH=PATH_TO_PROJECT:$PYTHONPATH python bin/create_training_set.py DTC_DATA_PATH

Again you can provide a port and host in the database is on a remote server.

Tests

In order to run tests, you will need a postgres image running, you can use the given docker-compose file :

docker-compose -f idg_dream_db/docker-compose.test.yml up -d

and run :

python -m unittest discover tests/

idg_dream's People

Contributors

olivierlabayle avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.