The machine-learning-models-predict-inhibitor-dengue-virus-2 from small-samori

Machine Learning Models Predict Inhibitiorys of Dengue Virus 2

This repository is the code implementation of the paper: Machine Learning Models Predict Inhibitiorys of Dengue Virus 2. Random Forest came out as the best performing(accuracy = 0.94, precision = 0.94, recall = 0.94, F1-score = 0.94). It had an AUC of 0.8 and Mathew Correlation Coefficient (MCC) of 0.61.

The methodology employed in the the figure is summerized in the figure below:

The dataset utillized in this project is a dataset of compound screened against the Dengue Virus 2 (DENV2). It can be found here. The script for each step of the project is saved in separate folders. To reproduce this work rerun the .py files in each folder (except the Molecular Descriptors step that require running PaDEL) as follows:

Download the SMILES structure dataset (as a .txt file) and compounds datatable from PubChem.
Copy the two files in to the 1. Data Preparation folder and run the scripts in the order they are numbered. Make sure the filenames match those in the scripts.
Copy the smile_activity_data.csv csv file that will be generated into the 2. Molecular Descriptors folder
Run the csv_smi_format_conversion.py script to the copied file from a csv file to a smi file. This will generate three smi files: actives.smi, inactives_1.smi and inactives_2.smi
Download PaDEL Descriptors and use it to compute the molecular descriptors.
Copy the generated descriptor files into the 3. Train test split/Imputation folder.
Within the Imputation folder, run combine_inactives.py to combine the inactives into one file and then the impute.py script to impute all NaNs
Copy the generated files into the 3. Train test split/Dataset folder and run the train_test_split_df.py script. NOTE: Edit filename and paths where necessary.
Copy the generated training and testing data into the 4. SMOTE analysis/Dataset folder and run the SMOTE analysis.py to perform SMOTE.
Copy the generated files into the 5. EDA/Dataset folder and run the PCA.py script
Copy the data from step 10 into the 6. Data Preprocessing/Dataset and run the mannwhitney_test_feature_selection.py to perform feautre selection.
Copy the reduced dataset into the 7. Build Models/Dataset folder and run the models.py script to train and evaulate the models.

To run just the training and evaluation of the models, run the models.py script in the 7. Build Models folder. The performance of the model will be saved as text files the Evaluation folder. The path of the datasets in the models.py script can be changed to train and/or test the algorithms on a new dataset. The trained models are saved in the 7. Build Models/Models folder

Credit

Data Professor YouTube Channel

small-samori / machine-learning-models-predict-inhibitor-dengue-virus-2 Goto Github PK

machine-learning-models-predict-inhibitor-dengue-virus-2's Introduction

machine-learning-models-predict-inhibitor-dengue-virus-2's People

Contributors

Watchers

machine-learning-models-predict-inhibitor-dengue-virus-2's Issues

regarding the text file

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent