Python functions and command line utilities for working with standard pilot1 data sets.
git clone https://github.com/levinas/p1h
pip install -U pandas scikit-learn xgboost
Scripts exist for dataframe export and prediciton tasks.
Save molecular features and dose reponse data for given drugs to CSV files:
$ python dataframe.py --by drug --drugs 100071 --feature_subsample 10
NSC 100071: saved 52 rows and 11 columns to NSC_100071.csv
Save drug features and dose response data for given cell lines to CSV files:
$ python dataframe.py --by cell --cells BR:MCF7 CNS:SF_268
BR:MCF7: saved 15628 rows and 3811 columns to BR:MCF7.csv
CNS:SF_268: saved 28151 rows and 3811 columns to CNS:SF_268.csv
Run three regression models on drug set A (defined by Jason to include 306 drugs) using all types of cell line features (expression, miRNA and proteome), and save feature importance and model performance evaluated on various metrics to files.
$ python by_drug.py --drugs A --models randomforest lasso elasticnet --cell_features all
from datasets import NCI60
from skwrapper import regress
df = NCI60.load_by_drug_data(drug='100071')
regress('XGBoost', df)
regress('Lasso', df)
from datasets import NCI60
from skwrapper import regress
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(n_estimators=20)
cells = NCI60.all_cells()
for cell in cells:
df = NCI60.load_by_cell_data(cell)
regress(model, df, cv=3)