kerepesi / aging_ml Goto Github PK
View Code? Open in Web Editor NEWPrediction and characterization of human ageing-related proteins by using machine learning. Related publication: https://doi.org/10.1038/s41598-018-22240-w
Prediction and characterization of human ageing-related proteins by using machine learning. Related publication: https://doi.org/10.1038/s41598-018-22240-w
******************************************************************************************* Feature tables and main scripts of the paper "Prediction and characterization of human ageing-related proteins by using machine learning" ******************************************************************************************** Citation of the paper: Kerepesi C, Daróczy B, Sturm Á, Vellai T, Benczúr A Prediction and characterization of human ageing-related proteins by using machine learning Scientific Reports, Vol 8, 4094 (2018). Requirements (versions used by us in parenthesis): - Linux (Ubuntu 16.04.3 LTS) - Python 3.5.2 - Python packages: pandas (0.20.3), numpy (1.13.1), sklearn (0.19.0) Running commands: - Running 20 experiments of XGBoost with 5 fold CV (predictions are averaged, parameters: max_depth=1, n_est=50, n_exp=20) $ python XGBoost_CV.py 1 50 20 Final_32_features.csv - Dumping trees of the XGBoost model (max_d=1, n_est=50, inputfile=Final_feature_table.csv): $ python TreeDumper.py 1 50 Final_32_features.csv Description of the files (for more information please see Methods of the paper): - aging_labels.csv: Labels of the classification ('1' if the given protein included in GenAge database, 0 otherwise) - uniprot_sprot_human.dat-GO_Digger_Sparse.py.txt-TableGen.py.csv.zip: Gene Ontology features with ancestors and with ageing GOs. - uniprot_sprot_human.dat-GO_Digger_Sparse.py.txt-TableGen.py.csv-woAgingGOs.csv.zip Gene Ontology features with ancestors and without ageing GOs ( called 'GO' in Table 4 of the paper). - uniprot_sprot_human.dat-GO_Digger_Sparse.py.txt-TableGen.py.csv-woAgingGOs.csv-f_sel_based_on_GO_stats.py-thr1000.csv.zip Feature set containing only the GO features that occur in least 1000 proteins (called 'Frequent GOs' in Table 5 of the paper). - uniprot_sprot_human.dat-InteractionDigger.py.txt-CreateNetworkFT_paired.py.csv-join_CytoscapeStats.csv: PPI Network features. - RNA_CoEx_vs_GenAge_FT.csv: Co-expression features. - Final_32_features.csv: Table of the final 32 features (selected by XGBoost). - Final_32_features.csv-XGBoost_CV_preds-n_est50-exp20.csv: Output file of the command 'python XGBoost_CV.py 1 50 20 Final_32_features.csv'. - Final_32_features.csv_Trees-n_est50-max_d1.txt: Output file of the command 'python TreeDumper.py 1 50 Final_32_features.csv'.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.