Giter Club home page Giter Club logo

aging_ml's Introduction

*******************************************************************************************
Feature tables and main scripts of the paper 
"Prediction and characterization of human ageing-related proteins by using machine learning"
********************************************************************************************

Citation of the paper:

    Kerepesi C, Daróczy B, Sturm Á, Vellai T, Benczúr A
    Prediction and characterization of human ageing-related proteins by using machine learning
    Scientific Reports, Vol 8, 4094 (2018).

Requirements (versions used by us in parenthesis): 

   - Linux (Ubuntu 16.04.3 LTS) 
   - Python 3.5.2 
   - Python packages: pandas (0.20.3), numpy (1.13.1), sklearn (0.19.0)

Running commands: 

  - Running 20 experiments of XGBoost with 5 fold CV (predictions are averaged, parameters: max_depth=1, n_est=50, n_exp=20)
    $ python XGBoost_CV.py 1 50 20 Final_32_features.csv

  - Dumping trees of the XGBoost model (max_d=1, n_est=50, inputfile=Final_feature_table.csv):
    $ python TreeDumper.py 1 50 Final_32_features.csv

Description of the files (for more information please see Methods of the paper):
   
  - aging_labels.csv:
    Labels of the classification ('1' if the given protein included in GenAge database, 0 otherwise)
    
  - uniprot_sprot_human.dat-GO_Digger_Sparse.py.txt-TableGen.py.csv.zip: 
    Gene Ontology features with ancestors and with ageing GOs. 
    
  - uniprot_sprot_human.dat-GO_Digger_Sparse.py.txt-TableGen.py.csv-woAgingGOs.csv.zip
    Gene Ontology features with ancestors and without ageing GOs ( called 'GO' in Table 4 of the paper). 
    
  - uniprot_sprot_human.dat-GO_Digger_Sparse.py.txt-TableGen.py.csv-woAgingGOs.csv-f_sel_based_on_GO_stats.py-thr1000.csv.zip
    Feature set containing only the GO features that occur in least 1000 proteins (called 'Frequent GOs' in Table 5 of the paper).
    
  - uniprot_sprot_human.dat-InteractionDigger.py.txt-CreateNetworkFT_paired.py.csv-join_CytoscapeStats.csv:
    PPI Network features. 
   
  - RNA_CoEx_vs_GenAge_FT.csv: 
    Co-expression features.
    
  - Final_32_features.csv: 
    Table of the final 32 features (selected by XGBoost).
    
  - Final_32_features.csv-XGBoost_CV_preds-n_est50-exp20.csv: 
    Output file of the command 'python XGBoost_CV.py 1 50 20 Final_32_features.csv'.

  - Final_32_features.csv_Trees-n_est50-max_d1.txt:
    Output file of the command 'python TreeDumper.py 1 50 Final_32_features.csv'.
    
    


aging_ml's People

Contributors

kerepesi avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.