Giter Club home page Giter Club logo

survival-support-vector-machine's Introduction

Fast Training of Support Vector Machines for Survival Analysis

License Build Status codecov DOI

This repository contains an efficient implementation of Survival Support Vector Machines as proposed in

Pölsterl, S., Navab, N., and Katouzian, A., Fast Training of Support Vector Machines for Survival Analysis, Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2015, Porto, Portugal, Lecture Notes in Computer Science, vol. 9285, pp. 243-259 (2015)

Pölsterl, S., Navab, N., and Katouzian, A., An Efficient Training Algorithm for Kernel Survival Support Vector Machines 4th Workshop on Machine Learning in Life Sciences, 23 September 2016, Riva del Garda, Italy

‼️ This repository is not actively maintained, please use sebp/scikit-survival instead ‼️

Requirements

  • Python 3.4 or later
  • cvxpy
  • cvxopt
  • numexpr
  • numpy 1.9 or later
  • pandas 0.18
  • scikit-learn 0.17
  • scipy 0.16 or later
  • C/C++ compiler
  • ipyparallel (optional)
  • seaborn (optional)

Installation

The easiest way to get started is to install Anaconda and setup an environment.

conda install -c sebp survival-svm

Installation from source

First, create a new environment, named ssvm:

conda create -n ssvm python=3 --file requirements.txt

To work in this environment, activate it as follows:

source activate ssvm

If you are on Windows, run the above command without the source in the beginning.

Once you setup your build environment, you have to compile the C/C++ extensions and install the package by running:

python setup.py install

Alternatively, if you want to use the package without installing it, you can compile the extensions in place by running:

python setup.py build_ext --inplace

To check everything is setup correctly run the test suite by executing:

nosetests

Examples

A simple example on how to use our implementation of Survival Support Vector Machines is described in an IPython/Jupyter notebook.

A more elaborate script that can be used to reproduce the results in the paper is grid_search_parallel.py in the examples directory. When running it you need to specify the algorithm (--method) and dataset (--dataset) to use:

# Start IPython cluster to run grid search in parallel
ipcluster start &
# Run cross-validation. Results are stored in results-veteran-l2_ranking.csv
python examples/grid_search_parallel.py --dataset veteran --method l2_ranking
# Find best hyper-parameter configuration and visualize the results
python examples/plot-performance.py -o results.pdf results-veteran-l2_ranking.csv

The example above requires the Ipython and seaborn packages.

The script runs cross-validation with 200 randomly selected 50/50 splits of the dataset. This is repeated for each possible configuration of hyper-parameters (see Methods section below). Each time the following performance measures are computed:

  1. Harrell's concordance index, and
  2. root mean squared error (RMSE) with respect to uncensored records.

The output is a CSV file that contains the performance on the test set for each fold and hyper-parameter configuration. Additional options of the script are available when running the script with the --help argument.

Methods

The grid search for all methods contains 13 configurations for the regularization parameter alpha: 2i, from i = -12 to 12 in steps of 2. When using the hybrid ranking-regression loss, an additional 21 configurations for the ratio between the two losses are considered: 0.05 to 0.95 in steps of 0.05.

Method Description rank_ratio
l1 Naive implementation of Survival SVM using hinge loss. -
l2_ranking Fast implementation of Survival SVM using squared hinge loss (ranking objective only). 1.0
l2_regression Fast implementation of Survival SVM using squared loss (regression objective only). 0.0
l2_ranking_regression Fast implementation of Survival SVM using hybrid of squared hinge loss for ranking and squared loss for regression. 0.05, 0.10, …, 0.95

Datesets

The repository contains four datasets that are freely available and can be used to reproduce the results in the paper.

Dataset Description Samples Features Events Outcome
actg320_aids or actg320_death AIDS study 1,151 13 96 (8.3%) AIDS defining event or death
breast-cancer Breast cancer 198 80 62 (31.3%) Distant metastases
veteran Veteran's Lung Cancer 137 6 128 (93.4%) Death
whas500 Worcester Heart Attack Study 500 14 215 (43.0%) Death

Documentation

The source code is thoroughly documented and a HTML version of the API documentation is available at https://tum-camp.github.io/survival-support-vector-machine/

You can generate the documentation yourself using Sphinx 1.4 or later.

cd doc
PYTHONPATH="..:sphinxext" sphinx-autogen api.rst
make html
xdg-open _build/html/index.html

survival-support-vector-machine's People

Contributors

sebp avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.