Fast Training of Support Vector Machines for Survival Analysis

This repository contains an efficient implementation of Survival Support Vector Machines as proposed in

Pölsterl, S., Navab, N., and Katouzian, A., Fast Training of Support Vector Machines for Survival Analysis, Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2015, Porto, Portugal, Lecture Notes in Computer Science, vol. 9285, pp. 243-259 (2015)

Pölsterl, S., Navab, N., and Katouzian, A., An Efficient Training Algorithm for Kernel Survival Support Vector Machines 4th Workshop on Machine Learning in Life Sciences, 23 September 2016, Riva del Garda, Italy

‼️ This repository is not actively maintained, please use sebp/scikit-survival instead ‼️

Requirements

Python 3.4 or later
cvxpy
cvxopt
numexpr
numpy 1.9 or later
pandas 0.18
scikit-learn 0.17
scipy 0.16 or later
C/C++ compiler
ipyparallel (optional)
seaborn (optional)

Installation

The easiest way to get started is to install Anaconda and setup an environment.

conda install -c sebp survival-svm

Installation from source

First, create a new environment, named ssvm:

conda create -n ssvm python=3 --file requirements.txt

To work in this environment, activate it as follows:

source activate ssvm

If you are on Windows, run the above command without the source in the beginning.

Once you setup your build environment, you have to compile the C/C++ extensions and install the package by running:

python setup.py install

Alternatively, if you want to use the package without installing it, you can compile the extensions in place by running:

python setup.py build_ext --inplace

To check everything is setup correctly run the test suite by executing:

nosetests

Examples

A simple example on how to use our implementation of Survival Support Vector Machines is described in an IPython/Jupyter notebook.

A more elaborate script that can be used to reproduce the results in the paper is grid_search_parallel.py in the examples directory. When running it you need to specify the algorithm (--method) and dataset (--dataset) to use:

# Start IPython cluster to run grid search in parallel
ipcluster start &
# Run cross-validation. Results are stored in results-veteran-l2_ranking.csv
python examples/grid_search_parallel.py --dataset veteran --method l2_ranking
# Find best hyper-parameter configuration and visualize the results
python examples/plot-performance.py -o results.pdf results-veteran-l2_ranking.csv

The example above requires the Ipython and seaborn packages.

The script runs cross-validation with 200 randomly selected 50/50 splits of the dataset. This is repeated for each possible configuration of hyper-parameters (see Methods section below). Each time the following performance measures are computed:

Harrell's concordance index, and
root mean squared error (RMSE) with respect to uncensored records.

The output is a CSV file that contains the performance on the test set for each fold and hyper-parameter configuration. Additional options of the script are available when running the script with the --help argument.

Methods

The grid search for all methods contains 13 configurations for the regularization parameter alpha: 2ⁱ, from i = -12 to 12 in steps of 2. When using the hybrid ranking-regression loss, an additional 21 configurations for the ratio between the two losses are considered: 0.05 to 0.95 in steps of 0.05.

Method	Description	rank_ratio
l1	Naive implementation of Survival SVM using hinge loss.	-
l2_ranking	Fast implementation of Survival SVM using squared hinge loss (ranking objective only).	1.0
l2_regression	Fast implementation of Survival SVM using squared loss (regression objective only).	0.0
l2_ranking_regression	Fast implementation of Survival SVM using hybrid of squared hinge loss for ranking and squared loss for regression.	0.05, 0.10, …, 0.95

Datesets

The repository contains four datasets that are freely available and can be used to reproduce the results in the paper.

Dataset	Description	Samples	Features	Events	Outcome
actg320_aids or actg320_death	AIDS study	1,151	13	96 (8.3%)	AIDS defining event or death
breast-cancer	Breast cancer	198	80	62 (31.3%)	Distant metastases
veteran	Veteran's Lung Cancer	137	6	128 (93.4%)	Death
whas500	Worcester Heart Attack Study	500	14	215 (43.0%)	Death

Documentation

The source code is thoroughly documented and a HTML version of the API documentation is available at https://tum-camp.github.io/survival-support-vector-machine/

You can generate the documentation yourself using Sphinx 1.4 or later.

cd doc
PYTHONPATH="..:sphinxext" sphinx-autogen api.rst
make html
xdg-open _build/html/index.html

lethanhnam305 / survival-support-vector-machine Goto Github PK

survival-support-vector-machine's Introduction

Fast Training of Support Vector Machines for Survival Analysis

‼️ This repository is not actively maintained, please use sebp/scikit-survival instead ‼️

Requirements

Installation

Installation from source

Examples

Methods

Datesets

Documentation

survival-support-vector-machine's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent