square / pysurvival Goto Github PK

View Code? Open in Web Editor NEW

330.0 19.0 105.0 7.86 MB

Open source package for Survival Analysis modeling

Home Page: https://www.pysurvival.io/

License: Apache License 2.0

HTML 56.21% Jupyter Notebook 36.20% Python 4.44% C++ 3.16%

survival-analysis machine-learning deep-learning python pytorch numpy

pysurvival's Introduction

PySurvival

What is Pysurvival ?

PySurvival is an open source python package for Survival Analysis modeling - the modeling concept used to analyze or predict when an event is likely to happen. It is built upon the most commonly used machine learning packages such NumPy, SciPy and PyTorch.

PySurvival is compatible with Python 2.7-3.7.

Check out the documentation here

Content

PySurvival provides a very easy way to navigate between theoretical knowledge on Survival Analysis and detailed tutorials on how to conduct a full analysis, build and use a model. Indeed, the package contains:

10+ models ranging from the Cox Proportional Hazard model, the Neural Multi-Task Logistic Regression to Random Survival Forest
Summaries of the theory behind each model as well as API descriptions and examples.
Tutorials displaying in great details how to perform exploratory data analysis, survival modeling, cross-validation and prediction, for churn modeling and credit risk to name a few.
Performance metrics to assess the models' abilities like c-index or brier score
Simple ways to load and save models
... and more !

Installation

If you have already installed a working version of gcc, the easiest way to install Pysurvival is using pip

pip install pysurvival

The full description of the installation steps can be found here.

Get Started

Because of its simple API, Pysurvival has been built to provide to best user experience when it comes to modeling. Here's a quick modeling example to get you started:

# Loading the modules
from pysurvival.models.semi_parametric import CoxPHModel
from pysurvival.models.multi_task import LinearMultiTaskModel
from pysurvival.datasets import Dataset
from pysurvival.utils.metrics import concordance_index

# Loading and splitting a simple example into train/test sets
X_train, T_train, E_train, X_test, T_test, E_test = \
	Dataset('simple_example').load_train_test()

# Building a CoxPH model
coxph_model = CoxPHModel()
coxph_model.fit(X=X_train, T=T_train, E=E_train, init_method='he_uniform', 
                l2_reg = 1e-4, lr = .4, tol = 1e-4)

# Building a MTLR model
mtlr = LinearMultiTaskModel()
mtlr.fit(X=X_train, T=T_train, E=E_train, init_method = 'glorot_uniform', 
           optimizer ='adam', lr = 8e-4)

# Checking the model performance
c_index1 = concordance_index(model=coxph_model, X=X_test, T=T_test, E=E_test )
print("CoxPH model c-index = {:.2f}".format(c_index1))

c_index2 = concordance_index(model=mtlr, X=X_test, T=T_test, E=E_test )
print("MTLR model c-index = {:.2f}".format(c_index2))

Citation and License

Citation

If you use Pysurvival in your research and we would greatly appreciate if you could use the following:

@Misc{ pysurvival_cite,
  author =    {Stephane Fotso and others},
  title =     {PySurvival: Open source package for Survival Analysis modeling},
  year =      {2019--},
  url = "https://www.pysurvival.io/"
}

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

pysurvival's People

Contributors

Stargazers

Watchers

Forkers

tomcat123a jieyuhe dinkarjuyal kush99993s dimatolsto wingkitlee0 bacalfa maraujo kgulpinar davidboren brucebismarck fyc0803 yaxche-io awlange iagomez astha21 carrielui aburkard msloma144 hnuwxw pphilemo hartb yanlirock huangzhii mkazmier chenyang-tao gnodar01 artem-lysenko isabella232 tinika91 shalevy1 ernc pooyam jeanselme alexlyan chaitanyapatil1996 dsciencelabs xujiangyu kamalsky sajanbhagat xjiang1024 ironrebel72 berleon asfnsdafhnisdjfods danielmcp2302 jennifer-w0527 empyriumz wenyasun belalanik jfly kevinlyy algolink sarthakpati dadekandrew2010 sinvl gregoryperkins rm-asif-amin-bkasheda g202112430 andre-vauvelle ryuzakizh ziyit ahmad-abdellatif shubhaminnani yanyinuo1023 sjkinghorn marianeayumi cpufxb cipher-wzy jane-lin0 pj-mathematician hodatorabi tokybe cctrotte yjan2088 yujing1997 utkarshsaraf19 mertbilgin wanderer2014 schlerp djun tommyngx

pysurvival's Issues

Error when installing on Mac

I recently bought a MacBook Pro 2 GHz Intel Core i5, and RAM of 16Gb.
I have installed Python's last version 3.9.0.
I am trying to install PySurvival on my new laptop following the usual procedure, however, I get a lot of error messages when running the command
pip3 install pysurvival

How could I solve this issue please?

ERROR: Command errored out with exit status 1:
   command: /usr/local/opt/[email protected]/bin/python3.9 /usr/local/lib/python3.9/site-packages/pip install --ignore-installed --no-user --prefix /private/var/folders/ys/zf1_7nr579qfkdcvpjnwhnv00000gn/T/pip-build-env-gazj7qjl/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'cython >= 0.29' 'numpy==1.14.5; python_version<'"'"'3.7'"'"'' 'numpy==1.16.0; python_version>='"'"'3.7'"'"'' setuptools setuptools_scm wheel
       cwd: None
  Complete output (3595 lines):
  Ignoring numpy: markers 'python_version < "3.7"' don't match your environment
  Collecting cython>=0.29
    Using cached Cython-0.29.21-py2.py3-none-any.whl (974 kB)
  Collecting numpy==1.16.0
    Using cached numpy-1.16.0.zip (5.1 MB)
  Collecting setuptools
    Using cached setuptools-50.3.2-py3-none-any.whl (785 kB)
  Collecting setuptools_scm
    Using cached setuptools_scm-4.1.2-py2.py3-none-any.whl (27 kB)
  Collecting wheel
    Using cached wheel-0.35.1-py2.py3-none-any.whl (33 kB)
  Building wheels for collected packages: numpy
    Building wheel for numpy (setup.py): started
    Building wheel for numpy (setup.py): still running...
    Building wheel for numpy (setup.py): finished with status 'error'
    ERROR: Command errored out with exit status 1:
     command: /usr/local/opt/[email protected]/bin/python3.9 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/ys/zf1_7nr579qfkdcvpjnwhnv00000gn/T/pip-install-ts_1r59v/numpy/setup.py'"'"'; __file__='"'"'/private/var/folders/ys/zf1_7nr579qfkdcvpjnwhnv00000gn/T/pip-install-ts_1r59v/numpy/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/ys/zf1_7nr579qfkdcvpjnwhnv00000gn/T/pip-wheel-obu2ycmj
         cwd: /private/var/folders/ys/zf1_7nr579qfkdcvpjnwhnv00000gn/T/pip-install-ts_1r59v/numpy/
    Complete output (3194 lines):
    Running from numpy source directory.
    /private/var/folders/ys/zf1_7nr579qfkdcvpjnwhnv00000gn/T/pip-install-ts_1r59v/numpy/numpy/distutils/misc_util.py:476: SyntaxWarning: "is" with a literal. Did you mean "=="?
      return is_string(s) and ('*' in s or '?' is s)
    blas_opt_info:
    blas_mkl_info:
    customize UnixCCompiler
      libraries mkl_rt not found in ['/usr/local/Cellar/[email protected]/3.9.0_1/Frameworks/Python.framework/Versions/3.9/lib', '/usr/local/lib', '/usr/lib']
      NOT AVAILABLE
  
    blis_info:
    customize UnixCCompiler
      libraries blis not found in ['/usr/local/Cellar/[email protected]/3.9.0_1/Frameworks/Python.framework/Versions/3.9/lib', '/usr/local/lib', '/usr/lib']
      NOT AVAILABLE

Is this project alive?

Anyone out there?

Some questions about the credit risk notebook

Hey,

When I run the code and follow along with the tutorial (https://square.github.io/pysurvival/tutorials/credit_risk.html), I'm confused, especially at the end.

I have few questions:

Are these 2 graphs similar? (same y-axis) Because I'm not sure what is the y-axis in the 2nd graph...
How is it possible that the high-risk line is higher than the low risk? Does it mean, he repays faster?
What is the "actual time", called T the one which is represented in the 2nd graphs?

Can you help me with that please, it's for a school project 🙏

Thanks for your consideration and have a good day!

Multilabel

Can MTLR or other models handle multilabel classification?

Multiclass methods: Gradient always exploding

I've had some success using other methods to predict survival for my business problem. However, I can never get either of the multi class methods to work. I'm met each time with The gradient exploded... You should reduce the learningrate (lr) of your optimizer. I've tried extremely small learning rates, but still the same result.

Feature Importance Linear Multitask Logistic Regression

Is there a way to know the feature importances (best fitting parameters) of the Linear and Non-Linear Multitask Logistic Regression models?

support time-varying covariates?

Hi, thanks for this package! It looks great for time-to-event modeling. I may have missed something in the documentation, but do the MTLR estimators support time-varying covariates, and if so, how would I set up the data to train such a model?

Add method "apply(self, X)" to forest-based survival models

In the sklearn API for tree-based models there's an method "apply(self, X)" that returns leaf indices, applying trees in the forest to X. It is very useful to diagnose which elements ended up in the same leaf node and cluster them.

Is it possible to add this functionality?

predict_survival function output

Does the predict_survival function predict from the beginning of the employee's tenure or do it automatically subtract the time observed till now.

I am trying to find something similar to the lifelines packages where they have the following
: ``predict_survival_function(X, conditional_after)` where conditional after taking the tenure till date into account and give the probability of survival after that point

pip install pysurvival failed to build cpp extensions on Windows 10 vsStudio2019

Trying to install this package on Windows10. the pip install is trying to use vsStudio2019 to compile and link the cpp extensions and is failing. Here are the relevant build commands:

C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29333\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\ProgramData\Anaconda3\include -IC:\ProgramData\Anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29333\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29333\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tppysurvival/cpp_extensions/non_parametric.cpp /Fobuild\temp.win-amd64-3.7\Release\pysurvival/cpp_extensions/non_parametric.obj -std=c++11 -O3
cl : Command line warning D9002 : ignoring unknown option '-std=c++11'
cl : Command line warning D9002 : ignoring unknown option '-O3'
non_parametric.cpp
pysurvival/cpp_extensions/non_parametric.cpp(152): warning C4554: '&': check operator precedence for possible error; use parentheses to clarify precedence
pysurvival/cpp_extensions/non_parametric.cpp(284): warning C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data
pysurvival/cpp_extensions/non_parametric.cpp(303): warning C4267: '=': conversion from 'size_t' to 'int', possible loss of data
pysurvival/cpp_extensions/non_parametric.cpp(308): warning C4267: '=': conversion from 'size_t' to 'int', possible loss of data
pysurvival/cpp_extensions/non_parametric.cpp(317): warning C4267: '=': conversion from 'size_t' to 'int', possible loss of data
pysurvival/cpp_extensions/non_parametric.cpp(318): warning C4554: '&': check operator precedence for possible error; use parentheses to clarify precedence
pysurvival/cpp_extensions/non_parametric.cpp(349): error C2065: 'M_PI': undeclared identifier
pysurvival/cpp_extensions/non_parametric.cpp(364): error C2065: 'M_PI': undeclared identifier
pysurvival/cpp_extensions/non_parametric.cpp(364): error C2065: 'M_PI': undeclared identifier
error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29333\bin\HostX86\x64\cl.exe' failed with exit status 2

auto_scaler=False error

Hi, there is a bug when we turn off the auto_scaler.

Sample code:
coxph = NonLinearCoxPHModel(auto_scaler=False)
coxph.fit(feature_train, time_train, event_train)

Error:

UnboundLocalError Traceback (most recent call last)
in
1 coxph = NonLinearCoxPHModel(auto_scaler=False)
----> 2 coxph.fit(feature_train, time_train, event_train)

/opt/conda/lib/python3.7/site-packages/pysurvival/models/semi_parametric.py in fit(self, X, T, E, init_method, optimizer, lr, num_epochs, dropout, batch_normalization, bn_and_dropout, l2_reg, verbose)
608 T = T[order]
609 E = E[order]
--> 610 X_original = X_original[order, :]
611 self.times = np.unique(T[E.astype(bool)])
612 self.nb_times = len(self.times)

UnboundLocalError: local variable 'X_original' referenced before assignment

pysurvival/pysurvival/models/semi_parametric.py

Line 367 in 841b9bc

def __init__(self, structure=None, auto_scaler = True):

fix suggestion (starting from line 602):

    # Scaling data
    if self.auto_scaler:
        X_original = self.scaler.fit_transform( X )
    else:
        X_original = X

Is there a way to manually set the the time buckets?

Hey

I would like to fix the time buckets before training, but I don't think the library supports that for now.
I tried setting the attributes times and time_buckets, but that doesn't work, as the model calculates its own ones during training.
I think that it would be useful to be able to specify that

RSF implementation seems to hang when predicting

I've installed pysurvival using brew for gcc and pip on my MacBook Pro (macOS 10.13.6) and been able to train a RSF model in a Jupyter Notebook (though this took several minutes of high CPU activity). The training data has around 70 factors and 5000 rows.

I'm now trying to work with the model but when I call e.g.

risks = rsf.predict_risk(X_test)

the notebook just hangs indefinitely with no sign of CPU activity.

Fail set calculation issue

https://github.com/square/pysurvival/blob/master/pysurvival/models/semi_parametric.py#L395

This line calculate the events that occur at a given time.

It should be index_fail = np.argwhere( self.times == T[i] ).flatten()

index_fail = np.argwhere( self.times == T[i] )[0] only considers one event occur at the given time. We should consider all of ties.

Installing under Python 3.9

There is a problem with installing pysurvival under python 3.9. (Maybe #39 is related)

The solution is to recompile the cython module with a recent cython version.

I did this here: https://github.com/berleon/pysurvival

Best,
Leon

compare_to_actual returns `rmse` 3 times in `metrics.py`

The offending line is here: https://github.com/square/pysurvival/blob/master/pysurvival/utils/metrics.py#L340

And 8 lines below it. May I submit a quick fix? Not sure what the process is for contributing but I'd love to help :)

Estimator used for p-value approximation

Hello! There were several approximations proposed in the original paper here, but which method was used to approximate the p-value for Conditional Survival Forests? Additionally, is the final p-value used for comparison corrected for multiple hypothesis testing using Bonferroni's correction? Thanks ahead.

cox coefs don't match that of most other packages

I ran a comparison from lifelines loaded data (rossi)

# lifelines
from lifelines.datasets import load_rossi
from lifelines import CoxPHFitter

# load dada
rossi = load_rossi()

And the pysurvival cox coefficients don't match. I also repeated for scikit-survival, and statsmodels. The age effect is an order of magnitude larger than the other packages.

Install successful but can not import some modules

Kernel crashes when running concordance_index()

I have a df with 40k rows and 21 variables. I am following the Churn prediction tutorial. csf_fit() works fine and takes 45min to run. But when I then run concordance_index() my session crashes and I lose my csf object.

I was able to reproduce the issue by running the example code for Conditional Survival Forest (CSF) but by increasing the N and number of features to:

# Generating N random samples 
N = 10000
dataset = sim.generate_data(num_samples = N, num_features=6)

I used the environment which the following Dockerfile provides:

FROM jupyter/scipy-notebook

RUN conda update -n base conda
RUN conda install pytorch-cpu torchvision-cpu -c pytorch
RUN conda install matplotlib pandas scikit-learn pyarrow progressbar scipy boost
RUN pip install --upgrade pip \
  && pip install pysurvival

Tutorials use time-varying covariates as if they are fixed

I could be wrong about this, as I am new to the survival analysis literature, but my understanding is that time-varying covariates must be given special treatment in any survival analysis:

https://www.jstor.org/stable/pdf/27643698.pdf?ab_segments=0%252Fbasic_SYC-5187_SYC-5188%252F5187&refreqid=excelsior%3A0aaa616b818456a7135b22942f8307e0
https://lifelines.readthedocs.io/en/latest/Time%20varying%20survival%20regression.html
https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf
https://www.annualreviews.org/doi/pdf/10.1146/annurev.publhealth.20.1.145

However, the tutorials for this project use time-varying covariates as if they are fixed over time:

https://square.github.io/pysurvival/tutorials/employee_retention.html

Is this problematic ?

two different predict_risk implementations

For others such as predict_survival, predict_hazard, predict_cdf, implementation in models.py are used in Simulations.

However for predict_risk, in Simulations, instead of using the one in models.py , it uses the one in simulations.py

The two predict_risk ( models.py, simulations.py ), give quite different result.

The one from models.py being sum over cumulative hazard, whereas the one in simulations.py being a sum over weight values (or more complex forms depending on the choice.)

Wonder why it is the case.

AttributeError: 'ConditionalSurvivalForestModel' object has no attribute 'model'

Hello! It looks like the Conditional Survival Forest model I fitted ran successfully, but I'm unable to save the model or use it to predict.

Windows Installation

Doesn't seem to work with pip install pysurvival.

...
pysurvival/cpp_extensions/non_parametric.cpp(349): error C2065: 'M_PI': undeclared identifier
pysurvival/cpp_extensions/non_parametric.cpp(364): error C2065: 'M_PI': undeclared identifier
  error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN\\x86_amd64\\cl.exe' failed with exit status 2

See https://stackoverflow.com/questions/6563810/m-pi-works-with-math-h-but-not-with-cmath-in-visual-studio.

so cool ！

How to visualize ConditionalSurvivalForestModel ？

i am a pretty fresh with pysurvival , how to visualize a forest model with others package?

Issue when setting "use_log = True" in create_risk_groups

The code currently works as follows:

risk = model.predict_risk(X)
    if use_log:
        risk = np.log(risk)

However, if model.predict_risk(X) get inf values, when applying np.log(risk) the values will still being inf. Whereas, if they are caclulated with log from the begining, inf values will be less likely to appear.
Maybe use something like

    if use_log:
        risk = model.predict_risk(X, use_log = True)
    else: risk = model.predict_risk(X, use_log = False)

Hyperparameters tuning

Hello,

Thank you for your great package.
I would like to know why there is no Hyperparameters tuning performed for any of the models and if you can add it to one of the methods such as DeepSurv so that we could do it for other methods by ourselves.

Thank you in advance,

Afshin

Typos in utils.metrics.compare_to_actual function

There might be typos in utils.metrics at lines 340 and 348 (compare_to_actual function)

I assume this shoulde be:

(340) results['median_absolute_error'] = med_ae instead of results['median_absolute_error'] = rmse
(348) results['mean_absolute_error'] = mae instead of results['mean_absolute_error'] = rmse

Thanks for the great repo !

Time bucket error after fitting coxph

I tried to fit CoxPHModel with my own dataset. I am sure that the format for X, T, and E vectors are correct. However, I got "AttributeError: The time axis needs to be created before using the method get_time_buckets." after optimization reached max number of iterations. How can I solve this problem?

High Concordance Value

Does anyone know how to tune the Random Survival Forests model?

about: local variable 'step_size_min' referenced before assignment in pysurvival

The error message is prompted by the step function in the rprop.py file in the user torch directory. You only need to initialize the variable that reports the error in this function, similar to step_size_min=[]
def step(self, closure=None):
......
F.rprop(params,
grads,
prevs,
step_sizes,
step_size_min=step_size_min,
step_size_max=step_size_max,
etaminus=etaminus,
etaplus=etaplus)

Make pysurvival work with scikit-learn

I have noticed that PySurvival does not really follow the priniciples of scikit-learn. Starting with the fact that you input X, T, E, instead of X, y. Further GridSearchCV cannot be used because of the aforementioned problem but also because there is no set_params method in the model objects. (also see pipeline of scikit-learn, which only works after extensive reworking of many classes and functions in scikit-learn). This is very unfortunate, I think, that this great package keeps outside of sklearn. Is there any plan to fix this and make PySurvival connectable to scikit-learn? Or am I missing something?

Calibration Plots

Is it possible to implement calibration plots with probabilities like in scikit-learn?

No module named 'pysurvival.utils._functions'

Hi,

I can't load this module 'pysurvival.utils._functions'. I looked into pysurvival/utils/ directory but didn't find it.

How do I fix this? Thank you!

Can not import custom class depend on pysurvival

Hello,

I have following class

from pysurvival.models.multi_task import NeuralMultiTaskModel

import joblib
import numpy as np

from ml_models.preprocessing.one_hot_encoding import PreProcessingWOneHot
from ml_models.templates.model import Model
from pysurvival.utils import save_model, load_model

from util import get_my_logger


class PySurvival(Model):
    def __init__(self):
        super().__init__()
        self.pre_processing = PreProcessingWOneHot()

    def build_survival_model(self, parm: dict, row_data: dict, target: np.array) -> (np.ndarray, np.ndarray, np.float64, np.float64, np.ndarray):
        """

        Args:
            row_data: list
            target: np.array
            parm: what parameters that I need to pass to models

        Returns:
            hold_out_y: target variable for hold out set
            predict_proba: predicted score on hold out set
            logloss: logloss on hold out
            auc: auc on hold out
            feature_score_array: feature score of features
        """
        structure = parm.pop('structure')
        self.model_instance = NeuralMultiTaskModel(bins=parm.pop('bins'), structure=structure)
        self.logger.info("building training data started")
        train_x, time, event = self.build_data_survival(row_data, target)
        self.logger.info("time is {0}".format(time[:10]))
        self.logger.info("time is {0}".format(event[:10]))
        self.logger.info("target is {0}".format(target[:10]))

        self.logger.info("final model building starting")
        self.model_instance.fit(train_x, time, event, **parm)

        hazard, density, survival = self.model_instance.predict(train_x)
        risk = self.model_instance.predict_risk(train_x)

        return {"time": time, "event": event, "hazard": hazard, "density": density, "survival": survival, "risk": risk}

In another file i am importing that class like following

from ml_models.templates.py_survival.py_survival_model import PySurvival

However, this doesn't work, it throws following error

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

However, it works if I import pysurvival first, before importing class, it works like following.

import pysurvival
from ml_models.templates.py_survival.py_survival_model import PySurvival

Do you know what is happening ?
Any help is appreciated.

This is great package. Thank you for making open sources

Add conda recipe

Hi,

It would be great if you could provide a conda recipe.

I am already working on this and it is ready to be previewed and merged by the conda team here: conda-forge/staged-recipes#15709

I'll be happy to add other maintainers for that package!

Cheers.

Gradient always explodes when modelling LogNormal AFT model

Hello,

For quite some time I can't model LogNormal AFT model, as no matter what learning rate, optimizer I select, gradient always explodes... Is there could be anything to this method specifically here?

Thanks for answer in advance!

Tutorial - Employee Retention - Dropping low salary feature

I don't fully understand how the salary feature is handled in the Employee Retention. There appears to be an ordinal with 3 categories: low, medium and high. What happens here is that:

The salary feature is one-hot encoded - Why wouldn't an ordinal encoding work here, considering the tree model?
The correlation is then tested on the "low" and "medium" columns, which is very negative - Isn't this quite expected, considering it's a categorical feature?
The "low" column is dropped - Doesn't that mean that we effectively grouped "high" and "low" salary together?

Warning Thrown when using `save_model`

I'm using the save_model function to save a ConditionalRandomSurvivalForest model for churn. When I do, I repeatedly see a warning:

python3.5/site-packages/pyarrow/pandas_compat.py:113: FutureWarning: A future version of pandas will default to `skipna=True`. To silence this warning, pass `skipna=True|False` explicitly

I would like to remove this warning, but unfortunately neither the save_model func nor the random forest object's save method allow me to pass in kwargs like skipna.

Package Versions:

pandas: 0.24.2
pyarrow: 0.8.0
pysurvival: 0.1.2

My best guess is that the issue is on this line:

pysurvival/pysurvival/models/__init__.py

Line 83 in dd4c5bf

serialized_to_save = pa.serialize(parameters_to_save)

Can we update save_model to accept kwargs?

Model explainability

Is there a way, given a previously trained model, to interpret the survival prediction such as Lime or Shap?

Way to extend the survival curve beyond the range of the time buckets ?

Hi,

Currently the survival curve given by the model is limited only to the range of the time buckets. If i give a time that is outside the time bucket, the survival value is just the value of survival predicted for the last value in the bucket. So, basically survival probability flat lines after the range. Is there a way to extrapolate the model beyond it ?

Thanks

Install successful but can not import some modules

I successfully installed pysurvival (after doing brew install gcc and export CC and CXX as per instructions), but when I try to import pysurvival into jupyter notebook I get this error.

And when I try to import some pysurvival modules there is no problem.

Is there maybe some conflict between GCC and clang when compiling C++ code?
My OS is Mac OSX Catalina 10.15.1 and I am using Python 3.7.4 but Python 2.7 also exist on my computer.

Class Imbalance

Does this package address class/label imbalance?

Mistake in "CoxPHModel.predict_risk"---line 308 in "models/semi_parametric.py"

There seem to be a mistake in calculating risk_score for Cox.
In "models/semi_parametric.py" line 308-312:

        risk_score = np.exp(np.dot(x, self.weights))
        if not use_log:
            risk_score = np.exp(risk_score)

        return risk_score

There are a redundant "np.exp()" in line 308
The logged risk score should just be "np.dot(x, self.weights)"

Hope you check it soon. Please tell me if I misunderstand it.

Windows

Will this work on Windows?

Feature importance in Forest Models

To train any forest model with categorical variables it is needed first to convert them to dummy. After training, to get the feature importance, how does one get only one importance for only one variable, instead of for each class-factor?

Non-determinism in model estimation

Hi, for the MLTR and RandomSurvivalForest I get different estimates for survival probabilities on each run.

Is there any parameter to regulate training?

Export utilities for use without model object

Hi, this seems like a very useful library with cool features, including the metrics concordance index, brier score etc.
For users who want to use their own model fit / estimation methods (like me), would it be a good addition to export the metrics (and maybe other functions) to work without having to specify a model argument?

Error message : ImportError: dlopen, ... Symbol not found...

Hi,
When I try to copy the code and run it from the tutorial page https://square.github.io/pysurvival/tutorials/credit_risk.html , the pysurvival package is successfully installed. But when I run this code:

from pysurvival.utils.display import correlation_matrix

it shows the error message as follow:

ImportError: dlopen(/Users/ju/opt/anaconda3/lib/python3.7/site-packages/pysurvival/models/_non_parametric.cpython-37m-darwin.so, 2): Symbol not found: _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_assignERKS4
Referenced from: /Users/ju/opt/anaconda3/lib/python3.7/site-packages/pysurvival/models/_non_parametric.cpython-37m-darwin.so
Expected in: /usr/lib/libstdc++.6.dylib
in /Users/ju/opt/anaconda3/lib/python3.7/site-packages/pysurvival/models/_non_parametric.cpython-37m-darwin.so

After that, I have uninstalled and reinstalled pysurvival several times from either Spider/Anaconda or command line, it was still failed.

My laptop is MacBook Pro (13-inch, 2019, Two Thunderbolt 3 ports), version 10.15.7 (19H2)
IDE: Spyder

How can I solve it?
Thank you!

square / pysurvival Goto Github PK

pysurvival's Introduction

PySurvival

What is Pysurvival ?

Content

Installation

Get Started

Citation and License

Citation

License

pysurvival's People

Contributors

Stargazers

Watchers

Forkers

pysurvival's Issues

Error:

Recommend Projects

Recommend Topics

Recommend Org