Giter Club home page Giter Club logo

jojocarson / hcv_rt_classifier Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 1.0 10 KB

Classification of retreatment for reinfection and virological failure among people treated with direct acting antiviral therapy for hepatitis C in national pharmacuetical dispensing administrative data

License: MIT License

Jupyter Notebook 100.00%
gradientboostingclassifier machinelearning pharmacoepidemiology randomforestclassifier directactingantiviral hepatitisc reinfection retreatment virologicalfailure

hcv_rt_classifier's Introduction

Classification of retreatment for reinfection and virological failure among people treated with direct acting antiviral therapy for hepatitis C in national pharmacuetical dispensing administrative data

00 Overview

Background

The treatment of hepatitis C virus (HCV) infection has evolved considerably with the development of direct-acting antiviral (DAA) therapies that are well-tolerated and yield high cure rates (โ‰ฅ95%). However, a small proportion of those treated will have virological failure while others may become reinfected. Retreatment of reinfection and virological failure is essential to prevent transmission, liver disease progression and HCV-related mortality.

In Australia, there is unrestricted access to government subsidized DAAs that can be prescribed by any medical practitioner, with no restrictions on the prescribing of retreamtent. While treatment uptake was initially greatest among older individuals with advanced disease (ie with higher risk of virological failure), there has been increasing treatment uptake among youunger people who inject drugs and people who are incarcerated (ie with higher risk of reinfection).

All dispensation of DAAs is reported through the Australian Pharmaceutical Benefit Scheme (PBS), including retreatment dispensation. The PBS provides high-coverage, structured information on patient demography and pharmacy dispensing however this data is collected for administrative purposes and lacks clinical granularity. Reasons for retreatment are important for assessing HCV elimination strategies but are not captured in the PBS data.

The REACH-C study is a national cohort of individuals receiving DAA treatment through the PBS that reported details of retreatments, including the retreatment reason. For this analysis, we used retreatment data from REACH-C to train a machine learning model to classify retreatments in PBS data as retreatment for reinfection or retreatment for virological failure.

Methods

A total of 10,843 individuals initaited DAA treatment in the REACH-C cohort between 2016-2019, retreatment data for 320 retreatments was collected from 2016-2020. A total of 95,274 individuals initiated DAA treatment through PBS between 2016-2021, retreatment data was availabel for 7948 retreatments was collected from 2016-2022.

The models were developed and trained to predict the reason for retreatment using variables in REACH-C that were also available in the PBS data. Variables included age, gender, HIV co-infection, prescriber type, DAA class (i.e., genotype specific, pangenotypic, salvage), regimen, and duration at (re)treatment, addition of ribavirin at (re)treatment, year of (re)treatment, time between end of initial treatment and commencing retreatment, and missed dispensations (as proportion of authorised duration). Categorical variables were converted to binary dummy variables, missed doses were included as proportion of authorised duration, year (re)treatment commenced and age were included as continuous variables. Random Forest and Gradient Boosting classifiers were considered.

Because of the modest sample size of the REACH-C retreatments (n=320), we divided the data into randomized training and validation datasets using a 3 x 10-fold nested cross validation. Nested cross-validation is an approach to model hyperparameter optimization and model selection that aims to avoid overfitting. This procedure nests the k-fold cross-validation procedure for model hyperparameter optimization inside the cross-validation procedure for model selection. In the 3 x inner loops, the score is approximately maximized by fitting a model to each training set, and then directly maximized in selecting hyperparameters over the validation set. In the 10 x outer loops generalization error is estimated by averaging test set scores over several dataset splits. The use of nested cross validation during training reduces the likelihood of overfitting to a specific subset of the training data because performance metrics are averaged across all folds of the training set. Within the nested cross validation we used GridSearchCV to exhaustively consider all hyperparameter combinations within the defined search space. We then configured the hyperparameter search to refit a final model with the entire training dataset using the best hyperparameters found during the search.

Figure 1. Nested-cross validation

image

Results

Table 1. Performance metrics for Random Forest and Gradient Boosting Classifiers

image

Figure 2. Six-monthly number individuals receiving first retreatment for HCV reinfection and virological failure and the total retreatment courses dispensed for HCV reinfection and virological failure in Australia during 2016-21 with 95% confidence intervals

rt_all_ci

Figure 3. Boostrapped confidence intervals for model predictions for first retreatment and total retreatment

bootstrapCI_github_proportion1bootstrapCI_github_proportion

Contents

01. Code for random forest classifier with nested cross validation
02. Code for gradient boosting classifier with nested cross validation
03. Code for obtaining predictions and computing bootstrapped CIs

Contributors

Joanne Carson
Sebastiano Barbieri
Greg Dore
Gail Matthews

Contact Information

For any question regarding the code or the model, please send an email to: [email protected]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.