Giter Club home page Giter Club logo

rstudio-telecom-log-reg-prediction's Introduction

log-reg-rstudio

Performance measures of customer churn at a telecom company in a logistic regression model.

Customer churn is an important metric for service companies like telecommunications. Retaining customers is more cost-effective than acquiring new ones (Gallo, 2014). Therefore, a good predictive model can help organizations anticipate and prevent customer churn. In this project, five performance measures are calculated on the telecommunication company customer churn logistic regression model; two use all variables, and three use the most significant predictors. A series of screenshots, a procedure summary, an interpretation of the results, and impressions of the experience follow.

The Process

The project reads in the telecommunications CSV file using the getwd() for the current working directory and the read.csv() functions. The file is saved in the telco data frame. The str() command displays the telco object's structure, as seen in Figure 1. This data frame shows the number of observations and variables and a list of the independent and dependent variables with characteristics in the data set. After installing the caret package and pulling in its library, the script to prepare and partition the data into training and testing sets is completed, as seen in Figure 2. The intrain data frame holds the information for the dependent variable and the cutoff point of .7 or 70%.

Next, the logistic model for the churn variable using the training data is performed and demonstrated in Figure 3 using the glm() function and binomial family. A summary of the data demonstrates the most significant predictors for churn. Figure 4 evaluates the model error rate with the testing data by setting “yes” as the most important variable equal to 1 and utilizing the predict() function. Any values over 0.5 will show “yes” for churn, and those below 0.5 will show “no.” The mean() function provides summary statistics for the testing data. Figure 5 calculates and prints the logistic model accuracy using the paste() function and the misClasificError variable. Figure 6 displays the confusion matrix utilizing the table() command and pulling in the fitted.results data from the prediction model. Finally, Figure 7 has four figures demonstrating performance measures in three separate models using the three most significant predictors independently and collectively in the last image. This is done using the same script above, with modifications for each predictor.

Interpretation of Results

The logistic model demonstrates the most significant predictors as Contract, PaperlessBilling, and tenure_group. The evaluation using testing data demonstrates a value of 0.2011385, suggesting an error rate of about 20%, indicating the model is approximately 80% accurate. The accuracy of the logistic model confirms this information, showing a value of 0.799. The confusion matrix details the errors and accuracy. Of the 1,704 actual "0" or “no” responses—meaning the customer did not churn—290 were misclassified as “1” or “yes.” This gives a Class 1 error rate of about 17% (290/1704), with sensitivity—the ability to correctly identify positive results—of about 83% (Berrier et al., 2018). Of the 404 “yes” responses, 134 were incorrectly classified as “no,” giving a Class 0 error rate of about 33% (134/404) and specificity—the ability to correctly identify negative results—of about 67% (Berrier et al., 2018). This is a pretty good model to predict churn.

Performance measures for several models using three significant predictors are completed.The contract, paperless billing, and tenure variables are evaluated independently. All three demonstrate that 560 of the 2108 values in the testing set were classified as “yes” when they should have been "no.” This gives a Class 1 error rate of about 27% and sensitivity of about 73%, lower than the primary model. Collectively, the three significant predictors show a Class 1 error rate of approximately 20%, with a sensitivity of 80%, and a Class 0 error rate of about 41%, with a specificity of 59%. Overall, these are fair results, but the accuracy of the primary model with all variables considered seems to be a better fit.

References

Berrier, J, Nestler, S., Pardoe, I., Sturdivant, R.X., & Watts, K. (2018). Fundamentals of Data Analytics R. Zyante Inc. Gallo, A. (2014, October 29). The value of keeping the right customers. Harvard Business Review. Retrieved on September 14, 2022, from https://hbr.org/2014/10/the-value-of-keeping-the-right-customers

rstudio-telecom-log-reg-prediction's People

Contributors

rachh8283 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.