Giter Club home page Giter Club logo

library_defaulters's Introduction

Library Defaulters Data Challenge

This repository contains code for analysis and prediction of default (not returning borrowed books) rates for library data.

File Organization

Repository

.
├── README.md
├── figures
│   └── library_map.html
├── prediction_models.ipynb
└── preprocessing_eda
    ├── books.ipynb
    ├── checkouts.ipynb
    ├── customers.ipynb
    └── libraries.ipynb

Data Directory (not included)

.
├── Data\ Challenge
│   ├── books.csv
│   ├── checkouts.csv
│   ├── customers.csv
│   └── libraries.csv
├── cleaned_data
│   ├── books_checkouts_merged.csv
│   ├── books_cleaned.csv
│   ├── checkouts_cleaned.csv
│   ├── customers_cleaned_geocoded.csv
│   ├── libraries_cleaned_geocoded.csv
│   └── merged.csv
└── library_defaulters

Business Questions

Are there any factors you can find that are connected with late returns?
I was able to find several factors related to late returns, the most prominent amongst these were:

  • distance between the patron home address and the library address. Patrons returning books late on average lived 3 kilometers further away than those returning books on time
  • proximity of due date to a holiday. I checked both if the due date for a return was a holiday, and the distance between the due date and the closest holiday (in days). Checkouts associated with late returns were on average closer to holidays (when excluding checkouts in the 19th century).
  • the day of the week that the due date is. Late returns were more common on Tuesdays, than other days of the week.
  • number of pages in the book. Longer books were returned late more often.

What would you recommend the library do to mitigate the risks you find?
I would make two primary recommendations:

  • to install drop-boxes, especially further away from the library, that patrons can drop the books they're returning to. This would reduce the distance between patron address and "return point"
  • to allow for extended return-windows around holidays. Since holidays appear to be an important feature in the data, my ananlysis is that peple are preoccupied (traveling, hosting family etc) during holidays leading to missing the due dates. Once the due dates are missed, they might slip into the "it's late anyway" thought process, causing further delays-- this is something I'd like to test since we have data on duration borrowed and actual return dates.

What other stories can you tell with this data?
I think a lot of the patterns of when people return books once they've already missed the due date can be interesting to tease out, as well as informative about what are optimal conditions for returning books. For example:

  • how long do patrons keep a book for after they've missed the due date?
  • what days of the week are "past due date" books often returned?
  • what is the relationship with number of pages and days a book is borrowed for? Is there a threshold level such that books shorter than that are always returned on time?

Additional Exercises

These are the exercises I would have undertaken if I had more time:

  • the data suffers from class imbalance, so undertake techniques to account for that such as minority-class oversampling or majority class undersampling
  • recursive feature eliminatior for an improved feature selection
  • a simple neural network (with softmax layer) to get the probabilities of late and timely return
  • set-up pipelines and create classes to avoid repitition when reusing code for different models
  • use google's Place API to see if the library is close to a grocery store, or downtown, or any other area that might be visited frequently

library_defaulters's People

Contributors

garimaqs avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.