Giter Club home page Giter Club logo

crimes_nyc_binary_classification's Introduction

Crimes-NYC

Binary Classification of NYC Crimes from 2006-2019

Workflow:

I made many models, all of which I tried to accurately predict whether a crime would be attempted or completed, based on the features. Of the models that worked well for this dataset, I hyperparameter tuned and tried to optimize on recall. The reason for this was because we wanted to make sure the attempted crimes would be accurately predicted. It was very hard to accurately predict attempted crimes, however, and much easier to predict the completed crimes. This was partly because this dataset was very imbalanced and there were 719,618 completed crimes, compared to 9,550 attempted crimes. I took a sample of this dataset, of 100,000. Then, I made a copy of the cleaned dataset, and increased the sample size to 500,000 and saw how the classification report changed. The results were better overall with the increased sample size, as was expected.

Objective:

To use previous NYC crime data to make accurate predictions about whether crimes were attempted or completed

Crime Map

Stakeholder: City of New York

-Will determine where to invest in resources to help reduce crime in these areas, based whether crimes were attempted or completed

Dataset:

NYPD Open Data Crime Complaints from 2006-2019. After data cleaning, there were 729,168 crimes of which I took samples of 100,000 and 500,000. The sample of 500,000 yielded better results overall.

Features in the Dataset:

Complaint Date and Time Crime and Crime Description Suspect and Victim Demographics Police Precinct Location and Location of the Crimes Borough Where Crime Occurred and Police Station Borough

We created new columns for details of the month, year, and duration of the crime, day of the week

Target:

We wanted to look at attempted vs completed crimes and see which features were the most accurate predictors of whether the crime was attempted or completed

Exploratory Data Analysis:

EDA

Crimes by Borough

Attempted vs Completed

Crime Map I constructed using Folium:

Crime Map

More Specific Data from NYC Subway Lines

Subway Crime Map

Models:

Logistic Regression with GridSearch Random Forest with Random Oversampler Random Forest with Random Undersampler Random Forest with SMOTE, with GridSearch

Feature Importances

Conclusion:

-It is very difficult to predict attempted crimes, and more easy to predict completed crimes based on this dataset -Harassment and robbery are two crimes that are very ambiguous as to whether they were attempted or completed. It is particularly difficult to predict these two crimes -A business recommendation is to allocate more resources to the locations with the most completed crimes

Model Optimization:

We tried the original sample size of 100,000 for the models. Because increasing the models sample size increases the recall score, we reran the models with a sample size of 300,000.

Models that did not work well for this dataset:

Multinomial Bayes XGBoost with SMOTE KNN with GridsearchCV

crimes_nyc_binary_classification's People

Contributors

daniellejenkins17 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.