aidanstack / cycling_classification_project Goto Github PK
View Code? Open in Web Editor NEWUsing 1.8 million rows of traffic collision data from NYC's Open Data initiative, we ran thousands of classification models to discover which variables made collisions lethal for cyclists. The data was cleaned using Pandas, then Scikit-learn was used to instantiate and gridsearch Decision Tree, Random Forest, Logistic Regression, and K Nearest Neighbor models. Logistic Regression proved the most reliable, and was tuned for recall, as fatal collisions made up only .04% of all collisions involving cyclists. The variables most associated with lethal cycling collisions were then extracted back out as actionable insights.