Giter Club home page Giter Club logo

-taxi-trip-time-prediction's Introduction

Taxi-trip-time-Prediction

Problem Statement

More than 7 billion people exist on earth. With necessities of food, water and shelter there also a key requirement of commutating from one place to other. Rapid advancement in technology in the last two decades leads to adaption of a more efficient way of transportation via internet and app-based transport system. New York city is one of such advanced city with extensive use of transportation via subways, buses and taxi services. New York has more then 10,000 plus taxi and nearly 50% of population doesn’t have a personal vehicle. Due to this facts most people used taxi has a there primary mode of transport and it accounts for more than 100 millions taxi trips per year.

The main objective is to build a predictive model, which could help them in predicting the trip duration of taxi. This would in turn help them in matching the right cabs with the right customers quickly and efficiently.

Data Overview

Data Description:

The dataset is based on the 2016 NYC Yellow Cab trip record data made available in Big Query on Google Cloud Platform. The data was originally published by the NYC Taxi and Limousine Commission (TLC). The data was sampled and cleaned for the purposes of this project. Based on individual trip attributes, you should predict the duration of each trip in the test set.

Data fields:

  • id - a unique identifier for each trip

  • vendor_id - a code indicating the provider associated with the trip record

  • pickup_datetime - date and time when the meter was engaged

  • dropoff_datetime - date and time when the meter was disengaged

  • passenger_count - the number of passengers in the vehicle (driver entered value)

  • pickup_longitude - the longitude where the meter was engaged

  • pickup_latitude - the latitude where the meter was engaged

  • dropoff_longitude - the longitude where the meter was disengaged

  • dropoff_latitude - the latitude where the meter was disengaged

  • store_and_fwd_flag - This flag indicates whether the trip record was held in vehicle memory before sending to the vendor because the vehicle did not have a connection to the server - Y=store and forward; N=not a store and forward trip

  • trip_duration - duration of the trip in seconds

Approach

The following steps were followed in the project:

Data Preprocessing and cleaning: Done data preprocessing by converting some columns' datatypes , by adding some new columns and removimng outliers

Exploratory Data Analysis: Did Exploratory Analysis which includes Univariate as well as Bivariate Analysis.

Data Split: The preprocessed data was split into training and test sets on a random state of 0. By using training data we trained our predictive model and we used testing data to evaluate our prediction.

Model Training: The models were trained using the training data, and these models are optimized by some Hyperparameter settings.

Model Evaluation: The performance of the trained models was evaluated using metrics such as mean absolute error, root mean squared error, and R-squared. The model that performed the best on the test data was selected.

Model Deployment: The selected model was deployed in a live production setting, where it could make real-time predictions of bike demand. The model's performance was monitored over time to ensure its accuracy and usefulness.

Models Used

Models Used For modeling we tried various regression models such as

1)Linear Regression

2)Random Forest

3)XG Boosting

All these models were fine tuned using a random search method with repeated cross-validation (CV) to find the best hyperparameters. We also found out Feature importance so that we get to know about the features which are highly important.

Result

We got the best model accuracy in Xgboost model,r2 score of 0.83 in test data..The RMSE score of Xgboost model is 0.28

Output

taxi

Conclusion

During our analysis, Firstly did data preprocessing by creating few columns . I cleaned the data and removed outliers. Then conducted an exploratory data analysis (EDA) on all the features in our dataset. I have done Monovariate and Bivariate analysis on these variables . I also studied the numerical variables, calculated their correlations, and the their relationships with the dependent variable. I also handled multicollinearity issue using VIF. I also done OneHotencoding by adding dummy columns .

I employed 3 machine learning algorithms including Linear Regression, Random Forest and XG Boosting . I also performed hyperparameter tuning to enhance the performance of our models.

Some Facts:

1)Our base model (Linear Regression) gave us a r2_score of 0.65 in test data and keep this model accuracy in reference to compare other model.

2)We got the best model accuracy in Xgboost model,r2 score of 0.83 in test data..The RMSE score of Xgboost model is 0.28

3)Our second best model is RandomForest Regressor with a r2 score of 0.72 in test data. The RMSE score of random forest is 0.36

4)If we dont use proper hypertunning parameter for Random Forest , it gives overfitting issues and takes to much time to run.(Practical observation)

5)As far as data is concern , we can see that there is rise in trip duration and number of rides at the time of evening , this looks like some natural pattern because of the trafic issues .

6)Also there are more pickup numbers durig 4-6 pm in the evening. This is due to Closing office hours may be.

7)March is the month where most of the Taxi rides happen this cound be due to all salaries are created due to budget end month or else it can be due to vacations as well

8)Taxi trips are lowest on Mondays and comparatively low at sSundays as well.

9)On Sundays people prefer to stay at home other than going outside in the traffic.

-taxi-trip-time-prediction's People

Contributors

ady909 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.