Giter Club home page Giter Club logo

kokolipa / tableau_citibike Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 2.03 MB

Using Tableau, this project analyses CitiBike's July 2023 data to understand user behaviour (Python is employed for ETL processes). Objectives include user segmentation, popular stations, ride characteristics, and geographical concentration. Key findings are summarised in two dashboards for member and casual users.

Home Page: https://public.tableau.com/app/profile/gal.beeri/viz/CitiBike_GalBeeri/CasualRidersDashboard

Jupyter Notebook 100.00%
etl haversine-distance math pandas tableau tableau-public

tableau_citibike's Introduction

Tableau - CitiBike July 2023 Analysis

Project Description & Background- Tableau-Dashboard

Background:

Since 2013, the Citi Bike program has implemented robust infrastructure for collecting data on the program's utilisation. Each month, bike data is collected, organised, and made public on the Citi Bike Data.

However, while the data has been regularly updated, the team has yet to implement a dashboard or sophisticated reporting process. City officials have questions about the program, so your first task on the job is to build a set of data reports to provide the answers.

Project Description:

Analysing CitiBike's warmest month (July) from 2023 to provide insights and explore the following questions:

  1. Do members or casuals have higher usage?
  2. Which stations are most popular?
  3. What is the overall average distance travelled?
  4. What days of the week are most rides taken on?
  5. What type of bicycle is used most?
  6. On average, how long do users rent a bicycle?
  7. Which zip codes have largest concentration of usage (approx.)?

To answer these questions, two dashboards were created for each company segment: members and casual users. Members are users that subscribed for an annual membership (Citi Bike plan / Lyft Pink plan pricing); Casual members are users who purchased a 24-hour pass OR 3-day pass.

Description of the data:

There are 13 columns and 3,767,347 data records in July CitiBike.

Columns:
# 1. ride_id               
# 2. rideable_type         
# 3. started_at            
# 4. ended_at              
# 5. start_station_name    
# 6. start_station_id      
# 7. end_station_name      
# 8. end_station_id        
# 9. start_lat             
# 10. start_lng             
# 11. end_lat   ---> The lat and lng of the endpoint for a given ride.            
# 12. end_lng               
# 13. member_casual  --> Segmentation column - identifying members and casual users         
# 14. distance   ---> Defining a function to return the distance between two geolocation points given a sphere - Haversine formula

Assumption & Note:

Assumption:

  • Citibike's July data includes multiple geolocations per station. The assumption here is that each station has a "static" geolocation as well as "dynamic" geolocations for each bicycle docked in the station (each bicycle has a "tablet" that is docked to the steering wheel). Note:
  • What is the distance metric? The dataset does not include multiple geolocations to indicate the root of a given ride. Haversine's formula allows us to calculate the distance between two stations. This data is presented in the dashboard to calculate the average distance for members and casual users.
Members Dashboard

members_dashboard

Casual Members Dashboard

casual_members_dashboard

Tableau Story

Main

ETL

Extract:

  • Data Extraction: Downloading the zip file from CitiBike's data source (202307-citibike-tripdata.csv.zip).
  • Rendering the data extracted from the zip file to Jupyter Notebook using Pandas.
  • Extracting the cleaned data using the zipfile python library using compression level 9.

Transform:

  • Removing null values.
  • Memory optimisation -> Transforming the data types and reducing the bite size for each dtype.
  • Manipulation -> Leveraging Harvesine's function to calculate the distance between two geolocations (adding the distance column to the dataset). Download the clean data here -> Cleaned Data - Drop Box

Load:

  • Load the data (CSV) into Tableau, analyse the data, and upload the visuals to the dashboards.

Python Libraries Used:

  1. Pandas.
  2. os.
  3. math --> radians, sin, cos, sqrt, atan2
  4. zipfile

Folder structure

.
│   ├── Images 
│   |   ├── CitiBike Logo.png      
│   |   ├── CitiBike_Bike.png          
│   |   ├── customers.png         
│   |   ├── Distance.png        
│   |   ├── docked_bike.png
│   |   ├── Ranting.png       
│   ├── DataTransformation.ipynb     
│   ├── Dashboard_Images
│   |   ├── Casual_Dashboard.png      
│   |   ├── Members_Dashboard.png     
|___README.md
|___.gitignore                

tableau_citibike's People

Contributors

kokolipa avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.