Giter Club home page Giter Club logo

electric_vehicle_de's Introduction

Electric Vehicle Data Engineering Project

Overview

This project is my submission for the Data Engineering Zoomcamp 2024 Capstone. The aim of the project is to explore and gain insights into the use of electric vehicles (EVs) in the United States. The dataset utilized for this project was obtained from Data.gov, more information about the data can be found here.

Project Architecture

Project Architecture

Project Workflow

Cloud Infrastructure Setup

  1. GCP Infrastructure Creation

    • Create a project in Google Cloud Platform (GCP).
    • Create a service account with the following roles: Storage Admin, BigQuery Admin, and Compute Admin.
    • Generate service account keys and download the key (.json file).
  2. Terraform Setup

    • Install Terraform:
      wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
      echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
      sudo apt update && sudo apt install terraform
    • Create a new folder (e.g., terraform) and navigate into it.
    • Create main.tf and variables.tf files.
    • Update main.tf with project ID, credentials, region, and location using the variables declared in variables.tf.
    • Update main.tf with Google Cloud Storage bucket name and BigQuery dataset name.
    • Set service account key as an environment variable:
      export GOOGLE_CREDENTIALS='/path/to/your/keys.json'
    • Initialize Terraform and apply changes:
      terraform init
      terraform plan
      terraform apply

Data Orchestration

  1. Mage and Docker Setup

    • Create a folder for Mage (e.g., Mage).
    • Add a Dockerfile to the folder that contains the Mage image:
      FROM mageai/mageai:latest
      
      ARG USER_CODE_PATH=/home/src/${PROJECT_NAME}
      
      COPY requirements.txt ${USER_CODE_PATH}requirements.txt 
      
      RUN pip3 install -r ${USER_CODE_PATH}requirements.txt
    • Add a docker-compose.yml file to include the project name, Dockerfile, GCP service keys, and other environment variables.
  2. Running Mage

    • Build and spin up Mage on localhost:
      docker-compose up
    • This may take some time to initialize.
  3. Pipeline Creation

    • In Mage, create a pipeline to load the data into Google Cloud Storage and BigQuery.

Data Transformation

dbt Cloud Setup

  1. Create a New Project in dbt

    • Sign in to dbt Cloud and create a new project.
    • Add a project name and select BigQuery as the data source.
    • Include the service account keys for authentication.
    • Set up a repository to run your transformations and create the project.
  2. Initialize dbt Project

    • Click 'initialize dbt project' and then 'commit and sync'.
  3. Model Creation and Transformation

    • Create your SQL files in the models directory for transformation.
    • Perform data transformation using dbt:
      dbt run
  4. Resources

Dashboard Creation

  1. Access Looker Studio

  2. Create a New Report

    • Create a new report in Looker Studio.
    • Connect to BigQuery as the data source.
    • Select datasets to use for building the dashboard.
  3. Dashboard Design

    • Design your dashboard in Looker Studio.
    • Include relevant visualizations and insights.
  4. Electric Vehicle Dashboard

    Dashboard

Resources

electric_vehicle_de's People

Contributors

justus-coded avatar

Stargazers

 avatar  avatar Jobert Gutierrez avatar  avatar Naga Akhil avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.