Giter Club home page Giter Club logo

margotmarchais / chocolate-e-commerce Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 1021 KB

Python x GCP x dbt x PowerBI project. I built a datawharehouse in Google BigQuery based on data I scraped from several chocolate makers and distributors websites in France. Such data were transformed in dbt and visualized in a PowerBI dashboard.

Home Page: https://margot-marchais-maurice.webflow.io/chocolate-french-market

Python 4.18% Jupyter Notebook 95.82%
bigquery chocolate dbt powerbi python requests scrapy

chocolate-e-commerce's Introduction

Market study | Chocolate e-commerce in France

Executive summary: My goal for this project was to collect data from various chocolate makers and distributors in France, and eventually build a comprehensive dataset about the chocolate online market in France.

Methodology:

  • Web scraping: I scraped product data from several French e-commerce websites (chocolate section), using Python scrapy and requests libraries.
  • Python: I build an additional script to:
    • do some minor transformations to the .csv files resulting from web scraping (data cleaning)
    • automatically load the resulting dataframes in BigQuery using the bigquery.client
  • GCP BigQuery: I created an account and an empty project in GCP.
  • dbt: I created a dbt Cloud project that is connected to my BigQuery project. In this dbt project, I created 3 sections:
    • Bronze (staging): raw data, with few modifications.
    • Silver (transformations): the raw data with some transformations (new columns, filters, etc)
    • Gold (final): final datasets that will be used for analysis. Thanks to dbt, I could 'export' the SQL views and tables to GCP BigQuery. I also managed to create the data lineage, QoD tests and documentation. All dbt modifications were saved thanks to dbt-Github.
  • PowerBI: Finally, I could plug my Gold final datasets to Power BI to create a visual overview of the market.

Final output

  • A comprehensive dataset about the French chocolates online market
  • A visual Power BI dashboard

Go further To learn more about the project, you may read my non technical article here: https://margot-marchais-maurice.webflow.io/chocolate-french-market

Technical learnings: I did this project to help me acquire new skills such as: build a datawharehouse in GCP Bigquery, automatically feed this DWH with scraped data thanks to a Python script, learn how to use dbt (data transformations, tests and docs generation),... It also made me refresh my web scraping skills (scrapy and requests libraries).

Aperçu du dashboard:

2024-05-06_11h58_36 2024-05-06_11h58_45 2024-05-06_12h39_09

Brands positioning:

2024-05-06_12h39_36 2024-05-06_11h59_27

chocolate-e-commerce's People

Contributors

margotmarchais avatar

Stargazers

Hugo Palmer avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.