Giter Club home page Giter Club logo

lab-eda-bivariate's Introduction

Lab - EDA Bivariate Analysis: Diving into Amazon UK Product Insights Part II

Objective: Delve into the dynamics of product pricing on Amazon UK to uncover insights that can inform business strategies and decision-making.

Dataset: This lab utilizes the Amazon UK product dataset which provides information on product categories, brands, prices, ratings, and more from from Amazon UK. You'll need to download it to start working with it.


Part 1: Analyzing Best-Seller Trends Across Product Categories

Objective: Understand the relationship between product categories and their best-seller status.

  1. Crosstab Analysis:

    • Create a crosstab between the product category and the isBestSeller status.

    • Are there categories where being a best-seller is more prevalent?

      Hint: one option is to calculate the proportion of best-sellers for each category and then sort the categories based on this proportion in descending order.

  2. Statistical Tests:

    • Conduct a Chi-square test to determine if the best-seller distribution is independent of the product category.
    • Compute Cramér's V to understand the strength of association between best-seller status and category.
  3. Visualizations:

    • Visualize the relationship between product categories and the best-seller status using a stacked bar chart.

Part 2: Exploring Product Prices and Ratings Across Categories and Brands

Objective: Investigate how different product categories influence product prices.

  1. Preliminary Step: Remove outliers in product prices.

    For this purpose, we can use the IQR (Interquartile Range) method. Products priced below the first quartile minus 1.5 times the IQR or above the third quartile plus 1.5 times the IQR will be considered outliers and removed from the dataset. The next steps will be done with the dataframe without outliers.

    Hint: you can check the last Check For Understanding at the end of the lesson EDA Bivariate Analysis for a hint on how to do this.

  2. Violin Plots:

    • Use a violin plot to visualize the distribution of price across different product categories. Filter out the top 20 categories based on count for better visualization.
    • Which product category tends to have the highest median price? Don't filter here by top categories.
  3. Bar Charts:

    • Create a bar chart comparing the average price of products for the top 10 product categories (based on count).
    • Which product category commands the highest average price? Don't filter here by top categories.
  4. Box Plots:

    • Visualize the distribution of product ratings based on their category using side-by-side box plots. Filter out the top 10 categories based on count for better visualization.
    • Which category tends to receive the highest median rating from customers? Don't filter here by top categories.

Part 3: Investigating the Interplay Between Product Prices and Ratings

Objective: Analyze how product ratings (stars) correlate with product prices.

  1. Correlation Coefficients:

    • Calculate the correlation coefficient between price and stars.
    • Is there a significant correlation between product price and its rating?
  2. Visualizations:

    • Use a scatter plot to visualize the relationship between product rating and price. What patterns can you observe?
    • Use a correlation heatmap to visualize correlations between all numerical variables.
    • Examine if product prices typically follow a normal distribution using a QQ plot.

Submission: Submit a Jupyter Notebook which contains code and a business-centric report summarizing your findings.

Bonus:

  • Do the same analysis without taking out the outliers. What are your insights?

lab-eda-bivariate's People

Contributors

ironhack-edu avatar toantranngoc84 avatar debironhack avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.