Giter Club home page Giter Club logo

shagunsharma14 / big_mart_sales_prediction Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 556 KB

This project is focused on analyzing and predicting sales data for a Big Mart. It involves data collection, processing, exploratory data analysis, preprocessing, and training a machine learning model using XGBoost regression. The goal is to gain insights from the data and create a predictive model to estimate sales based on various features.

Jupyter Notebook 100.00%

big_mart_sales_prediction's Introduction

Big Mart Sales Prediction

Predicting sales for Big Mart stores

Introduction

The Big Mart Sales Prediction project aims to predict the sales of products in Big Mart stores. By analyzing various features such as item weight, item visibility, and store location, the project leverages machine learning techniques to provide accurate sales predictions. This information can be used by the store management to make informed decisions regarding inventory management, promotions, and overall business strategies.

Dependencies

To run this project, the following dependencies are required:

  • numpy
  • pandas
  • matplotlib
  • seaborn
  • scikit-learn
  • xgboost

Data Collection and Processing

The project involves collecting and processing data from a CSV file. The data is loaded into a Pandas DataFrame, and basic information about the dataset is obtained. The missing values are handled, and necessary preprocessing steps are performed to prepare the data for analysis and modeling.

Code snippet:

# Importing the Dependencies
import numpy as np
import pandas as pd

# Loading the data from a CSV file to Pandas DataFrame
big_mart_data = pd.read_csv('Train.csv')

# First 5 rows of the dataframe
big_mart_data.head()

# Number of data points & number of features
print("Shape of the dataframe:", big_mart_data.shape)

# Getting some information about the dataset
big_mart_data.info()

Screenshot:

Loaded Data

Data Analysis

The project conducts a comprehensive analysis of the data, including various visualizations and statistical insights. Distribution plots and count plots are used to understand the distribution of numerical and categorical features. These visualizations provide valuable insights into the data, helping to identify patterns and trends.

Code snippet:

import matplotlib.pyplot as plt
import seaborn as sns

sns.set()

# Item_Weight distribution
plt.figure(figsize=(6,6))
sns.distplot(big_mart_data['Item_Weight'])
plt.title('Item Weight Distribution')
plt.xlabel('Item Weight')
plt.ylabel('Density')
plt.show()

# Item Visibility distribution
plt.figure(figsize=(6,6))
sns.distplot(big_mart_data['Item_Visibility'])
plt.title('Item Visibility Distribution')
plt.xlabel('Item Visibility')
plt.ylabel('Density')
plt.show()

# Item MRP distribution
plt.figure(figsize=(6,6))
sns.distplot(big_mart_data['Item_MRP'])
plt.show()

# Item_Outlet_Sales distribution
plt.figure(figsize=(6,6))
sns.distplot(big_mart_data['Item_Outlet_Sales'])
plt.show()

# Outlet_Establishment_Year column
plt.figure(figsize=(6,6))
sns.countplot(x='Outlet_Establishment_Year', data=big_mart_data)
plt.show()

# Item_Fat_Content column
plt.figure(figsize=(6,6))
sns.countplot(x='Item_Fat_Content', data=big_mart_data)
plt.show()

# Item_Type column
plt.figure(figsize=(30,6))
sns.countplot(x='Item_Type', data=big_mart_data)
plt.show()

# Outlet_Size column
plt.figure(figsize=(6,6))
sns.countplot(x='Outlet_Size', data=big_mart_data)
plt.show()

Screenshots:

Item Weight DistributionItem Visibility Distribution Item MRP distributionItem_Outlet_Sales distribution Outlet_Establishment_Year columnItem_Fat_Content_columnOutlet_Size column Item_Type column

Data Pre-Processing

The data undergoes pre-processing steps to handle missing values and prepare categorical features for modeling. Missing values are filled using appropriate techniques such as mean or mode imputation. Categorical features are encoded using label encoding to convert them into numerical representations, enabling the machine learning model to process the data effectively.

Code snippet:

from sklearn.preprocessing import LabelEncoder

# Handling missing values
big_mart_data['Item_Weight'].fillna(big_mart_data['Item_Weight'].mean(), inplace=True))

# Label Encoding
encoder = LabelEncoder()
big_mart_data['Item_Fat_Content'] = encoder.fit_transform(big_mart_data['Item_Fat_Content'])

Screenshot:

Preprocessed Data

Machine Learning Model Training

The project utilizes the XGBoost algorithm for training the machine learning model. The features and target variable are split into training and testing datasets using the train-test split method. The XGBoost regressor is trained on the training data to learn the underlying patterns and make accurate sales predictions.

Code snippet:

from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor

# Splitting features and target
X = big_mart_data.drop(columns='Item_Outlet_Sales', axis=1)
Y = big_mart_data['Item_Outlet_Sales']

# Splitting the data into training and testing data
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=2)

# XGBoost Regressor
regressor = XGBRegressor()
regressor.fit(X_train, Y_train)

Evaluation

The trained model is evaluated using evaluation metrics such as R-squared value. These metrics measure the performance of the model in predicting sales. The evaluation provides insights into how well the model generalizes to unseen data.

Code snippet:

from sklearn import metrics

# Prediction on training data
training_data_prediction = regressor.predict(X_train)
r2_train = metrics.r2_score(Y_train, training_data_prediction)
print('R-squared value on training data:', r2_train)

# Prediction on test data
test_data_prediction = regressor.predict(X_test)
r2_test = metrics.r2_score(Y_test, test_data_prediction)
print('R-squared value on test data:', r2_test)

Screenshot:

Evaluation Metrics

big_mart_sales_prediction's People

Contributors

shagunsharma14 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.