Giter Club home page Giter Club logo

abhipatel35 / ml-regression-lifecycle Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 28 KB

Explore the complete lifecycle of a machine learning project focused on regression. This repository covers data acquisition, preprocessing, and training with Linear Regression, Decision Tree Regression, and Random Forest Regression models. Evaluate and compare models using R2 score. Ideal for learning and implementing regression use cases.

Python 100.00%
decision-tree-regression linear-regression machine-learning machine-learning-lifecycle ml pandas r2-score random-forest-regression regression sklearn-library

ml-regression-lifecycle's Introduction

Machine Learning Project: Complete Lifecycle for Regression Use Case

Overview

This repository presents a comprehensive guide to the end-to-end lifecycle of a machine learning project, focusing on solving a regression problem. The project encompasses key stages such as data acquisition, preprocessing, model training, testing, and evaluation.

Key Features

  • Data Preparation:

    • Load the dataset and perform exploratory data analysis.
    • Encode categorical features for modeling.
  • Model Training and Evaluation:

    • Utilize three regression models: Linear Regression, Decision Tree Regression, and Random Forest Regression.
    • Evaluate model performance using the R2 score as the evaluation metric.
  • Model Comparison:

    • Compare the performance of the three models to identify the most suitable for the regression use case.

Usage

  1. Clone the Repository:

    git clone https://github.com/abhipatel35/ML-Regression-Lifecycle.git
  2. Navigate to the Project Directory:

  3. Install Dependencies:

  4. Run the Jupyter Notebook or Python Script:

    • Open and run the Jupyter Notebook/Pycharm to execute the Python script main.py to explore the complete project.

Project Structure

  • main.py: Python script with the main project code and Jupiter notebook/ Pycharm code containing the complete project code with explanations.
  • insurance.csv: Sample dataset for the regression use case.

Data Preparation

Loading the Dataset

import pandas as pd

# Load the dataset into a DataFrame
df = pd.read_csv('insurance.csv')

Exploratory Data Analysis

# Display the first few rows of the dataset
print(df.head())

# Display the number of rows and columns
print(df.shape)

# Display data types of each column
print(df.info())

# Statistical summary of numerical features
print(df.describe())

# Check for null values
print(df.isnull().sum())

Data Encoding for Categorical Features

# Encode categorical features
df.replace({'sex': {'male': 0, 'female': 1}}, inplace=True)
df.replace({'smoker': {'yes': 0, 'no': 1}}, inplace=True)
df.replace({'region': {'southwest': 0, 'southeast': 1, 'northwest': 2, 'northeast': 3}}, inplace=True)

Separating Dependent and Independent Variables

# Separate dependent/target variable (y) and independent features (x)
x = df.drop(columns=['charges'], axis=1)
y = df['charges']

Train-Test Split

# Split the data into training and testing sets
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
print(x_train.shape)
print(x_test.shape)

Model Training and Evaluation

Linear Regression

from sklearn.linear_model import LinearRegression

# Create and train the Linear Regression model
lr = LinearRegression()
lr.fit(x_train, y_train)

# Make predictions and evaluate
lr_pred = lr.predict(x_test)
print("Linear Regression ->", r2_score(y_test, lr_pred))

Decision Tree Regression

from sklearn.tree import DecisionTreeRegressor

# Create and train the Decision Tree Regression model
dtr = DecisionTreeRegressor()
dtr.fit(x_train, y_train)

# Make predictions and evaluate
dtr_pred = dtr.predict(x_test)
print("Decision Tree Regression ->", r2_score(y_test, dtr_pred))

Random Forest Regression

from sklearn.ensemble import RandomForestRegressor

# Create and train the Random Forest Regression model
rfr = RandomForestRegressor()
rfr.fit(x_train, y_train)

# Make predictions and evaluate
rfr_pred = rfr.predict(x_test)
print("Random Forest Regression ->", r2_score(y_test, rfr_pred))

Model Comparison

After training and evaluating the three models, you can compare their performance using the R2 score. Choose the model that best suits your use case.

Once satisfied with the model's performance, it can be deployed for real-world applications.

ml-regression-lifecycle's People

Contributors

abhipatel35 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.