Giter Club home page Giter Club logo

ex-1-nn's Introduction

NAME : Shriram S

REGISTER NO. 212222240098

EX. NO.1

DATE : 22/02/2024

Introduction to Kaggle and Data preprocessing

AIM:

To perform Data preprocessing in a data set downloaded from Kaggle

EQUIPMENTS REQUIRED:

Hardware – PCs

Anaconda – Python 3.7 Installation / Google Colab /Jupyter Notebook

RELATED THEORETICAL CONCEPT:

Kaggle:

Kaggle, a subsidiary of Google LLC, is an online community of data scientists and machine learning practitioners. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.

Data preprocessing:

Pre-processing refers to the transformations applied to our data before feeding it to the algorithm. Data Preprocessing is a technique that is used to convert the raw data into a clean data set. In other words, whenever the data is gathered from different sources it is collected in raw format which is not feasible for the analysis. Data Preprocessing is the process of making data suitable for use while training a machine learning model. The dataset initially provided for training might not be in a ready-to-use state, for e.g. it might not be formatted properly, or may contain missing or null values.Solving all these problems using various methods is called Data Preprocessing, using a properly processed dataset while training will not only make life easier for you but also increase the efficiency and accuracy of your model.

Need of Data Preprocessing :

For achieving better results from the applied model in Machine Learning projects the format of the data has to be in a proper manner. Some specified Machine Learning model needs information in a specified format, for example, Random Forest algorithm does not support null values, therefore to execute random forest algorithm null values have to be managed from the original raw data set. Another aspect is that the data set should be formatted in such a way that more than one Machine Learning and Deep Learning algorithm are executed in one data set, and best out of them is chosen.

ALGORITHM:

STEP 1:

Importing the libraries

STEP 2:

Importing the dataset

STEP 3:

Taking care of missing data

STEP 4:

Encoding categorical data

STEP 5:

Normalizing the data

STEP 6:

Splitting the data into test and train

PROGRAM:

Import Libraries

from google.colab import files
import pandas as pd
import seaborn as sns
import io
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from scipy import stats
import numpy as np

Read the dataset

df=pd.read_csv("Churn_Modelling.csv")

### Checking Data
py
df.head()
df.tail()
df.columns

Check the missing data

df.isnull().sum()

Check for Duplicates

df.duplicated()

Assigning Y

y = df.iloc[:, -1].values
print(y)

Check for duplicates

df.duplicated()

Check for outliers

df.describe()

Dropping string values data from dataset

data = df.drop(['Surname', 'Geography','Gender'], axis=1)

Checking datasets after dropping string values data from dataset

data.head()

Normalize the dataset

scaler=MinMaxScaler()
df1=pd.DataFrame(scaler.fit_transform(data))
print(df1)

Split the dataset

X=df.iloc[:,:-1].values
y=df.iloc[:,-1].values
print(X)
print(y)

Training and testing model

X_train ,X_test ,y_train,y_test=train_test_split(X,y,test_size=0.2)
print("X_train\n")
print(X_train)
print("\nLenght of X_train ",len(X_train))
print("\nX_test\n")
print(X_test)
print("\nLenght of X_test ",len(X_test))

OUTPUT:

Data checking

image

Missing Data

image

Duplicates identification

image

Vakues of 'Y'

image

Outliers

image

Checking datasets after dropping string values data from dataset

image

Normalize the dataset

image

Split the dataset

image

Training and testing model

image

RESULT:

Thus, Implementation of Data Preprocessing is done in python using a data set downloaded from Kaggle.

ex-1-nn's People

Contributors

lavanyajoyce avatar shriramgh avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.