Giter Club home page Giter Club logo

lazy-cleaner's Introduction

Lazy-cleaner:

What it is about :

Lazy-cleaner is a python package that cleans your dataframe and that includes correcting data types and class labeling techniques and filling the Null values with more than one method saving the time of coding exeplicit methods.


Advanteges

  1. saving time.
  2. preprocessing data like a pro.
  3. user friendly for begineer Data learners.

Installation Guide

To install Lazy-cleaner Package:

  • Make sure python3 and pip is installed in your machine.
  • Use the command line and type: pip install Lazy-cleaner or pip3 install Lazy-cleaner

How to Use?

The most simple format of using Lazy-cleaner is as follow:

from Lazy_cleaner.Lazy_cleaner import stat
from Lazy_cleaner.Lazy_cleaner import fillnan
from Lazy_cleaner.Lazy_cleaner import label
from Lazy_cleaner.Lazy_cleaner import Clean_Data

import pandas as pd

Data = pd.read_csv("FileName.csv")
fill = fillnan()

New_Data = fill.simpleimputer(Data)

Existing functions:

Lazy_cleaner.fill_lin_rand( messy_df, metric , colnames ):

fill the Nan values by Linear Regression or by Random Forest.

also consider filling by median manually before using this method

  • messy_df -> data frame you want to work on.
  • metric -> model you want to work with (Linear Reg , Random forest).
  • colnames-> list of column names that contains Null vlaues.

Returns : the cleaned dataframe after filling the Null values and a list of the missing values


Lazy_cleaner.stat().p_val_of_features( df , label_column ):

calculate Ordinary Least Squares to know which features are important and make overview of the regression model.

  • df -> the dataframe.
  • label_column -> the Target column Name in string format.

Returns : summary report about the features and their p value


Lazy_cleaner.stat().na_per_data( df ):

calculate the percentage of Nan values in the whole dataset

  • df -> dataframe

returns: the percentage of Nan values in the whole dataset


Lazy_cleaner.fillnan().simpleimputer( df ):

this function fill the Null values with simple techniques like (mean,mode) depending on the data type of the column

also consider filling by median manually before using this method

  • df -> dataframe

returns: the dataframe after editing


Lazy_cleaner.fillnan().hyperimputer( df , metric = "Linear Regression" ):

this function compines between the simple imputation , the linear regression imputation and random forests imputation as in object columns it will impute with the mode and for any numerical number it will impute with linear regression or random forest

also consider filling by median manually before using this method

  • df->dataframe
  • metric ->"Linear Regression" or "Random Forests" models to fill you numeric nan values with

returns: the dataframe after editing


Lazy_cleaner.fillnan().fill_by_nex(df,columns=[]):

fill nan values by the next value

  • df -> dataframe
  • columns -> a list of columns you want to fill

returns: the dataframe after editing


Lazy_cleaner.fillnan().fill_by_perv(df,columns=[]):

this one fill nan values by the previous value

  • df -> dataframe
  • columns -> a list of columns you want to fill

returns: the dataframe after editing


Lazy_cleaner.label().to_category( df ):

change from and object datatype column into categorie datatype column

  • df-> dataframe

returns: the dataframe after editing


Lazy_cleaner.label().freq_labeling( df , column ):

replace objects by how frequent they are in a certain column

  • df -> dataframe
  • column -> column you want to apply this method on

returns: the dataframe after editing


Lazy_cleaner.Clean_Data().reduce_mem_usage( props ):

reduce memory usage of the dataframe.

  • props-> dataset you want to reduce

returns: the dataframe after editing and Nan values list


Lazy_cleaner.Clean_Data().replace_matches( df , column , string_to_match , min_ratio ):

if there is simillar words but different format due to a data entry you can apply this function to calculate similarity percentage and then change them

  • df -> dataframe
  • column -> column to edit
  • string_to_match -> string you want to match and replace
  • min_ratio -> minimum probability to excange

returns: the dataframe after editing


Lazy_cleaner.Clean_Data().drop_missing( df , thresh=55 ):

drop the columns if the missing values exceed 60% of it

  • df-> dataframe
  • thresh->percentage of the missing threshold to Delete above

returns: the dataframe after editing


Lazy_cleaner.Clean_Data().log_features( df=None ):

apply Log function to the features.

  • df -> dataframe you want to apply log func to

returns: dataset after applying log


Lazy_cleaner.Clean_Data().dealing_with_outliers( df , type_o = "z-score"):

removes outliers with two techniques z-score and IQR ("Inter-Quartile Range").

  • df -> dataframe
  • type_o -> type of the method you want to choose

Returns : the dataframe after removing the outliers


Lazy_cleaner.Clean_Data().normalize( df , columns ):

normalize columns by your choice in a dataframe

  • df -> dataframe
  • columns -> list of columns to normalize its values

returns: the dataframe after editing


Lazy_cleaner.Clean_Data().en_de( decode , typeo ):

encode and decode any string format

  • decode -> if you want to encode assign as False if decode assign True
  • typeo -> the format you want to encode to or decode from

returns: the result after encoding or decoding

lazy-cleaner's People

Contributors

ahmed-ashraf-khalil avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.