Giter Club home page Giter Club logo

hamr's Introduction

hamr

Build Status

Handle All Missing (Values)

Project contributors:

  1. Duong Vu
  2. Jordan Dubchak
  3. Linsey Yao

Introduction

Our package intends to explore the pattern of missing values in users' dataset and also imputes the missing values using several methods.

We decided to make this project because we have not found any package that handle both tasks in either R or Python. In R, we found Amelia and vis_dat package that only visualize the missing data. In Python we found fancyimpute that deals with missing value but does not have any visualization, and missingno that visualizes missing data. We thought this would be better package for users who do not have much experience in data wrangling.

To install please execute the following in R:

devtools::load_all()

devtools::install_github("UBC-MDS/hamr")

How to use:

Usage: vis_missing(dfm, colour="default", missing_val_char = NA)
Input:

  • dfm: a data frame or matrix containing missing values
  • colour: a base R or ggplot2 colour map, defaults to ggplot2 default
  • missing_val_char: the character representing missing values in data frame. One of: c(NA, " ", "", "?")

Output: A visualization of missing data across the data frame.

Example:

df <- data.frame(x = c(1, " ", 3), y = c(1, 8, 9))
vis_missing(df, missing_val_char = " ")

--

Usage: impute_missing(dfm, col, method, missing_val_char)
Input:

  • dfm: a data frame or a matrix with missing values
  • col: a column name (string)
  • method: a method name ("CC", "MIP", "DIP")
  • missing_val_char: missing value characters (NA, NaN, "", "?")

Output: a data frame with no missing values in the specified column

Example:

> df <- data.frame(exp = c(1, 2, 3), res = c(0, 10, ""))
> impute_missing(df, "res", "MIP", "")
  exp res
1   1   0
2   2  10
3   3   5

--

Usage: compare_model(df, feature, methods, missing_val_char)
Input:

  • df (ndarray) -- the original dataset with missing values that needs to be imputed. feature (str) -- name of a specified feature from the original dataset containing missing values that need to be imputed.

  • methods (str or list) -- the methods that users want to compare (default: ["CC","IMP"])

    • Supporting methods are:

    ​ CC - Complete Case ​ MIP - Imputation with mean value ​ DIP - Imputation with median value

  • missing_val_char (str) -- missing value types.

    • Supporting types are:

    ​ NaN - Not a Number ​ "" - Blank ​ "?" - Question mark

Output: a summary table comparing the summary statistics: count, mean, std, min, 25%, 50%, 75%, max.

Example:

> df <- data.frame(exp = c(1, 2, 3), res = c(0, 10, ""))
> compare_model(df, "res", c("CC","MIP"), "")
         column mean       sd min median max
2  res_after_CC    5 7.071068   0      5  10
3 res_after_MIP    5 5.000000   0      5  10

--

HAM in Python

This package is also available in Python.

hamr's People

Contributors

yllz avatar jdubchak avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.