Giter Club home page Giter Club logo

getting-and-cleaning-data's Introduction

getting-and-cleaning-data

final course project

Data download and unzip

string variables for file download

fileName <- "UCIdata.zip" url <- "http://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip" dir <- "UCI HAR Dataset"

File download verification. If file does not exist, download to working directory.

if(!file.exists(fileName)){ download.file(url,fileName, mode = "wb") }

File unzip verification. If the directory does not exist, unzip the downloaded file.

if(!file.exists(dir)){ unzip("UCIdata.zip", files = NULL, exdir=".") }

Read Data

subject_test <- read.table("UCI HAR Dataset/test/subject_test.txt") subject_train <- read.table("UCI HAR Dataset/train/subject_train.txt") X_test <- read.table("UCI HAR Dataset/test/X_test.txt") X_train <- read.table("UCI HAR Dataset/train/X_train.txt") y_test <- read.table("UCI HAR Dataset/test/y_test.txt") y_train <- read.table("UCI HAR Dataset/train/y_train.txt")

activity_labels <- read.table("UCI HAR Dataset/activity_labels.txt") features <- read.table("UCI HAR Dataset/features.txt")

Analysis

1. Merges the training and the test sets to create one data set.

dataSet <- rbind(X_train,X_test)

2. Extracts only the measurements on the mean and standard deviation for each measurement.

Create a vector of only mean and std, use the vector to subset.

MeanStdOnly <- grep("mean()|std()", features[, 2]) dataSet <- dataSet[,MeanStdOnly]

4. Appropriately labels the data set with descriptive activity names.

Create vector of "Clean" feature names by getting rid of "()" apply to the dataSet to rename labels.

CleanFeatureNames <- sapply(features[, 2], function(x) {gsub("[()]", "",x)}) names(dataSet) <- CleanFeatureNames[MeanStdOnly]

combine test and train of subject data and activity data, give descriptive lables

subject <- rbind(subject_train, subject_test) names(subject) <- 'subject' activity <- rbind(y_train, y_test) names(activity) <- 'activity'

combine subject, activity, and mean and std only data set to create final data set.

dataSet <- cbind(subject,activity, dataSet)

3. Uses descriptive activity names to name the activities in the data set

group the activity column of dataSet, re-name lable of levels with activity_levels, and apply it to dataSet.

act_group <- factor(dataSet$activity) levels(act_group) <- activity_labels[,2] dataSet$activity <- act_group

5. Creates a second, independent tidy data set with the average of each variable for each activity and each subject.

check if reshape2 package is installed

if (!"reshape2" %in% installed.packages()) { install.packages("reshape2") } library("reshape2")

melt data to tall skinny data and cast means. Finally write the tidy data to the working directory as "tidy_data.txt"

baseData <- melt(dataSet,(id.vars=c("subject","activity"))) secondDataSet <- dcast(baseData, subject + activity ~ variable, mean) names(secondDataSet)[-c(1:2)] <- paste("[mean of]" , names(secondDataSet)[-c(1:2)] ) write.table(secondDataSet, "tidy_data.txt", sep = ",")

getting-and-cleaning-data's People

Contributors

sonumeher02 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.