Light

lzcheng / jhudss03-getting-and-cleaning-data Goto Github PK

View Code? Open in Web Editor NEW

0.0 2.0 0.0 58.36 MB

Repo for Coursera course Getting and Cleaning Data by Johns Hopkins University

R 100.00%

jhudss03-getting-and-cleaning-data's Introduction

JHUDSS03-Getting-and-Cleaning-Data

Repo for Coursera course Getting and Cleaning Data by Johns Hopkins University

This repo contains the following files:

run_analysis.R script
ReadMe markdown document
Codebook markdown document
data folder that includes all the raw data

Instruction for running the R script:

First download the data from https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip unzip the folder and rename it as "data". Or directly download the data folder in this repo.
Make sure the data folder and the run_analysis.R file are both in the current working directory.
If you don't have tidyverse package installed, please first run install.packages("tidyverse") in RStudio.
Use the command source("run_analysis.R") in RStudio console.
You will find one output file called "tidyset.txt" in your working directory.
Use the command checkdata<-read.table("./tidyset.txt",header = T) in RStudio to read the file. The dataset has 180 means (30 subjects * 6 activities) of 81 features.

Outline of the tidying process:

Merge the measurement set (X), the activity_label set (Y) and the subject set for both the training set and the test set respectively, then merge the training set and the test set into the set all.
Rename the variables in the set all by the measured features. Extract only the features with the function mean() and std() using grep() to create a new dataset all_sub. I choose to exclude the features with angle() because they are not measuring the mean, but the angle between different vectors. The following variables are selected.

mean(): Mean value
std(): Standard deviation
meanFreq(): Weighted average of the frequency components to obtain a mean frequency

In the activity column, replace the numbers 1-6 with the descriptive labels such as "walking", "sitting" etc.
In the column/variable names, remove "()" and "-" and replace "mean" with "Mean", "std" with "Std".
Group the set all_sub by both subject and activity using the group_by function from the dplyr package. Then use the summarize_all function to find the average of each variable for each subject and each activity. The resulting dataset is named tidyset.

jhudss03-getting-and-cleaning-data's People

Contributors

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.