Giter Club home page Giter Club logo

courseproject's Introduction

Course Project

######Getting & Cleaning Data- version 1.0 Last updated 22/06/2014

Codebook

The codebook provides information on the origin of the raw data and calculations. The codebook also presents the details of the tidy data set produced by the 'run_analysis' script along with total number of instances, varaiables/features, variable names and their data type.

Run_analysis.R Script

The data is collected from the accelerometers from the Samsung Galaxy S smartphone.A link to the website is provided in the codebook. The script can simply be copied and pasted in RStudio to make it work.You will find the mechanics of the script along with relevant line numbers in the script file below.

  1. The function 'run_analysis.R' takes two parameters with in quotes(""). These are - Directory name for project data - If it does not exist, it will be created - File name for the tidy data file - code lines 9-12
  1. The .zip file is downloaded using the 'https' url and unzipping is carried out in the project data directory.- code lines 15-22
  1. The relevant training & test files are read into separate variables and then joined together to create one file for measurement data, subject id & activity numbers. - code lines 24-49
  1. Descriptive variable names from the the file 'features_info.txt' are read and assigned to the merged measurement data.- code lines 51-58
  1. A subset of the measurement data is taken based on the condition of column names containing 'mean' and 'std'.- code lines 60-68
  1. The subject id and activity numbers are merged with the above subset of measurement data.- code line 71
  1. Activity numbers are replaced with descriptive activity names.- code lines 73-81
  1. Column names are actioned upon to make them user friendly.- code lines 84-90
  1. Finally, an independent tidy data set with the average of each measurement variable for each activity and each subject is produced.- code line 95
  1. The output is saved in a text file which contains 180 records and 68 columns. This file is not saved in the project directory but in current working directory. - code lines 100-103

The data set captures each varaible in one column. A row corresponds to only one observation per subject per activity. The measurement variables are all in their respective columns.

The script is created and tested successfully in 'Rstudio - version Version 0.98.507 โ€“ ยฉ 2009-2013 RStudio, Inc.'. The OS is 'Windows 7 Home Premium - SP1'

courseproject's People

Watchers

James Cloos avatar Syed Masood avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.