Giter Club home page Giter Club logo

datasharing's Introduction

Workflow

'run_analysis.R' script does the following action sequence:

  1. Downloading and extracting raw data
  2. Reading dictionary entities - Activity and Feature
  3. Reading and cleaning test dataset in the following way:
    1. Reads file 'subject_test.txt' with subjects and 'y_test.txt' with activities
    2. Merging observations set (X_test.txt file) with Subject and Activity which were loaded in the previous step. Activity name is used as activity value, not ID
    3. Each column is a feature. Sets column names accordingly
    4. Extract only columns 'activity', 'subject' and columns containing 'mean()' and 'std()' in names. 'Sqldf' package was used to achieve this.
  4. Reading and cleaning train dataset in the same way as test dataset (see item 3). Dev note: reading and cleaning dataset code was extracted to separate function which was reused in the 3th and 4th items.
  5. Union both test and train datasets. Sorting result dataset by Subject and Activity.
  6. Transforming column names in readable form.
    1. During the cleaning data, raw symbols '-', '(', ')' was replaced with underline symbol. For example, 'mean()' was replaced with 'mean__'. So, firstly scripts replaces several underlined symbols with one: '__' is going to be '_'
    2. Then scripts removes underlined symbols from the end of a label
    3. Observations contain values for time and frequency domains. The labels start with 't' and 'f' accordingly. Scripts expands them to 'Time' and 'Frequency'. For example, 'tBodyAcc' is transformed to 'TimeBodyAcc'
    4. 'Mag' in the labels was expanded to 'Magnitude'
    5. 'BodyBody' typo in the labels was repladed with 'Body'
  7. Creating the second dataset from the first one. The new dataset contains only average values of all observations of the 1st dataset grouping by subject and it's activity.

datasharing's People

Contributors

jtleek avatar kfeoktistoff avatar nickreich avatar rpglover64 avatar jimktrains avatar mmparker avatar nikai3d avatar snoldak924 avatar lcolladotor avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.