datasciencecaptoneproject's Introduction

Introduction

This work corresponds to the Peer-graded Assignment: Milestone Report. The aim is to show the exploration data and text mining done so far to prepare the final project of the data science captone project. This work have three parts. First, the text data comming from blogs, news and twitter in english is read and 10% is selected. Secondly, we create the corpus, clean the data and analyze the most frequence words and the combination of two and three words (n-grams). Finally we make some plots to show the result of the exploratory analysis. Moreover, in the end, the approach for the next step in the project is stated.

How to see the work

You can read the final html https://mcastrol.github.io/dataScienceCaptoneProject/DCCaptoneProject_FirstPart.html

Content of the repository

DCCaptoneProject_FirstPart.Rmd: Rmarkdown with the milestone development.
DCCaptoneProject_FirstPart.html: renden Html to see directly from git
full-list-of-bad-words-text-file_2018_03_26.txt: bad word excluded from the analysis

Input data can be getted in https://d396qusza40orc.cloudfront.net/dsscapstone/dataset/Coursera-SwiftKey.zip

Recommend Projects

hsc251 / datasciencecaptoneproject Goto Github PK

datasciencecaptoneproject's Introduction

Introduction

How to see the work

Content of the repository

datasciencecaptoneproject's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent