FGVR

R package to power-up data science analysis based on learned techniques in the FGV MBA course.

Don't panic! --Douglas Adams on "The Hitchhiker's Guide to the Galaxy" book

The premise of this package is gathering a set of R functions that helps FGV MBA's students performing repetitive activities during the following steps: Data Cleaning, Data Enhancements, Data Preparation... and more!

All functions and resources available in this package was inspired on the Business Analytics and Big Data classes, where the following Professors shed some light into our minds:

Name (Discipline)	Assignment Repository
Gustavo Mirapalheta (Exploratory Data Analysis)	[https://github.com/ldaniel/Exploratory-Data-Analysis]
João Rafael Dias (Predictive Analytics)	[https://github.com/ldaniel/Predictive-Analytics]
Eduardo Francisco (Spatial statistics)	[https://github.com/ldaniel/Spatial-Statistics]
Rafael Scopel (Time Series Analysis)	[https://github.com/ldaniel/Time-Series-Analysis]
Rodrigo Togneri (Matrix Methods and Cluster Analysis)	[https://github.com/ldaniel/Matrix-Methods-Cluster-Analysis]

Thank you all for that! 😄

Contributors

Special thanks to these awesome contributors: @Daniel, @Rodrigo e @Ygor, who shared a lot of time and dedication to achieve such great work! 👊

Profile	Contributor	E-mail
	Daniel Campos	([email protected])
	Leandro Daniel	([email protected])
	Rodrigo Goncalves	([email protected])
	Ygor Lima	([email protected])

Installation

To get the current development version from github:

# install.packages("devtools")
devtools::install_github("ldaniel/fgvr")

Running

The fgvr package has a set of handy functions.

createProjectFromTemplate

This function creates an initial R project setup focused in data science.

fgvr::createProjectFromTemplate("Predictive-Analytics", "c:/temp")

The following structure will be created:

[Project root directory]
|   README.md
|   __myproject__.Rproj
|
+---data
|   +---processed
|   |       bigtable.feather
|   |       readme.txt
|   |
|   \---raw
|           game-of-thrones-deaths-data.txt
|           readme.txt
|
+---docs
|       readme.txt
|
+---images
|       readme.txt
|
+---markdown
|       01_about_the_data.Rmd
|       02_data_preparation.Rmd
|       03_exploration_report.Rmd
|       conclusion.Rmd
|       index.Rmd
|       references.Rmd
|       _pdf.Rmd
|       _site.yml
|
+---models
|       readme.txt
|       source_train_test_dataset.rds
|
\---src
    +---datapreparation
    |       execute_data_preparation.R
    |       step_01_config_environment.R
    |       step_02_data_ingestion.R
    |       step_03_data_cleaning.R
    |       step_04_label_translation.R
    |       step_05_data_enhancement.R
    |       step_06_dataset_preparation.R
    |
    +---playground
    |       playground.R
    |
    \---util
            auxiliary_functions.R
            generate_markdown_website.R

createTestAndTrainSamples

This function creates train and test datasets given a database and the Y variable. In addition, this function also returns the sample proportion for each dataset.

# using, just as an example, the sample dataset loansdefaulters, also included in the package 
base <- fgvr::loansdefaulters

# example calling the function by passing all parameters:
#   dataset    = the dataset you want to split into test and train samples.
#   yvar       = the Y variable in your dataset.
#   seed       = the seed number used to generate the train and test samples.
#                the default value is 12345.
#   percentage = the percentage of data that goes to training sample.
#                the default value is 0.7.
mydataset <- fgvr::createTestAndTrainSamples(dataset = base, yvar = "y_loan_defaulter", 
                                             seed = 12345, percentage = 0.7)

# or omitting 'seed' and 'percentage' parameters, then the default values will be used.
mydataset <- fgvr::createTestAndTrainSamples(dataset = base, yvar = "y_loan_defaulter")

# getting the final samples and proportion.
mydataset$data.train
mydataset$data.test
mydataset$event.proportion

ldaniel / fgvr Goto Github PK

fgvr's Introduction

FGVR

Contributors

Installation

Running

createProjectFromTemplate

createTestAndTrainSamples

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent