Giter Club home page Giter Club logo

fgvr's Introduction

FGVR

R package to power-up data science analysis based on learned techniques in the FGV MBA course.

Don't panic! --Douglas Adams on "The Hitchhiker's Guide to the Galaxy" book

The premise of this package is gathering a set of R functions that helps FGV MBA's students performing repetitive activities during the following steps: Data Cleaning, Data Enhancements, Data Preparation... and more!

All functions and resources available in this package was inspired on the Business Analytics and Big Data classes, where the following Professors shed some light into our minds:

Name (Discipline) Assignment Repository
Gustavo Mirapalheta
(Exploratory Data Analysis)
:octocat: [https://github.com/ldaniel/Exploratory-Data-Analysis]
Joรฃo Rafael Dias
(Predictive Analytics)
:octocat: [https://github.com/ldaniel/Predictive-Analytics]
Eduardo Francisco
(Spatial statistics)
:octocat: [https://github.com/ldaniel/Spatial-Statistics]
Rafael Scopel
(Time Series Analysis)
:octocat: [https://github.com/ldaniel/Time-Series-Analysis]
Rodrigo Togneri
(Matrix Methods and Cluster Analysis)
:octocat: [https://github.com/ldaniel/Matrix-Methods-Cluster-Analysis]

Thank you all for that! ๐Ÿ˜„

Contributors

Special thanks to these awesome contributors: @Daniel, @Rodrigo e @Ygor, who shared a lot of time and dedication to achieve such great work! ๐Ÿ‘Š

Profile Contributor E-mail
Daniel Campos ([email protected])
Leandro Daniel ([email protected])
Rodrigo Goncalves ([email protected])
Ygor Lima ([email protected])

Installation

To get the current development version from github:

# install.packages("devtools")
devtools::install_github("ldaniel/fgvr")

Running

The fgvr package has a set of handy functions.

createProjectFromTemplate

This function creates an initial R project setup focused in data science.

fgvr::createProjectFromTemplate("Predictive-Analytics", "c:/temp")

The following structure will be created:

[Project root directory]
|   README.md
|   __myproject__.Rproj
|
+---data
|   +---processed
|   |       bigtable.feather
|   |       readme.txt
|   |
|   \---raw
|           game-of-thrones-deaths-data.txt
|           readme.txt
|
+---docs
|       readme.txt
|
+---images
|       readme.txt
|
+---markdown
|       01_about_the_data.Rmd
|       02_data_preparation.Rmd
|       03_exploration_report.Rmd
|       conclusion.Rmd
|       index.Rmd
|       references.Rmd
|       _pdf.Rmd
|       _site.yml
|
+---models
|       readme.txt
|       source_train_test_dataset.rds
|
\---src
    +---datapreparation
    |       execute_data_preparation.R
    |       step_01_config_environment.R
    |       step_02_data_ingestion.R
    |       step_03_data_cleaning.R
    |       step_04_label_translation.R
    |       step_05_data_enhancement.R
    |       step_06_dataset_preparation.R
    |
    +---playground
    |       playground.R
    |
    \---util
            auxiliary_functions.R
            generate_markdown_website.R

createTestAndTrainSamples

This function creates train and test datasets given a database and the Y variable. In addition, this function also returns the sample proportion for each dataset.

# using, just as an example, the sample dataset loansdefaulters, also included in the package 
base <- fgvr::loansdefaulters

# example calling the function by passing all parameters:
#   dataset    = the dataset you want to split into test and train samples.
#   yvar       = the Y variable in your dataset.
#   seed       = the seed number used to generate the train and test samples.
#                the default value is 12345.
#   percentage = the percentage of data that goes to training sample.
#                the default value is 0.7.
mydataset <- fgvr::createTestAndTrainSamples(dataset = base, yvar = "y_loan_defaulter", 
                                             seed = 12345, percentage = 0.7)

# or omitting 'seed' and 'percentage' parameters, then the default values will be used.
mydataset <- fgvr::createTestAndTrainSamples(dataset = base, yvar = "y_loan_defaulter")

# getting the final samples and proportion.
mydataset$data.train
mydataset$data.test
mydataset$event.proportion

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.