Giter Club home page Giter Club logo

data-science-tidy's Introduction

Introduction to Data Science in the Tidyverse

rstudio::conf 2020

by Amelia McNamara and Hadley Wickham


๐Ÿ—“๏ธ January 27 and 28, 2020
โฐ 09:00 - 17:00
๐Ÿจ Plaza B (lobby level)
โœ๏ธ rstd.io/conf


Overview

This is a two-day, hands-on workshop designed for people who are brand new to R & RStudio and who learn best in person.

You will learn the basics of R and data science, and practice using the RStudio IDE (integrated development environment). We'll discuss much of the material from the book R for Data Science, including data visualization (ggplot2), data transformation and tidying (dplyr, tidyr), understanding special data types (stringr, forcats, lubridate), and modeling (broom). Throughout the workshop, we'll work in RMarkdown documents, and learn best practices for data computing.

If you want to transition from coding in base R to the tidyverse, or just jump into doing data science in the tidyverse without any prior R experience, this is the workshop for you! We will have a team of TAs on hand to show you the ropes, and help you out when you get stuck.

Learning objectives

The basics of doing data science in the tidyverse, including data visualization, data transformation and tidying, understanding special data types, and basic modeling.

Is this course for me?

Consider these questions:

  1. You have a dataset of prices of diamonds, as well as their size. Could you make a scatterplot of the two variables using ggplot2?

  2. You have two datasets, one with information on music genres and age ranges, the other with genres and radio station call names. Can you imagine how you would join them together with a dplyr verb?

  3. We want to model the wages of people in the United States, using their height and education as predictors. Then, we would like to plot model predictions for each level of educational attainment. Can you imagine how to do this in R?

If you answered "no" to any/all of those questions... great! This workshop is for you. By the end of the two days, you should be able to accomplish all those tasks. If you answered "yes" to all three questions, you may want to consider taking a different workshop.

Prework

You'll be using RStudio Cloud, a cloud-based version of R and RStudio available through your web browser. So (all going well) on the day of the workshop all you'll need is a laptop that can access the internet (wifi will be available). Please do sign up for an account on RStudio Cloud before the workshop. You can make an account directly on RStudio Cloud, or use single-sign-on with a service like GitHub or Google.

In the unlikely event that there are problems with the conference internet connection, you may want to have a local installation on your computer as a backup. If you'd like, install the following:

  1. A recent version of R (~3.6), which is available for free at cran.r-project.org

  2. A recent version of RStudio IDE (~1.2.5033), available for free at www.rstudio.com/download

  3. The set of relevant R packages, which you can install by connecting to the internet, opening RStudio, and running:

    install.packages(c("babynames", "fivethirtyeight", "formatR", "gapminder", "hexbin", "mgcv", "maps", "mapproj", "nycflights13", "rmarkdown", "skimr", "tidyverse", "viridis"))

Don't forget to bring your power cord!

Schedule

Time Activity
09:00 - 10:30 Session 1
10:30 - 11:00 Coffee break
11:00 - 12:30 Session 2
12:00 - 13:30 Lunch break Grand Ballroom
13:30 - 15:00 Session 3
15:00 - 15:30 Coffee break
15:30 - 17:00 Session 4

Instructors

Amelia McNamara (she/her) is an assistant professor of statistics in the Department of Computer and Information Sciences at the University of St Thomas, in Minnesota. She is an RStudio Certified Trainor, as well as a Carpentries Certified Instructor. This is her third year teaching an rstudio::conf workshop.

Hadley Wickham (he/him) is the Chief Scientist at RStudio. He is the author of many R packages, including major components of the tidyverse, including ggplot2, dplyr, tidyr, purrr, and readr. He is also the author of the book R for Data Science with Garrett Grolemund, on which much of this workshop is based.

TAs

We have a fantastic slate of TAs assisting at this workshop! They are:

  • Jesse Mostipak
  • Ben Baumer
  • Matthew Flickinger
  • Mike Smith
  • David Keyes

This work is licensed under a Creative Commons Attribution 4.0 International License.

data-science-tidy's People

Contributors

ameliamn avatar mine-cetinkaya-rundel avatar hadley avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.