Giter Club home page Giter Club logo

tidyr's Introduction

tidyr

CRAN status Travis build status R build status Codecov test coverage

Overview

The goal of tidyr is to help you create tidy data. Tidy data is data where:

  1. Every column is variable.
  2. Every row is an observation.
  3. Every cell is a single value.

Tidy data describes a standard way of storing data that is used wherever possible throughout the tidyverse. If you ensure that your data is tidy, you’ll spend less time fighting with the tools and more time working on your analysis. Learn more about tidy data in vignette("tidy-data").

Installation

# The easiest way to get tidyr is to install the whole tidyverse:
install.packages("tidyverse")

# Alternatively, install just tidyr:
install.packages("tidyr")

# Or the development version from GitHub:
# install.packages("devtools")
devtools::install_github("tidyverse/tidyr")

Cheatsheet

Getting started

library(tidyr)

tidyr functions fall into five main categories:

  • “Pivotting” which converts between long and wide forms. tidyr 1.0.0 introduces pivot_longer() and pivot_wider(), replacing the older spread() and gather() functions. See vignette("pivot") for more details.

  • “Rectangling”, which turns deeply nested lists (as from JSON) into tidy tibbles. See unnest_longer(), unnest_wider(), hoist(), and vignette("rectangle") for more details.

  • Nesting converts grouped data to a form where each group becomes a single row containing a nested data frame, and unnesting does the opposite. See nest(), unnest(), and vignette("nest") for more details.

  • Splitting and combining character columns. Use separate() and extract() to pull a single character column into multiple columns; use unite() to combine multiple columns into a single character column.

  • Make implicit missing values explicit with complete(); make explicit missing values implicit with drop_na(); replace missing values with next/previous value with fill(), or a known value with replace_na().

Related work

tidyr replaces reshape2 (2010-2014) and reshape (2005-2010). Somewhat counterintuitively, each iteration of the package has done less. tidyr is designed specifically for tidying data, not general reshaping (reshape2), or the general aggregation (reshape).

data.table provides high-performance implementations of melt() and dcast()

If you’d like to read more about data reshaping from a CS perspective, I’d recommend the following three papers:

To guide your reading, here’s a translation between the terminology used in different places:

tidyr gather spread
reshape(2) melt cast
spreadsheets unpivot pivot
databases fold unfold

Getting help

If you encounter a clear bug, please file a minimal reproducible example on github. For questions and other discussion, please use community.rstudio.com.


Please note that the tidyr project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

tidyr's People

Contributors

hadley avatar lionel- avatar batpigandme avatar jennybc avatar krlmlr avatar markdly avatar jimhester avatar williamlai2 avatar davisvaughan avatar aaronwolen avatar joshkatz avatar atusy avatar romainfrancois avatar wibeasley avatar rmsharp avatar mine-cetinkaya-rundel avatar charliejhadley avatar lorenzwalthert avatar juliasilge avatar yutannihilation avatar ax42 avatar stufield avatar simoncoulombe avatar zeehio avatar samanthatoet avatar seabbs avatar salim-b avatar ryo-n7 avatar rbloehm avatar schoonees avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.