Create write_package() function

frictionless

Frictionless is an R package to read and write Frictionless Data Packages. A Data Package is a simple container format and standard to describe and package a collection of (tabular) data. It is typically used to publish FAIR and open datasets.

To get started, see:

Get started: an introduction to the package’s main functionalities.
Function reference: overview of all functions.

Installation

Install the latest released version from CRAN:

install.packages("frictionless")

Or the development version from GitHub or R-universe:

# install.packages("devtools")
devtools::install_github("frictionlessdata/frictionless-r")

# Or rOpenSci R-universe
install.packages("frictionless", repos = "https://ropensci.r-universe.dev")

Usage

With frictionless you can read data from a Data Package (local or remote) into your R environment. Here we read bird GPS tracking data from a Data Package published on Zenodo:

library(frictionless)

# Read the datapackage.json file
# This gives you access to all Data Resources of the Data Package without 
# reading them, which is convenient and fast.
package <- read_package("https://zenodo.org/records/10053702/files/datapackage.json")

package
#> A Data Package with 3 resources:
#> • reference-data
#> • gps
#> • acceleration
#> For more information, see <https://doi.org/10.5281/zenodo.10053702>.
#> Use `unclass()` to print the Data Package as a list.

# List resources
resources(package)
#> [1] "reference-data" "gps"            "acceleration"

# Read data from the resource "gps"
# This will return a single data frame, even though the data are split over 
# multiple zipped CSV files.
read_resource(package, "gps")
#> # A tibble: 73,047 × 21
#>     `event-id` visible timestamp           `location-long` `location-lat`
#>          <dbl> <lgl>   <dttm>                        <dbl>          <dbl>
#>  1 14256075762 TRUE    2018-05-25 16:11:37            4.25           51.3
#>  2 14256075763 TRUE    2018-05-25 16:16:41            4.25           51.3
#>  3 14256075764 TRUE    2018-05-25 16:21:29            4.25           51.3
#>  4 14256075765 TRUE    2018-05-25 16:26:28            4.25           51.3
#>  5 14256075766 TRUE    2018-05-25 16:31:21            4.25           51.3
#>  6 14256075767 TRUE    2018-05-25 16:36:09            4.25           51.3
#>  7 14256075768 TRUE    2018-05-25 16:40:57            4.25           51.3
#>  8 14256075769 TRUE    2018-05-25 16:45:55            4.25           51.3
#>  9 14256075770 TRUE    2018-05-25 16:50:49            4.25           51.3
#> 10 14256075771 TRUE    2018-05-25 16:55:36            4.25           51.3
#> # ℹ 73,037 more rows
#> # ℹ 16 more variables: `bar:barometric-pressure` <dbl>,
#> #   `external-temperature` <dbl>, `gps:dop` <dbl>, `gps:satellite-count` <dbl>,
#> #   `gps-time-to-fix` <dbl>, `ground-speed` <dbl>, heading <dbl>,
#> #   `height-above-msl` <dbl>, `location-error-numerical` <dbl>,
#> #   `manually-marked-outlier` <lgl>, `vertical-error-numerical` <dbl>,
#> #   `sensor-type` <chr>, `individual-taxon-canonical-name` <chr>, …

You can also create your own Data Package, add data and write it to disk:

# Create a Data Package and add the "iris" data frame as a resource
my_package <-
  create_package() %>%
  add_resource(resource_name = "iris", data = iris)

my_package
#> A Data Package with 1 resource:
#> • iris
#> Use `unclass()` to print the Data Package as a list.

# Write the Data Package to disk
my_package %>%
  write_package("my_directory")

For more functionality, see get started or the function reference.

frictionless vs datapackage.r

datapackage.r is an alternative R package to work with Data Packages. It has an object-oriented design and offers validation.

frictionless on the other hand allows you to quickly read and write Data Packages to and from data frames, getting out of the way for the rest of your analysis. It is designed to be lightweight, follows tidyverse principles and supports piping. Its validation functionality is limited to what is needed for reading and writing, see frictionless-py for extensive validation.

	test_that("add_resource() creates resource that can be passed to write_package()", {
	pkg <- example_package
	df <- data.frame(
	"col_1" = c(1, 2),
	"col_2" = factor(c("a", "b"), levels = c("a", "b", "c"))
	)
	pkg <- add_resource(pkg, "new", df)
	temp_dir <- tempdir()
	expect_invisible(write_package(pkg, temp_dir)) # Can write successfully
	unlink(temp_dir, recursive = TRUE)
	})

	test_that("check_schema() returns TRUE on valid Table Schema", {
	pkg <- example_package
	# Can't obtain df using read_resource(), because that function uses
	# check_schema() (in get_schema()) internally, which is what we want to test
	df <- suppressMessages(
	readr::read_csv(file.path(pkg$directory, pkg$resources[[1]]$path))
	)
	schema_get <- get_schema(pkg, "deployments")
	schema_create <- create_schema(df)
	expect_true(check_schema(schema_get))
	expect_true(check_schema(schema_create))
	expect_true(check_schema(schema_get, df))
	expect_true(check_schema(schema_create, df))
	})

	test_that("read_resource() returns a tibble", {
	pkg <- example_package
	df <- data.frame(
	"col_1" = c(1, 2),
	"col_2" = factor(c("a", "b"), levels = c("a", "b", "c"))
	)
	pkg <- add_resource(pkg, "new", df)

	expect_s3_class(read_resource(pkg, "deployments"), "tbl") # via path
	expect_s3_class(read_resource(pkg, "media"), "tbl") # via data
	expect_s3_class(read_resource(pkg, "new"), "tbl") # via df
	})

	# Remove elements that are NULL or empty list
	schema <- rlist::list.clean(
	schema,
	function(x) is.null(x) \| length(x) == 0L,
	recursive = TRUE
	)

	convert_format <- function(format, translations) {
	format %>% stringr::str_replace_all(translations)
	}

frictionlessdata / frictionless-r Goto Github PK

frictionless-r's Introduction

frictionless

Installation

Usage

frictionless vs datapackage.r

Meta

frictionless-r's People

Contributors

Stargazers

Watchers

Forkers

frictionless-r's Issues

Read

Manipulate

Write

create_package() from scratch

create_schema() for a df

✅ get_schema() from a resource

add_resource() to a package

✅ remove_resource()

write_package() to a directory

Atomic function

Loop function

Recode like function

create_schema() for a df

Date

Time

Datetime

Pattern implementation

Recommend Projects

Recommend Topics

Recommend Org