Multi class data set

I'm been working with {themis} documentation, and I had a hard time finding a good example data set that

  1. Includes multiple classes (4 or more)
  2. Didn't have any missing data in the response
  3. Be a reasonable response for the data set

for now, I'm using the credit_data data set with the Home variables but I need to do some prep work

credit_data0 <- credit_data %>%
  filter(Home != "ignore") %>%
  mutate(Home = as.character(Home))

For illustration, it works fine, but I worry about giving the wrong signal by balancing over an obvious non-response

Have data sets with character variables

From a quick look through the data sets, I couldn't find any that had categorical variables as encoded as character variables. It is good practice to have these variables encoded as factor variables, but it does make it harder to create good examples of turning characters variables into factor variables.

I don't know the fix since changing these variables could have annoying downstream problems

Maybe use a cache for larger data sets

To get around CRAN's package size limit, we could try and have URLs that point to data sets which would live on github in this repo, and then cache them on the user's machine.

I imagine it would look like:

data_ames <- function() {
  if (has_data_in_cache("ames")) {
  } else {

We could follow the lead of pak, which uses the following function to determine where R's global permanent cache is:

and then we could save into <cache-path>/model-data/ames.rds

To be even faster, we would only load the data once per R session. Once we load it from the cache directory, we would store it in an environment internal to modeldata and pull it from there each time data_ames() is called. So it might look more like:

data_ames <- function() {
  if (has_data_in_internal_environment("ames")) {
  } else if (has_data_in_cache("ames")) {
  } else {

The datasets themselves would actually live in a folder in this repo that would be .Rbuildignore-d. For example: inst/data/ames.rds and then ignore inst/data

sparse data set

Mainly to be used in examples and unit tests.

It would be nice if {Matrix} isn't needed to be added as suggest/import

Add str(dataset) to all data sets

I preface this issue by acknowledging that this package should be kept as small as possible.

It would be nice if all the data sets had str(dataset) in examples like Chicago does. It makes it a little easier to quickly find a data set with the features you need.

Consider replacing okc_text dataset

There was some controversy with this dataset and OkCupid data in general, and and in some ways okc_text feels a little icky to use.

In addition, the data doesn't lend itself to modeling tasks well.

It would be nice if we could find a clearly public dataset that doesn't have these problems.

As far as I can see, {textrecipes} is the only package using it: So the repercussions from this change will mostly be on my shoulders.

CC @juliasilge

call `str(dataset)` in examples

This wouldn't make a much of a difference when viewing help-files, but it would make the pkgdown site much more informative. e.g. when looking for a dataset with a date column or with a certain range of rows, I could quickly click through links in the Reference page rather than calling str() on each dataset in the console:

example without str


example with str

Can't access to data sets using `::`


Thanks for this package and all the work put in tmwr!

I noted I can't access a data set using ::. There is a particular reason why datasets cannot be accessed using this way like other data packages?


#> Error: 'ames' is not an exported object from 'namespace:modeldata'
#> Error: 'credit_data' is not an exported object from 'namespace:modeldata'

# example 1
#> # A tibble: 1,924,665 ร— 5
#>     year sex   name          n   prop
#>    <dbl> <chr> <chr>     <int>  <dbl>
#>  1  1880 F     Mary       7065 0.0724
#>  2  1880 F     Anna       2604 0.0267
#>  3  1880 F     Emma       2003 0.0205
#>  4  1880 F     Elizabeth  1939 0.0199
#>  5  1880 F     Minnie     1746 0.0179
#>  6  1880 F     Margaret   1578 0.0162
#>  7  1880 F     Ida        1472 0.0151
#>  8  1880 F     Alice      1414 0.0145
#>  9  1880 F     Bertha     1320 0.0135
#> 10  1880 F     Sarah      1288 0.0132
#> # โ€ฆ with 1,924,655 more rows

# example 2
#> # A tibble: 16 ร— 2
#>    carrier name                       
#>    <chr>   <chr>                      
#>  1 9E      Endeavor Air Inc.          
#>  2 AA      American Airlines Inc.     
#>  3 AS      Alaska Airlines Inc.       
#>  4 B6      JetBlue Airways            
#>  5 DL      Delta Air Lines Inc.       
#>  6 EV      ExpressJet Airlines Inc.   
#>  7 F9      Frontier Airlines Inc.     
#>  8 FL      AirTran Airways Corporation
#>  9 HA      Hawaiian Airlines Inc.     
#> 10 MQ      Envoy Air                  
#> 11 OO      SkyWest Airlines Inc.      
#> 12 UA      United Air Lines Inc.      
#> 13 US      US Airways Inc.            
#> 14 VX      Virgin America             
#> 15 WN      Southwest Airlines Co.     
#> 16 YV      Mesa Airlines Inc.

Created on 2022-05-10 by the reprex package (v2.0.1)

Session info
#> โ”€ Session info โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
#>  setting  value
#>  version  R version 4.2.0 (2022-04-22 ucrt)
#>  os       Windows 10 x64 (build 22000)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  Spanish_Spain.utf8
#>  ctype    Spanish_Spain.utf8
#>  tz       America/Santiago
#>  date     2022-05-10
#>  pandoc @ C:/Program Files/RStudio/bin/quarto/bin/ (via rmarkdown)
#> โ”€ Packages โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
#>  package      * version date (UTC) lib source
#>  babynames    * 1.0.1   2021-04-12 [1] CRAN (R 4.2.0)
#>  cli            3.3.0   2022-04-25 [1] CRAN (R 4.2.0)
#>  crayon         1.5.1   2022-03-26 [1] CRAN (R 4.2.0)
#>  digest         0.6.29  2021-12-01 [1] CRAN (R 4.2.0)
#>  ellipsis       0.3.2   2021-04-29 [1] CRAN (R 4.2.0)
#>  evaluate       0.15    2022-02-18 [1] CRAN (R 4.2.0)
#>  fansi          1.0.3   2022-03-24 [1] CRAN (R 4.2.0)
#>  fastmap        1.1.0   2021-01-25 [1] CRAN (R 4.2.0)
#>  fs             1.5.2   2021-12-08 [1] CRAN (R 4.2.0)
#>  glue           1.6.2   2022-02-24 [1] CRAN (R 4.2.0)
#>  highr          0.9     2021-04-16 [1] CRAN (R 4.2.0)
#>  htmltools      0.5.2   2021-08-25 [1] CRAN (R 4.2.0)
#>  knitr          1.39    2022-04-26 [1] CRAN (R 4.2.0)
#>  lifecycle      1.0.1   2021-09-24 [1] CRAN (R 4.2.0)
#>  magrittr       2.0.3   2022-03-30 [1] CRAN (R 4.2.0)
#>  modeldata    * 0.1.1   2021-07-14 [1] CRAN (R 4.2.0)
#>  nycflights13 * 1.0.2   2021-04-12 [1] CRAN (R 4.2.0)
#>  pillar         1.7.0   2022-02-01 [1] CRAN (R 4.2.0)
#>  pkgconfig      2.0.3   2019-09-22 [1] CRAN (R 4.2.0)
#>  reprex         2.0.1   2021-08-05 [1] CRAN (R 4.2.0)
#>  rlang          1.0.2   2022-03-04 [1] CRAN (R 4.2.0)
#>  rmarkdown      2.14    2022-04-25 [1] CRAN (R 4.2.0)
#>  rstudioapi     0.13    2020-11-12 [1] CRAN (R 4.2.0)
#>  sessioninfo    1.2.2   2021-12-06 [1] CRAN (R 4.2.0)
#>  stringi        1.7.6   2021-11-29 [1] CRAN (R 4.2.0)
#>  stringr        1.4.0   2019-02-10 [1] CRAN (R 4.2.0)
#>  tibble         3.1.6   2021-11-07 [1] CRAN (R 4.2.0)
#>  utf8           1.2.2   2021-07-24 [1] CRAN (R 4.2.0)
#>  vctrs          0.4.1   2022-04-13 [1] CRAN (R 4.2.0)
#>  withr          2.5.0   2022-03-03 [1] CRAN (R 4.2.0)
#>  xfun           0.30    2022-03-02 [1] CRAN (R 4.2.0)
#>  yaml           2.3.5   2022-02-21 [1] CRAN (R 4.2.0)
#>  [1] C:/Users/jbkun/AppData/Local/R/win-library/4.2
#>  [2] C:/Program Files/R/R-4.2.0/library
#> โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

Thanks so much in advance.
Kinkd regrads,

URL for rcompanion is currently bad

In the docs for crickets, we have this:


Lines 13 to 14 in df553cb

#' @source Mangiafico, S. 2015. "An R Companion for the Handbook of Biological
#' Statistics." \url{}.

That URL currently gives:

Error: Could not resolve host:

I don't know at this point if it is down for good or not, but let's check it out before the next release.

Move `master` branch to `main`

The master branch of this repository will soon be renamed to main, as part of a coordinated change across several GitHub organizations (including, but not limited to: tidyverse, r-lib, tidymodels, and sol-eng). We anticipate this will happen by the end of September 2021.

That will be preceded by a release of the usethis package, which will gain some functionality around detecting and adapting to a renamed default branch. There will also be a blog post at the time of this master --> main change.

The purpose of this issue is to:

  • Help us firm up the list of targetted repositories
  • Make sure all maintainers are aware of what's coming
  • Give us an issue to close when the job is done
  • Give us a place to put advice for collaborators re: how to adapt

message id: euphoric_snowdog

