forestgeo / bciex Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 1.0 30.68 MB

Easy access to data for examples -- from Barro Colorado Island, Panama.

Home Page: https://forestgeo.github.io/bciex/

License: Other

R 100.00%

bciex's People

Contributors

Stargazers

Watchers

Forkers

fdbesanto2

bciex's Issues

Prepare release

Continue work with Suzanne and other reviewers towards release.

Document dealing with elevation data

Show here how elevation data doesn't match census data in terms of its variable names.

census <- fgeo.tool::top(bciex::bci12s7mini, sp, 2)
census

elevation <- bciex::bci_elevation
head(elevation)

# Plot positions are `gx`, `gy` in `census` versus `x`, `y` in `elevation`.
# They must have the same name. Fixing elevation
elevation <- fgeo.tool::restructure_elev(bciex::bci_elevation)
head(elevation)

For examples, subset 20k individuals in total from each tree and stem dataset (2012 release)

From https://github.com/forestgeo/forestr/issues/33

Not twenty individuals from each quadrat … subset to like 20K individuals … let them fall out spatially as they might … we don’t want to even out the spatial distribution.

--@seanmcmc

TASK

Subset the tags of a few trees at random. This will let the individuals fall out spatially as they might (we don’t want to even out the spatial distribution).

Sean proposed using 20k individuals but for the most common use, in examples and tests, 20k seems too much. I'll start with 1000 individuals and may provide larger datasets (separately) if we really need that. In any case, the full datasets can be accessed via the bci package, and subseted as needed.
Stuart and Sean suggested to use data from BCI released in 2016. but I'll start with the data released in 2012. The data released in 2012 is more clearly public via https://repository.si.edu/handle/10088/20925. And aiming to use the latest data for examples comes at a high maintainance cost. If we wanted examples to use always the latest available data, every time there is a new census all the code that uses those examples should be updtated. This seems unnecessary trouble.

Fix CRAN NOTE

https://cran.r-project.org/web/checks/check_results_fgeo.x.html

checking dependencies in R code ... NOTE
Namespace in Imports field not imported from: ‘memoise’
All declared Imports should be used.

Sample 1 hectare

@seanmcm proposed that one useful way of sampling data for examples and tests is to sample 1 hectare of a plot. That should be particularly useful for spatial analyses.

Release on GitHub and drat (not CRAN)

Prepare for release:

Create pre-release branch.
devtools::check_win_devel()
rhub::check_for_cran()
revdepcheck::revdep_check(num_workers = 4)
Polish NEWS
- Use temporary header as pkg version.9000 (pre-release)
- Follow https://style.tidyverse.org/news.html
Merge.

Perform release:

Announce

Write blog post
Add link to blog post in pkgdown news menu
Tweet

Templatate at forestgeo/learn#182 (adapted from https://github.com/r-lib/usethis/issues/338).

Should we convert non ASCII characters? If so, how?

The datasets bci_species and bci_wood_density have non ASCII characters, which throw a warning during checks run prior to building an R package. In each dataset, below are the variables and values where non ASCII characters are detected, and a conversion to consider. Notice that the conversion is poor, so we may need to find a better way to convert.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(bciex)
library(purrr)
#> 
#> Attaching package: 'purrr'
#> The following objects are masked from 'package:dplyr':
#> 
#>     contains, order_by



detect_ascii <- function(x) {!grepl("ASCII", stringi::stri_enc_mark(x))}
show_non_ascii <- function(x) {unique(x[detect_ascii(x)])}

compare_non_ascii_to_converted <- function(x) {
  non_ascii <- x %>%
    select_if(is.character) %>%
    map(show_non_ascii) %>%
    discard(is.na(.)) %>%
    discard(map(., length) == 0)

  converted <- map(non_ascii, stringi::stri_trans_general, "latin-ascii")

  map2(non_ascii, converted, data.frame) %>%
  map(set_names, c("non_ascii", "converted"))
}


# bci_species -------------------------------------------------------------

compare_non_ascii_to_converted(bci_species)

#> $Latin
#>                  non_ascii                converted
#> 1 Inga sp.34(hoja_pequeña) Inga sp.34(hoja_pequena)


#> $Species
#>             non_ascii           converted
#> 1 sp.34(hoja_pequeña) sp.34(hoja_pequena)


#> $Authority
#>                                 non_ascii
#> 1                      (Müll.Arg.) Hemsl.
#> 2                   Moc. & Sessé ex Dunal
#> 3                               Müll.Arg.
#> 4  (Planch. & Linden) C. Ulloa & P. Jørg.
#> 5                                   Sessé
#> 6                     (Aubrév.) T.D.Penn.
#> 7                          (Kunth) Cortés
#> 8            (Willd. ex A.Juss.) Müll.Arg
#> 9                        (Tul.) Müll.Arg.
#> 10                            P.E.Sánchez
#> 11                         (Aubrév.) Pilz
#> 12                  Q.Jiménez & T.D.Penn.
#> 13                             MÃ¼ll.Arg.
#> 14                                L'HÃ©r.
#> 15                    Benth. ex Müll.Arg.
#> 16                                Allemão
#> 17          (Willd. ex Schult.) Müll.Arg.
#> 18          (Moc. & Sessé ex DC.) Standl.
#> 19                                   <NA>
#> 20                                 Trécul
#> 21                                J. León
#> 22                           Sessé & Moc.
#> 23             (Cav.) B.Ståhl & Källersjö
#> 24                 (Sw.) Gómez de la Maza

#>                                 converted
#> 1                      (Mull.Arg.) Hemsl.
#> 2                   Moc. & Sesse ex Dunal
#> 3                               Mull.Arg.
#> 4  (Planch. & Linden) C. Ulloa & P. Jorg.
#> 5                                   Sesse
#> 6                     (Aubrev.) T.D.Penn.
#> 7                          (Kunth) Cortes
#> 8            (Willd. ex A.Juss.) Mull.Arg
#> 9                        (Tul.) Mull.Arg.
#> 10                            P.E.Sanchez
#> 11                         (Aubrev.) Pilz
#> 12                  Q.Jimenez & T.D.Penn.
#> 13                          MA 1/4ll.Arg.
#> 14                              L'HA(C)r.
#> 15                    Benth. ex Mull.Arg.
#> 16                                Allemao
#> 17          (Willd. ex Schult.) Mull.Arg.
#> 18          (Moc. & Sesse ex DC.) Standl.
#> 19                                   <NA>
#> 20                                 Trecul
#> 21                                J. Leon
#> 22                           Sesse & Moc.
#> 23             (Cav.) B.Stahl & Kallersjo
#> 24                 (Sw.) Gomez de la Maza



# bci_wood_density --------------------------------------------------------

compare_non_ascii_to_converted(bci_wood_density)
#> $species
#>                        non_ascii                        converted
#> 1                           <NA>                             <NA>
#> 2                     bigll3Ã<U+0082>Â¡                       bigll3A,A¡
#> 3                     pequeÃ<U+0083>Â±a                       pequeAfA±a
#> 4       sp. Ã¢â<U+0082>¬Ë<U+009C>hairyÃ¢â<U+0082>¬â<U+0084>¢       sp. A¢a,¬EoehairyA¢a,¬a,,¢
#> 5                   Ã¢â<U+0082>¬Ë<U+009C>giant                    A¢a,¬Eoegiant
#> 6   dewevrei (De Wild.) J.LÃâ<U+0082>¬     dewevrei (De Wild.) J.LA-a,¬
#> 7 normandii AubrÃâ<U+0082>¬Å<U+0092>Â©v. & Pe normandii AubrA-a,¬A'A(C)v. & Pe
#> 8 pellegrinianum (J.LÃâ<U+0082>¬Å<U+0092>Â©on pellegrinianum (J.LA-a,¬A'A(C)on

Update package with data sent by Suzanne on 2018-03-20

Data added at this commit https://goo.gl/UntzyS

Document data as generically as possible

Follows https://github.com/forestgeo/forestr/issues/33, which may be moved here.

Submit to CRAN?

We may submit this package to CRAN easily. We just need to solve the issues with non_ascii characters found in species and wood density data, or to exclude those datasets completetly. The benefit of releasing to CRAN is that users can install this package directly with install.packages("bciex"), i.e. they don't need devtools to install from github via devtools::install_github(forestgeo/bciex"). Also, if bciex lives in CRAN, it is very easy to import from any other package that needs it, allowing us to avoid duplicating data for examples. Finally, this allows using Travis for continuous integration, which otherwise fails if a package is not found.

Document unique identifiers

(via https://goo.gl/Fn4TDE)

Suzanne explained which variables identify uniquely each row of the following data sets:

ViewFullTable
tree tables
stem tables

A key to clarify my original consusion was this:

In the OLD version of the BCI R tables, StemID refers to the Stemnumber that is in ViewFullTable, not StemID. In all the NEW R tables, is unique.

In conclusion

Table: unique identifier
-------------------------
R Tree tables:  TreeID
R Stem tables:  StemID
ViewFullTable:  DBHID

This explains why StemID is not a unique identifier of ViewFullTable:

StemID uniquely identifies all stems. DBHID uniquely identifies all dbh measurements. ... If you are looking for a unique identifier in ViewFullTable, DBHID is the unique identifier. That is because a StemID can be measured several times, once for each census that it is alive.

What I need to do

I need to update all packages that use OLD data, where Stem tables lack a unique identifier.

One cheap solution is to fix the variable StemID to make it indeed a unique identifier. And document this action well. This should work well for datas ets which purpose is only demonstration, not the accuracy of the data.

Another more expensive solution is to add a NEW version of the BCI data, and maybe remove the OLD version (again, NEW and OLD here mean with or without a unique identifier for the Stem tables).

Push to bitbucket for Continuous Integration for free

Add new version of the BCI ViewFullTable

Why don’t you take the new version of the BCI ViewFullTable and just subset one hectare from it to use in your samples?

What column name should habitat data have, habitat or habitats?

Hi @laosuz,

Should habitat data have column name habitat or habitats?

I noticed that the habitat data you sent me (which is here in bciex has column name habitat. But that is different compared to habitat data I saw from the Pasoh plot, which has column name habitats (ends in s).

names(bciex::bci_habitat)
#> [1] "x"       "y"       "habitat"

names(pasoh::pasoh_hab_index20)
#> [1] "x"        "y"        "habitats" "index5"   "index20"

Luckily, this small detail broke some code so I could notice the issue.