Giter Club home page Giter Club logo

bciex's People

Contributors

laosuz avatar maurolepore avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

fdbesanto2

bciex's Issues

Prepare release

Continue work with Suzanne and other reviewers towards release.

Document dealing with elevation data

Show here how elevation data doesn't match census data in terms of its variable names.

census <- fgeo.tool::top(bciex::bci12s7mini, sp, 2)
census

elevation <- bciex::bci_elevation
head(elevation)

# Plot positions are `gx`, `gy` in `census` versus `x`, `y` in `elevation`.
# They must have the same name. Fixing elevation
elevation <- fgeo.tool::restructure_elev(bciex::bci_elevation)
head(elevation)

For examples, subset 20k individuals in total from each tree and stem dataset (2012 release)

From https://github.com/forestgeo/forestr/issues/33

Not twenty individuals from each quadrat … subset to like 20K individuals … let them fall out spatially as they might … we don’t want to even out the spatial distribution.

--@seanmcmc

TASK

Subset the tags of a few trees at random. This will let the individuals fall out spatially as they might (we don’t want to even out the spatial distribution).

  • Sean proposed using 20k individuals but for the most common use, in examples and tests, 20k seems too much. I'll start with 1000 individuals and may provide larger datasets (separately) if we really need that. In any case, the full datasets can be accessed via the bci package, and subseted as needed.

  • Stuart and Sean suggested to use data from BCI released in 2016. but I'll start with the data released in 2012. The data released in 2012 is more clearly public via https://repository.si.edu/handle/10088/20925. And aiming to use the latest data for examples comes at a high maintainance cost. If we wanted examples to use always the latest available data, every time there is a new census all the code that uses those examples should be updtated. This seems unnecessary trouble.

Sample 1 hectare

@seanmcm proposed that one useful way of sampling data for examples and tests is to sample 1 hectare of a plot. That should be particularly useful for spatial analyses.

Release on GitHub and drat (not CRAN)

Prepare for release:

  • Create pre-release branch.

  • devtools::check_win_devel()

  • rhub::check_for_cran()

  • revdepcheck::revdep_check(num_workers = 4)

  • Polish NEWS

  • Merge.

Perform release:

  • Create release branch

  • Bump version (in DESCRIPTION and NEWS)

  • Walk through devtools::release() (but don't submit).

    • Have you updated packages with (update.packages())?
    • Have you run R CMD check locally?
    • Have you checked for spelling errors (with spell_check())?
    • Were devtool's checks successful?
    • Have you checked on R-hub (with check_rhub())?
    • Have you checked on win-builder (with check_win_devel())?
    • Have you updated NEWS.md file?
    • Have you updated DESCRIPTION (with use_tidy_version() and use_tidy_description())?
    • Have you updated cran-comments.md?
  • Merge

  • Check that site built OK

  • Release on GitHub

  • Release on drat

  • Bump dev version

Announce

  • Write blog post

  • Add link to blog post in pkgdown news menu

  • Tweet

Templatate at forestgeo/learn#182 (adapted from https://github.com/r-lib/usethis/issues/338).

Should we convert non ASCII characters? If so, how?

The datasets bci_species and bci_wood_density have non ASCII characters, which throw a warning during checks run prior to building an R package. In each dataset, below are the variables and values where non ASCII characters are detected, and a conversion to consider. Notice that the conversion is poor, so we may need to find a better way to convert.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(bciex)
library(purrr)
#> 
#> Attaching package: 'purrr'
#> The following objects are masked from 'package:dplyr':
#> 
#>     contains, order_by



detect_ascii <- function(x) {!grepl("ASCII", stringi::stri_enc_mark(x))}
show_non_ascii <- function(x) {unique(x[detect_ascii(x)])}

compare_non_ascii_to_converted <- function(x) {
  non_ascii <- x %>%
    select_if(is.character) %>%
    map(show_non_ascii) %>%
    discard(is.na(.)) %>%
    discard(map(., length) == 0)

  converted <- map(non_ascii, stringi::stri_trans_general, "latin-ascii")

  map2(non_ascii, converted, data.frame) %>%
  map(set_names, c("non_ascii", "converted"))
}


# bci_species -------------------------------------------------------------

compare_non_ascii_to_converted(bci_species)

#> $Latin
#>                  non_ascii                converted
#> 1 Inga sp.34(hoja_pequeña) Inga sp.34(hoja_pequena)


#> $Species
#>             non_ascii           converted
#> 1 sp.34(hoja_pequeña) sp.34(hoja_pequena)


#> $Authority
#>                                 non_ascii
#> 1                      (Müll.Arg.) Hemsl.
#> 2                   Moc. & Sessé ex Dunal
#> 3                               Müll.Arg.
#> 4  (Planch. & Linden) C. Ulloa & P. Jørg.
#> 5                                   Sessé
#> 6                     (Aubrév.) T.D.Penn.
#> 7                          (Kunth) Cortés
#> 8            (Willd. ex A.Juss.) Müll.Arg
#> 9                        (Tul.) Müll.Arg.
#> 10                            P.E.Sánchez
#> 11                         (Aubrév.) Pilz
#> 12                  Q.Jiménez & T.D.Penn.
#> 13                             Müll.Arg.
#> 14                                L'Hér.
#> 15                    Benth. ex Müll.Arg.
#> 16                                Allemão
#> 17          (Willd. ex Schult.) Müll.Arg.
#> 18          (Moc. & Sessé ex DC.) Standl.
#> 19                                   <NA>
#> 20                                 Trécul
#> 21                                J. León
#> 22                           Sessé & Moc.
#> 23             (Cav.) B.Ståhl & Källersjö
#> 24                 (Sw.) Gómez de la Maza

#>                                 converted
#> 1                      (Mull.Arg.) Hemsl.
#> 2                   Moc. & Sesse ex Dunal
#> 3                               Mull.Arg.
#> 4  (Planch. & Linden) C. Ulloa & P. Jorg.
#> 5                                   Sesse
#> 6                     (Aubrev.) T.D.Penn.
#> 7                          (Kunth) Cortes
#> 8            (Willd. ex A.Juss.) Mull.Arg
#> 9                        (Tul.) Mull.Arg.
#> 10                            P.E.Sanchez
#> 11                         (Aubrev.) Pilz
#> 12                  Q.Jimenez & T.D.Penn.
#> 13                          MA 1/4ll.Arg.
#> 14                              L'HA(C)r.
#> 15                    Benth. ex Mull.Arg.
#> 16                                Allemao
#> 17          (Willd. ex Schult.) Mull.Arg.
#> 18          (Moc. & Sesse ex DC.) Standl.
#> 19                                   <NA>
#> 20                                 Trecul
#> 21                                J. Leon
#> 22                           Sesse & Moc.
#> 23             (Cav.) B.Stahl & Kallersjo
#> 24                 (Sw.) Gomez de la Maza



# bci_wood_density --------------------------------------------------------

compare_non_ascii_to_converted(bci_wood_density)
#> $species
#>                        non_ascii                        converted
#> 1                           <NA>                             <NA>
#> 2                     bigll3Ã<U+0082>¡                       bigll3A,A¡
#> 3                     pequeÃ<U+0083>±a                       pequeAfA±a
#> 4       sp. ââ<U+0082>¬Ë<U+009C>hairyââ<U+0082>¬â<U+0084>¢       sp. A¢a,¬EoehairyA¢a,¬a,,¢
#> 5                   ââ<U+0082>¬Ë<U+009C>giant                    A¢a,¬Eoegiant
#> 6   dewevrei (De Wild.) J.Líâ<U+0082>¬     dewevrei (De Wild.) J.LA-a,¬
#> 7 normandii Aubríâ<U+0082>¬Å<U+0092>©v. & Pe normandii AubrA-a,¬A'A(C)v. & Pe
#> 8 pellegrinianum (J.Líâ<U+0082>¬Å<U+0092>©on pellegrinianum (J.LA-a,¬A'A(C)on

Submit to CRAN?

We may submit this package to CRAN easily. We just need to solve the issues with non_ascii characters found in species and wood density data, or to exclude those datasets completetly. The benefit of releasing to CRAN is that users can install this package directly with install.packages("bciex"), i.e. they don't need devtools to install from github via devtools::install_github(forestgeo/bciex"). Also, if bciex lives in CRAN, it is very easy to import from any other package that needs it, allowing us to avoid duplicating data for examples. Finally, this allows using Travis for continuous integration, which otherwise fails if a package is not found.

Document unique identifiers

(via https://goo.gl/Fn4TDE)

Suzanne explained which variables identify uniquely each row of the following data sets:

  • ViewFullTable
  • tree tables
  • stem tables

A key to clarify my original consusion was this:

In the OLD version of the BCI R tables, StemID refers to the Stemnumber that is in ViewFullTable, not StemID. In all the NEW R tables, is unique.

In conclusion

Table: unique identifier
-------------------------
R Tree tables:  TreeID
R Stem tables:  StemID
ViewFullTable:  DBHID

This explains why StemID is not a unique identifier of ViewFullTable:

StemID uniquely identifies all stems. DBHID uniquely identifies all dbh measurements. ... If you are looking for a unique identifier in ViewFullTable, DBHID is the unique identifier. That is because a StemID can be measured several times, once for each census that it is alive.

What I need to do

I need to update all packages that use OLD data, where Stem tables lack a unique identifier.

One cheap solution is to fix the variable StemID to make it indeed a unique identifier. And document this action well. This should work well for datas ets which purpose is only demonstration, not the accuracy of the data.

Another more expensive solution is to add a NEW version of the BCI data, and maybe remove the OLD version (again, NEW and OLD here mean with or without a unique identifier for the Stem tables).

What column name should habitat data have, habitat or habitats?

Hi @laosuz,

Should habitat data have column name habitat or habitats?

I noticed that the habitat data you sent me (which is here in bciex has column name habitat. But that is different compared to habitat data I saw from the Pasoh plot, which has column name habitats (ends in s).

names(bciex::bci_habitat)
#> [1] "x"       "y"       "habitat"

names(pasoh::pasoh_hab_index20)
#> [1] "x"        "y"        "habitats" "index5"   "index20"

Luckily, this small detail broke some code so I could notice the issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.