forestgeo / bciex Goto Github PK
View Code? Open in Web Editor NEWEasy access to data for examples -- from Barro Colorado Island, Panama.
Home Page: https://forestgeo.github.io/bciex/
License: Other
Easy access to data for examples -- from Barro Colorado Island, Panama.
Home Page: https://forestgeo.github.io/bciex/
License: Other
Continue work with Suzanne and other reviewers towards release.
Show here how elevation data doesn't match census data in terms of its variable names.
census <- fgeo.tool::top(bciex::bci12s7mini, sp, 2)
census
elevation <- bciex::bci_elevation
head(elevation)
# Plot positions are `gx`, `gy` in `census` versus `x`, `y` in `elevation`.
# They must have the same name. Fixing elevation
elevation <- fgeo.tool::restructure_elev(bciex::bci_elevation)
head(elevation)
From https://github.com/forestgeo/forestr/issues/33
Not twenty individuals from each quadrat … subset to like 20K individuals … let them fall out spatially as they might … we don’t want to even out the spatial distribution.
--@seanmcmc
TASK
Subset the tags of a few trees at random. This will let the individuals fall out spatially as they might (we don’t want to even out the spatial distribution).
Sean proposed using 20k individuals but for the most common use, in examples and tests, 20k seems too much. I'll start with 1000 individuals and may provide larger datasets (separately) if we really need that. In any case, the full datasets can be accessed via the bci package, and subseted as needed.
Stuart and Sean suggested to use data from BCI released in 2016. but I'll start with the data released in 2012. The data released in 2012 is more clearly public via https://repository.si.edu/handle/10088/20925. And aiming to use the latest data for examples comes at a high maintainance cost. If we wanted examples to use always the latest available data, every time there is a new census all the code that uses those examples should be updtated. This seems unnecessary trouble.
https://cran.r-project.org/web/checks/check_results_fgeo.x.html
checking dependencies in R code ... NOTE
Namespace in Imports field not imported from: ‘memoise’
All declared Imports should be used.
@seanmcm proposed that one useful way of sampling data for examples and tests is to sample 1 hectare of a plot. That should be particularly useful for spatial analyses.
Prepare for release:
Create pre-release branch.
devtools::check_win_devel()
rhub::check_for_cran()
revdepcheck::revdep_check(num_workers = 4)
Merge.
Perform release:
Create release branch
Bump version (in DESCRIPTION and NEWS)
Walk through devtools::release()
(but don't submit).
update.packages()
)?R CMD check
locally?spell_check()
)?check_rhub()
)?check_win_devel()
)?NEWS.md
file?DESCRIPTION
(with use_tidy_version()
and use_tidy_description()
)?cran-comments.md?
Merge
Check that site built OK
Release on GitHub
Release on drat
Bump dev version
Announce
Write blog post
Add link to blog post in pkgdown news menu
Tweet
Templatate at forestgeo/learn#182 (adapted from https://github.com/r-lib/usethis/issues/338
).
The datasets bci_species
and bci_wood_density
have non ASCII characters, which throw a warning during checks run prior to building an R package. In each dataset, below are the variables and values where non ASCII characters are detected, and a conversion to consider. Notice that the conversion is poor, so we may need to find a better way to convert.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(bciex)
library(purrr)
#>
#> Attaching package: 'purrr'
#> The following objects are masked from 'package:dplyr':
#>
#> contains, order_by
detect_ascii <- function(x) {!grepl("ASCII", stringi::stri_enc_mark(x))}
show_non_ascii <- function(x) {unique(x[detect_ascii(x)])}
compare_non_ascii_to_converted <- function(x) {
non_ascii <- x %>%
select_if(is.character) %>%
map(show_non_ascii) %>%
discard(is.na(.)) %>%
discard(map(., length) == 0)
converted <- map(non_ascii, stringi::stri_trans_general, "latin-ascii")
map2(non_ascii, converted, data.frame) %>%
map(set_names, c("non_ascii", "converted"))
}
# bci_species -------------------------------------------------------------
compare_non_ascii_to_converted(bci_species)
#> $Latin
#> non_ascii converted
#> 1 Inga sp.34(hoja_pequeña) Inga sp.34(hoja_pequena)
#> $Species
#> non_ascii converted
#> 1 sp.34(hoja_pequeña) sp.34(hoja_pequena)
#> $Authority
#> non_ascii
#> 1 (Müll.Arg.) Hemsl.
#> 2 Moc. & Sessé ex Dunal
#> 3 Müll.Arg.
#> 4 (Planch. & Linden) C. Ulloa & P. Jørg.
#> 5 Sessé
#> 6 (Aubrév.) T.D.Penn.
#> 7 (Kunth) Cortés
#> 8 (Willd. ex A.Juss.) Müll.Arg
#> 9 (Tul.) Müll.Arg.
#> 10 P.E.Sánchez
#> 11 (Aubrév.) Pilz
#> 12 Q.Jiménez & T.D.Penn.
#> 13 Müll.Arg.
#> 14 L'Hér.
#> 15 Benth. ex Müll.Arg.
#> 16 Allemão
#> 17 (Willd. ex Schult.) Müll.Arg.
#> 18 (Moc. & Sessé ex DC.) Standl.
#> 19 <NA>
#> 20 Trécul
#> 21 J. León
#> 22 Sessé & Moc.
#> 23 (Cav.) B.Ståhl & Källersjö
#> 24 (Sw.) Gómez de la Maza
#> converted
#> 1 (Mull.Arg.) Hemsl.
#> 2 Moc. & Sesse ex Dunal
#> 3 Mull.Arg.
#> 4 (Planch. & Linden) C. Ulloa & P. Jorg.
#> 5 Sesse
#> 6 (Aubrev.) T.D.Penn.
#> 7 (Kunth) Cortes
#> 8 (Willd. ex A.Juss.) Mull.Arg
#> 9 (Tul.) Mull.Arg.
#> 10 P.E.Sanchez
#> 11 (Aubrev.) Pilz
#> 12 Q.Jimenez & T.D.Penn.
#> 13 MA 1/4ll.Arg.
#> 14 L'HA(C)r.
#> 15 Benth. ex Mull.Arg.
#> 16 Allemao
#> 17 (Willd. ex Schult.) Mull.Arg.
#> 18 (Moc. & Sesse ex DC.) Standl.
#> 19 <NA>
#> 20 Trecul
#> 21 J. Leon
#> 22 Sesse & Moc.
#> 23 (Cav.) B.Stahl & Kallersjo
#> 24 (Sw.) Gomez de la Maza
# bci_wood_density --------------------------------------------------------
compare_non_ascii_to_converted(bci_wood_density)
#> $species
#> non_ascii converted
#> 1 <NA> <NA>
#> 2 bigll3Ã<U+0082>¡ bigll3A,A¡
#> 3 pequeÃ<U+0083>±a pequeAfA±a
#> 4 sp. ââ<U+0082>¬Ë<U+009C>hairyââ<U+0082>¬â<U+0084>¢ sp. A¢a,¬EoehairyA¢a,¬a,,¢
#> 5 ââ<U+0082>¬Ë<U+009C>giant A¢a,¬Eoegiant
#> 6 dewevrei (De Wild.) J.LÃâ<U+0082>¬ dewevrei (De Wild.) J.LA-a,¬
#> 7 normandii AubrÃâ<U+0082>¬Å<U+0092>©v. & Pe normandii AubrA-a,¬A'A(C)v. & Pe
#> 8 pellegrinianum (J.LÃâ<U+0082>¬Å<U+0092>©on pellegrinianum (J.LA-a,¬A'A(C)on
Data added at this commit https://goo.gl/UntzyS
Follows https://github.com/forestgeo/forestr/issues/33, which may be moved here.
We may submit this package to CRAN easily. We just need to solve the issues with non_ascii characters found in species and wood density data, or to exclude those datasets completetly. The benefit of releasing to CRAN is that users can install this package directly with install.packages("bciex"), i.e. they don't need devtools to install from github via devtools::install_github(forestgeo/bciex")
. Also, if bciex lives in CRAN, it is very easy to import from any other package that needs it, allowing us to avoid duplicating data for examples. Finally, this allows using Travis for continuous integration, which otherwise fails if a package is not found.
(via https://goo.gl/Fn4TDE)
Suzanne explained which variables identify uniquely each row of the following data sets:
A key to clarify my original consusion was this:
In the OLD version of the BCI R tables, StemID refers to the Stemnumber that is in ViewFullTable, not StemID. In all the NEW R tables, is unique.
Table: unique identifier
-------------------------
R Tree tables: TreeID
R Stem tables: StemID
ViewFullTable: DBHID
This explains why StemID
is not a unique identifier of ViewFullTable:
StemID uniquely identifies all stems. DBHID uniquely identifies all dbh measurements. ... If you are looking for a unique identifier in ViewFullTable, DBHID is the unique identifier. That is because a StemID can be measured several times, once for each census that it is alive.
I need to update all packages that use OLD data, where Stem tables lack a unique identifier.
One cheap solution is to fix the variable StemID
to make it indeed a unique identifier. And document this action well. This should work well for datas ets which purpose is only demonstration, not the accuracy of the data.
Another more expensive solution is to add a NEW version of the BCI data, and maybe remove the OLD version (again, NEW and OLD here mean with or without a unique identifier for the Stem tables).
Why don’t you take the new version of the BCI ViewFullTable and just subset one hectare from it to use in your samples?
Hi @laosuz,
Should habitat data have column name habitat
or habitats
?
I noticed that the habitat data you sent me (which is here in bciex has column name habitat
. But that is different compared to habitat data I saw from the Pasoh plot, which has column name habitats
(ends in s).
names(bciex::bci_habitat)
#> [1] "x" "y" "habitat"
names(pasoh::pasoh_hab_index20)
#> [1] "x" "y" "habitats" "index5" "index20"
Luckily, this small detail broke some code so I could notice the issue.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.