Giter Club home page Giter Club logo

soilreports's Introduction

R-CMD-check

soilReports

Reports are a handy way to summarize large volumes of data, particularly with figures and tables. soilReports is an R package "container" designed to accommodate the maintenance, documentation, and distribution of R-based reporting tools. Inside the package are report templates, setup files, documentation, and example configuration files.

The soilReports package provides a couple important helper functions that do most of the work:

  • listReports(): print a listing of the available reports, version numbers, and basic metadata
  • reportSetup(...): download any R packages required by the named report, e.g. "region2/mu-comparison"
  • reportInit(...) | reportCopy(...): copy a named report template into a specific directory
  • reportUpdate(...): update a named report in a specific directory, replacing report.Rmd only

Each report contains several files:

  • report.Rmd: an R Markdown file that is "knit" into a final HTML or DOC report
  • README.md: report-specific instructions
  • custom.R: report-specific functions
  • categorical_definitions.R: report-specific color mapping and metadata for categorical raster data (user-editable)
  • config.R: configuration file to set report parameters (user-editable)
  • changes.txt: notes on changes and associated version numbers

R Profile Setup

NOTE: The following instructions are rarely, if ever, needed with R 4.2+

On many of our machines, the $HOME directory points to a network share. This can cause all kinds of problems when installing R packages, especially if you connect to the network by VPN. The following code is a one-time solution and will cause R packages to be installed on a local disk by adding an .Rprofile file to your $HOME directory. This file will instruct R to use C:/Users/FirstName.LastName/Documents/R/ for installing R packages. Again, you only have to do this once.

# determine your current $HOME directory
path.expand('~')

# install .Rprofile
source('https://raw.githubusercontent.com/ncss-tech/soilReports/master/R/installRprofile.R')
installRprofile(overwrite=TRUE)

soilReports Installation - First time or after R upgrade

Run this code if you don't yet have the soilReports package or after a new version of R has been installed on your machine.

# need devtools to install packages from GitHub
install.packages('remotes', dep = TRUE)

# get the latest version of the 'soilReports' package
remotes::install_github("ncss-tech/soilReports", dependencies = FALSE, upgrade_dependencies = FALSE) 

Choose an Available Report

Example Output

Reports for Raster Summary by MU or MLRA

Reports for DMU QC/QA

Reports for Pedon Data

Run a Report - Example: Map Unit Comparison report

# load this library
library(soilReports)

# list reports in the package
listReports()

# install required packages for a named report
reportSetup(reportName='region2/mu-comparison')

# copy report file 'MU-comparison' to your current working directory
reportInit(reportName='region2/mu-comparison', outputDir='MU-comparison')

Updating Existing Reports - Example: Map Unit Comparison report

Updates to report templates, documentation, and custom functions are available after installing the latest soilReports package from GitHub. Use the following examples to update an existing copy of the "region2/mu-comparison" report. Note that your existing configuration files will not be modified.

# get latest version of package + report templates
remotes::install_github("ncss-tech/soilReports", dependencies=FALSE, upgrade_dependencies=FALSE)

# load this library
library(soilReports)

# get any new packages that may be required by the latest version
reportSetup(reportName='region2/mu-comparison')

# overwrite report files in an existing report instance (does NOT overwrite config)
reportUpdate(reportName='region2/mu-comparison', outputDir='MU-comparison')

Suggested Background Material

Troubleshooting

  • If you haven't run R in a while, consider updating all packages with: update.packages(ask=FALSE, checkBuilt=TRUE).
  • Make sure that all raster data sources are GDAL-compatible formats: GeoTiff, ERDAS IMG, ArcGRID, etc. (not ESRI FGDB)
  • Make sure that the map unit polygon data source is an OGR-compatible format: ESRI SHP, ESRI FGDB, etc.
  • Make sure that the extent of raster data includes the full extent of map unit polygon data.
  • If there is a problem installing packages with reportSetup(), consider adding the upgrade=TRUE argument.
  • If you are encountering errors with "Knit HTML" in RStudio, try: update.packages(ask=FALSE, checkBuilt=TRUE).

TODO

See issue tracker for TODO items.

Related Packages

soilreports's People

Contributors

alenars avatar brownag avatar dylanbeaudette avatar hammerly avatar jennifer-wood avatar kant avatar smroecker avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

soilreports's Issues

explore other ordination methods

  1. tried other MDS methods:
    • MASS::sammon() fails when there are duplicates in the initial configuration
    • vegan::monoMDS() gives similar results as MASS::isoMDS()
    • MASS::isoMDS() is the fastest, most stable algorithm I have tried
    • tried tsne algorithm (https://lvdmaaten.github.io/tsne/), slow and runs out of memory with large datasets

bootstrap package installation when .Rprofile isn't in place

There is no clean way to get devtools and soilReports into the "correct" library paths when:

  1. an .Rprofile file is missing
  2. HOME is set to H:/

Possible solution: source installRprofile() from GH and run before installing anything.

TODO: test!

add slope class breakdown

  1. slope class defs in config.R
  2. classify slope map (if there is one) and present as proportions
  3. sanity checks

Testing of new effective sampling size calculations and box-whisker plots

New process:

  1. load rasters into memory if possible
  2. perform cursory grid-based sampling to determine Moran's I of each raster
  3. set sampling intensity based on I of each raster

This is all implemented in:

  • sharpshootR::sampleRasterStackByMU(..., estimateEffectiveSampleSize=TRUE)
  • sharpshootR::Moran_I_ByRaster()
  • sharpshootR::ESS_by_Moran_I()

These functions have not been extensively documented or tested.

Interactive (Shiny) reports - pedon summary report example

In the 'region2' folder under branch 'AGB' you will find a skeleton of a pedon summary report that employs Shiny and flexdashboard for user interaction/visualization of pedon data. A static report can be knitted after tweaking parameters of interest.

There are a few different moving parts here that extend beyond the typical case we have for soilReports.

The main workhorse is shiny.Rmd. This is the file that would be run from within Rstudio.

shiny.Rmd sources config.R, utility_functions.R, and main.R.

Upon running shiny.Rmd, an interface showing data from the user's NASIS selected set will appear. config.R allows a few different settings (linework data source, caching, generalized horizons, as well as loading of raster data sources) that are pre-requisites for creating the Shiny interface and its contents.

From the Shiny interface, the user can utilize regex patterns to filter pedon data by MUSYM, taxon name, pedonID or can explicitly specify a list of pedons. Also, they can select a 'modal' pedon from their subset for comparison.

The tabs to the right of the input panel correspond to sections of the pedon summary report with some new additions that display modal pedon data, horizon generalization, etc.

Once a user is happy with the selected set of pedons and is ready to create a permanent record of their work, they hit the "Export" button which will knit a static HTML file containing the same information displayed in the Shiny interface. The R objects that are built in the Shiny interface are passed on to the knit session by creating an R environment containing the relevant dependencies for the template "report.Rmd" to run. "Export" also dumps out copies of the R objects (associated component table, soil profile collections) and tabular output that uniquely identifies the pedons used.

The issue here is how can this style of report be integrated into soilReports?

Let me know what you think if you get a chance to try this out. You'll need pedons and some DMU components in your selected set, update the config to point to your linework, and then once the dashboard loads will need to adjust the default MUSYM pattern for parsing the NASIS selected set.

add option to cache samples

Caching samples between report runs would speed report development and tweaking of parameters. However, it could cause confusion when a report is re-run after changes have been made to the linework.

Consider setting this flag to FALSE as a default. Also, this may be related to debugging output such as chunk timing.

estimate effective DF from spatial samples

Graphical and formalized comparison are difficult due to the VERY high spatial autocorrelation. How can we down-weight the DF accordingly?

Ideas

1. http://www.inside-r.org/packages/cran/SpatialPack/docs/modified.ttest
2. use `clhs` output for bwplots and distributional tests... realistic?
3. weighting via local Moran's I
4. https://github.com/jebyrnes/spatial_correction_lavaan
5. [comparison of methods](http://www.petrkeil.com/?p=1050)
6. [faster ? calculation via ape::Moran.I and distance matrix](http://www.ats.ucla.edu/stat/r/faq/morans_i.htm)

Once we figure out an accurate and scale-able approach, we still need to figure out how to get these adjust sample size numbers into custom.bwplot() via bwplot(). This may require a custom panel function that performs a look-up against the spatial stats summary table.

report configuration

Ideally, most reports should be configurable via:

  • config.R
  • interactive report parameters
  • parameters specified in a call to render() or in the report YAML header

There are cases where each has benefits. Converting the "region 2" reports to use rmarkdown report parameter names shouldn't take all that long.

More expressive raster list in config.R

Start with something like this:

raster.list <- list(
  continuous=list(
    `Mean Annual Air Temperature (degrees C)`='E:/gis_data/prism/final_MAAT_800m.tif',
    `Mean Annual Precipitation (mm)`='E:/gis_data/prism/final_MAP_mm_800m.tif',
    `Effective Precipitation (mm)`='E:/gis_data/prism/effective_precipitation_800m.tif',
    `Frost-Free Days`='E:/gis_data/prism/ffd_mean_800m.tif',
    `Growing Degree Days (degrees C)`='E:/gis_data/prism/gdd_mean_800m.tif',
    `Elevation (m)`='E:/gis_data/region-2-mu-analysis/elev_30.tif',
    `Slope Gradient (%)`='E:/gis_data/region-2-mu-analysis/slope_30.tif',
    `Annual Beam Radiance (MJ/sq.m)`='E:/gis_data/ca630/beam_rad_sum_mj_30m.tif',
    `(Estimated) MAST (degrees C)`='E:/gis_data/ca630/mast-model.tif',
    `Compound Topographic Index`='E:/gis_data/ca630/tci30.tif',
    `MRVBF`='E:/gis_data/ca630/mrvbf_10.tif',
    `SAGA TWI`='E:/gis_data/ca630/saga_twi_10.tif'
  ),
  categorical=list(
    `Geomorphon Landforms`='L:/Geodata/DEM_derived/forms10.tif',
    `Curvature Classes`='E:/gis_data/ca630/curvature_classes_15.tif'
  ),
  circular=list(`Slope Aspect (degrees)`='E:/gis_data/region-2-mu-analysis/aspect_30.tif')
)

"to check" polygon output

Add an additional .SHP file to output that includes information regarding polygons with above-threshold proportion of sample values outside 5-95% percentile range for the MU.

Currently, statistics shapefile output contains median values for each raster and the "toCheck" flag which is a ranking based on the amount of samples outside range. This polygon is useful for symbolizing where problematic polygons occur, and also the distribution of abiotic factors (vis-ร -vis the median) across a MU extent... but does not tell why a particular polygon is flagged. This can rarely be determined from the median values unless it is a very extreme case.

The new shapefile will follow same format as the stats shapefile (1 column per raster) only instead of medians, contains the "proportion of samples outside range" for each raster data source.

This will reduce the iterative process of looking up polygon IDs to see why they were flagged. Currently that information is only available in the tabular output at the end of the report HTML file.

In addition to reducing iteration between the report and the shapefile display in e.g. ArcMap this will allow symbolizing of MUs based on proportion outside range for INDIVIDUAL data sources rather than an aggregate of all data sources.

add KSSL depth-slice summaries to MLRA report

image

This would require a pre-made slab-style database of aggregate data for all MLRA. New configuration options in config.R could allow for specification of 4-5 soil properties from 10-20 possible properties.

Sampling / Aggregating:

library(soilDB)
# iterate over MLRA in current shapefile or local list of codes
x <- fetchKSSL(mlra = '136')

# aggregate, property selection important
a <- slab(x, mlra ~ clay + estimated_ph_h2o + caco3 + bs82)

# save to file

Combine into database:

# assemble samples into single file and distribute

Symbolize in report

library(RColorBrewer)
library(lattice)

# adjust factor labels for MLRA to include number of pedons
pedons.per.mlra <- tapply(site(x)$mlra, site(x)$mlra, length)
a$mlra <- factor(a$mlra, levels=names(pedons.per.mlra), labels=paste(names(pedons.per.mlra), ' (', pedons.per.mlra, ' profiles)', sep=''))

# re-name variables
a$variable <- factor(a$variable, labels=c('Clay %', 'pH 1:1 Water', 'CaCO3 Equiv. (%)', 'Base Sat. pH 8.2'))

# make some nice colors
cols <- brewer.pal('Set1', n=7)

xyplot(
  top ~ p.q50 | variable, groups=mlra, data=a, lower=a$p.q25, upper=a$p.q75, 
  ylim=c(170,-5), alpha=0.25, scales=list(y=list(tick.num=7, alternating=3), x=list(relation='free',alternating=1)),
  panel=panel.depth_function, prepanel=prepanel.depth_function, sync.colors=TRUE, asp=1.5,
  ylab='Depth (cm)', xlab='median bounded by 25th and 75th percentiles', strip=strip.custom(bg=grey(0.85)),
  par.settings=list(superpose.line=list(col=cols, lty=c(1,2,3), lwd=2)),
  auto.key=list(columns=3, title='MLRA', points=FALSE, lines=TRUE),
  sub=paste(length(x), 'profiles')
)

abbreviateNames() does not always return unique names

This is typically only a problem when someone has created their input features (mu) with many tables joined to it. Some options:

  • filter out all columns except mu.col
  • smarter abbreviateNames()

The context of this error is:

poly.check.wide <- dcast(polygons.to.check, pID ~ variable, value.var = 'prop.outside.range')
poly.check.wide[is.na(poly.check.wide)] = 0 #replace NAs with zero (no samples outside 5-95% percentile range)

mu.check <- merge(mu, poly.check.wide, by='pID', all.x=TRUE)
names(mu.check)[-1] <- abbreviateNames(mu.check)

# fix names for printing
names(polygons.to.check)[1] <- mu.col

# print table (removed from report now that shapefile with proportions outside range is generated)
#kable(polygons.to.check[display.idx,], row.names = FALSE) #only shows polys with p.crit > 0.15 in report tabular output

#save a SHP file with prop.outside.range for each polygon and raster data source combination
if(nrow(polygons.to.check) > 0) {
  shp.fname <- paste0('poly-qc-', paste(mu.set, collapse='_'))
  writeOGR(mu.check, dsn='output', layer=shp.fname, driver='ESRI Shapefile', overwrite_layer=TRUE)
  write.csv(mu.check,file=paste0("output\\",shp.fname,".csv")) 
}

keep track of changes

NEWS file, or some other update of changes / by report. Perhaps a new list or vector in the report-metadata chunk.

Extending reportInit/reportSetup to handle a "report manifest"

Currently, we assume that every report is comprised of report.Rmd, config.R and setup.R files, but sometimes these are not the only documents that would need to be copied by reportInit().

Examples of this case are:

  1. region11 lab summary reports (using custom.R)
  2. Shiny interactive pedon summary report (currently under region2 in branch 'AGB')

A few questions for your consideration:

  • Should setup.R be extended to include a variable (e.g. files_to_copy; vector of directories and files to copy) that lists additional items to be copied when creating a new report instance?
  • Should we distinguish reports that have an interactive/dashboard component (i.e. 'has_dashboard' boolean that indicates presence of shiny.Rmd or equivalent)?
  • Should the report metadata (currently stored at top of report.Rmd) be moved to a set of metadata variables in setup.R?

Demonstration of sampling intensity

This is related to #11.

Some Region 2 folks are not convinced that the sampling approach is appropriate--they want to use all pixels within a suite of polygons.

Demonstrate appropriate sampling density via Moran's I and stability of the median as a function of sample size.

specify categorical raster details in definitions file

Building off of #28...

  categorical=list(
    `Geomorphon Landforms`=list(
       data='L:/Geodata/DEM_derived/forms10.tif',
        legend=list(
          levels=1:10,
          labels=c('flat', 'summit', 'ridge', 'shoulder', 'spur', 'slope', 'hollow', 'footslope', 'valley', 'depression')
        )
     ),
  ...
 )

Error: Failure during raster IO

This is likely associated with sharpshootR::sampleRasterStackByMU(), and lower-level raster access as specified in the traceback:

 rgdal::getRasterData(con, offset = offs, region.dim = c(1, nc), 
    band = layers) 
13 .readCellsGDAL(x, uniquecells, layers) 
12 .readCells(x, cells, 1) 
11 .cellValues(object, cells, layer = layer, nl = nl) 
10 .xyValues(x, coordinates(y), ..., df = df) 
9 .local(x, y, ...) 
8 raster::extract(r, s) 
7 raster::extract(r, s) 
6 data.frame(value = raster::extract(r, s), pID = s$pID, sid = s$sid) 
5 (function (r) 
{
    res <- data.frame(value = raster::extract(r, s), pID = s$pID, 
        sid = s$sid) ... 
4 rapply(raster.list, how = "replace", f = function(r) {
    res <- data.frame(value = raster::extract(r, s), pID = s$pID, 
        sid = s$sid)
    return(res) ... 
3 sampleRasterStackByMU(mu, mu.set, mu.col, raster.list, pts.per.acre, 
    estimateEffectiveSampleSize = correct.sample.size) 
2 withCallingHandlers(expr, warning = function(w) invokeRestart("muffleWarning")) 
1 suppressWarnings(sampleRasterStackByMU(mu, mu.set, mu.col, raster.list, 
    pts.per.acre, estimateEffectiveSampleSize = correct.sample.siz

Some ideas:

link points in ordination to points on the ground

The ordination figure would be a lot more useful if there were some way to link points in the figure with points on the ground. This could be as simple as preserving coordinates / IDs and outputting sampling locations (cLHS sub-sample). Some degree of interaction is needed when plots are cluttered:

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.