ncss-tech / stats_for_soil_survey Goto Github PK

View Code? Open in Web Editor NEW

31.0 19.0 9.0 691.49 MB

S4SS: Statistics for Soil Survey

Home Page: http://ncss-tech.github.io/stats_for_soil_survey/book/intro.html

License: GNU General Public License v2.0

HTML 99.41% TeX 0.06% CSS 0.14% R 0.19% JavaScript 0.19%

usda nrcs soil soil-survey statistics digital-soil-mapping pedometrics pedology s4ss eda ncss tabular-data spatial-data

stats_for_soil_survey's Introduction

stats_for_soil_survey's People

Contributors

Stargazers

Watchers

Forkers

dhzimmermann andreawilliamsusda amypellegrini anhnguyendepocen lmxb fuatkaya morandiaye abdelkrim-bsr natearoe

stats_for_soil_survey's Issues

SoilProfileCollection quick reference

apply editor changes to chapter 5

finish moving Aaron's suggestions into master branch.

S4SS - Statistics for Soil Survey Part 2 Revisions 2021

Structural TODO:

finish conversion to bookdown (Dylan)
sort out duplicate "References" bug

Content TODO:

include reference to and examples from FedData package

https://github.com/ropensci/FedData

{spcosa} package {rJava} dependency causes RStudio to crash

spcosa examples from the sampling chapter will crash RStudio if you try to build the book (or run the example code) from the IDE.

We've been seeing this problem for a while (~since we got R 4.2 approved) as it affects all packages that load rJava.
This is not a problem with R inherently because building the book from command line works.

Bookdown Conversion Notes and TODO

Andrew's notes, originally from #23:

I redid the S4SS Part 1 book in bookdown.

http://ncss-tech.github.io/stats_for_soil_survey/book/

I am not an expert on bookdown at all -- but I have done several small projects. This is my most complex yet.

This is a great resource, from the master himself: https://bookdown.org/yihui/bookdown/

Many aspects of the gitbook/bookdown workflow can be enhanced using the Rstudio Addin for {bookdown} -- so you should probably get familiar with that if you want to build the book yourself. I would suggest trying your own sample RStudio bookdown Project template. Both addin and template available if you have the package installed.

## get bookdown
install.packages('bookdown')

Non-CRAN Dependency

remotes::install_github('r-spatial/mapview')

For a missing html_dependency (CSS files) need mapview >2.9.0 or the interactive plots will not render in bookdown. Note they work fine in regular sessions -- this is only required if you want to build book and have interactive maps work.

Build the book

To build the book, this is the basic sequence of commands.

options(bookdown.clean_book = TRUE)

# remove ../book and other intermediate build files
bookdown::clean_book()

# create ../book where . is the _source_ directory of the book (i.e. /newbook, not /book)
bookdown::render_book(".")

Fancy stuff

In RStudio, there is an addin to use bookdown::serve_book() -- which is really nice and can incorporate a lot of your changes dynamically on the fly. It gets hairy if you edit multiple chapters at once, or start editing the live code chunks. If things get hung up you have to restart your Rstudio session, and if it crashes or gives really strange errors not related to code maybe clean_book as above.

Data requirements

Currently all the examples do not require any specific selected set contents or set up on users computer -- aside having the datasets downloaded from chapter 4 spatial examples, formerly Chapter 2b (which is eval=FALSE to make the book build faster/better). So run that chunk independently to download the ZIP files before you try to build the book. There are several ways I could possibly "optimize" book build time but I have struck a balance of reproducibility/complexity/build time and time-time.

Open to edits and suggestions

Anyone can edit the course book .Rmd files. You do not have to re-build the book, or commit changes that you make to the "published" HTML book. I think that building the whole book, regularly, is essential for multiple collaborators/reproducibility of the book contents.

Caution/encouragement on building the book

I would gladly take on the role of building the book and fixing errors that arise from anyone who does not feel confident building it themselves. It is not the most trivial thing, but I think it is the way forward. I sort of sprung it on you all as a way of wrangling the diverse and complex existing content. Please contact me and we can discuss fastest most effective way to incorporate your suggestions.

You want to be careful when it comes to adding code chunks. All code chunks are eval=FALSE unless explicitly set to true. With false it is safe, they will show up as code blocks and will not be executed by the R interpreter unless you "flip the switch." With TRUE, you could in theory "break" the whole book until the error in your code is resolved.

It might be best to work in a branch if you have significant changes -- that way I or others can help. The ultimate saving grace is that an un-built book will not overwrite the HTML -- and bad HTML won't overwrite the book if you do not commit it -- so the live book remains intact.

Add simpler percentile/RIC prelude example to Ch2

Developing RIC for Loafercreek (2nd exercise in current chapter 2) is a bit involved, and may be too much to do before introducing EDA concepts, summary statistics, generalized horizons etc.

I am going to suggest adding in its place, or prior to it, some more discussion on basic profile-level summaries (and calculating them with user-defined functions/profileApply or existing aqp functions)

Also, to foreshadow RIC development from Ch 2, without going into the generalized horizon rabbit hole, perhaps using a simpler dataset, such as the example O horizon thickness data from Swanson (1993) in Soil Survey Horizons "Determining Ranges of Map Unit Characteristics: A Simple Method with a Statistical Basis" would be better. We could do an exercise using the Swanson data, or similar, then talk about how to calculate similar site-level properties by pedon/across horizons for use in these types of summaries, use in modeling, etc.

Developing the method in the way Swanson does in Ch 2 would allow the basic concept to be explored without getting into statistical method details--which is only briefly covered in the Ch 3 material at this time anyway.

It may be good to more formally build in the Generalized Horizon Label concept to Ch 2, in particular making the direct connection to the NASIS dspcomplayerid column (alias "genhz" in fetchNASIS()) and how it can be used for aggregation of pedon horizon data. That way it is introduced formally before doing an exercise, and before going into Ch 3 which pretty much presumes the student understands what they are and why they are using them for the various EDA exercises.

convert all links from r-forge to github

All references to content on the old AQP r-forge site need to be edited accordingly.

integrate spatial sampling examples into relevant chapters

From these examples:

gSSURGO / {sf} examples

Hey @smroecker: can we get some of your time to demonstrate how to interact with gSSURGO in R via {sf}?

test pre-course assignment

# run this in the R console
source('https://raw.githubusercontent.com/ncss-tech/soilReports/master/R/installRprofile.R')
installRprofile()

aqp, soilDB, and sharpshootR CRAN release

aqp
soilDB
sharpshootR
other package

Working towards 508 Compliance

The HTML format of the bookdown book is reasonably good from aSection 508 Compliance perspective. It is easy to navigate with the keyboard (up/down to scroll, left/right arrow to change chapters, tab jumps to links/headers, works with screen readers, etc.

This issue is to identify tasks that we can perform to enhance/incrementally improve accessibility of the document.

Add "alt" text to all images and figures that do not already have it (via markdown image syntax ![alttext](path/to/image.jpg), HTML alt tag, or knitr chunk option fig.caption
Provide link to Rmd source of each HTML page and use knitr::purl() to provide an R script with code executed on each page
Knit book to PDF via GitHub actions so link to download PDF (http://ncss-tech.github.io/stats_for_soil_survey/book/s4ssbook.pdf) works

integrate "reports"

Each user should be able to get, setup, run, and interpret as couple of reports. This would cover a couple of main topics:

R usage
summary stats / percentiles
interpretation of central tendency / spread
similarity measures
multivariate displays

grep, pattern matching, REGEX tiny homework

fast raster / velox example in Spatial Data lecture

The {velox} package (https://github.com/hunzikp/velox) has unmet dependencies and is no longer on CRAN for R 4.0.2

We will need to update/remove "2.2.3 FAST raster sampling: velox" section.

I suggest using {fasterize} and {exactextractr} for a new demo on fast raster manipulation -- both of thes packages are very useful for manipulating of raster data for arbitrary features quickly.

finish synthesis on Shannon H and Brier scores

http://ncss-tech.github.io/stats_for_soil_survey/chapters/9_uncertainty/class-accuracy-uncertainty.html

2023 Feedback on STATS 2020

Ideas / commentary after working with mentees and reviewing lecture material.

many namespace collisions → cut down on the number of packages used / loaded at any given time, this is esp. a problem with library(tidyverse) approach to loading everything
aqp::allocate() can be very noise, add verbose argument
include examples / interpretation of plot(Predict(model.rms)) and plot(summary(model.rms))
explain / link to additional information on odds ratio, interpret all examples in the book
label probability axes on all figures
re-think / simplify glm examples: predictor variables too complex / hard to interpret
more explanation of rms::validate()
CA790 regression examples need more context / explanation
num. tax. examples: explain type = 'n' when making plots
link to / integrate evaluation of ordination, new exercise / examples
tree methods: more expressive use of corrplot() → colors, shading, ordering, etc.
ordered factor syntax / interpretation / importance
convert everything to terra
categorical data modeling, EDA, etc. → link to Michael Friendly's work

finish work on data we use

make some examples using rms functions

http://ncss-tech.github.io/stats_for_soil_survey/chapters/6_linear_models/rms-examples.html

integrate content from AQP presentations into chapters 2-3

There are several:

New links / external resources and classes

2023 favorites:

General Suggestions from Aaron

The style-sheet based formatting in this chapter may lead to conflicts between the editing and the desired html output. I have no way to determine how the RMD formatting will regenerate in HTML. Please review carefully. Similarly, I am uncertain how to use the BIB file to generate references that match the citation style of the Soil Science Division.
Editor's note: Citing or linking to Wikipedia has been considered questionable scholarship. The Wiki pages are subject rapid change. The dendrogram page, for example, was modified between the date this chapter was written and the date I'm editing it. The decision to cite or link is ultimately yours, but I recommend against it. Please consider an alternate link, such as https://www.google.com/#q=dendrogram. Such a link makes it the reader's responsibility to judge the quality of the current information.
Editor's note: This site is behind a pay-wall and may not be available to all students. You should consider switching to an open-source file (e.g., the pdf from r-forge.r-project.org).

MAST and STR examples from SSR2

SEKI STR
Yosemite MAST
CA630 MAST

Cheatsheets and revising the R reference card

I think we should focus some efforts towards updating the "reference card" from 2016, and taking its contents and putting into an even more compact format

@smroecker @jskovlin @dylanbeaudette @hammerly @phytoclast @skyew

http://ncss-tech.github.io/stats_for_soil_survey/reference_card/reference_card.html

It is a great idea -- and has well-written content -- but it is not a "card."

The traditional "cheatsheet" concept is something I have wanted to emulate for aqp and soilDB. But I have not done that. And that is not an issue for this stats class repository. We barely scratch the surface of either of those packages in the class, so I almost would be tentative to throw the cheatsheet at a brand new R user -- given the current variety in both packages. I had wanted to make something before SSSA, but I didnt ever get around to it. Partially because of this [perceived] problem.

Here are some examples: https://github.com/rstudio/cheatsheets

I think we could make a cheatsheet that is specific to the stats classes (one each for parts 1 and 2) -- and targeting an actual single-page, nice looking, compact format

new link for spatial data chapter

https://geocompr.robinlovelace.net/

S4SS - Statistics for Soil Survey Part 1 Revisions 2021

2020/01/26 TODOs and suggestions from @smroecker

Core R data manipulation functional themes:

loading/fetching data
filtering
transforming
aggregating
iterating

TODO

Top priorities:

1. Move mapView and sf examples to Spatial Chapter
2. simple example comparing the diagnostic slot to information parsed from the horizon slot

Demo reproducible examples for functions:

"filter" (subset)
"transform" (mutate, slice, segment, spc2mpspline)
"aggregate" (slab)
"iterate" (profileApply)
soilDB
- get_extended_data_from_NASIS_db, get_vegplot_from_NASIS_db
- fetch functions: fetchOSD, fetchSDA, fetchNASISWebReport, fetchHenry
New taxonomic functions
- (maybe; these interfaces are a bit fluid right now, but students are typically very interested in taxonomic information...)

The changes described below are from the original PR: #21

Re-organization of data chapter, and Part 1 chapters, into bookdown format. Most of the changes pertain to separation of portions of the data chapter out and moving them elsewhere, or a little bit vice versa.

This is the proposed order of sections:

Precourse, Intro, Data, EDA, Spatial, Sampling

Intro

Mostly same content.

There is currently no exposition on the materials in the "appendix" for the Data chapter -- but we need to spend some time focusing on that type of material. I would think it could be part of chapter 1 as a more of a "basic R syntax and concepts" section

Data

"Data" contains the same essential elements/content; really motivating the discussion around pedon data specifically. I would like to see some more references to ecosite data in this chapter eventually.

The examples code still includes exercises like simple plots of point locations, but no detailed exposition on spatial data types. That now comes after EDA. I leave some stubs to allude to future Spatial sections ( and also EDA with dplyr ?) but my thought is we do not get too prescriptive -- just say that there are many ways of doing these things once you have access to the data in a data.frame.

Data now features Soil Reports at the end (previously end of EDA). My thought is these are, or can be, fun exercise that motivate the need for understanding distributions, descriptive statistics etc. and maybe gets people "excited" about what they can do with existing R tools.

EDA

EDA is mostly unchanged except for moving the Soil Reports stuff into Data.

I think we can have them running reports and looking at the output before we really get into the details of the stats. It is nice to have a hard example of something in front of you when learning something new -- that way they can really get a jump start thinking about how they can apply it to their own data / final project etc.

Spatial

Spatial data after that. Emphasize the data.frame skills covered in Precourse/external AgLearn courses/Intro/Data by focusing on sf data.frame objects first. Then cover their interop/conversion to sp objects. This will prepare the students better for sf stuff they will encounter in the wild as sf is the only interface to many packages. They still need essential sp context, links and demos that they will need for examples, existing code etc. but sf is a subclass of data.frame so that sticks with some central themes for the class

Interactive maps, along with the new exactextractr example are our shiny R spatial examples, both featuring sf interfaces. Spatial chapter now really tries to draw parallels to data.frames, and between the methods used for reading/writing, setting CRS etc. across sf, sp and raster objects.

Sampling

Finally, the sampling chapter has examples of using sp objects for spatial sampling. I think this section could be enhanced significantly as a resource, not so much as something covered in detail in class. I would like to provide subsections so we have identical sf st_sample and sp spSample examples to draw parallels. We have the sampling presentation and other materials, so can spend as much time on applying the code examples as is interesting to the group -- but the thought is this chapter should mostly be a self-contained set of reference examples of different sampling strategies applied to simple, but realistic, data. I consider the specific details in this chapter to be more like an end matter for Part 1, something that fits well after discussing details of data, describing data, and how to describe data in space.