ncss-tech / aqp Goto Github PK
View Code? Open in Web Editor NEWAlgorithms for Quantitative Pedology
Home Page: http://ncss-tech.github.io/aqp/
Algorithms for Quantitative Pedology
Home Page: http://ncss-tech.github.io/aqp/
Missing data are very common in soil data. Do we have enough data to use methods like MICE to fill the gaps? What about error propagation?
Recent applications / packages:
consider moving to rOxygen2 approach?
Something like this:
# toggle default conversion
opt.original <- options(stringsAsFactors = FALSE)
# ... do something
# reset options
options(opt.original)
ASAP: convert all functions using data.frame[idx, ]
to data.frame[idx, , drop=FALSE]
.
This will return single column data.frame objects as data.frames, instead of coercing to vectors
2D vs. 1D horizonation, useful in situation where cryoturbation is a dominant pedogenic driver.
See Sampling Protocols for Permafrost-Affected Soils for some examples.
Recent papers:
Borrowed from Siewert et al., 2016:
groupedProfilePlot(p,groups = 'taxonname', label='pedon_id', break.style="line",
print.id = TRUE, id.style = 'side', cex.id=1.2,
cex.names=1, cex.depth.axis=1.25, y.offset=7,
axis.line.offset=-3.0, group.name.cex=1, color=input$thematic_field,
width=0.1, shrink=T, shrink.cutoff=3)
guessing horizon designations are stored in `hzname`
Error in segments(y0 = upper.position, y1 = lower.position, x0 = boundary.positions, :
cannot mix zero-length and non-zero-length coordinates
sometimes you need to plot just the central tendency and there are no upper
/ lower
values.
Example:
library(aqp)
library(colorspace)
# scanning 10YR 6/4 chip with NIX sensor
# reported CIE LAB vales: 61.87, 7.18, 22.54
rgb2munsell(as(LAB(cbind(61.87, 7.18, 22.54)), 'sRGB')@coords)
# results:
# hue value chroma sigma
# 7.5YR 6 4 0.01259235
# alternate method:
rgb2munsell(convertColor(cbind(61.87, 7.18, 22.54), from='Lab', to='sRGB', from.ref.white = 'D65', to.ref.white = 'D65'))
# results:
# hue value chroma sigma
# 7.5YR 6 4 0.01340875
These new functions are meant to operate on an SPC with a single profile, e.g. functions to be used with profileApply()
:
find_horizons_by_points(spc, z)
: return indices to horizons within spc
according to vector of depths z
find_horizons_by_interval(spc, interval)
: return indices to horizons within spc
that overlap interval
Need to finish coding, testing, and documentation.
plotSPC
does not generate enough colors in the legend (?) when coloring horizons with categorical data containing more than 8 unique values.
https://github.com/edzer/units
https://cran.r-project.org/web/packages/units/index.html
https://cran.r-project.org/web/packages/units/vignettes/units.html
fetch* functions could automatically encode units at load time.
These functions don't quite work when used applied in cases where the plotting order != internal SPC ordering.
After fixing a bug (e498780) this morning, I seem to have created more problems (5af5f6a).
These functions should be able to automatically determine the location of brackets. The current implementation requires pre-sorting and an index--except that it doesn't work!
A total re-write is in order, given the importance of these functions and the large variety of ways in which they are used.
Need to post / archive / update the old code used to generate munsell.rda
. That file was generated using a combination of raw data and manually added neutral (N) hues.
munsell
package has a nice script for using the raw data and integrating greysIdeas:
The basic premise:
raster::extract
methodsp::over
methodrgdal::spTransform
methodThanks to @brownag for noticing this.
This is caused by the horizons
replacement method in setters.R
:
# replacement: order by IDs, then top horizon boundary
hz_top_depths <- horizonDepths(object)[1]
object@horizons <- value[order(value[[idname(object)]], value[[hz_top_depths]]), ]
Why are we re-ordering the new horizon data?
Re-create the problem like this. Note that the error presents itself after rbind
-ing two SPC objects with IDs that are interspersed.
library(soilDB)
# simple check for correct ordering
isValid <- function(object) {
test <- profile_id(object) == site(object)[[idname(object)]]
res <- all(test)
if(! res)
return(test)
return(res)
}
x <- fetchKSSL(mlra = '18')
y <- fetchKSSL(mlra = '22A')
# OK
isValid(x)
isValid(y)
# OK
x$new <- rep(NA, times=nrow(x))
y$new <- rep(NA, times=nrow(y))
isValid(x)
isValid(y)
# OK
s <- rbind(x, y)
isValid(s)
# causes SPC corruption
s <- rbind(x, y)
s$sand <- 0
isValid(s)
# causes SPC corruption
s <- rbind(x, y)
s$new <- rep(NA, times=nrow(s))
isValid(s)
This would make rgb2munsell()
and all functions that use it much faster.
Encountered this error when trying to calcualate slab means on the control section.
Need to add checks in slab to verify:
Reproducible example (until the data are fixed in NASIS):
slabbing function for control section
f.pcs.prop <- function(x, prop) {
sv <- c(x$pscs_top, x$pscs_bottom)
if(any(is.na(sv)))
return(NA)
fm <- as.formula(paste('~', prop))
s <- slab(x, fm, slab.structure=sv, slab.fun=mean, na.rm=TRUE)
return(s$value)
}
Case 1
library(soilDB)
testdata=fetchKSSL(pedlabsampnum = 'UCD08JER010')
profileApply(testdata,FUN=f.pcs.prop,prop='clay')
Case 2
testdata=fetchKSSL(pedlabsampnum = 'UCD08JER010')
testdata@horizons[5,]$pscs_top=testdata@horizons[5,]$pscs_bot
profileApply(testdata,FUN=f.pcs.prop,prop='clay')
both result in:
Error in `$<-.data.frame`(`*tmp*`, "seg.label", value = integer(0)) :
replacement has 0 rows, data has 2
Case 3
testdata=fetchKSSL(pedlabsampnum = 'UCD08JER010')
testdata@horizons[5,]$hzn_bot=82
profileApply(testdata,FUN=f.pcs.prop,prop='clay')
results in:
Error in rep(NA, times = max.d - slab.structure[2]) : invalid 'times' argument
https://github.com/dgrtwo/fuzzyjoin
There are several functions that could simplify tasks such as:
... ?
... when required columns are missing.
This function was originally designed to convert the results of a cross-tabulation into an adjacency matrix, suitable for plotting via igraph
.
## demonstrate genhzTableToAdjMat usage
data(loafercreek, package='soilDB')
# convert contingency table -> adj matrix / TP matrix
tab <- table(loafercreek$hzname, loafercreek$genhz)
m <- genhzTableToAdjMat(tab)
# plot
par(mar=c(0,0,0,0), mfcol=c(1,2))
plotSoilRelationGraph(m, graph.mode = 'directed', edge.arrow.size=0.5)
plotSoilRelationGraph(m, graph.mode = 'directed', edge.arrow.size=0.5, spanning.tree = 'max')
Given that there are other cases where the same conversion would be useful, I think that the function should be re-named and possibly moved to sharpshootR
.
At the moment, munsell2rgb
requires the_value
and the_chroma
to be integers, otherwise a NA
is returned:
> aqp::munsell2rgb(the_hue = '10YR', the_value = 3.9, the_chroma = 5.4)
[1] NA
> aqp::munsell2rgb(the_hue = '10YR', the_value = 3, the_chroma = 5 )
[1] "#614218FF"
I think this should be made clear(er) to users. Different options:
munsell2rgb
to automatically convert the_value
and the_chroma
to integers, with a warningmunsell2rgb
to throw an error if the_value
and/or the_chroma
are not integersIt would be nice if this function could use interpolation between chips to convert non-standard Munsell notation (e.g. 7.9YR 2.7/2.0) into proper sRGB coordinates.
getClosestMunsellChip('7.9YR 2.7/2.0', convertColors = FALSE)
# returns the sRGB coordinates for "7.5YR 3/2"
This looks about right.
library(aqp)
library(colorspace)
data("munsell")
lab <- as(sRGB(as.matrix(munsell[, 4:6])), 'LAB')@coords
munsell <- cbind(munsell, lab)
It is also possible to convert sRGB -> LAB like this, results are very close to colorspace functions.
col <- munsell2rgb('10YR', 4, 4, return_triplets = TRUE)
convertColor(col, 'sRGB', 'Lab', from.ref.white = 'D65', to.ref.white = 'D65')
When working with a version of aqp installed off CRAN, shannonEntropy not exported. Dylan, you're probably already aware that this isn't in the namespace. MUComparison report used to have a local definition of shannonEntropy
, and it is now in aqp
.
> aqp::shannonEntropy()
Error: 'shannonEntropy' is not an exported object from 'namespace:aqp'
Appears to be working correctly in the latest commit on GitHub.
The S3 rbind
method for SoilProfileCollection
objects isn't properly registered and the S4-style approach doesn't work:
setMethod("rbind", "SoilProfileCollection", .rbind.SoilProfileCollection)
Ideals welcome... I suspect it has something to do with the additional arguments to rbind
in base that don't have any relevance to SoilProfileCollection
objects.
Open to a new function with a more meaningful name; possibly combine()
.
Unfortunately, this is a show-stopper for CRAN submission.
It would be very useful to have some functions that would map PedonPC "export" format or the new URL-based AnalysisPC Export format to be read-into a SoilProfileCollection
or more likely, a new data structure.
Example URL-based export for a single pedon
There are times when it would be nice to subset horizons within and SPC, without having to manually extract / subset / replace data.
Current approach, not ideal:
h <- horizons(spc)
# do something to h
horizons(spc) <- h
Better approach:
spc <- subsetHorizons(spc, expression, ...)
Building on subsetHorizons()
, the selectHorizonsByDepth(spc, z=c(5, 15))
function would return a SoilProfileCollection
including only those horizons that intersect depths of 5 and 15 cm.
Thinking this through some more, an ever better approach would use the []
notation for extraction of horizon data, possibly with new operators. Like this:
# return an SPC with just those horizons that *intersect* the depth interval of 5-10m.
x <- spc[, %intersect% c(5,10)]
It is possible to have multiple "densic" in a profile, with only the bottom-most horizon meeting the criteria for "contact". One solution is to integrate diagnostic features into the calculation.
See the attached data from TX for reference.
library(aqp)
library(soilDB)
library(plyr)
library(lattice)
library(reshape2)
# load Rich's data
load('pedons.rda')
# get selected set of pedons, no removal of pedons with inconsistent horizonation
# x <- fetchNASIS(rmHzErrors = FALSE)
# compute soil depth and depth class for all pedons, using pattern matching of horizon names
sdc <- getSoilDepthClass(x, p = 'Cr|R|Cd')
head(sdc)
# join depth and depth class data to pedons
site(x) <- sdc
# normalize taxonname
soils <- c('Houston Black', 'Heiden', 'Ferris', 'Vertel')
for(i in soils)
x$taxonname[grep(i, x$taxonname, ignore.case = TRUE)] <- i
table(x$taxonname)
# access diagnostict HZ data, 1:many per pedon
d <- diagnostic_hz(x)
# keep just records of "densic" anything
idx <- grep('densic', d$diag_kind, ignore.case = TRUE)
d <- d[idx, ]
table(d$diag_kind)
# convert long -> wide format: top depths
d.wide <- dcast(d, peiid ~ diag_kind, value.var='featdept')
# fix names
names(d.wide) <- c('peiid', 'densic.contact.top', 'densic.materials.top')
# join to SPC
site(x) <- d.wide
# convert long -> wide format: bottom depths
d.wide <- dcast(d, peiid ~ diag_kind, value.var='featdepb')
# fix names
names(d.wide) <- c('peiid', 'densic.contact.bottom', 'densic.materials.bottom')
# join to SPC
site(x) <- d.wide
# check depth via hz name pattern matching vs. diagnostic features
head(site(x)[, c('pedon_id', 'depth', 'densic.contact.top', 'densic.materials.top', 'densic.materials.bottom')], 10)
# rules for refinement of "soil depth"
x$depth_revised <- rep(NA, times=length(x))
# 1. "densic" diagnostic features not present or populated: use depth via hzname
idx <- which(is.na(x$densic.materials.top) & is.na(x$densic.contact.top))
x$depth_revised[idx] <- x$depth[idx]
# 2. densic contact present: use top depth of densic contact
idx <- which(!is.na(x$densic.contact.top))
x$depth_revised[idx] <- x$densic.contact.top[idx]
# 3. densic contact not present, but densic materials present: use bottom of densic materials
idx <- which(is.na(x$densic.contact.top) & !is.na(x$densic.materials.bottom))
x$depth_revised[idx] <- x$densic.materials.bottom[idx]
# what is left?
# errors in data population: missing "densic materials" bottom depth
# TODO: fix these in NASIS
site(x)[is.na(x$depth_revised), c('pedon_id', 'depth', 'depth_revised', 'densic.contact.top', 'densic.materials.top', 'densic.materials.bottom')]
## Note: use depth_revised for subsequent calculations
## Note: previously estimated depth classes are no longer correct
# what fraction of pedons had the "wrong" soil depth?
# about 33% ! wow !
prop.table(table(x$depth_revised > x$depth))
the slice function no longer works. I have tried with the example on the introduction to SoilProfileCollection objects https://ncss-tech.github.io/AQP/aqp/aqp-intro.html and it does not work with that example either.
It seems there is a bug, probably related with the relaxing of the slice() function two weeks ago. The function slab works ok though.
sliced <- slice(sp4, fm= 0:15 ~ sand + silt + clay + name + ex_Ca_to_Mg)
Error in UseMethod("slice_") :
no applicable method for 'slice_' applied to an object of class "SoilProfileCollection"
?slice
slab(sp4, fm= ~ sand + silt + clay, slab.structure=c(0,10), slab.fun=mean, na.rm=TRUE)
variable all.profiles value top bottom contributing_fraction
1 sand 1 47.63 0 10 1
2 silt 1 31.15 0 10 1
3 clay 1 21.11 0 10 1
There is a bug affecting the addVolumeFraction
function, which affects profile which have floating points depths:
sandSiltClay <- new("SoilProfileCollection"
, idcol = "id"
, depthcols = c("top", "bottom")
, metadata = structure(list(depth_units = "cm"), .Names = "depth_units", row.names = c(NA,
-1L), class = "data.frame")
, horizons = structure(list(id = c("Clay", "Clay", "Clay", "Clay", "Clay",
"Sand", "Sand", "Sand", "Sand", "Sand", "Silt", "Silt", "Silt",
"Silt", "Silt"), name = structure(c(2L, 1L, 4L, 5L, 3L, 2L, 1L,
4L, 5L, 3L, 2L, 1L, 4L, 5L, 3L), .Label = c("SLFs", "tSLw", "VAl",
"VLc", "VLl"), class = "factor"), top = c(0, 18.5, 28.5, 46,
53.5, 0, 18.5, 28.5, 46, 53.5, 0, 18.5, 28.5, 46, 53.5), bottom = c(18.5,
28.5, 46, 53.5, 101, 18.5, 28.5, 46, 53.5, 101, 18.5, 28.5, 46,
53.5, 101), percent = c(21.5, 21.5, 17.5, 10, 2.5, 22.5, 22.5,
30, 55, 90, 56.9, 56, 52.5, 35, 7.5), stones = c(23L, 28L, 48L,
60L, 63L, 23L, 28L, 48L, 60L, 63L, 23L, 28L, 48L, 60L, 63L)), .Names = c("id",
"name", "top", "bottom", "percent", "stones"), row.names = c(6L,
7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 11L, 12L, 13L, 14L, 15L), class = "data.frame")
, site = structure(list(id = c("Clay", "Sand", "Silt")), .Names = "id", row.names = c(NA,
-3L), class = "data.frame")
, sp = new("SpatialPoints"
, coords = structure(c(0, 0), .Dim = 1:2, .Dimnames = list(NULL, c("coords.x1",
"coords.x2")))
, bbox = structure(c(0, 0, 0, 0), .Dim = c(2L, 2L), .Dimnames = list(c("coords.x1",
"coords.x2"), c("min", "max")))
, proj4string = new("CRS"
, projargs = NA_character_
)
)
, diagnostic = structure(list(), .Names = character(0), row.names = integer(0), class = "data.frame")
)
plot(sandSiltClay)
addVolumeFraction(sandSiltClay, 'stones')
The error message is:
Error in `$<-.data.frame`(`*tmp*`, "val", value = c(NA, NA, NA, NA, 1, :
replacement has 185 rows, data has 180
This bug is due to the rounding of floating point horizon depths in .volume2df
, specifically:
d$val <- m[1:cells]
This is due to the rounding of depths in d
(line 23):
d <- expand.grid(x=(1:res), y=(1:depth))
where
Browse[1]> 1:depth
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
for depth = 18.5
, so nrow(d)
= 180, while on the other hand, length(m)
= 185.
The latest version of the soiltexture
package now uses the standard class labels. This is a good time to switch from the plotrix::soil.texture()
approach.
It would be nice to have functions that could extract sequences of profiles / horizons based on arbitrary filtering criteria. This would be similar to the V()
and E()
functions from the igraph package.
For example, aqp is missing this kind of functionality:
profileApply()
results and SPCWhy? Sometimes it is important to extract data by genetic horizon according to a vector of depths, usually specific to each profile in a collection. For example:
Consider the case of multiple profiles, returned by fetchKSSL()
, and a table of sensor data associated with specific horizons. There is no simple way to "connect" sensor data collected at various depths to an SPC object. Linking VG parameters (from SPC, via ID / depth) to sensor data is not as simple as it sounds.
This will get you close, however filled with NA when data are missing:
# depths at which to extract data
sensor_depths <- c(10,40,75)
# new SPC with data at select depths, all site / horizons / spatial data preserved
res <- slice(spc, sensor_depths ~ . )
However, the original horizon depths are lost. There may be cases where keeping the horizon depths in the results from slice()
would be useful
See the fetchSCAN tutorial for an example.
Bugs:
profile_compare
expects input data to be sorted by ID/top before use, this is not always the case after editing IDs via profile_id(SPC) <- new_IDs
tapply
, split
, and functions that call these two will implicitly re-order data via conversion to factorunroll
should be replaced with slice
union
is used to combine dataMisc:
D = (D_hz/max(D_hz) * w_hz) + (D_site/max(D_site) * w_site) / (w_hz + w_site)
Which is best for converting sRGB -> LAB? Dropping the colorspace package from a requirement would be nice.
library(aqp)
library(colorspace)
# this one?
as(sRGB(as.matrix(munsell2rgb('10YR', 4, 4, return_triplets = TRUE))), 'LAB')
# L A B
# 41.23715 7.282579 25.96353
col <- as(sRGB(as.matrix(munsell2rgb('10YR', 4, 4, return_triplets = TRUE))), 'LAB')
rgb2munsell(as(col, 'sRGB')@coords)
# hue value chroma sigma
# 10YR 4 4 2.139961e-07
# or this one?
convertColor(munsell2rgb('10YR', 4, 4, return_triplets = TRUE), from='sRGB', to='Lab', from.ref.white = 'D65', to.ref.white = 'D65')
# L a.x b
# 41.27066 7.377786 25.99946
col <- convertColor(munsell2rgb('10YR', 4, 4, return_triplets = TRUE), from='sRGB', to='Lab', from.ref.white = 'D65', to.ref.white = 'D65')
rgb2munsell(convertColor(col, from='Lab', to='sRGB', from.ref.white = 'D65', to.ref.white = 'D65'))
# hue value chroma sigma
# 10YR 4 4 3.044709e-06
when symbolizing on an attribute that contains one or fewer values (all others NA) plotSPC fails with missing legend or missing values for the RGB color elements
can be reproduced by fetching a SPC from NASIS, setting all but one value in horizons(spc)$clay
to NA
then trying to create a grouped profile plot with the thematic colors determined by clay (color='clay'
)
f <- fetchNASIS()
f$clay[2:length(f$clay)] <- NA
groupedProfilePlot(f, groups='taxonname', color='clay')
Case 1: all values of thematic var are NA
Error in legend("bottom", legend = color.legend.data$legend, col = color.legend.data$col, :
'legend' is of length 0
Case 2: all but one value NA
Error in rgb(c.rgb[cc, ], maxColorValue = 255) :
argument "green" is missing, with no default
Case 3: two or more values non-NA
Works as expected.
I would like to do this:
d <- diff(SPC1, SPC2)
# inspect diff
Packages:
... Looks like daff
is the way to go.
library(aqp)
library(daff)
# example data
data(sp1)
depths(sp1) <- id ~ top + bottom
site(sp1) <- ~ group
# copies
a <- sp1
b <- sp1
# modify some data
b$bottom[1] <- 4
b$top[2] <- 4
# remove a profile
b <- b[c(1:4, 6:9), ]
# compute difference
delta <- diff_data(horizons(a), horizons(b))
render_diff(delta)
TODO: write S4 methods for daff
package functions that are applicable to SPC objects. Iteration over slots and saving to a list would be the simplest approach.
The check if(! all(profile_id(res) == site(res)[[idname(res)]])) is a problem. Their are situations in NASIS where multiple sites are linked to one pedon.
for some reason its failing when I try and replace the horizons in an spc object, horizons(l) <- lh
well I think I figured it out
the used merge() instead of join()
which changes the order of the column names
horizons(l) <- lh must not like that
slab()
faster / more efficient (data.table ? )Moved to #229
I've noticed issues with aqp
installation failing for both CRAN and devtools installs due to stringi
package not being installed.
See the new data associated with converted sRGB and CIE LAB from the "real" colors file.
Details: http://www.rit.edu/cos/colorscience/rc_munsell_renotation.php
A simple test of 5 colors resulted in identical sRGB triplets.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.