hypertidy / vapour Goto Github PK
View Code? Open in Web Editor NEWGDAL API package for R
Home Page: https://hypertidy.github.io/vapour/
GDAL API package for R
Home Page: https://hypertidy.github.io/vapour/
clarify the old sql behaviour, we can remove the warning when FID gets inserted
Using this file NSS.GHRR.ND.D94182.S1520.E1650.B1625960.GC
I get RasterXSize of 51, which is the standard pixel count of the GCPs, not the XSize which is 409.
Use poDataSet->GetRasterXSize()
rather than the size of the first band ...
See here link to example from Even https://trac.osgeo.org/gdal/ticket/7008
I think we've learnt enough to put out a bare bones release. The key C++ functionality to wrap is these, either as slightly overlapping implementations or as building blocks:
Focus on just these features as a minimal core since we can build higher-level collections and descriptions of multiple data sources from these components
These functions
vapour_read_attributes
- a list of column vectorsvapour_read_geometry
- a list of native blobsvapour_read_geometry_text
- a vector of JSON, WKT, KML, or GMLvapour_read_extent
- a list of extent vectorsEach function is standalone, opening and closing the data source, defaulting to the 0-th layer and only accepting a layer index, optionally executing a SQL string, and ignoring the layer index if sql is given.
Example here:
https://gist.github.com/mdsumner/ca460e7a9063dbf8592826e8a17cc419
We could do that directly for any raster, have a set of tiles in R that trigger the right window-read when needed, hook that up to a leaflet window for streaming.
Radian can be used like this directly for tiled data too
I see that vapour
currently passes NULL
to the SpatialFilter argument of the ExecuteSQL function:
Line 309 in 465b2a6
I realize this may be a big ask, but how difficult would it be to enable passing a bounding box instead?
Read n attribute rows, or n geometries in sequence and add helper for "has geometry".
## this can't work yet because of the fid_select sleight of hand,
## so needs to catch that issue earlier and return infor
vapour_has_geometry <- function(dsource, layer = 0L) {
## convert from character layer to integer if required
if (!is.numeric(layer)) layer <- index_layer(dsource, layer)
sql <- sprintf("SELECT Geom FROM %s LIMIT %i",
vapour_layer_names(dsource)[layer + 1],
1L)
err <- try(vapour_read_geometry(dsource, layer = layer, sql = sql), silent = TRUE)
!inherits(err, "try-error")
}
#' Read n features from a layer.
#'
#' Uses OGRSQL, which could be fragile depending on the layer name quoting required.
#' @inheritParams vapour_read_attributes
vapour_read_limit_n <- function(dsource, layer = 0L, limit = 1L) {
## convert from character layer to integer if required
if (!is.numeric(layer)) layer <- index_layer(dsource, layer)
sql <- sprintf("SELECT * FROM %s LIMIT %i",
vapour_layer_names(dsource)[layer + 1],
as.integer(limit))
as.data.frame(vapour_read_attributes(mvfile,
sql = sql), stringsAsFactors = FALSE)
}
library(vapour)
file <- "list_locality_postcode_meander_valley.tab"
mvfile <- system.file(file.path("extdata/tab", file), package="vapour")
vapour_read_limit_n(mvfile)
The complement to limit_n
for chunking through a source.
Something like the statewide cadastre could be read as a single data frame in seconds with a bounding geometry rather than the geometry. Use this as an sf object to drill down to features of interest.
Fooling around with this got me thinking:
conn <- dplyr::src_sqlite(system.file("gpkg/nc.gpkg", package = "sf"))$con
st_read(conn, query = "SELECT geom AS geometry FROM 'nc.gpkg'")
## otherwise
dplyr::src_sqlite(conn, "nc.gpkg")
Because, why not try executing the layer argument as SQL if it's not an existing layer in the DSN?
All the functions have layer , sql
arguments except for vapour_layer_names
which only has sql
, so I think this can work. Logic is
Even though message is "vapour_read_feature_what is deprecated. Use 'vapour_read_geometry_cpp' instead", my previous concerns still abide that enabling direct calls to _cpp
fns ought not be done. I'm not sure which should be deprecated here, but something, right?
Add to the vapour_raster_info
output, for use in targetting scenes in L1B.
Should accept sp style, raster style, sf style.
Consider refactoring to use an "extents" package.
To replace the geometry part of #35
https://stat.ethz.ch/pipermail/r-sig-geo/2018-June/026657.html
It seems like we can just copy this as a general resource?
This is a raw example using raster_io to read just the level-of-detail required for a requested plot.
#' Plot GDAL matrix.
#'
#' Plot a matrix directly read via GDAL to match the current device size.
#'
#' The data is read in at the resolution required by the current device, assuming it's a single
#' panel. TODO: read off the space to plot into from the device, and
#' read only that window from the source at the right resolution.
#' @param filename source
#' @param add add to the plot
#' @importFrom vapour raster_io raster_info
#' @export
#' @examples
#' ## this file takes 5s to read in total from NetCDF, 24 seconds via GDAL
#' filename <- "/rdsi/PRIVATE/raad/data_local/www.bodc.ac.uk/gebco/GRIDONE_2D.nc"
#' #filename <- raadtools::topofile()
#' #filename <- "D:/data/topography/etopo2/subset.tif"
#' library(vapour)
#' ## read in to match the device and it's quicker
#' plot_raster(filename)
#' #zoom(wrld_simpl, new = FALSE)
#' #plot_raster(filename, ext = spex::spex(), add = T, col = rainbow(100))
#' #maps::map(add = TRUE)
#' ## library(sf); plot(st_transform(sst_c, 4326), add = T)
#' ##for (i in sample(seq_len(nrow(wrld_simpl)))) {plot_raster(filename, ext = raster::extent(wrld_simpl[i, ])); plot(wrld_simpl[i, ], add = TRUE)}
plot_raster <- function(filename, add = FALSE, ext = NULL, cols = NULL) {
ri <- vapour::raster_info(filename)
xlim <- ri$geotransform[1]
xlim <- c(xlim[1], xlim[1] + ri$dimXY[1] * ri$geotransform[2])
ylim <- ri$geotransform[4]
ylim <- c(ylim[1] + ri$dimXY[2] * ri$geotransform[6], ylim)
flip <- function(x) x[,ncol(x):1]
brks <- seq(ri$minmax[1], ri$minmax[2], length = 100)
cols <- if (is.null(cols)) viridis::viridis(99) else colorRampPalette(cols)(length(brks)-1)
dv <- dev.size("px")
rlocal <- raster::raster(raster::extent(xlim, ylim), nrow= ri$dimXY[2], ncol = ri$dimXY[1])
window <- c(0, 0, ri$dimXY)
if (!is.null(ext)) {
## this index is Y-up
index <- spex:::as_double(tabularaster::index_extent(rlocal, ext))
indexYdown <- c(index[1:2], nrow(rlocal) - index[4:3])
window <- c(indexYdown[1], indexYdown[3],
indexYdown[2] - indexYdown[1], indexYdown[4] - indexYdown[3])
rlocal <- raster::crop(rlocal, ext, snap = "out")
}
z <- flip(matrix(raster_io(filename, c(window, dv[1], dv[2])), dv[1]))
dim(rlocal) <- rev(dim(z))
l <- list(x = raster::xFromCol(rlocal),
y = rev(raster::yFromRow(rlocal)),
z = z)
if(!add && dev.cur() == 1) {
print(add)
add <- TRUE
par(mar = rep(0, 4))
plot(xlim, ylim, type = "n", xaxs = "i", yaxs = "i")
}
image(l, useRaster = TRUE,
col = cols, breaks = brks, add = add)
}
This works great for L1B, which has in-built geolocation arrays that present as GCPs:
## local stere projection grid overlapping near Mawson station
## GCdata/NSS.GHRR.ND.D94182.S1520.E1650.B1625960.GC
## template %s are input and output
gdalwarp %s %s -geoloc -r bilinear \
-te -1157398.178974 -1276821.259254 1158383.248024 \ 1277541.239696 -tr 1000 1000 \
-t_srs '+proj=stere +lon_0=64.000000 +lat_0=-66.000000 +lat_ts=-71 +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0'
Wrap the logic up to call GDAL directly on the file, extract the geoloc and apply the warping based on a raster input grid.
Bonus points:
Use rgdal2 as a basic model, though there's no need for the rgdal2.h templating now with XPtr.
https://github.com/thk686/rgdal2
Also these:
http://lists.r-forge.r-project.org/pipermail/rcpp-devel/2014-December/008254.html
// http://lists.r-forge.r-project.org/pipermail/rcpp-devel/2015-June/008818.html
// HELP: I can't see how to XPtr<T> this function so that I can re-use it in another
// function, is the XPtr<GDALDataset> somehow special or is it that I have to declare
// that template more broadly (how?)
// [[Rcpp::export]]
SEXP vapour_GDAL_open(CharacterVector filename,
LogicalVector readonly) {
GDALAllRegister();
Rprintf(filename[0]);
GDALAccess access = readonly ? GA_ReadOnly : GA_Update;
//poDataset = (GDALDataset *) GDALOpen( filename[0], GA_ReadOnly );
XPtr<GDALDataset> gdaldataset((GDALDataset *)GDALOpen(filename[0], access));
return gdaldataset;
}
can we hack use of psShape here to avoid conversion to polygon?
The read_geometry_text examples include a direct path to inst/filename.something, and I think this causes a total lock up in RStudio - I've had to soft reboot each time with check, and just running examples.
Don't do that in examples ...
Various things to explore, discuss ...
mvfile <- system.file(file.path("extdata/tab", file), package="vapour")
layer <- gsub(".tab$", "", basename(mvfile))
## get the number of features by FID (DISTINCT should be redundant here)
vapour_read_attributes(mvfile, sql = sprintf("SELECT COUNT(DISTINCT FID) AS nfeatures FROM %s", layer))
## note how TAB is 1-based
vapour_read_attributes(mvfile, sql = sprintf("SELECT COUNT(*) AS n FROM %s WHERE FID < 2", layer))
#$n
#[1] 1
## but SHP is 0-based
shp <- system.file("shape/nc.shp", package="sf")
vapour_read_attributes(shp, sql = sprintf("SELECT COUNT(*) AS n FROM %s WHERE FID < 2", "nc"))
#$n
#[1] 2
do min versioning for travis GDAL https://github.com/mpadge/sftest/blob/master/.travis.yml
thanks @mpadge !
Current just passes exposed to C++. Need to write R code to first check whether file exists, or connection is open.
https://github.com/hypertidy/vapour/blob/master/src/rasterio.cpp#L48
This will be too slow for really big collections (though it does perform surprisingly well). Allow ability to turn it off. Also, this return info is not documented at the R level
Process the input resample arg to be lowercase for comparison, so that we can have "Gauss" or "gauss", and "Cubicspline" or "CubicSpline"
This is bit dangerous, but the SQL mech can be used to get min and max fid, so this can be added to the sql query, and then interrogated to pass a filter into the geom reader.
hooray
Implemente vapour_read_names in C++, with limit_n applied (SQL is too slow for very large GDB for example).
Do you have any idea why vapour
(and gdalUtils
) fails to read from a symlink but system("ogrinfo")
and sf::st_layers
works?
This isn't a vapour issue, but it'd be good to know what the solution is - some obscure escape syntax?
vrtemplate <- function() {
'<OGRVRTDataSource>
<OGRVRTLayer name="nc.gpkg">
<SrcDataSource>%s</SrcDataSource>
<SrcSQL>%s</SrcSQL>
</OGRVRTLayer>
</OGRVRTDataSource>'
}
updateVRT <- function(dsn, sql) {
sprintf(vrtemplate(), dsn, sql)
}
write_vrt_tmp <- function(x) {
tmp <- sprintf("%s.vrt", tempfile())
writeLines(x, tmp)
tmp
}
library(sf)
f <- system.file("gpkg/nc.gpkg", package = "sf")
## st_layers(f)
## must quote a layer name like "nc.gpkg"
tmp <- write_vrt_tmp(updateVRT(f, "SELECT * FROM 'nc.gpkg' WHERE AREA > 0.15"))
read_sf(tmp)
## this works
vapour_read_attributes(f, sql = "SELECT NAME FROM 'nc.gpkg' WHERE AREA < 0.15")
But, try < in VRT and fail
tmp <- write_vrt_tmp(updateVRT(f, "SELECT * FROM 'nc.gpkg' WHERE AREA < 0.15"))
read_sf(tmp)
#Cannot open data source /tmp/RtmpY5ksju/file614e257431d5.vrt
#Error in CPL_read_ogr(dsn, layer, as.character(options), quiet, type, :
# Open failed.
#In addition: Warning message:
#In CPL_read_ogr(dsn, layer, as.character(options), quiet, type, :
# GDAL Error 1: Line 3: Didn't find expected '=' for value of attribute '<'.
for a missing geometry, we get c(0, 0, 0, 0) but if all are missing we get a list of NULL
Can use this for summarizing the geometry-state, but the 0s should be NAs, or a NULL.
A VSIReadDirEx example thanks to @obrl-soil
remotes::install_github("hypertidy/vapour@VSI_list")
u <- "/vsizip//vsicurl/http://dapds00.nci.org.au/thredds/fileServer/rr2/national_geophysical_compilations/http/radmap_v3_2015_filtered_dose/radmap_v3_2015_filtered_dose.ers.zip"
vapour:::VSI_list(u)
[1] "radmap_v3_2015_filtered_dose" "radmap_v3_2015_filtered_dose.ers"
[3] "radmap_v3_2015_filtered_dose.isi" "radmap_v3_2015_filtered_dose.txt"
Doc is here: https://www.gdal.org/gdal_virtual_file_systems.html
Discussed on twitter: https://twitter.com/obrl_soil/status/1087674550198820864
would be good to flesh this out a bit more, and get some functionality of gdal_ls.py
in R.
first is a sp object with the spex bbox of every polygon
library(vapour)
file <- (system.file("shape/nc.shp", package="sf"))
library(dplyr)
file <- raadfiles:::get_raw_raad_filenames() %>%
dplyr::filter(grepl("TASVEG", file, ignore.case = TRUE))
file <- ".../tas.gov.au/TASVEG/GDB/TASVEG3.gdb"
read_extent <- function(x, layer = 1L, ...) {
layers <- vapour::vapour_layer_names(x)
if (is.numeric(layer)) one_layer <- layers[layer]
if (is.character(layer)) one_layer <- layer
if (!one_layer %in% layers) warning("layer not found!")
out <- do.call(rbind, purrr::map(vapour::vapour_read_extent(x),
~spex::spex(raster::extent(.x), crs = "+init=epsg:4326")))
out$dsn <- x
fids <- as.integer(seq_len(dim(out)[1L]))
out$sql <- sprintf("SELECT * FROM %s WHERE FID = %i", one_layer,
fids)
out$fid <- fids
out$layer <- one_layer
out
}
system.time(x <- read_extent(file))
# user system elapsed
# 2283.033 29.665 2312.259
# > pryr::object_size(x)
# 922 MB
# > length(x)
# [1] 463954
This is simplistic, but nice since we can use directly in mapedit and call back to the source to read in those geometries:
ind <- mapedit::selectFeatures(x)
y <- purrr::map(ind$sql, ~vapour::vapour_read_geometry(file, sql = .x))
The second is super fast, we get an sf POLYGON of every bounding box within seconds.
## vs 6seconds
system.time(e <- vapour_read_extent(file))
dummy <- sf::st_as_sfc(spex::spex(raster::extent(e[[1]]), crs = NA_character_))
f <- function(x) {out <- dummy;
out[[1]][[1]] <- matrix(x[c(1, 1, 2, 2, 1,
3, 4, 4, 3, 3)], ncol = 2);
attr(out, "bbox") <- structure(setNames(x[c(1, 3,2, 4)], c("xmin", "ymin", "xmax", "ymax")), class = "bbox")
out}
## very fast to build the sf index to this huge GDB
bblist <- do.call(c, purrr::map(e, f))
Should raster_io build the right default window?
Options:
native = FALSE
arg, if set to TRUE it reads the window (does it override the input window?)Currently window is not optional, it must be passed in. Probably best to write more functions that wrap raster_io, though it's likely most of them won't get used much. The extent, resolution, native, and max-size are the likely candidates
I'm looking for assistance to see that install on MacOS will work.
If you have a chance to try installing vapour on Apple machines that would be really helpful, let me know how you go!
The sf package has the following prerequisites (and these would be more or less the same requirement for vapour):
brew unlink gdal
brew tap osgeo/osgeo4mac && brew tap --repair
brew install proj
brew install geos
brew install gdal2 --with-armadillo --with-complete --with-libkml --with-unsupported
brew link --force gdal2
If that works, please try the following to install vapour
:
devtools::install_github("hypertidy/vapour")
The line brew install geos
can probably be ignored.
manifold
vapour_layer_names("ODBC:manifold8")
#[1] "Drawing Table" "Table"
## this CRASHES R presumably because of "Geom (I)"?
#vapour_read_attributes("ODBC:manifold8", sql = "SELECT * FROM [Drawing Table]")
## ok
vapour_read_attributes("ODBC:manifold8", sql = "SELECT * FROM [Table]")
#$`Column`
#[1] 1 4
#$`Column 2`
#[1] 2 3
now sf
st_layers("ODBC:manifold8")
Driver: ODBC
Available layers:
layer_name geometry_type features fields
1 Drawing Table 0 15
2 Table 0 2
Warning messages:
1: In CPL_get_layers(dsn, options, do_count) :
GDAL Error 1: GetFeatureCount() failed on query SELECT COUNT(*) FROM Drawing Table.
Unexpected token 'Drawing' at line 1.
2: In CPL_get_layers(dsn, options, do_count) :
GDAL Error 1: GetFeatureCount() failed on query SELECT COUNT(*) FROM Table.
Unexpected token 'Table' at line 1.
>
read_sf("ODBC:manifold8", "Table")
Simple feature collection with 0 features and 2 fields
bbox: xmin: NA ymin: NA xmax: NA ymax: NA
epsg (SRID): NA
proj4string: NA
# A tibble: 0 x 3
# ... with 3 variables: Column <int>, `Column 2` <int>, geometry <GEOMETRY>
Warning message:
In CPL_read_ogr(dsn, layer, query, as.character(options), quiet, :
GDAL Error 1: GetFeatureCount() failed on query SELECT COUNT(*) FROM Table.
Unexpected token 'Table' at line 1.
This one refers to very old interface (read_gdal_table, to_format):
https://github.com/hypertidy/vapour/blob/master/R/vapour-package.R#L61
There's a few with sf calls commented out that also need careful checking.
I'm having difficulty running install_github
on master. During compilation I see:
...
** tests
** byte-compile and prepare package for lazy loading
Error in structure(list(source = x, info = raster_info(x)), class = "ssraster") :
object 'f' not found
...
We need to be able to loop over features and only keep those that match an input set of FIDS from vapour_read_names, alternatively a sequential integer set
Execution haltedchecking Rd \usage sections ... WARNING
Undocumented arguments in documentation object 'vapour_gdal_version'
‘dsource’
Also OSM test fails on skip_n, spurious message needs more care
We need these to work
DAP/Thredds
u <- "http://rs-data1-mel.csiro.au/thredds/dodsC/imos-srs/sst/ghrsst/L3S-3d/ngt/2018/20180404152000-ABOM-L3S_GHRSST-SSTskin-AVHRR_D-3d_night-v02.0-fv01.0.nc"
u <- "http://coastwatch.pfeg.noaa.gov/erddap/griddap/erdQSwind3day"
I think this name is a problem because vapour::raster_io
is readonly (RasterIO is obvsly rw), and possible confusion with the rasterio python package.
vapour_read_*
and raster_*
is also a bit annoying, it's only because vapour started as being only about vector sources.
maybe we can do
vapour_sds_names
(replace _sds_info, analogous to _layer_names)vapour_read_raster
(replace raster_io, analogous to _read_geometry) - or _read_bandvapour_raster_info
(replace raster_info)And then we also need a vapour_layer_info
to complete the set, and this is how we'd get the CRS for the vector layer, the attribute names and types (good for DBI, and whatever other md is there).
Then we still have this annoying vapour_ prefix, but at least it's consistent and we can change simply.
This all has to be revisited
vapour_read_names is currently borked, I tried putitng long long into double but that brings up valgrind errors ( I think).
Try using bit64:
https://stackoverflow.com/questions/49015493/integer64-and-rcpp-compatibility
and let the user worry about it.
Please fix from CRAN
We see (gcc 8)
sds_io.cpp: In function ‘Rcpp::CharacterVector sds_info_cpp(const char*)’:
sds_io.cpp:48:28: warning: comparison of integer expressions of
different signedness: ‘size_t’ {aka ‘long unsigned int’} and ‘int’
[-Wsign-compare]
for (size_t ii = 0; ii < dscount; ii++) {
~~~^~~~~~~~~
I don't see why you picked a 64-bit unsigned type with a 32-bit signed
bound but if int does not suffice to enumerate them, this needs revision.
Follow-up Q: is this code copied from GDAL itself?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.