Comments (7)
Yeah, I can set up a run, and then send you the code, log file, and run times.
from wdpar.
I've just done a run of the global database using the example R script distributed with the package (see https://github.com/prioritizr/wdpar/blob/master/inst/scripts/global-example-script.R). I've copied in the log file below and included the session information too. Since this was run on a server with 60 GB RAM, it's relatively fast because all the processing can be done in RAM without resorting to swap space. Let me know if you need any further details?
Log file
R version 4.1.2 (2021-11-01) -- "Bird Hippie"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> # System command to execute:
> # R CMD BATCH --no-restore --no-save global-example-script.R
>
> # Initialization
> ## define countries for processing data
> country_names <- "global"
>
> ## define file path to save data
> path <- paste0(
+ "~/wdpa-data/global-", format(Sys.time(), "%Y-%m-%d"), ".gpkg"
+ )
>
> ## load packages
> library(sf)
Linking to GEOS 3.10.2, GDAL 3.4.3, PROJ 8.2.0; sf_use_s2() is TRUE
> library(wdpar)
>
> # Preliminary processing
> ## prepare folder if needed
> export_dir <- suppressWarnings(normalizePath(dirname(path)))
> if (!file.exists(export_dir)) {
+ dir.create(export_dir, showWarnings = FALSE, recursive = TRUE)
+ }
>
> ## preapre user data directory
> data_dir <- rappdirs::user_data_dir("wdpar")
> if (!file.exists(data_dir)) {
+ dir.create(data_dir, showWarnings = FALSE, recursive = TRUE)
+ }
>
> # Main processing
> ## download data
> raw_data <- wdpa_fetch(
+ country_names, wait = TRUE, download_dir = data_dir, verbose = TRUE
+ )
[100%] Downloaded 194 bytes...
[100%] Downloaded 1537392988 bytes...
Warning message:
In CPL_read_ogr(dsn, layer, query, as.character(options), quiet, :
GDAL Message 1: organizePolygons() received a polygon with more than 100 parts. The processing may be really slow. You can skip the processing by setting METHOD=SKIP, or only make it analyze counter-clock wise parts by setting METHOD=ONLY_CCW if you can assume that the outline of holes is counter-clock wise defined
>
> ## clean data
> result_data <- wdpa_clean(raw_data, erase_overlaps = FALSE, verbose = TRUE)
ℹ initializing
✔ initializing [36ms]
ℹ retaining only areas with specified statuses
✔ retaining only areas with specified statuses [17.9s]
ℹ removing UNESCO Biosphere Reserves
✔ removing UNESCO Biosphere Reserves [18.8s]
ℹ removing points with no reported area
✔ removing points with no reported area [18.1s]
ℹ wrapping dateline
✔ wrapping dateline [4m 48.2s]
ℹ repairing geometry
✔ repairing geometry [31m 16.9s]
ℹ reprojecting data
✔ reprojecting data [29.2s]
ℹ repairing geometry
✔ repairing geometry [10m 21s]
ℹ further geometry fixes (i.e. buffering by zero)
✔ further geometry fixes (i.e. buffering by zero) [6m 19.2s]
ℹ buffering points to reported area
✔ buffering points to reported area [48.8s]
ℹ repairing geometry
✔ repairing geometry [8m 8.8s]
ℹ snapping geometry to tolerance
✔ snapping geometry to tolerance [15s]
ℹ repairing geometry
✔ repairing geometry [11m 57s]
ℹ formatting attribute data
✔ formatting attribute data [50ms]
ℹ removing slivers
✔ removing slivers [13.1s]
ℹ calculating spatial statistics
✔ calculating spatial statistics [6.6s]
>
> # Exports
> ## save result
> sf::write_sf(result_data, path, overwrite = TRUE)
>
> proc.time()
user system elapsed
4583.412 201.009 4876.970
Session information
> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] wdpar_1.3.3 sf_1.0-8
loaded via a namespace (and not attached):
[1] Rcpp_1.0.9 magrittr_2.0.3 units_0.8-0 tidyselect_1.1.1
[5] R6_2.5.1 rlang_0.4.12 fansi_0.5.0 dplyr_1.0.7
[9] tools_4.1.2 grid_4.1.2 KernSmooth_2.23-20 utf8_1.2.2
[13] e1071_1.7-11 DBI_1.1.3 ellipsis_0.3.2 class_7.3-19
[17] assertthat_0.2.1 tibble_3.1.6 lifecycle_1.0.1 crayon_1.4.2
[21] purrr_0.3.4 vctrs_0.3.8 glue_1.5.1 proxy_0.4-27
[25] compiler_4.1.2 pillar_1.6.4 generics_0.1.1 classInt_0.4-7
[29] pkgconfig_2.0.3
from wdpar.
Sorry. Maybe I oversaw it...but how many Polygons did you process? This was not the global WDPA right?
from wdpar.
Ah okay. Got it. It is the global data but without unionizing. Hummm. I thought your section on big data and processing overnight was also referring to unionizing ie ereasing overlaps. Did you ever do that for the global data?
If so, could there be a method for more or less benchmarking it? I know it gets more complex now...
from wdpar.
Yeah that's right. The resulting dataset contains 272,466 protected areas. Yeah, I have tried running the global data with erase_overlaps = TRUE
and it doesn't work - the geometry processing dies due to (extremely) invalid geometries and I couldn't find a work around. To address this, the package documentation recommends using erase_overlaps = FALSE
for large datasets (eg., https://prioritizr.github.io/wdpar/articles/wdpar.html#recommended-practices-for-large-datasets, https://prioritizr.github.io/wdpar/reference/wdpa_clean.html#recommended-practices-for-large-datasets-1) and provides advice for post-processing (e.g., using wdpa_dissolve()
to take care of overlaps).
from wdpar.
Seems fine to me. Thanks for the proof.
from wdpar.
Brilliant - thanks!
from wdpar.
Related Issues (20)
- Fails to download country-level data HOT 2
- Fails package checks under noSuggests config
- Poor internet connection breaks wdpa_fetch HOT 9
- Port error HOT 10
- It takes forever to eraseoverlap for global dataset. HOT 22
- wdpa_fetch HOT 6
- I used the newest version of wdpar, still have this "side location conflict" HOT 7
- The package not run for previous download data, and do not start a new download data HOT 6
- GEOS version sensitivity HOT 7
- upcoming sf breaks wdpar HOT 6
- JOSS Review: Improve documentation on geo-processing steps and its effects on the original geometries HOT 12
- JOSS Review: Improve Statement of need / description of use-cases HOT 16
- JOSS Review - Add links to references cited in README HOT 3
- JOSS Review warning about out of date local data HOT 6
- JOSS Review: Add statement about state of the field HOT 4
- wdpa_fetch() no longer works HOT 4
- Dissolve all MPAs in each country HOT 10
- Erase overlaps keeps failing HOT 19
- Error when running wdpa_fetch() HOT 14
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wdpar.