Giter Club home page Giter Club logo

Comments (7)

jeffreyhanson avatar jeffreyhanson commented on May 29, 2024

Yeah, I can set up a run, and then send you the code, log file, and run times.

from wdpar.

jeffreyhanson avatar jeffreyhanson commented on May 29, 2024

I've just done a run of the global database using the example R script distributed with the package (see https://github.com/prioritizr/wdpar/blob/master/inst/scripts/global-example-script.R). I've copied in the log file below and included the session information too. Since this was run on a server with 60 GB RAM, it's relatively fast because all the processing can be done in RAM without resorting to swap space. Let me know if you need any further details?


Log file

R version 4.1.2 (2021-11-01) -- "Bird Hippie"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> # System command to execute:
> # R CMD BATCH --no-restore --no-save global-example-script.R
> 
> # Initialization
> ## define countries for processing data
> country_names <- "global"
> 
> ## define file path to save data
> path <- paste0(
+   "~/wdpa-data/global-", format(Sys.time(), "%Y-%m-%d"), ".gpkg"
+ )
> 
> ## load packages
> library(sf)
Linking to GEOS 3.10.2, GDAL 3.4.3, PROJ 8.2.0; sf_use_s2() is TRUE
> library(wdpar)
> 
> # Preliminary processing
> ## prepare folder if needed
> export_dir <- suppressWarnings(normalizePath(dirname(path)))
> if (!file.exists(export_dir)) {
+   dir.create(export_dir, showWarnings = FALSE, recursive = TRUE)
+ }
> 
> ## preapre user data directory
> data_dir <- rappdirs::user_data_dir("wdpar")
> if (!file.exists(data_dir)) {
+   dir.create(data_dir, showWarnings = FALSE, recursive = TRUE)
+ }
> 
> # Main processing
> ## download data
> raw_data <- wdpa_fetch(
+   country_names, wait = TRUE, download_dir = data_dir, verbose = TRUE
+ )
 [100%] Downloaded 194 bytes...
 [100%] Downloaded 1537392988 bytes...


Warning message:
In CPL_read_ogr(dsn, layer, query, as.character(options), quiet,  :
  GDAL Message 1: organizePolygons() received a polygon with more than 100 parts. The processing may be really slow.  You can skip the processing by setting METHOD=SKIP, or only make it analyze counter-clock wise parts by setting METHOD=ONLY_CCW if you can assume that the outline of holes is counter-clock wise defined
> 
> ## clean data
> result_data <- wdpa_clean(raw_data, erase_overlaps = FALSE, verbose = TRUE)
ℹ initializing
✔ initializing [36ms]

ℹ retaining only areas with specified statuses
✔ retaining only areas with specified statuses [17.9s]

ℹ removing UNESCO Biosphere Reserves
✔ removing UNESCO Biosphere Reserves [18.8s]

ℹ removing points with no reported area
✔ removing points with no reported area [18.1s]

ℹ wrapping dateline
✔ wrapping dateline [4m 48.2s]

ℹ repairing geometry
✔ repairing geometry [31m 16.9s]

ℹ reprojecting data
✔ reprojecting data [29.2s]

ℹ repairing geometry
✔ repairing geometry [10m 21s]

ℹ further geometry fixes (i.e. buffering by zero)
✔ further geometry fixes (i.e. buffering by zero) [6m 19.2s]

ℹ buffering points to reported area
✔ buffering points to reported area [48.8s]

ℹ repairing geometry
✔ repairing geometry [8m 8.8s]

ℹ snapping geometry to tolerance
✔ snapping geometry to tolerance [15s]

ℹ repairing geometry
✔ repairing geometry [11m 57s]

ℹ formatting attribute data
✔ formatting attribute data [50ms]

ℹ removing slivers
✔ removing slivers [13.1s]

ℹ calculating spatial statistics
✔ calculating spatial statistics [6.6s]

> 
> # Exports
> ## save result
> sf::write_sf(result_data, path, overwrite = TRUE)
> 
> proc.time()
    user   system  elapsed 
4583.412  201.009 4876.970 

Session information

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] wdpar_1.3.3 sf_1.0-8   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9         magrittr_2.0.3     units_0.8-0        tidyselect_1.1.1  
 [5] R6_2.5.1           rlang_0.4.12       fansi_0.5.0        dplyr_1.0.7       
 [9] tools_4.1.2        grid_4.1.2         KernSmooth_2.23-20 utf8_1.2.2        
[13] e1071_1.7-11       DBI_1.1.3          ellipsis_0.3.2     class_7.3-19      
[17] assertthat_0.2.1   tibble_3.1.6       lifecycle_1.0.1    crayon_1.4.2      
[21] purrr_0.3.4        vctrs_0.3.8        glue_1.5.1         proxy_0.4-27      
[25] compiler_4.1.2     pillar_1.6.4       generics_0.1.1     classInt_0.4-7    
[29] pkgconfig_2.0.3 

from wdpar.

Jo-Schie avatar Jo-Schie commented on May 29, 2024

Sorry. Maybe I oversaw it...but how many Polygons did you process? This was not the global WDPA right?

from wdpar.

Jo-Schie avatar Jo-Schie commented on May 29, 2024

Ah okay. Got it. It is the global data but without unionizing. Hummm. I thought your section on big data and processing overnight was also referring to unionizing ie ereasing overlaps. Did you ever do that for the global data?

If so, could there be a method for more or less benchmarking it? I know it gets more complex now...

from wdpar.

jeffreyhanson avatar jeffreyhanson commented on May 29, 2024

Yeah that's right. The resulting dataset contains 272,466 protected areas. Yeah, I have tried running the global data with erase_overlaps = TRUE and it doesn't work - the geometry processing dies due to (extremely) invalid geometries and I couldn't find a work around. To address this, the package documentation recommends using erase_overlaps = FALSE for large datasets (eg., https://prioritizr.github.io/wdpar/articles/wdpar.html#recommended-practices-for-large-datasets, https://prioritizr.github.io/wdpar/reference/wdpa_clean.html#recommended-practices-for-large-datasets-1) and provides advice for post-processing (e.g., using wdpa_dissolve() to take care of overlaps).

from wdpar.

Jo-Schie avatar Jo-Schie commented on May 29, 2024

Seems fine to me. Thanks for the proof.

from wdpar.

jeffreyhanson avatar jeffreyhanson commented on May 29, 2024

Brilliant - thanks!

from wdpar.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.