Giter Club home page Giter Club logo

Comments (29)

JosiahParry avatar JosiahParry commented on September 22, 2024 1

Related georust/geo#1055

from stplanr.

Robinlovelace avatar Robinlovelace commented on September 22, 2024 1

Great to see the upstream-of-the-upstream fix in there!

from stplanr.

JosiahParry avatar JosiahParry commented on September 22, 2024 1

I will now work to get an initial version up into CRAN in the next week or so. You will see a huge performance increase if you use rsgeo for line segmentization. Line segmentization using rsgeo is done in parallel in addition to being blazingly fast.
Note there is no facility for using specified line length but thats easy to calculate n from line length.

library(stplanr)

l <- routes_fast_sf[2, "geometry"] |> 
  sf::st_as_sf()

library(rsgeo)

lrs <- as_rsgeo(l$geometry)

bench::mark(
  line_segment(l, 25),
  line_segmentize(lrs, 25),
  check = F
)
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
#> # A tibble: 2 × 6
#>   expression                    min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>               <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 line_segment(l, 25)       31.65ms  31.82ms      31.2    6.63MB    114. 
#> 2 line_segmentize(lrs, 25)   3.81µs   4.71µs  170137.    10.86KB     17.0

from stplanr.

Robinlovelace avatar Robinlovelace commented on September 22, 2024 1

You don't often find a 4 order of magnitude speed-up in computing these days but that seems to be what this. Amazing work Josiah!

from stplanr.

Robinlovelace avatar Robinlovelace commented on September 22, 2024 1

Suggested solution: add {rsgeo} as a Suggests and if it's installed use that version in place of stplanr::line_segment() here:

line_segment <- function(

That way we reduce dependencies while getting the speed-up when needed. Sound like a plan @JosiahParry ? I may do a pair programming session with @wangzhao0217 on this later today.

from stplanr.

Robinlovelace avatar Robinlovelace commented on September 22, 2024 1

@JosiahParry I'm on the case, simpler reprex incoming.

from stplanr.

Robinlovelace avatar Robinlovelace commented on September 22, 2024 1

Visually appealing networks in need of merging, great example Zhao!

image

from stplanr.

Robinlovelace avatar Robinlovelace commented on September 22, 2024 1

Aha. Yes. That will save a LOT of time, and IOU an apology: you already implemented it in a vectorised way, sorry!

n = runif(nrow(rnet_x), 1, 10) |> round()
rnet_xr = rsgeo::as_rsgeo(sf::st_cast(rnet_x, "LINESTRING"))
a = rsgeo::line_segmentize(rnet_xr, n = 2)
b = rsgeo::line_segmentize(rnet_xr, n = n)
a_sf = sf::st_as_sfc(a)
b_sf = sf::st_as_sfc(b)
length(a_sf |> sf::st_cast("LINESTRING"))
length(b_sf |> sf::st_cast("LINESTRING"))

from stplanr.

Robinlovelace avatar Robinlovelace commented on September 22, 2024 1

For completeness, here's the same reprex with latest version showing, indeed ~50x speed-up:

remotes::install_dev("stplanr")
#> Using github PAT from envvar GITHUB_PAT
#> Skipping install of 'stplanr' from a github remote, the SHA1 (f72e6aef) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(stplanr)
library(sf)
#> Linking to GEOS 3.11.1, GDAL 3.6.4, PROJ 9.1.1; sf_use_s2() is TRUE
rnet_x = sf::read_sf("https://github.com/nptscot/npt/releases/download/v2/rnet_x_thurso.geojson")
rnet_y = sf::read_sf("https://github.com/nptscot/npt/releases/download/v2/rnet_y_thurso.geojson")

name_list = names(rnet_y)
funs = list()

# Loop through each name and assign it a function based on specific conditions
for (name in name_list) {
  if (name == "geometry") {
    next  # Skip the current iteration
  } else if (name %in% c("Gradient", "Quietness")) {
    funs[[name]] = mean
  } else {
    funs[[name]] = sum
  }
}

nrow(rnet_x)
#> [1] 913
#> [1] 913
nrow(rnet_y)
#> [1] 639
#> [1] 639

runtime = system.time({
  rnet_merged = rnet_merge(rnet_x, rnet_y, dist = 20, segment_length = 10, funs = funs, max_angle_diff = 20)
})
#> The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
#> which was just loaded, were retired in October 2023.
#> Please refer to R-spatial evolution reports for details, especially
#> https://r-spatial.org/r/2023/05/15/evolution4.html.
#> It may be desirable to make the sf package available;
#> package maintainers should consider adding sf to Suggests:.
#> Warning: st_centroid assumes attributes are constant over geometries
#> Joining with `by = join_by(identifier)`

nrow(rnet_x) / runtime[3]
#>  elapsed 
#> 316.9039

Created on 2023-10-06 with reprex v2.0.2

from stplanr.

Robinlovelace avatar Robinlovelace commented on September 22, 2024

Let us know how you get on either way...

from stplanr.

JosiahParry avatar JosiahParry commented on September 22, 2024

Just wanted to provide an update: rsgeo is now stable on CRAN with binaries built for mac and windows. Ubuntu 22 binaries can be accessed via r-universe for those without cargo. The upstream feature has been merged into geo via georust/geo#1055 as well. So it will continue to live on there and not be independently implemented by rsgeo.

from stplanr.

Robinlovelace avatar Robinlovelace commented on September 22, 2024

Wow!

from stplanr.

Robinlovelace avatar Robinlovelace commented on September 22, 2024

Re-opening because it's still a bit slow. Heads-up @wangzhao0217, who provided reproducible examples in #538.

from stplanr.

Robinlovelace avatar Robinlovelace commented on September 22, 2024

@wangzhao0217 can you let us know what versions of stplanr and rsgeo you have installed? You can do that with:

pkgs = devtools::package_info("stplanr")
dplyr::filter(pkgs, package == "stplanr")
#>  package * version    date (UTC) lib source
#>  stplanr   1.1.2.9000 2023-10-02 [1] Github (ropensci/stplanr@b88dbc8)
#> 
#>  [1] /home/robin/R/x86_64-pc-linux-gnu-library/4.3
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
packageVersion("rsgeo")
#> [1] '0.1.6.9000'

Created on 2023-10-04 with reprex v2.0.2

from stplanr.

JosiahParry avatar JosiahParry commented on September 22, 2024

@wangzhao0217 are there any simpler reproducible examples you can provide? I'm trying to run your code up until assigning rnet_xp_clip and its quite slow to get to there. I will note that I see you removed the transformation to a projected CRS. The rsgeo implementation wont kick in unless you're working with planar coordinates.

from stplanr.

Robinlovelace avatar Robinlovelace commented on September 22, 2024

And one showing one of the attributes to merge, this is cool.

image

from stplanr.

Robinlovelace avatar Robinlovelace commented on September 22, 2024

Confirmed: it is still pretty slow...

from stplanr.

Robinlovelace avatar Robinlovelace commented on September 22, 2024

Got a result tho! Check this out: https://rpubs.com/RobinLovelace/1093048

from stplanr.

Robinlovelace avatar Robinlovelace commented on September 22, 2024

@JosiahParry reprex, ~ 1 minute for only 1k lines : (

Imagine my code on the {stplanr} side, not {rsgeo} is to blame, note the pblapply burried in there with slow progress bar:

library(stplanr)
rnet_x = sf::read_sf("https://github.com/nptscot/npt/releases/download/v2/rnet_x_thurso.geojson")
rnet_y = sf::read_sf("https://github.com/nptscot/npt/releases/download/v2/rnet_y_thurso.geojson")

name_list = names(rnet_y)
funs = list()

# Loop through each name and assign it a function based on specific conditions
for (name in name_list) {
  if (name == "geometry") {
    next  # Skip the current iteration
  } else if (name %in% c("Gradient", "Quietness")) {
    funs[[name]] = mean
  } else {
    funs[[name]] = sum
  }
}

rnet_merged = rnet_merge(rnet_x, rnet_y, dist = 20, segment_length = 10, funs = funs, max_angle_diff = 20)

from stplanr.

JosiahParry avatar JosiahParry commented on September 22, 2024

I see! I'll give it a perusal soon. It took 28 secs on my M1.
Which isn't terrible i guess but if this is only 1k lines :/

image

from stplanr.

Robinlovelace avatar Robinlovelace commented on September 22, 2024

I get only around 10 segments per second. If you speak to @dabreegster and others that is v. slow and I tend to agree.

runtime = system.time({
  rnet_merged = rnet_merge(rnet_x, rnet_y, dist = 20, segment_length = 10, funs = funs, max_angle_diff = 20)
})
nrow(rnet_x) / runtime[3]
#>  elapsed 
#> 9.585302

Full reprex with time calculation below.

library(stplanr)
rnet_x = sf::read_sf("https://github.com/nptscot/npt/releases/download/v2/rnet_x_thurso.geojson")
rnet_y = sf::read_sf("https://github.com/nptscot/npt/releases/download/v2/rnet_y_thurso.geojson")

name_list = names(rnet_y)
funs = list()

# Loop through each name and assign it a function based on specific conditions
for (name in name_list) {
  if (name == "geometry") {
    next  # Skip the current iteration
  } else if (name %in% c("Gradient", "Quietness")) {
    funs[[name]] = mean
  } else {
    funs[[name]] = sum
  }
}

nrow(rnet_x)
#> [1] 913
nrow(rnet_y)
#> [1] 639

runtime = system.time({
  rnet_merged = rnet_merge(rnet_x, rnet_y, dist = 20, segment_length = 10, funs = funs, max_angle_diff = 20)
})
#> Warning: attribute variables are assumed to be spatially constant throughout
#> all geometries
#> Warning in st_cast.sf(sf::st_cast(x, "MULTILINESTRING"), "LINESTRING"):
#> repeating attributes for all sub-geometries for which they may not be constant
#> Joining with `by = join_by(identifier)`
#> Linking to GEOS 3.11.1, GDAL 3.6.4, PROJ 9.1.1; sf_use_s2() is TRUE
#> Warning: st_centroid assumes attributes are constant over geometries
#> Joining with `by = join_by(identifier)`
# Segments per second:
nrow(rnet_x) / runtime[3]
#>  elapsed 
#> 9.585302

Created on 2023-10-04 with reprex v2.0.2

from stplanr.

Robinlovelace avatar Robinlovelace commented on September 22, 2024

Worth parallelising? There may well be a bit of code that could be heavily refactored before going down that route...

from stplanr.

Robinlovelace avatar Robinlovelace commented on September 22, 2024

This is the iterator that takes up most of the time. Problem with segmentize on the Rust side: it doesn't take max length so there's lots of calculation. For that reason I think it's worth thinking about going back to the GRASS v.split function.

stplanr/R/linefuns.R

Lines 192 to 207 in a85cfd4

res_list = pbapply::pblapply(seq(n_row_l), function(i) {
if (debug_mode) {
message(paste0("Processing row ", i, " of ", n_row_l))
}
# if( i == 108) {
# browser()
# }
l_segmented = line_segment1(l[i, ], n_segments = NA, segment_length = segment_length, use_rsgeo)
res_names <- names(sf::st_drop_geometry(l_segmented))
# Work-around for https://github.com/ropensci/stplanr/issues/531
if (i == 1) {
res_names <<- names(sf::st_drop_geometry(l_segmented))
}
l_segmented = l_segmented[res_names]
l_segmented
})

from stplanr.

JosiahParry avatar JosiahParry commented on September 22, 2024

Have you looked at a flame graph to see where the time is actually spent? I'm not familiar with these functions yet so I can't provide good guidance. I'll review tonight and tomorrow

from stplanr.

JosiahParry avatar JosiahParry commented on September 22, 2024

Line 199. Looks like each line is segmented per iteration. This means that the vectorization isn't being used. And a lot of overhead is probably being spent converting between geometry types.

Sent from the gym....will test the hypothesis later 🤪

from stplanr.

Robinlovelace avatar Robinlovelace commented on September 22, 2024

This means that the vectorization isn't being used.

True but does the the Rust implementation of segmentize allow a vector of n? Will find out!

from stplanr.

JosiahParry avatar JosiahParry commented on September 22, 2024

Yup! Feed it n linestrings you'll get n multilinestrings!

from stplanr.

Robinlovelace avatar Robinlovelace commented on September 22, 2024

Going to give it a go...

from stplanr.

JosiahParry avatar JosiahParry commented on September 22, 2024

Using the line_funs as written in this commit from my former branch this is the behavior I get:
https://github.com/JosiahParry/stplanr/blob/98f065aa77652c23c5597ddd6cbbae3864c4d2d2/R/linefuns.R

There appears to be an issue with rnet_subset() at some place

library(stplanr)
library(sf)
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
#> Linking to GEOS 3.11.1, GDAL 3.6.4, PROJ 9.1.1; sf_use_s2() is TRUE
rnet_x = sf::read_sf("https://github.com/nptscot/npt/releases/download/v2/rnet_x_thurso.geojson")
rnet_y = sf::read_sf("https://github.com/nptscot/npt/releases/download/v2/rnet_y_thurso.geojson")

# Convert to projected geometry:
rnet_xp = st_transform(rnet_x, "EPSG:27700")
rnet_yp = st_transform(rnet_y, "EPSG:27700")

bench::mark(
  rsgeo = rnet_join(rnet_xp, rnet_yp, subset_x = FALSE),
  sf = rnet_join(rnet_x, rnet_y, subset_x = FALSE),
  check = FALSE
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 rsgeo        29.7ms   32.4ms      30.4    2.47MB     13.3
#> 2 sf           85.5ms   90.9ms      11.0    5.69MB     14.7

# theres a bug in the sf implementation for projected
rnet_join(rnet_x, rnet_y, subset_x = TRUE)
#> Error in wk_handle.wk_wkb(wkb, s2_geography_writer(oriented = oriented, : Loop 104 is not valid: Edge 25 is degenerate (duplicate vertex)

# but not for projected
rnet_join(rnet_xp, rnet_yp, subset_x = TRUE)
#> Warning: attribute variables are assumed to be spatially constant throughout
#> all geometries
#> Warning in st_cast.sf(sf::st_cast(x, "MULTILINESTRING"), "LINESTRING"):
#> repeating attributes for all sub-geometries for which they may not be constant
#> Joining with `by = join_by(identifier)`
#> Simple feature collection with 514 features and 28 fields
#> Geometry type: POLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 307453.9 ymin: 958801.6 xmax: 318252.3 ymax: 970009.1
#> Projected CRS: OSGB36 / British National Grid
#> # A tibble: 514 × 29
#>    identifier                                    geometry commute_fastest_bicy…¹
#>  * <chr>                                    <POLYGON [m]>                  <dbl>
#>  1 11098166-9EEC-4E90-A7A2-FD8… ((311850.8 959961.9, 311…                     NA
#>  2 EBFC94A3-C88B-410C-ACF5-B84… ((315281 960575.1, 31528…                     NA
#>  3 1EE3C11B-A6CE-4899-94A6-DD3… ((315039.1 960553.7, 315…                     NA
#>  4 DC997373-9C9D-4FE6-93F8-4A1… ((315225.1 960686.2, 315…                      3
#>  5 33250B29-BBE4-402B-A32E-F63… ((313554.8 959510.2, 313…                     NA
#>  6 FCAD7998-A10F-4198-AFBA-902… ((313476.4 959589.7, 313…                     NA
#>  7 C5422867-9F42-4031-AB68-990… ((313707.7 959722.8, 313…                     NA
#>  8 E5B4ED0E-B796-4DE5-B7FE-CCB… ((313880.8 959848.4, 314…                     NA
#>  9 D649FC4E-DB02-4E70-BEFB-4A1… ((314154.1 960017.8, 314…                     NA
#> 10 B1602061-942D-4FF3-9C88-047… ((313541.7 960192, 31354…                      0
#> # ℹ 504 more rows
#> # ℹ abbreviated name: ¹​commute_fastest_bicycle
#> # ℹ 26 more variables: commute_fastest_bicycle_go_dutch <dbl>,
#> #   commute_balanced_bicycle <dbl>, commute_balanced_bicycle_go_dutch <dbl>,
#> #   commute_quietest_bicycle <dbl>, commute_quietest_bicycle_go_dutch <dbl>,
#> #   commute_ebike_bicycle <dbl>, commute_ebike_bicycle_go_dutch <dbl>,
#> #   school_fastest_bicycle <dbl>, school_fastest_bicycle_go_dutch <dbl>, …

Created on 2023-10-04 with reprex v2.0.2

from stplanr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.