-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed-up rnet_merge()
#522
Comments
Let us know how you get on either way... |
Related georust/geo#1055 |
Great to see the upstream-of-the-upstream fix in there! |
I will now work to get an initial version up into CRAN in the next week or so. You will see a huge performance increase if you use rsgeo for line segmentization. Line segmentization using rsgeo is done in parallel in addition to being blazingly fast. library(stplanr)
l <- routes_fast_sf[2, "geometry"] |>
sf::st_as_sf()
library(rsgeo)
lrs <- as_rsgeo(l$geometry)
bench::mark(
line_segment(l, 25),
line_segmentize(lrs, 25),
check = F
)
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 line_segment(l, 25) 31.65ms 31.82ms 31.2 6.63MB 114.
#> 2 line_segmentize(lrs, 25) 3.81µs 4.71µs 170137. 10.86KB 17.0 |
You don't often find a 4 order of magnitude speed-up in computing these days but that seems to be what this. Amazing work Josiah! |
Just wanted to provide an update: rsgeo is now stable on CRAN with binaries built for mac and windows. Ubuntu 22 binaries can be accessed via r-universe for those without cargo. The upstream feature has been merged into geo via georust/geo#1055 as well. So it will continue to live on there and not be independently implemented by rsgeo. |
Wow! |
Suggested solution: add Line 170 in af01a8c
That way we reduce dependencies while getting the speed-up when needed. Sound like a plan @JosiahParry ? I may do a pair programming session with @wangzhao0217 on this later today. |
Re-opening because it's still a bit slow. Heads-up @wangzhao0217, who provided reproducible examples in #538. |
@wangzhao0217 can you let us know what versions of pkgs = devtools::package_info("stplanr")
dplyr::filter(pkgs, package == "stplanr")
#> package * version date (UTC) lib source
#> stplanr 1.1.2.9000 2023-10-02 [1] Github (ropensci/stplanr@b88dbc8)
#>
#> [1] /home/robin/R/x86_64-pc-linux-gnu-library/4.3
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library
packageVersion("rsgeo")
#> [1] '0.1.6.9000' Created on 2023-10-04 with reprex v2.0.2 |
@wangzhao0217 are there any simpler reproducible examples you can provide? I'm trying to run your code up until assigning |
@JosiahParry I'm on the case, simpler reprex incoming. |
Confirmed: it is still pretty slow... |
Got a result tho! Check this out: https://rpubs.com/RobinLovelace/1093048 |
@JosiahParry reprex, ~ 1 minute for only 1k lines : ( Imagine my code on the library(stplanr)
rnet_x = sf::read_sf("https://github.com/nptscot/npt/releases/download/v2/rnet_x_thurso.geojson")
rnet_y = sf::read_sf("https://github.com/nptscot/npt/releases/download/v2/rnet_y_thurso.geojson")
name_list = names(rnet_y)
funs = list()
# Loop through each name and assign it a function based on specific conditions
for (name in name_list) {
if (name == "geometry") {
next # Skip the current iteration
} else if (name %in% c("Gradient", "Quietness")) {
funs[[name]] = mean
} else {
funs[[name]] = sum
}
}
rnet_merged = rnet_merge(rnet_x, rnet_y, dist = 20, segment_length = 10, funs = funs, max_angle_diff = 20) |
I get only around 10 segments per second. If you speak to @dabreegster and others that is v. slow and I tend to agree. runtime = system.time({
rnet_merged = rnet_merge(rnet_x, rnet_y, dist = 20, segment_length = 10, funs = funs, max_angle_diff = 20)
})
nrow(rnet_x) / runtime[3]
#> elapsed
#> 9.585302 Full reprex with time calculation below. library(stplanr)
rnet_x = sf::read_sf("https://github.com/nptscot/npt/releases/download/v2/rnet_x_thurso.geojson")
rnet_y = sf::read_sf("https://github.com/nptscot/npt/releases/download/v2/rnet_y_thurso.geojson")
name_list = names(rnet_y)
funs = list()
# Loop through each name and assign it a function based on specific conditions
for (name in name_list) {
if (name == "geometry") {
next # Skip the current iteration
} else if (name %in% c("Gradient", "Quietness")) {
funs[[name]] = mean
} else {
funs[[name]] = sum
}
}
nrow(rnet_x)
#> [1] 913
nrow(rnet_y)
#> [1] 639
runtime = system.time({
rnet_merged = rnet_merge(rnet_x, rnet_y, dist = 20, segment_length = 10, funs = funs, max_angle_diff = 20)
})
#> Warning: attribute variables are assumed to be spatially constant throughout
#> all geometries
#> Warning in st_cast.sf(sf::st_cast(x, "MULTILINESTRING"), "LINESTRING"):
#> repeating attributes for all sub-geometries for which they may not be constant
#> Joining with `by = join_by(identifier)`
#> Linking to GEOS 3.11.1, GDAL 3.6.4, PROJ 9.1.1; sf_use_s2() is TRUE
#> Warning: st_centroid assumes attributes are constant over geometries
#> Joining with `by = join_by(identifier)`
# Segments per second:
nrow(rnet_x) / runtime[3]
#> elapsed
#> 9.585302 Created on 2023-10-04 with reprex v2.0.2 |
Worth parallelising? There may well be a bit of code that could be heavily refactored before going down that route... |
This is the iterator that takes up most of the time. Problem with segmentize on the Rust side: it doesn't take max length so there's lots of calculation. For that reason I think it's worth thinking about going back to the GRASS Lines 192 to 207 in a85cfd4
|
Have you looked at a flame graph to see where the time is actually spent? I'm not familiar with these functions yet so I can't provide good guidance. I'll review tonight and tomorrow |
Line 199. Looks like each line is segmented per iteration. This means that the vectorization isn't being used. And a lot of overhead is probably being spent converting between geometry types. Sent from the gym....will test the hypothesis later 🤪 |
True but does the the Rust implementation of segmentize allow a vector of n? Will find out! |
Yup! Feed it n linestrings you'll get n multilinestrings! |
Aha. Yes. That will save a LOT of time, and IOU an apology: you already implemented it in a vectorised way, sorry! n = runif(nrow(rnet_x), 1, 10) |> round()
rnet_xr = rsgeo::as_rsgeo(sf::st_cast(rnet_x, "LINESTRING"))
a = rsgeo::line_segmentize(rnet_xr, n = 2)
b = rsgeo::line_segmentize(rnet_xr, n = n)
a_sf = sf::st_as_sfc(a)
b_sf = sf::st_as_sfc(b)
length(a_sf |> sf::st_cast("LINESTRING"))
length(b_sf |> sf::st_cast("LINESTRING")) |
Going to give it a go... |
Using the line_funs as written in this commit from my former branch this is the behavior I get: There appears to be an issue with library(stplanr)
library(sf)
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
#> Linking to GEOS 3.11.1, GDAL 3.6.4, PROJ 9.1.1; sf_use_s2() is TRUE
rnet_x = sf::read_sf("https://github.com/nptscot/npt/releases/download/v2/rnet_x_thurso.geojson")
rnet_y = sf::read_sf("https://github.com/nptscot/npt/releases/download/v2/rnet_y_thurso.geojson")
# Convert to projected geometry:
rnet_xp = st_transform(rnet_x, "EPSG:27700")
rnet_yp = st_transform(rnet_y, "EPSG:27700")
bench::mark(
rsgeo = rnet_join(rnet_xp, rnet_yp, subset_x = FALSE),
sf = rnet_join(rnet_x, rnet_y, subset_x = FALSE),
check = FALSE
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 rsgeo 29.7ms 32.4ms 30.4 2.47MB 13.3
#> 2 sf 85.5ms 90.9ms 11.0 5.69MB 14.7
# theres a bug in the sf implementation for projected
rnet_join(rnet_x, rnet_y, subset_x = TRUE)
#> Error in wk_handle.wk_wkb(wkb, s2_geography_writer(oriented = oriented, : Loop 104 is not valid: Edge 25 is degenerate (duplicate vertex)
# but not for projected
rnet_join(rnet_xp, rnet_yp, subset_x = TRUE)
#> Warning: attribute variables are assumed to be spatially constant throughout
#> all geometries
#> Warning in st_cast.sf(sf::st_cast(x, "MULTILINESTRING"), "LINESTRING"):
#> repeating attributes for all sub-geometries for which they may not be constant
#> Joining with `by = join_by(identifier)`
#> Simple feature collection with 514 features and 28 fields
#> Geometry type: POLYGON
#> Dimension: XY
#> Bounding box: xmin: 307453.9 ymin: 958801.6 xmax: 318252.3 ymax: 970009.1
#> Projected CRS: OSGB36 / British National Grid
#> # A tibble: 514 × 29
#> identifier geometry commute_fastest_bicy…¹
#> * <chr> <POLYGON [m]> <dbl>
#> 1 11098166-9EEC-4E90-A7A2-FD8… ((311850.8 959961.9, 311… NA
#> 2 EBFC94A3-C88B-410C-ACF5-B84… ((315281 960575.1, 31528… NA
#> 3 1EE3C11B-A6CE-4899-94A6-DD3… ((315039.1 960553.7, 315… NA
#> 4 DC997373-9C9D-4FE6-93F8-4A1… ((315225.1 960686.2, 315… 3
#> 5 33250B29-BBE4-402B-A32E-F63… ((313554.8 959510.2, 313… NA
#> 6 FCAD7998-A10F-4198-AFBA-902… ((313476.4 959589.7, 313… NA
#> 7 C5422867-9F42-4031-AB68-990… ((313707.7 959722.8, 313… NA
#> 8 E5B4ED0E-B796-4DE5-B7FE-CCB… ((313880.8 959848.4, 314… NA
#> 9 D649FC4E-DB02-4E70-BEFB-4A1… ((314154.1 960017.8, 314… NA
#> 10 B1602061-942D-4FF3-9C88-047… ((313541.7 960192, 31354… 0
#> # ℹ 504 more rows
#> # ℹ abbreviated name: ¹commute_fastest_bicycle
#> # ℹ 26 more variables: commute_fastest_bicycle_go_dutch <dbl>,
#> # commute_balanced_bicycle <dbl>, commute_balanced_bicycle_go_dutch <dbl>,
#> # commute_quietest_bicycle <dbl>, commute_quietest_bicycle_go_dutch <dbl>,
#> # commute_ebike_bicycle <dbl>, commute_ebike_bicycle_go_dutch <dbl>,
#> # school_fastest_bicycle <dbl>, school_fastest_bicycle_go_dutch <dbl>, … Created on 2023-10-04 with reprex v2.0.2 |
* Remove comments * Format * Simplify use_rsgeo Heads-up @JosiahParry, I think it works fine on projected data, as long as not too far to the poles? * Speed-up by vectorisation, close #522 * Tidy up * Fixes to line_segment() * Debug edge case error * Fix edge case issue, re-test * Add browser() for debugging * Debugging for #551 * Use geosphere, fix #551 * Document * Make line_bearing work for projected data * Format rnet_join() * Document
For completeness, here's the same reprex with latest version showing, indeed ~50x speed-up: remotes::install_dev("stplanr")
#> Using github PAT from envvar GITHUB_PAT
#> Skipping install of 'stplanr' from a github remote, the SHA1 (f72e6aef) has not changed since last install.
#> Use `force = TRUE` to force installation
library(stplanr)
library(sf)
#> Linking to GEOS 3.11.1, GDAL 3.6.4, PROJ 9.1.1; sf_use_s2() is TRUE
rnet_x = sf::read_sf("https://github.com/nptscot/npt/releases/download/v2/rnet_x_thurso.geojson")
rnet_y = sf::read_sf("https://github.com/nptscot/npt/releases/download/v2/rnet_y_thurso.geojson")
name_list = names(rnet_y)
funs = list()
# Loop through each name and assign it a function based on specific conditions
for (name in name_list) {
if (name == "geometry") {
next # Skip the current iteration
} else if (name %in% c("Gradient", "Quietness")) {
funs[[name]] = mean
} else {
funs[[name]] = sum
}
}
nrow(rnet_x)
#> [1] 913
#> [1] 913
nrow(rnet_y)
#> [1] 639
#> [1] 639
runtime = system.time({
rnet_merged = rnet_merge(rnet_x, rnet_y, dist = 20, segment_length = 10, funs = funs, max_angle_diff = 20)
})
#> The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
#> which was just loaded, were retired in October 2023.
#> Please refer to R-spatial evolution reports for details, especially
#> https://r-spatial.org/r/2023/05/15/evolution4.html.
#> It may be desirable to make the sf package available;
#> package maintainers should consider adding sf to Suggests:.
#> Warning: st_centroid assumes attributes are constant over geometries
#> Joining with `by = join_by(identifier)`
nrow(rnet_x) / runtime[3]
#> elapsed
#> 316.9039 Created on 2023-10-06 with reprex v2.0.2 |
Two options come to mind for doing this:
qgisprocess
forv.split
, orgeos
package (although no direct v.split function)rsgeo
function from @JosiahParry (if ready ; )Challenge for you if you want it @wangzhao0217: install the
qgisprocess
package + https://trac.osgeo.org/osgeo4w/ and try running the code in the readme ofqgisprocess
pkg.The text was updated successfully, but these errors were encountered: