Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

st_join() compatible function #35

Open
JosiahParry opened this issue Feb 25, 2024 · 3 comments
Open

st_join() compatible function #35

JosiahParry opened this issue Feb 25, 2024 · 3 comments

Comments

@JosiahParry
Copy link
Collaborator

I think there could be a use case for providing a function that can be passed to st_join(). This could be handy for folks who strictly just want to join two datasets together and do the work themselves.

Here's a simple reprex. Though it definitely doesn't have the performance I'd like i think it illustrates the point. This could allow users to filter matches themselves or do whatever post processing to matches that they'd like and then use that as join keys.

Doing this makes me think that it would be helpful if rnet_match() returned the weights e.g. shared length / length(source) and length(shared) / length(target)

library(sf)
#> Warning: package 'sf' was built under R version 4.3.1
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(rnetmatch)
# box to crop geometry to
crop_box <- st_bbox(c("xmin" = 427200, xmax = 427500, ymin = 433550, ymax = 433700))

rnet_y <- "https://raw.githubusercontent.com/nptscot/networkmerge/main/data/rnet_armley.geojson" |> 
  read_sf() |>
  # st_geometry() |> 
  st_transform(27700) |> 
  st_crop(crop_box)
#> Warning: attribute variables are assumed to be spatially constant throughout
#> all geometries

rnet_x <- "https://raw.githubusercontent.com/nptscot/networkmerge/main/data/rnet_armley_line.geojson" |>
  read_sf() |> 
  st_crop(crop_box)


# create matches
matches <- rnetmatch::rnet_match(rnet_x, rnet_y, 10, 10)

# function with signature to work with st_join
match_keys <- function(.x = NULL, .y = NULL, matches) {
  dplyr::left_join(tibble::tibble(i = 1:max(matches$i)), matches) |> 
    dplyr::group_by(i) |> 
    dplyr::summarise(j = list(c(j))) |> 
    tibble::deframe() |> 
    lapply(\(.x) {
      if (all(is.na(.x))) {
        integer()
      } else {
        .x
      }
    })
}

st_join(
  rnet_x,
  rnet_y,
  join = match_keys,
  # pass matches via dots 
  matches = matches
)
#> Joining with `by = join_by(i)`
#> Simple feature collection with 29 features and 7 fields
#> Geometry type: LINESTRING
#> Dimension:     XY
#> Bounding box:  xmin: 427200 ymin: 433550 xmax: 427500 ymax: 433700
#> Projected CRS: OSGB36 / British National Grid
#> # A tibble: 29 × 8
#>                           geometry local_id bicycle govtarget_slc govnearmkt_slc
#>  *                <LINESTRING [m]>    <int>   <dbl>         <dbl>          <dbl>
#>  1 (427200 433651.8, 427220.3 433…    20808      22            30             28
#>  2 (427200 433651.8, 427220.3 433…    20799      22            26             25
#>  3 (427234.6 433571.2, 427230.8 4…    22076      43            71             63
#>  4 (427220.3 433645.4, 427232.3 4…    20799      22            26             25
#>  5 (427220.3 433645.4, 427232.3 4…    20518      20            27             25
#>  6 (427220.3 433645.4, 427232.3 4…     6724       0             1              1
#>  7 (427220.3 433645.4, 427232.3 4…    22658     133           222            192
#>  8 (427220.3 433645.4, 427232.3 4…    20808      22            30             28
#>  9 (427220.3 433645.4, 427232.3 4…    10212       2             2              2
#> 10 (427220.3 433645.4, 427226.7 4…    14447       5             7              7
#> # ℹ 19 more rows
#> # ℹ 3 more variables: gendereq_slc <dbl>, dutch_slc <dbl>, ebike_slc <dbl>

Created on 2024-02-25 with reprex v2.0.2

@Robinlovelace
Copy link
Collaborator

This is a great idea. Will run the reprex now and have a play to check I understand what's going on.

@Robinlovelace
Copy link
Collaborator

Reprex submitted and in #38

Results look really good! The next step would be user-defined group_by(id) |> summarise(...) steps giving ultimate flexibility, right?

image

@Robinlovelace
Copy link
Collaborator

Follow-up question for you @JosiahParry do we even need st_join() in this example? I mean the group_by and then aggregate can be done without the st_join() step, avoiding the costly duplication of rnet_x$geometry.

gradually getting back up to speed with this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants