refactor: single bracket querying of a graph (#1465) #1658

schochastics · 2025-01-17T12:37:31Z

This PR refactors single bracket querying of a graph (g[1:3,4:6]) ( #1465).

`[.igraph`

In the old version, the complete adjacency matrix was computed and then a subset created. The refactored function now builds the submatrix directly. This leads to a little speedup and a lower memory footprint.

set.seed(411)
g <- sample_gnp(5000,0.1)
bench::mark(
  check = FALSE,
  new = igraph:::get_adjacency_submatrix(g,1:100,1:100),
  old = as_adjacency_matrix(g)[1:100,1:100]
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 new          1.84ms   1.94ms    452.      4.26MB     24.0
#> 2 old         61.36ms 118.39ms      9.11   210.4MB     38.3

bench::mark(
  check = FALSE,
  new = igraph:::get_adjacency_submatrix(g,i = 1:100,j = 1:100,sparse = FALSE),
  old = as_adjacency_matrix(g,sparse = FALSE)[1:100,1:100]
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 new          1.69ms   1.81ms     462.     3.31MB     5.99
#> 2 old         59.95ms   61.1ms      16.0  190.81MB    16.0

E(g)$weight <- runif(ecount(g))
bench::mark(
  check = FALSE,
  new = igraph:::get_adjacency_submatrix(g,1:100,1:100),
  old = as_adjacency_matrix(g)[1:100,1:100]
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 new          1.88ms   1.99ms     451.      3.2MB     7.98
#> 2 old         55.34ms  62.09ms      14.5   209.7MB    23.6

bench::mark(
  check = FALSE,
  new = igraph:::get_adjacency_submatrix(g,i = 1:100,j = 1:100,sparse = FALSE),
  old = as_adjacency_matrix(g,sparse = FALSE)[1:100,1:100]
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 new           1.7ms    1.8ms     432.     3.31MB     8.00
#> 2 old          56.7ms   57.8ms      13.8  190.81MB    13.8

^{Created on 2025-01-18 with reprex v2.1.1}

aviator-app · 2025-01-17T12:37:33Z

Current Aviator status

Aviator will automatically update this comment as the status of the PR changes.
Comment /aviator refresh to force Aviator to re-examine your PR (or learn about other /aviator commands).

This pull request is currently open (not queued).

How to merge

To merge this PR, comment /aviator merge or add the mergequeue label.

See the real-time status of this PR on the Aviator webapp.

Use the Aviator Chrome Extension to see the status of your PR within GitHub.

krlmlr

Thanks. I see how duplicate i and j indexes add complexity to the new get_adjacency_submatrix() routine. How about the following logic:

we don't compute unique()
instead, we compute adj_out <- adjacent_vertices(x, i, mode = "out") if i is given, and adj_in <- adjacent_vertices(x, j, mode = "in") if j is given
if none are given, we forward to a different existing routine
if only one of i or j is given, we're done
if both are given, we compute vctrs::vec_set_intersect(adj_in, adj_out)

How is the test coverage for this code?

I'd appreciate it if all changes that do not rely on get_adjacency_submatrix() came in one or several separate PRs. I'd like to do a few more iterations here.

R/indexing.R

schochastics · 2025-01-19T06:16:19Z

Manipulating a graph via this logic was moved to #1661

schochastics · 2025-01-22T13:13:15Z

Thanks. I see how duplicate i and j indexes add complexity to the new get_adjacency_submatrix() routine. How about the following logic:
* we don't compute `unique()`

* instead, we compute `adj_out <- adjacent_vertices(x, i, mode = "out")` if `i` is given, and `adj_in <- adjacent_vertices(x, j, mode = "in")` if `j` is given

* if none are given, we forward to a different existing routine

* if only one of `i` or `j` is given, we're done

* if both are given, we compute `vctrs::vec_set_intersect(adj_in, adj_out)`
How is the test coverage for this code?

I'd appreciate it if all changes that do not rely on get_adjacency_submatrix() came in one or several separate PRs. I'd like to do a few more iterations here.

I have tried this logic but always ran into issues for the case of non unique indices.
I tried to simplify a few things in the current solution. If you still think it is too complex, I am happy to give this logic another try

maelle · 2025-01-23T10:15:28Z

devtools::test_coverage_active_file()

krlmlr

get.adjacency.sparse() with a directed graph should involve only one or two copies. We could use that result with regular matrix subsetting and then tweak for the directed case. While this is not ideal, it may well be faster than anything we can come up in R land. Further optimizations are then possible by adding a from or to argument to as_edgelist() (which is called by get.adjacency.sparse()).

schochastics · 2025-01-23T21:10:56Z

I am surprised myself, but the difference between the submatrix routine and get.adjacency.sparse() is quite big. I will try a bit more though to simplify/split the submatrix routine to make it more readable (without too much loss of performance).

pkgload::load_all("~/git/R_packages/rigraph/")
#> ℹ Loading igraph
g <- sample_gnp(5000,0.05, directed = FALSE)

bench::mark(check = FALSE,
  sub_sparse = get_adjacency_submatrix(g,1:100,sparse = TRUE),
  sub_dense = get_adjacency_submatrix(g,1:100,sparse = FALSE),
  full_sparse = as_adjacency_matrix(g,sparse = TRUE)[1:100,]
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 3 × 6
#>   expression       min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 sub_sparse    2.01ms   2.13ms     401.     5.05MB     53.9
#> 2 sub_dense     2.22ms   2.44ms     299.     8.43MB     53.8
#> 3 full_sparse  31.91ms  33.76ms      22.9  105.47MB     66.8

g <- sample_gnp(5000,0.05, directed = TRUE)

bench::mark(check = FALSE,
  sub_sparse = get_adjacency_submatrix(g,1:100,sparse = TRUE),
  sub_dense = get_adjacency_submatrix(g,1:100,sparse = FALSE),
  full_sparse = as_adjacency_matrix(g, sparse = TRUE)[1:100,]
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 3 × 6
#>   expression       min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 sub_sparse    2.03ms   2.15ms     415.     3.97MB     24.0
#> 2 sub_dense     2.21ms   2.39ms     340.     7.79MB     45.8
#> 3 full_sparse  42.55ms   49.4ms      15.1   129.2MB     35.3

^{Created on 2025-01-23 with reprex v2.1.1}

schochastics · 2025-01-27T20:02:37Z

I am now convinced that this is as good as it gets. Any other approach seems to loose too mach performance

krlmlr

Great progress, we're getting there!

tests/testthat/test-indexing.R

R/indexing.R

schochastics · 2025-01-30T20:56:22Z

Non unique i/j are now handled outside of get_adjacency_submatrix(). I think this is indeed very much cleaner.

schochastics marked this pull request as draft January 17, 2025 12:37

schochastics changed the title ~~refactor: speedup single bracket querying of a graph (#1465)~~ refactor: single bracket querying/manipulating of a graph (#1465) Jan 17, 2025

schochastics marked this pull request as ready for review January 18, 2025 18:26

schochastics requested review from maelle and krlmlr January 18, 2025 18:26

krlmlr reviewed Jan 18, 2025

View reviewed changes

R/indexing.R Outdated Show resolved Hide resolved

R/indexing.R Outdated Show resolved Hide resolved

R/indexing.R Outdated Show resolved Hide resolved

schochastics added 3 commits January 19, 2025 07:08

faster adjacency matrix quering

e48d50f

replaced clean_indices with as_igraph_vs

e5e828b

unify helper function for querying

44561c5

schochastics force-pushed the indexing branch from d64b7b4 to 44561c5 Compare January 19, 2025 06:10

schochastics changed the title ~~refactor: single bracket querying/manipulating of a graph (#1465)~~ refactor: single bracket querying of a graph (#1465) Jan 19, 2025

schochastics added 3 commits January 22, 2025 11:39

refactor handling of unique and edge_list creation

ff45667

added tests

f6912f8

remove purrr

56ba6e9

schochastics force-pushed the indexing branch from f4795c7 to 56ba6e9 Compare January 22, 2025 14:18

added tests for duplicated i/j

ece42d0

krlmlr reviewed Jan 23, 2025

View reviewed changes

schochastics marked this pull request as draft January 23, 2025 19:52

schochastics mentioned this pull request Jan 23, 2025

feat: get_edge_ids() accepts data frames and matrices #1663

Merged

schochastics and others added 2 commits January 23, 2025 21:45

make dense case as.matrix(sparse)

7227626

Merge branch 'main' into indexing

ff0508d

Merge branch 'main' into indexing

13dd05b

schochastics marked this pull request as ready for review January 27, 2025 19:53

krlmlr reviewed Jan 30, 2025

View reviewed changes

tests/testthat/test-indexing.R Show resolved Hide resolved

R/indexing.R Outdated Show resolved Hide resolved

R/indexing.R Outdated Show resolved Hide resolved

schochastics and others added 3 commits January 30, 2025 21:39

pulled duplication handling out of sumatrix function

e5baaa5

Merge branch 'main' into indexing

a94ea07

adjusted for new get_edge_ids()

39348a1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: single bracket querying of a graph (#1465) #1658

refactor: single bracket querying of a graph (#1465) #1658

schochastics commented Jan 17, 2025 •

edited

Loading

aviator-app bot commented Jan 17, 2025

krlmlr left a comment •

edited

Loading

schochastics commented Jan 19, 2025

schochastics commented Jan 22, 2025

maelle commented Jan 23, 2025

krlmlr left a comment

schochastics commented Jan 23, 2025

schochastics commented Jan 27, 2025

krlmlr left a comment

schochastics commented Jan 30, 2025

refactor: single bracket querying of a graph (#1465) #1658

Are you sure you want to change the base?

refactor: single bracket querying of a graph (#1465) #1658

Conversation

schochastics commented Jan 17, 2025 • edited Loading

[.igraph

aviator-app bot commented Jan 17, 2025

Current Aviator status

How to merge

krlmlr left a comment • edited Loading

Choose a reason for hiding this comment

schochastics commented Jan 19, 2025

schochastics commented Jan 22, 2025

maelle commented Jan 23, 2025

krlmlr left a comment

Choose a reason for hiding this comment

schochastics commented Jan 23, 2025

schochastics commented Jan 27, 2025

krlmlr left a comment

Choose a reason for hiding this comment

schochastics commented Jan 30, 2025

schochastics commented Jan 17, 2025 •

edited

Loading

`[.igraph`

krlmlr left a comment •

edited

Loading