Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: single bracket querying of a graph (#1465) #1658

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

schochastics
Copy link
Contributor

@schochastics schochastics commented Jan 17, 2025

This PR refactors single bracket querying of a graph (g[1:3,4:6]) ( #1465).

[.igraph

In the old version, the complete adjacency matrix was computed and then a subset created. The refactored function now builds the submatrix directly. This leads to a little speedup and a lower memory footprint.

set.seed(411)
g <- sample_gnp(5000,0.1)
bench::mark(
  check = FALSE,
  new = igraph:::get_adjacency_submatrix(g,1:100,1:100),
  old = as_adjacency_matrix(g)[1:100,1:100]
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 new          1.84ms   1.94ms    452.      4.26MB     24.0
#> 2 old         61.36ms 118.39ms      9.11   210.4MB     38.3

bench::mark(
  check = FALSE,
  new = igraph:::get_adjacency_submatrix(g,i = 1:100,j = 1:100,sparse = FALSE),
  old = as_adjacency_matrix(g,sparse = FALSE)[1:100,1:100]
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 new          1.69ms   1.81ms     462.     3.31MB     5.99
#> 2 old         59.95ms   61.1ms      16.0  190.81MB    16.0

E(g)$weight <- runif(ecount(g))
bench::mark(
  check = FALSE,
  new = igraph:::get_adjacency_submatrix(g,1:100,1:100),
  old = as_adjacency_matrix(g)[1:100,1:100]
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 new          1.88ms   1.99ms     451.      3.2MB     7.98
#> 2 old         55.34ms  62.09ms      14.5   209.7MB    23.6

bench::mark(
  check = FALSE,
  new = igraph:::get_adjacency_submatrix(g,i = 1:100,j = 1:100,sparse = FALSE),
  old = as_adjacency_matrix(g,sparse = FALSE)[1:100,1:100]
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 new           1.7ms    1.8ms     432.     3.31MB     8.00
#> 2 old          56.7ms   57.8ms      13.8  190.81MB    13.8

Created on 2025-01-18 with reprex v2.1.1

Copy link
Contributor

aviator-app bot commented Jan 17, 2025

Current Aviator status

Aviator will automatically update this comment as the status of the PR changes.
Comment /aviator refresh to force Aviator to re-examine your PR (or learn about other /aviator commands).

This pull request is currently open (not queued).

How to merge

To merge this PR, comment /aviator merge or add the mergequeue label.


See the real-time status of this PR on the Aviator webapp.
Use the Aviator Chrome Extension to see the status of your PR within GitHub.

@schochastics schochastics marked this pull request as draft January 17, 2025 12:37
@schochastics schochastics changed the title refactor: speedup single bracket querying of a graph (#1465) refactor: single bracket querying/manipulating of a graph (#1465) Jan 17, 2025
@schochastics schochastics marked this pull request as ready for review January 18, 2025 18:26
@schochastics schochastics requested review from maelle and krlmlr January 18, 2025 18:26
Copy link
Contributor

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I see how duplicate i and j indexes add complexity to the new get_adjacency_submatrix() routine. How about the following logic:

  • we don't compute unique()
  • instead, we compute adj_out <- adjacent_vertices(x, i, mode = "out") if i is given, and adj_in <- adjacent_vertices(x, j, mode = "in") if j is given
  • if none are given, we forward to a different existing routine
  • if only one of i or j is given, we're done
  • if both are given, we compute vctrs::vec_set_intersect(adj_in, adj_out)

How is the test coverage for this code?

I'd appreciate it if all changes that do not rely on get_adjacency_submatrix() came in one or several separate PRs. I'd like to do a few more iterations here.

R/indexing.R Outdated Show resolved Hide resolved
R/indexing.R Outdated Show resolved Hide resolved
R/indexing.R Outdated Show resolved Hide resolved
@schochastics schochastics changed the title refactor: single bracket querying/manipulating of a graph (#1465) refactor: single bracket querying of a graph (#1465) Jan 19, 2025
@schochastics
Copy link
Contributor Author

Manipulating a graph via this logic was moved to #1661

@schochastics
Copy link
Contributor Author

Thanks. I see how duplicate i and j indexes add complexity to the new get_adjacency_submatrix() routine. How about the following logic:

* we don't compute `unique()`

* instead, we compute `adj_out <- adjacent_vertices(x, i, mode = "out")` if `i` is given, and `adj_in <- adjacent_vertices(x, j, mode = "in")` if `j` is given

* if none are given, we forward to a different existing routine

* if only one of `i` or `j` is given, we're done

* if both are given, we compute `vctrs::vec_set_intersect(adj_in, adj_out)`

How is the test coverage for this code?

I'd appreciate it if all changes that do not rely on get_adjacency_submatrix() came in one or several separate PRs. I'd like to do a few more iterations here.

I have tried this logic but always ran into issues for the case of non unique indices.
I tried to simplify a few things in the current solution. If you still think it is too complex, I am happy to give this logic another try

@maelle
Copy link
Contributor

maelle commented Jan 23, 2025

Copy link
Contributor

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get.adjacency.sparse() with a directed graph should involve only one or two copies. We could use that result with regular matrix subsetting and then tweak for the directed case. While this is not ideal, it may well be faster than anything we can come up in R land. Further optimizations are then possible by adding a from or to argument to as_edgelist() (which is called by get.adjacency.sparse()).

@schochastics
Copy link
Contributor Author

I am surprised myself, but the difference between the submatrix routine and get.adjacency.sparse() is quite big. I will try a bit more though to simplify/split the submatrix routine to make it more readable (without too much loss of performance).

pkgload::load_all("~/git/R_packages/rigraph/")
#> ℹ Loading igraph
g <- sample_gnp(5000,0.05, directed = FALSE)

bench::mark(check = FALSE,
  sub_sparse = get_adjacency_submatrix(g,1:100,sparse = TRUE),
  sub_dense = get_adjacency_submatrix(g,1:100,sparse = FALSE),
  full_sparse = as_adjacency_matrix(g,sparse = TRUE)[1:100,]
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 3 × 6
#>   expression       min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 sub_sparse    2.01ms   2.13ms     401.     5.05MB     53.9
#> 2 sub_dense     2.22ms   2.44ms     299.     8.43MB     53.8
#> 3 full_sparse  31.91ms  33.76ms      22.9  105.47MB     66.8

g <- sample_gnp(5000,0.05, directed = TRUE)

bench::mark(check = FALSE,
  sub_sparse = get_adjacency_submatrix(g,1:100,sparse = TRUE),
  sub_dense = get_adjacency_submatrix(g,1:100,sparse = FALSE),
  full_sparse = as_adjacency_matrix(g, sparse = TRUE)[1:100,]
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 3 × 6
#>   expression       min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 sub_sparse    2.03ms   2.15ms     415.     3.97MB     24.0
#> 2 sub_dense     2.21ms   2.39ms     340.     7.79MB     45.8
#> 3 full_sparse  42.55ms   49.4ms      15.1   129.2MB     35.3

Created on 2025-01-23 with reprex v2.1.1

@schochastics schochastics marked this pull request as ready for review January 27, 2025 19:53
@schochastics
Copy link
Contributor Author

I am now convinced that this is as good as it gets. Any other approach seems to loose too mach performance

Copy link
Contributor

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great progress, we're getting there!

tests/testthat/test-indexing.R Show resolved Hide resolved
R/indexing.R Outdated Show resolved Hide resolved
R/indexing.R Outdated Show resolved Hide resolved
@schochastics
Copy link
Contributor Author

Non unique i/j are now handled outside of get_adjacency_submatrix(). I think this is indeed very much cleaner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants