This package contains a data structure that wraps a matrix of matrices or factorizations and acts like the matrix resulting from concatenating the input matrices (see below).
Notably, this allows the use of canonical linear algebra routines that just need to access mul!
or *
without special consideration for the block structure.
The structure contained herein differentiates itself from BlockArrays.jl
in that it allows the blocks to be AbstractMatrix
or Factorization
types, therefore allowing the exploitation of more general matrix structure for efficient matrix multiplications.
See the Wikipedia article for more information on block matrices.
Say we need to work with a block matrix A
like the following
using LinearAlgebra
d = 512
n, m = 3, 4
A = [randn(d, d) for i in 1:n, j in 1:m]
Then Julia already allows for convenient syntax for matrix-vector multiplication, by using a vector of vectors like so
using BenchmarkTools
x = [randn(d) for _ in 1:m]
y = [zeros(d) for _ in 1:n]
@btime A*x
12.652 ms (321 allocations: 1.29 MiB)
@btime mul!(y, A, x);
12.700 ms (320 allocations: 1.29 MiB)
However, we see that even mul!
allocates significant memory since mul!
is not applied recursively.
Instead *
is used to multiply the vector elements.
With this package, we can wrap A
to act like a normal matrix for multiplication:
using BlockFactorizations
B = BlockFactorization(A)
x = randn(d*m)
y = randn(d*n)
@btime B*x;
11.589 ms (30 allocations: 67.12 KiB)
@btime mul!(y, B, x);
11.691 ms (28 allocations: 3.05 KiB)
Notably, this allows us to use canonical linear algebra routines that just need to access mul!
or *
without special consideration for the block structure.
Instead of having dense blocks, we can have more general block types, which can allow for more efficient matrix multiplication:
A = [Diagonal(randn(d)) for i in 1:n, j in 1:m]
B = BlockFactorization(A)
x = [randn(d) for _ in 1:m]
y = [zeros(d) for _ in 1:n]
@btime A*x
117.001 μs (321 allocations: 1.29 MiB)
@btime mul!(y, A, x);
127.356 μs (320 allocations: 1.29 MiB)
x = randn(d*m)
y = zeros(d*n)
@btime B*x;
58.703 μs (158 allocations: 81.12 KiB)
@btime mul!(y, B, x);
45.743 μs (156 allocations: 17.05 KiB)
Note that as long as mul!
and size
is defined for the element types, BlockFactorization
will be applicable.
Right above, we saw that BlockFactorizations supports Diagonal
elements.
Further, it also specializes matrix multiplication in case of a block diagonal matrix:
d = 512
n = 16
# testing Diagonal BlockFactorization
A = Diagonal([(randn(d, d)) for i in 1:n])
B = BlockFactorization(A)
# this doesn't work in 1.6.2
# x = [zeros(d) for _ in 1:n]
# @btime A*x;
x = randn(d*n)
y = zeros(d*n)
@btime B*x;
1.005 ms (30 allocations: 67.45 KiB)
@btime mul!(y, B, x);
989.163 μs (28 allocations: 3.38 KiB)
Note that the matrix multiplication function is parallelized with @threads
.
Also, a BlockFactorization B
can be converted to a non-blocked matrix using Matrix(B)
.
Reported timings were computed on a 2017 MacBook Pro with a dual core processor and 16GB of RAM.