-
Notifications
You must be signed in to change notification settings - Fork 46
Matrix multiply algorithm
Myasuka edited this page Oct 19, 2014
·
2 revisions
The algorithm we use in the two large distributed matrices multiplication has three aproaches.
- The first one refers to the HAMA 0.1, here you can find more about the algorithm.
Basically, the algorithm is to split a large matrix into smaller ones, and after multiply these submatrices, combine them together.
It's MapReduce version interpretation can be seen below:
We collect the blocks to 'collectionTable' firstly using map/reduce. Rows are named as c(i, j) with sequential number ((N^2 * i) + ((j * N) + k) to avoid duplicated records. CollectionTable:
matrix A matrix B
------------------------+-------------------------------
block(0, 0)-0 block(0, 0) block(0, 0)
block(0, 0)-1 block(0, 1) block(1, 0)
block(0, 0)-2 block(0, 2) block(2, 0)
... N ...
block(N-1, n-1)-(N^3-1) block(N-1, N-1) block(N-1, N-1)
Each row has a two sub matrices of a(i, k) and b(k, j) so that minimized data movement and network cost. Blocking jobs:
Collect the blocks to 'collectionTable' from A and B.
- A map task receives a row n as a key, and vector of each row as its value
- emit (blockID, sub-vector) pairs
- Reduce task merges block structures based on the information of blockID
Multiplication job:
- A map task receives a blockID n as a key, and two sub-matrices of A and B as its value
- Multiply two sub-matrices: a[i][j] * b[j][k]
- Reduce task computes sum of blocks
- c[i][k] += multiplied blocks
We transform these two steps into Spark.
- We refer to the CARMA algorithm, and redesign another algorithm.
- The last one is based on the idea that broadcasts one of the matrices, and split another matrix into blocks.