Skip to content

florapril/Multiplying-Sparse-Matrices-with-MapReduce

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multiplying-Sparse-Matrices-with-MapReduce

There are two big sparse matrices M and N (each is 100k * 10k), compute the multiplication of them by using MapReduce. Matrices are stored with the following format:
<i><TAB><j><TAB><mij>
<j><TAB><k><TAB><njk>
If you want to install Hadoop cluster by yoursrlf, you can refer to my tutorial (https://zhuanlan.zhihu.com/p/58968191)
I use Hadoop streaming with Python to solve this problem. There are three MapReduce jobs in total.

Stage 1

mappper: add tag to matrices M and N so as to differentiate them. Swap i and j in M. Output is
<j><TAB><m><TAB><i><TAB><mij>
<j><TAB><n><TAB><k><TAB><njk>
reducer: no reducer in this stage

Stage 2

mapper: identity mapper reducer: Cartesian product of record with the same j from M and N matrices, output is
< i,k ><TAB>< mij * njk>

Stage 3

mapper: identity mapper reducer: sum record with same key (i,k)

About

By using Hadoop

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages