-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathnotes.txt
145 lines (104 loc) · 8.61 KB
/
notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
This project contains code performing matrix factorizations in a parallel and distributed fashion.
1. Related publications (pdf files can be found in folder publications):
R. Gemulla, P. J. Haas, Y. Sismanis, C. Teflioudi, F. Makari
Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent.
In NIPS 2011 Biglearn workshop, 2011.
C. Teflioudi, F. Makari, R. Gemulla
Distributed Matrix Completion.
In ICDM, 2012.
F. Makari, C. Teflioudi, R. Gemulla, P. J. Haas, Y. Sismanis
Shared-Memory and Shared-Nothing Stochastic Gradient Descent Algorithms for Matrix Completion.
In KAIS, 2013
2. Installation
Follow the instructions in file INSTALL
3. How to create toy synthetic data
cd to build/tools)
./generateSyntheticData
The script generateSyntheticData will generate in /tmp:
- a matrix with training data: train.mmc
- a matrix with test data: test.mmc
- 2 initial factor matrices: W.mma and H.mma
4. Examples
You can find examples of scripts using different methods in the folder examples/mf
5. Tools
In the folder build/tools you can find tools that can run factorization from the command line. The usual parameters are:
--input-file: the matrix with the training data. E.g., /tmp/train.mmc
--input-test-file: the matrix with the test data. E.g., /tmp/test.mmc
--input-row-file: the matrix with the initial row factors. E.g., /tmp/W.mma
--input-col-file: the matrix with the initial column factors. E.g., /tmp/H.mma
--tasks-per-rank: the number of threads to work per machine. E.g., 2
--epochs: the number of iterations (passes over all the training data points). E.g.: 20
--update: the update that needs to be performed. "Nzsl_Nzl2(1)" means Non-Zero-Squared-Loss with weighted L2 regularization and regularization parameter=1 as objective function.
--loss: the loss function to report after each epoch. E.g., "Nzsl_Nzl2(1)"
--regularize: This is experimental. Please always keep it ="None"
--rank: the rank of the factorization. E.g, 10, 50, 100
--decay: the step size selection mechanism. E.g. "BoldDriver(0.01)" will use BoldDriver (plese refer to our publication for a reference on how BoldDriver works) with initial step size = 0.01
Example invocations:
DSGD++
To run DSGD++ locally with 2 threads
./mfdsgdpp --input-file=/tmp/train.mmc --input-test-file=/tmp/test.mmc --input-row-file=/tmp/W.mma --input-col-file=/tmp/H.mma --tasks-per-rank=2 --rank=10 --update="Nzsl_Nzl2(1)" --regularize="None" --loss="Nzsl_Nzl2(1)" --truncate="(-100,100)" --decay="BoldDriver(0.01)" --epochs=3
To use MPI to distribute on many machines (substitute localhost with the machine names)
mpirun --hosts localhost,localhost ./mfdsgdpp --input-file=/tmp/train.mmc --input-test-file=/tmp/test.mmc --input-row-file=/tmp/W.mma --input-col-file=/tmp/H.mma --tasks-per-rank=2 --rank=10 --update="Nzsl_Nzl2(1)" --regularize="None" --loss="Nzsl_Nzl2(1)" --truncate="(-100,100)" --decay="BoldDriver(0.01)"
PSGD (HOGWILD) with weighted L2 regularization (can run parallel but not distributed)
./psgdNZL2NoLock --input-file=/tmp/train.mmc --input-test-file=/tmp/test.mmc --input-row-file=/tmp/W.mma --input-col-file=/tmp/H.mma --tasks-per-rank=2
ALS (can run both parallel and distributed)
mpirun --hosts localhost,localhost ./mfdap --input-file=/tmp/train.mmc --input-test-file=/tmp/test.mmc --input-row-file=/tmp/W.mma --input-col-file=/tmp/H.mma --tasks-per-rank=1 --loss="Nzsl_Nzl2(1)" --epochs=3
ASGD (can run distributed. Its parallel version is equal to PSGD)
mpirun --hosts localhost,localhost ./mfasgd --input-file=/tmp/train.mmc --input-test-file=/tmp/test.mmc --input-row-file=/tmp/W.mma --input-col-file=/tmp/H.mma --tasks-per-rank=1 --update="Nzsl_Nzl2(1)" --regularize="None" --loss="Nzsl_Nzl2(1)" --truncate="(-100,100)" --decay="BoldDriver(0.01)" --rank=10
CSGD (can run parallel but not distributed)
./stratified-psgd --input-file=/tmp/train.mmc --input-test-file=/tmp/test.mmc --input-row-file=/tmp/W.mma --input-col-file=/tmp/H.mma --tasks-per-rank=2 --eps0=0.001 --cache=256
6. Generation of very large data matrices on the fly.
Use the tool mfcreateRandomMatrixFile. This tool creates a file descriptor for generating synthetic matrices on the fly in a parallel or distributed manner.
For a given experiment you will need 2 such files: (i) a file containing the original factors + the data matrix + the test matrix (optionally)
(ii) a file containing the starting points (initial factors).
E.g.:
./mfcreateRandomMatrixFile --size1=1000000 --size2=1000000 --nnz=10000000 --nnzTest=1000000 --rank=10 --values="Normal(0,10)" --noise="Normal(0,1)" --blocks1=10 --blocks2=10 --output-file=/tmp/synthetic.rm
Will create (i) and
./mfcreateRandomMatrixFile --size1=1000000 --size2=1000000 --rank=10 --values="Uniform(-1,1)" --blocks1=10 --blocks2=10 --output-file=/tmp/syntheticFactors.rm
will create (ii).
We can then call DSGDpp as:
mpirun --hosts localhost,localhost ./mfdsgdpp --input-file=/tmp/synthetic.rm --input-test-file=/tmp/synthetic.rm --input-row-file=/tmp/syntheticFactors.rm --input-col-file=/tmp/syntheticFactors.rm --tasks-per-rank=2 --rank=10 --update="Nzsl_Nzl2(1)" --regularize="None" --loss="Nzsl_Nzl2(1)" --truncate="(-100,100)" --decay="BoldDriver(0.01)"
Before creating a file take into consideration the following:
(1) parallelization happens in row-chunk manner. Please make sure that the blocks1 dimension is enough fine-grained, so that the threads of each node will have enough row-blocks to work on.
(2) make sure that the blocks1, blocks2 values are at least as much as the most fine-blocked matrix that you want your file to be able to create.
(3) blocks_in_file mod blocks_of_the_matrix = 0.
(4) rows/columns_of_the_matrix mod blocks_in_file = 0.
7. Experiments with Syn1B-sq.
7a. Generation of the matrices:
./mfcreateRandomMatrixFile --size1=3379200 --size2=3072000 --nnz=1000000000 --nnzTest=1000000 --rank=50 --values="Normal(0,3.1622)" --noise="Normal(0,1)" --blocks1=256 --blocks2=256 --output-file=/tmp/synthetic.rm
notice that the blocks1 and blocks2 should support the largest stratification that you will use (see paragraph 6)
./mfcreateRandomMatrixFile --size1=3379200 --size2=3072000 --rank=50 --values="Uniform(-0.5,0.5)" --blocks1=256 --blocks2=256 --output-file=/tmp/syntheticFactors.rm
7b. Experiments:
e.g., DSGD++ 4 machines with 8 threads each
mpirun --hosts host1,host2,host3,host4 \
./mfdsgdpp --input-file=/tmp/synthetic.rm --input-row-file=/tmp/syntheticFactors.rm --input-col-file=/tmp/syntheticFactors.rm \
--tasks-per-rank=8 --rank=50 --update="Nzsl" --regularize="None" --loss="Nzsl" --truncate="(-100,100)" --decay="BoldDriver(0.000625)" --epochs=60 --stratum-order="WOR" \
--trace /tmp/synthetic.dsgdpp4x8.trace.R --trace-var synthetic.dsgdpp4x8
e.g., ALS 4 machines with 8 threads each
mpirun --hosts host1,host2,host3,host4 \
./mfdap --input-file=/tmp/synthetic.rm --input-row-file=/tmp/syntheticFactors.rm --input-col-file=/tmp/syntheticFactors.rm \
--tasks-per-rank=8 --loss="Nzsl" --epochs=60 \
--trace /tmp/synthetic.dals4x8.trace.R --trace-var synthetic.dals4x8
e.g., ASGD 4 machines with 8 threads each
mpirun --hosts host1,host2,host3,host4 \
./mfasgd --input-file=/tmp/synthetic.rm --input-row-file=/tmp/syntheticFactors.rm --input-col-file=/tmp/syntheticFactors.rm \
--tasks-per-rank=8 --update="Nzsl" --regularize="None" --loss="Nzsl" --truncate="(-100,100)" --decay="BoldDriver(0.000625)" --rank=50 --epochs=60 --average-deltas=1 \
--trace /tmp/synthetic.asgd4x8.trace.R --trace-var synthetic.asgd4x8
e.g., DSGD MapReduce-style 4 machines with 8 threads each
mpirun --hosts host1,host2,host3,host4 \
./mfdsgd --input-file=/tmp/synthetic.rm --input-row-file=/tmp/syntheticFactors.rm --input-col-file=/tmp/syntheticFactors.rm \
--tasks-per-rank=8 --rank=50 --update="Nzsl" --regularize="None" --loss="Nzsl" --truncate="(-100,100)" --decay="BoldDriver(0.000625)" --epochs=60 --map-reduce=1 --stratum-order="WOR" \
--trace /tmp/synthetic.dsgdMR4x8.trace.R --trace-var synthetic.dsgdMR4x8
e.g. PSGD with 8 threads
./psgdNZL2NoLock --input-file=/tmp/synthetic.rm --input-row-file=/tmp/syntheticFactors.rm --input-col-file=/tmp/syntheticFactors.rm \
--tasks-per-rank=8 --lambda=0 --eps0=0.000625 --epochs=60 --shuffle="parAdd"
e.g. PSGD with locking with 8 threads
./psgdNZL2Lock --input-file=/tmp/synthetic.rm --input-row-file=/tmp/syntheticFactors.rm --input-col-file=/tmp/syntheticFactors.rm \
--tasks-per-rank=8 --lambda=0 --eps0=0.000625 --epochs=60 --shuffle="parAdd"
e.g. CSGD with 8 threads
./stratified-psgd --input-file=/tmp/synthetic.rm --input-row-file=/tmp/syntheticFactors.rm --input-col-file=/tmp/syntheticFactors.rm \
--tasks-per-rank=8 --eps0=0.000625 --cache=256 --epochs=60 --lambda=0
8. Contributors
Rainer Gemulla
Faraz Makari
Christina Teflioudi