-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.txt
executable file
·65 lines (47 loc) · 2.68 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
Code for paper "Actively Purchasing Data for Learning"
Jacob Abernethy
Yiling Chen
Chien-Ju Ho
Bo Waggoner <[email protected]>
-----------------------------------------
-- Overview
Using the MNIST handwritten digit dataset.
Pairs (datapoint, cost) arrive one by one.
The mechanism posts a price for that datapoint;
if higher than the cost, it pays the cost and obtains
the datapoint.
There is a budget constraint B.
After all the training data has arrived, the mechanism
outputs a hypothesis h, whose error rate is measured on
the test set.
------------------------------------------
-- Prerequisites
- python 2.7
- numpy
- scipy and scikit-learn (for downloading the data set conveniently)
- matplotlib for plotting results of simulations
-----------------------------------------
-- Running
$ python run.py
Manually edit the file "run.py" to control how many trials, the parameters, etc.
The first time you run it, downloads the MNIST training dataset to ./mldata/mnist-original.mat
After running, writes the results to the file "plot.py".
$ python plot.py
visualizes the results.
-----------------------------------------
-- Overview of the files
run.py: Do some number of trials for each mechanism, each method of generating costs, and each budget. Prints the data to plot.py, which can also be run as a python file to visualize the data.
run.py interacts with three types of classes: Algorithms (currently only gradient descent), Mechanisms (can be initialized with access to any algorithm), and Cost Generators (draw costs for a given data point according to some distribution).
Algorithms:
-- implement methods: reset(eta), test_error(X,Y), norm_grad_loss(x,y), data_update(x,y,importance_wt), null_update().
grad_desc.py: implementation of gradient descent algorithm.
Mechanisms:
-- implement methods: reset(eta, T, B, cmax=1.0), train_and_get_err(costs,Xtrain,Ytrain,Xtest,Ytest)
baseline.py: has unlimited budget, so buys every data point
naive.py: offer fixed price to each data point until budget is exhausted
rs12.py: implementation of generalization of RothSchoenebeck12 assuming that costs are uniform[0,cmax]
ours.py: our posted-price and learning mechanism
Cost-generating:
-- implementing methods normalize(num_examples[]), draw_cost(label)
unif_costgen.py: costs are uniform [0,1] marginal. Can specify some labels as "expensive" and others as "cheap", and then expensive label costs are uniform [a,1] and cheap labels uniform [0,a] for some a.
bernoulli_costgen.py: costs are free with probability 1-p and cost 1 with probability p. Can specify some labels as expensive and others as cheap, and then expensive labels will cost 1 when possible subject to the (p,1-p) distribution.