-
Notifications
You must be signed in to change notification settings - Fork 0
HarvardEconCS/active-data-buying
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Code for paper "Actively Purchasing Data for Learning" Jacob Abernethy Yiling Chen Chien-Ju Ho Bo Waggoner <[email protected]> ----------------------------------------- -- Overview Using the MNIST handwritten digit dataset. Pairs (datapoint, cost) arrive one by one. The mechanism posts a price for that datapoint; if higher than the cost, it pays the cost and obtains the datapoint. There is a budget constraint B. After all the training data has arrived, the mechanism outputs a hypothesis h, whose error rate is measured on the test set. ------------------------------------------ -- Prerequisites - python 2.7 - numpy - scipy and scikit-learn (for downloading the data set conveniently) - matplotlib for plotting results of simulations ----------------------------------------- -- Running $ python run.py Manually edit the file "run.py" to control how many trials, the parameters, etc. The first time you run it, downloads the MNIST training dataset to ./mldata/mnist-original.mat After running, writes the results to the file "plot.py". $ python plot.py visualizes the results. ----------------------------------------- -- Overview of the files run.py: Do some number of trials for each mechanism, each method of generating costs, and each budget. Prints the data to plot.py, which can also be run as a python file to visualize the data. run.py interacts with three types of classes: Algorithms (currently only gradient descent), Mechanisms (can be initialized with access to any algorithm), and Cost Generators (draw costs for a given data point according to some distribution). Algorithms: -- implement methods: reset(eta), test_error(X,Y), norm_grad_loss(x,y), data_update(x,y,importance_wt), null_update(). grad_desc.py: implementation of gradient descent algorithm. Mechanisms: -- implement methods: reset(eta, T, B, cmax=1.0), train_and_get_err(costs,Xtrain,Ytrain,Xtest,Ytest) baseline.py: has unlimited budget, so buys every data point naive.py: offer fixed price to each data point until budget is exhausted rs12.py: implementation of generalization of RothSchoenebeck12 assuming that costs are uniform[0,cmax] ours.py: our posted-price and learning mechanism Cost-generating: -- implementing methods normalize(num_examples[]), draw_cost(label) unif_costgen.py: costs are uniform [0,1] marginal. Can specify some labels as "expensive" and others as "cheap", and then expensive label costs are uniform [a,1] and cheap labels uniform [0,a] for some a. bernoulli_costgen.py: costs are free with probability 1-p and cost 1 with probability p. Can specify some labels as expensive and others as cheap, and then expensive labels will cost 1 when possible subject to the (p,1-p) distribution.
About
Code for simulations in paper "Actively Purchasing Data for Learning" by Abernethy, Chen, Ho, Waggoner
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published