Decentralized Online Bandit Optimization on Directed Graphs with Regret Bounds

This repository contains the source code to generate the numerical results in the paper Decentralized Online Bandit Optimization on Directed Graphs with Regret Bounds. The simulation scenario is inspired by The AI economist. In particular, there are $M+1$ players where one player acts as a socio-economic planner and the remaining players act as workers. The game is played over T rounds and proceeds as follows over each round:

The socio-economic planner decides for a taxation policy
The workers observe the taxation policy and pick actions consecutively
Each worker action is mapped to an income and a labor cost
The net income of each worker is obtained by subtracting the tax collectedby the socio-economic planner
The worker utility is decided from the net income and the labor cost
The bandit reward is a weighted average of all worker utilities and the collected tax (all normalized to [0,1])
All the players observe the bandit reward and update their respective policies.

To generate Fig.3 in the paper, run:

python3 main.py --T 1000000 --K 100

This command will store the result in output.csv and generate the figure below.

Fig. 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Decentralized Online Bandit Optimization on Directed Graphs with Regret Bounds

Files

README.md

Latest commit

History

README.md

File metadata and controls

Decentralized Online Bandit Optimization on Directed Graphs with Regret Bounds