Providing near-human level accuracy on grading interactive student programs with meta-reinforcmeent learning.
Evan Zheran Liu*, Moritz Stephan*, Allen Nie, Chris Piech, Emma Brunskill, Chelsea Finn
Advances in Neural Information Processing Systems (NeurIPS), 2022. Selected as an Oral.
Also see our project web page and paper.
This code requires Python3.6+.
The Python3 requirements are specified in requirements.txt
.
We recommend creating a virtualenv
, e.g.:
virtualenv venv
source venv/bin/activate
pip install -r requirements.txt
To train a DreamGrader policy for a particular error type, invoke the following command:
python3 main.py {exp_name} -b instruction_agent.policy.type=\"classifier\" -c configs/default.json -c configs/bounce.json -c configs/bounce_bigger_train.json -b environment.error_type=\"{error_type}\" -s {seed}
This will create a directory experiments/exp_name
, which will contain:
- A tensorboard subdirectory at
experiments/exp_name/tensorboard
, which logs statistics, such as accumulated returns vs. number of training episodes, and also vs. number of training steps. - A visualization subdirectory at
experiments/exp_name/visualize
, which will contain videos of the learned agent. The videos prefixed withexploration
show the learned exploration behavior, while theinstruction
videos show the predicted bug label, but are not particularly informative. - A checkpoints subdirectory at
experiments/exp_name/checkpoints
, which will periodically save model checkpoints. - Metadata about the run, such as the configs used.
The error_type
argument specifies which error type to use. The values used in
the main experiments from the paper are:
whenGoal-noBallLaunch
: When the ball enters the goal, no ball is created.whenMiss-noOpponentScore
: When the ball hits the ground, the opponent's score does not increment.whenGoal-illegal-incrementOpponentScore
: When the ball enters the goal, the opponent's score increments.whenMove-error
: Either the left or right action do not appropriately move the paddle.whenPaddle-illegal-incrementPlayerScore
: When the paddle moves, the player's score increments.whenWall-illegal-incrementOpponentScore
: When the ball hits the wall, the opponent's score increments.whenGoal-illegal-bounceBall
: The ball bounces off the goal.whenRun-noBallLaunch
: No ball is launced at the beginning of the episode.
The seed
argument takes an integer value. The results in the paper used seeds
0, 1, and 2.
This example command will train a DreamGrader agent on the bug type where no ball is created after the ball enters the goal.
python3 main.py dream_grader_exp -b instruction_agent.policy.type=\"classifier\" -c configs/default.json -c configs/bounce.json -c configs/bounce_bigger_train.json -b environment.error_type=\"whenGoal-noBallLaunch\" -s 0
This code also releases Bounce as a new meta-RL task.
We recommend using the BounceMetaEnv
class from envs/bounce.py
.
This class creates Gym environments, which correspond to different (student)
programs. The class can either be configured to directly read from a csv file
of real student programs, or can create its own programs programmatically.
This is achieved by passing the appropriate config file to the load_config
method.
By default, the example command provided above configures reading and using
real student programs with the train / test split used in the paper.
After configuring the BounceMetaEnv
class, Gym environments corresponding to
different programs can be created with the create_env
method.
Here is an example usage:
import config as cfg
from envs import bounce
# Uses the default config of real student programs
config_filenames = ["configs/default.json", "configs/bounce.json",
"configs/bounce_bigger_train.json"]
config = cfg.Config.from_files_and_bindings(config_filenames, [])
# Configure the class
bounce.BounceMetaEnv.load_config(config.get("environment"))
# Creates Gym environments:
# - The provided random seed and test flag randomly select a student program
# from either the train or test split
seed = 0
test = False
env = bounce.BounceMetaEnv.create_env(seed, test=test)
If you use this code, please cite our paper.
@article{liu2022giving,
title={Giving Feedback on Interactive Student Programs with Meta-Exploration},
author={Liu, Evan Zheran and Stephan, Moritz and Nie, Allen and Piech, Chris and Brunskill, Emma and Finn, Chelsea},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
year={2022}
}