inlining-Oz-v1.1
This release changes two things about the model:
-
The training corpus: instead of training on the modules from a single build target, we trained on roughly half of all C++ modules in the google monorepo. The "half" qualifier is due to training time, we trained this model for over 3 days on nearly a thousand (virtual) machines, and training on all of the monorepo would at least double the required compute.
-
The training method: instead of PPO, we used ES [*] (which is unfortunately not currently open sourced). This does not change the format of the saved model, but we found that this training method was easier to scale to accommodate the larger corpus.
We have found that this model is significantly more generalizable than previous model releases. If you try the model on your own internal build targets, let us know how it does!
[*] Krzysztof Choromanski, Mark Rowland, Vikas Sindhwani, Richard E. Turner, Adrian Weller: "Structured Evolution with Compact Architectures for Scalable Policy Optimization", https://arxiv.org/abs/1804.02395