This repo includes code for running first- and second-order methods for minimax optimization in Newton-type Methods for Minimax Optimization, including gradient descent ascent (GDA), total gradient descent ascent (TGDA), follow-the-ridge (FR), gradient-descent Newton (GDN) and complete Newton (CN). We run these algorithms on tasks including estimation of a single Gaussian, learning a mixture of Gaussians, and generating digits on the MNIST 0/1 subset. GPU/CUDA required for running all our experiments.
model.py
contains different neural net architectures for various tasksdata.py
generates various datasetsrun.py
is the main python script for comparing different algorithmsutils.py
implements various helper functions including hessian-vector-product and conjugate gradient.
The following bash files include configurations to run various experiments. You can uncomment different parts to run different algorithms including GDA/TGDA/FR/GDN/CN.
bash_gaussian_mean.sh
: estimation of the mean of a Gaussianbash_gaussian_covariance.sh
: estimation of the covariance of a Gaussianbash_gmm.sh
: learning a mixture of Gaussiansbash_gmm.sh
: learning to generate digits in MNIST 0/1 dataset
The folder ./checkpoints
includes two pretrained models for GMM and MNIST separately. We used the model trained by GDA as initialization.
plot_gmm.py
: plot file for visualizing the Gaussian mixtures generated by different algorithmsplot_mnist.py
: plot file for visualizing the digits generated generated by different algorithms
If you want to use our code, please add the following reference:
@inproceedings{zhang2020newton,
title={Newton-type methods for minimax optimization},
author={Zhang, Guojun and Wu, Kaiwen and Poupart, Pascal and Yu, Yaoliang},
booktitle={ICML Workshop on Beyond first-order methods in ML systems},
year={2021},
url={https://arxiv.org/abs/2006.14592}
}