controls described in the paper
eta affects saturation
rescale threshold ... highly variable depending on model, experiment
momentum between -0.75 and -0.25 recommended
The paper mostly compares APG with too-high CFG, which I'm not sure is all that useful - we already know high CFG burns badly, so a better comparison would be between APG and reasonable CFG to see if we really do get better quality, better prompt adherence, etc. Anyway, it does something.