Skip to content

Latest commit

 

History

History
233 lines (230 loc) · 110 KB

README.md

File metadata and controls

233 lines (230 loc) · 110 KB

Awesome-Optimizer

A collection of optimizer-related papers and code.

For the last column, we let GD for Gradient Descent, S for second-order (quasi-newton) methods, E for evolutionary, GF for gradient free, VR for variance reduced.

Title Year Optimizer Published Code
AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for Preconditioning Matrix 2023 AGD neurips'23 pytorch GD,S
AdaLomo: Low-memory Optimization with Adaptive Learning Rate 2023 AdaLOMO arxiv pytorch GD
Large Language Models as Optimizers 2023 OPRO arxiv python llm
Promoting Exploration in Memory-Augmented Adam using Critical Momenta 2023 Adam+CM arxiv pytorch GD
CAME: Confidence-guided Adaptive Memory Efficient Optimization 2023 CAME acl'23 pytorch GD
Full Parameter Fine-tuning for Large Language Models with Limited Resources 2023 LOMO arxiv pytorch GD
Prodigy: An Expeditiously Adaptive Parameter-Free Learner 2023 Prodigy arxiv pytorch GD
DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method 2023 DoWG neurips'23 GD
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training 2023 Sophia arxiv pytorch GD
UAdam: Unified Adam-Type Algorithmic Framework for Non-Convex Stochastic Optimization 2023 UAdam arxiv GD
Sharpness-Aware Minimization Revisited: Weighted Sharpness as a Regularization Term 2023 WSAM kdd'23 pytorch GD
DP-Adam: Correcting DP Bias in Adam's Second Moment Estimation 2023 DP-Adam iclr-W'23 GD
An Adam-enhanced Particle Swarm Optimizer for Latent Factor Analysis 2023 ADHPL arxiv E
DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule 2023 DoG icml'23 pytorch GD
FOSI: Hybrid First and Second Order Optimization 2023 FOSI HPI'23 jax GD,S
Symbolic Discovery of Optimization Algorithms 2023 Lion neurips'23 jax, tf, pytorch GD
Amos: An Adam-style Optimizer with Adaptive Weight Decay towards Model-Oriented Scale 2022 Amos arxiv jax GD
VeLO: Training Versatile Learned Optimizers by Scaling Up 2022 VeLO arxiv jax GD
Grad-GradaGrad? A Non-Monotone Adaptive Stochastic Gradient Method 2022 GradaGrad arxiv GD
CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU 2022 CowClip aaai'23 tf GD
Smooth momentum: improving lipschitzness in gradient descent 2022 Smooth Momentum APIN GD
Towards Better Generalization of Adaptive Gradient Methods 2020 SAGD neurips'20 GD
An Improved Adaptive Optimization Technique for Image Classification 2020 Mean-ADAM ICIEV GD
SCW-SGD: Stochastically Confidence-Weighted SGD 2020 SCWSGD ICIP GD
Slime mould algorithm: A new method for stochastic optimization 2020 SMA FGCS code E
Ranger-Deep-Learning-Optimizer 2020 Ranger github pytorch GD
pbSGD: Powered Stochastic Gradient Descent Methods for Accelerated Non-Convex Optimization 2020 pbSGD ijcai'20 pytorch GD
A Variant of Gradient Descent Algorithm Based on Gradient Averaging 2020 Grad-Avg arxiv GD
Stochastic Gradient Descent with Nonlinear Conjugate Gradient-Style Adaptive Momentum 2020 FRSGD arxiv GD
CADA: Communication-Adaptive Distributed Adam 2020 CADA arxiv pytorch, matlab GD
Eigenvalue-corrected Natural Gradient Based on a New Approximation 2020 TEKFAC arxiv GD
SMG: A Shuffling Gradient-Based Method with Momentum 2020 SMG icml'21 GD
SALR: Sharpness-aware Learning Rate Scheduler for Improved Generalization 2020 SALR TNNLS GD
Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering 2020 MEKA neurips-W'21 GD
Mixing ADAM and SGD: a Combined Optimization Method 2020 MAS arxiv pytorch GD
EAdam Optimizer: How ε Impact Adam 2020 EAdam arxiv pytorch GD
Adam+: A Stochastic Method with Adaptive Variance Reduction 2020 Adam+ arxiv GD
Sharpness-aware Minimization for Efficiently Improving Generalization 2020 SAM iclr'21 jax GD
Expectigrad: Fast Stochastic Optimization with Robust Convergence Properties 2020 Expectigrad arxiv tf GD
AEGD: Adaptive Gradient Descent with Energy 2020 AEGD AIMS pytorch GD
Adam with Bandit Sampling for Deep Learning 2020 Adambs arxiv GD
AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients 2020 AdaBelief neurips'20 pytorch GD
Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization 2020 Apollo[W] arxiv pytorch GD,S
S-SGD: Symmetrical Stochastic Gradient Descent with Weight Noise Injection for Reaching Flat Minima 2020 S-SGD arxiv GD
Gravilon: Applications of a New Gradient Descent Method to Machine Learning 2020 Gravilon arxiv GD
PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization 2020 PAGE icml'21 GD
Adaptive Gradient Methods for Constrained Convex Optimization and Variational Inequalities 2020 Ada{ACSA,AGD+} aaai'21 GD
Stochastic Normalized Gradient Descent with Momentum for Large Batch Training 2020 SNGM arxiv GD
AdaScale SGD: A User-Friendly Algorithm for Distributed Training 2020 AdaScale icml'21 GD
Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization 2020 PSTorm JOTA GD
MTAdam: Automatic Balancing of Multiple Training Loss Terms 2020 MTAdam acl'21 pytorch GD
AdaSGD: Bridging the gap between SGD and Adam 2020 AdaSGD arxiv GD
AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights 2020 AdamP iclr'21 pytorch GD
Accelerated Large Batch Optimization of BERT Pretraining in 54 minutes 2020 LANS arxiv pytorch GD
AdaSwarm: Augmenting Gradient-Based optimizers in Deep Learning with Swarm Intelligence 2020 AdaSwarm TETC pytorch E
Enhance Curvature Information by Structured Stochastic Quasi-Newton Methods 2020 SKQN,S4QN cvpr'21 GD
Adaptive Gradient Methods Can Be Provably Faster than SGD after Finite Epochs 2020 SHAdaGrad arxiv GD
A New Accelerated Stochastic Gradient Method with Momentum 2020 SGDM arxiv GD
Practical Quasi-Newton Methods for Training Deep Neural Networks 2020 K-BFGS[(L)] neurips'20 pytorch GD
AdaS: Adaptive Scheduling of Stochastic Gradients 2020 AdaS cvpr'22 pytorch GD
Adai: Separating the Effects of Adaptive Learning Rate and Momentum Inertia 2020 Adai icml'22 pytorch GD
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning 2020 ADAHESSIAN aaai'21 pytorch GD
Momentum with Variance Reduction for Nonconvex Composition Optimization 2020 MVRC-[1,2] arxiv GD
CoolMomentum: A Method for Stochastic Optimization by Langevin Dynamics with Simulated Annealing 2020 CoolMomentum arxiv tf, pytorch GD
Gradient Centralization: A New Optimization Technique for Deep Neural Networks 2020 GC eccv'20 pytorch, tf GD
AdaX: Adaptive Gradient Descent with Exponential Long Term Memory 2020 AdaX[-W] arxiv pytorch GD
Weak and Strong Gradient Directions: Explaining Memorization, Generalization, and Hardness of Examples at Scale 2020 RM3 arxiv tf GD
TAdam: A Robust Stochastic Gradient Optimizer 2020 TAdam arxiv pytorch GD
Iterative Averaging in the Quest for Best Test Error 2020 Gadam arxiv GD
On the distance between two neural networks and the stability of learning 2020 Fromage neurips'20 pytorch GD
Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent 2020 SRSGD arxiv pytorch GD
Stochastic Runge-Kutta methods and adaptive SGD-G2 stochastic gradient descent 2020 SGD-G2 arxiv GD
LaProp: Separating Momentum and Adaptivity in Adam 2020 LaProp arxiv pytorch GD
Compositional ADAM: An Adaptive Compositional Solver 2020 C-ADAM arxiv GD
Biased Stochastic Gradient Descent for Conditional Stochastic Optimization 2020 BSGD arxiv GD
On the Trend-corrected Variant of Adaptive Stochastic Optimization Methods 2020 AdamT ijcnn'20 pytorch GD
Efficient Learning Rate Adaptation for Convolutional Neural Network Training 2019 e-AdLR ijcnn'19 GD
ProxSGD: Training Structured Neural Networks under Regularization and Constraints 2019 ProxSGD iclr'20 tf GD
An Adaptive Optimization Algorithm Based on Hybrid Power and Multidimensional Update Strategy 2019 AdaHMG ieee GD
signSGD via Zeroth-Order Oracle 2019 ZO-signSGD iclr'19 GF
Fast DENSER: Efficient Deep NeuroEvolution 2019 F-DENSER arxiv tf E
Adathm: Adaptive Gradient Method Based on Estimates of Third-Order Moments 2019 Adathm DSC GD
A new perspective in understanding of Adam-Type algorithms and beyond 2019 AdamAL arxiv pytorch GD
CProp: Adaptive Learning Rate Scaling from Past Gradient Conformity 2019 CProp arxiv pytorch GD
Domain-independent Dominance of Adaptive Methods 2019 AvaGrad, Delayed Adam cvpr'21 pytorch GD
Second-order Information in First-order Optimization Methods 2019 AdaSqrt arxiv tf GD
Does Adam optimizer keep close to the optimal point? 2019 AdaFix arxiv GD
Local AdaAlter: Communication-Efficient Stochastic Gradient Descent with Adaptive Learning Rates 2019 AdaAlter arxiv mxnet GD
UniXGrad: A Universal, Adaptive Algorithm with Optimal Guarantees for Constrained Optimization 2019 UniXGrad neurips'19 GD
Demon: Improved Neural Network Training with Momentum Decay 2019 Demon {SGDM,Adam} icassp'22 tf GD
ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization 2019 ZO-AdaMM neurips'19 tf GF
On Empirical Comparisons of Optimizers for Deep Learning 2019 RMSterov arxiv GD
An Adaptive and Momental Bound Method for Stochastic Learning 2019 AdaMod arxiv pytorch GD
On Higher-order Moments in Adam 2019 HAdam arxiv GD
diffGrad: An Optimization Method for Convolutional Neural Networks 2019 diffGrad TNNLS pytorch GD
Calibrating the Adaptive Learning Rate to Improve Convergence of ADAM 2019 SAMSGrad arxiv pytorch GD
On the Variance of the Adaptive Learning Rate and Beyond 2019 RAdam iclr'20 pytorch, TF GD
BGADAM: Boosting based Genetic-Evolutionary ADAM for Neural Network Optimization 2019 BGADAM arxiv GD
Adaloss: Adaptive Loss Function for Landmark Localization 2019 Adaloss arxiv GD
signADAM: Learning Confidences for Deep Neural Networks 2019 signADAM[++] icdmw'19 pytorch GD
The Role of Memory in Stochastic Optimization 2019 PolyAdam UAI'20 GD
Lookahead Optimizer: k steps forward, 1 step back 2019 Lookahead neurips'19 tf, pytorch GD
Momentum-Based Variance Reduction in Non-Convex SGD 2019 STORM neurips'19 pytorch GD
SAdam: A Variant of Adam for Strongly Convex Functions 2019 SAdam iclr'20 code GD
Matrix-Free Preconditioning in Online Learning 2019 RecursiveOptimizer icml'19 tf GD
PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization 2019 PowerSGD[M] neurips'19 pytorch GD
Fast-DENSER++: Evolving Fully-Trained Deep Artificial Neural Networks 2019 F-DENSER++ arxiv tf E
Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks 2019 Novograd neurips'19 pytorch GD
An Adaptive Remote Stochastic Gradient Method for Training Neural Networks 2019 NAMS{G,B},ARSG arxiv pytorch,mxnet GD
Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates 2019 ArmijoLS neurips'19 pytorch GD
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes 2019 LAMB iclr'19 tf,pytorch GD
On the Convergence Proof of AMSGrad and a New Version 2019 AdamX arxiv GD
An Optimistic Acceleration of AMSGrad for Nonconvex Optimization 2019 OPT-AMSGrad acml'21 GD
Parabolic Approximation Line Search for DNNs 2019 PAL neurip'20 pytorch GD
Gradient-only line searches: An Alternative to Probabilistic Line Searches 2019 GOLS-I arxiv GD
Adaptive Gradient Methods with Dynamic Bound of Learning Rate 2019 AdaBound iclr'19 pytorch GD
Memory-Efficient Adaptive Optimization 2019 SM3 neurips'19 tf GD
DADAM: A Consensus-based Distributed Adaptive Gradient Method for Online Optimization 2019 DADAM arxiv matlab GD
On the Convergence of AdaGrad with Momentum for Training Deep Neural Networks 2018 Ada{NAG,HB} arxiv GD
SADAGRAD: Strongly Adaptive Stochastic Gradient Methods 2018 SADAGRAD icml'18 GD
PSA-CMA-ES: CMA-ES with population size adaptation 2018 PSA-CMA-ES gecco'18 E
Adaptive Methods for Nonconvex Optimization 2018 Yogi neurips'18 tf GD
Deep Frank-Wolfe For Neural Network Optimization 2018 DFW iclr'19 pytorch GD
HyperAdam: A Learnable Task-Adaptive Adam for Network Training 2018 HyperAdam aaai'19 tf, pytorch GD
Practical Bayesian Learning of Neural Networks via Adaptive Optimisation Methods 2018 BADAM icml'20 tf GD
Kalman Gradient Descent: Adaptive Variance Reduction in Stochastic Optimization 2018 KGD arxiv tf GD
Quasi-hyperbolic momentum and Adam for deep learning 2018 QHM,QHAdam iclr'19 pytorch, tf GD
AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods 2018 AdaShift iclr'19 pytorch GD
Optimal Adaptive and Accelerated Stochastic Gradient Descent 2018 A2Grad{Exp,Inc,Uni} arxiv pytorch GD
Accelerating SGD with momentum for over-parameterized learning 2018 MaSS arxiv tf GD
Online Adaptive Methods, Universality and Acceleration 2018 AcceleGrad neurips'18 GD
On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization 2018 AdaFom iclr'19 GD
AdaGrad Stepsizes: Sharp Convergence Over Nonconvex Landscapes 2018 AdaGrad-Norm icml'19 pytorch GD
Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam 2018 VAdam vadam'18 pytorch, tf GD
Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks 2018 Padam ijcai'20 pytorch GD
Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis 2018 EKFAC neurips'18 pytorch GD
Bayesian filtering unifies adaptive and non-adaptive neural network optimization methods 2018 AdaBayes[FP] neurips'18 pytorch GD
Nostalgic Adam: Weighting more of the past gradients when designing the adaptive learning rate 2018 NosAdam ijcai'19 pytorch GD
Small steps and giant leaps: Minimal Newton solvers for Deep Learning 2018 Curveball iccv'19 matlab GD
GADAM: Genetic-Evolutionary ADAM for Deep Neural Network Optimization 2018 GADAM arxiv GD
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost 2018 Adafactor icml'18 pytorch GD
Aggregated Momentum: Stability Through Passive Damping 2018 AggMo iclr'19 pytorch, tf GD
Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization 2018 Katyusha X icml'18 VR
WNGrad: Learn the Learning Rate in Gradient Descent 2018 WNGrad arxiv C++ GD
VR-SGD: A Simple Stochastic Variance Reduction Method for Machine Learning 2018 VR-SGD IKDE C++ GD
signSGD: Compressed Optimisation for Non-Convex Problems 2018 signSGD icml'18 mxnet GD
Shampoo: Preconditioned Stochastic Tensor Optimization 2018 Shampoo icml'18 tf GD
L4: Practical loss-based stepsize adaptation for deep learning 2018 L4{Adam,Momentum} neurips'18 pytorch, tf GD
On the Convergence of Adam and Beyond 2018 AMSGrad, AdamNC iclr'18 pytorch GD
SW-SGD: The Sliding Window Stochastic Gradient Descent Algorithm 2017 SW-SGD PCS GD
Improving Generalization Performance by Switching from Adam to SGD 2017 SWATS iclr'18 pytorch GD
Noisy Natural Gradient as Variational Inference 2017 Noisy {Adam,K-FAC} icml'18 tf GD
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training 2017 AdaComp aaai'18 GD
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks 2017 AdaBatch iclr-W'18 PyTorch GD
First-order Stochastic Algorithms for Escaping From Saddle Points in Almost Linear Time 2017 NEON neurips'18 GD
BPGrad: Towards Global Optimality in Deep Learning via Branch and Pruning 2017 BPGrad cvpr'18 matlab GD
Decoupled Weight Decay Regularization 2017 AdamW,SGDW iclr'19 lua GD
Evolving Deep Convolutional Neural Networks for Image Classification 2017 EvoCNN ITEC python E
Normalized Direction-preserving Adam 2017 ND-Adam arxiv pytorch, tf GD
Regularizing and Optimizing LSTM Language Models 2017 NT-ASGD iclr'18 pytorch GD
Natasha 2: Faster Non-Convex Optimization Than SGD 2017 Natasha{1.5,2} neurips'18 GD
Large Batch Training of Convolutional Networks 2017 LARS arxiv pytorch GD
Practical Gauss-Newton Optimisation for Deep Learning 2017 KFRA, KFLR icml'17 GD
YellowFin and the Art of Momentum Tuning 2017 YellowFin arxiv tf GD
Variants of RMSProp and Adagrad with Logarithmic Regret Bounds 2017 SC-{Adagrad,RMSProp} icml'17 pytorch GD
Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients 2017 M-SVAG icml'18 tf GD
Training Deep Networks without Learning Rates Through Coin Betting 2017 COCOB neurips'17 tf GD
Sub-sampled Cubic Regularization for Non-convex Optimization 2017 SCR icml'17 numpy S
Online Convex Optimization with Unconstrained Domains and Losses 2017 RescaledExp neurips'16 GD
Evolving Deep Neural Networks 2017 CoDeepNEAT arxiv tf E
SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient 2017 SARAH icml'17 VR
IQN: An Incremental Quasi-Newton Method with Local Superlinear Convergence Rate 2017 IQN icassp'17 C++ GD,S
NMODE --- Neuro-MODule Evolution 2017 NMODE arxiv C++ E
The Whale Optimization Algorithm 2016 WOA AES numpy E
Incorporating Nesterov Momentum into Adam 2016 Nadam arxiv pytorch GD
Eve: A Gradient Based Optimization Method with Locally and Globally Adaptive Learning Rates 2016 Eve arxiv pytorch GD
Direct Feedback Alignment Provides Learning in Deep Neural Networks 2016 DFA neurips'16 numpy GD
SGDR: Stochastic Gradient Descent with Warm Restarts 2016 SGDR iclr'17 theano GD
Stochastic Quasi-Newton Methods for Nonconvex Stochastic Optimization 2016 Damp-oBFGS-Inf SIAM pytorch GD,S
A Comprehensive Linear Speedup Analysis for Asynchronous Stochastic Parallel Optimization from Zeroth-Order to First-Order 2016 ZO-SCD neurips'16 GF
Barzilai-Borwein Step Size for Stochastic Gradient Descent 2016 {SGD,SVRG}-BB neurips'16 numpy GD
Adaptive Learning Rate via Covariance Matrix Based Preconditioning for Deep Neural Networks 2016 SDProp ijcai'17 GD
Katyusha: The First Direct Acceleration of Stochastic Gradient Methods 2016 Katyusha stoc'17 VR
Accelerating SVRG via second-order information 2015 SVRG+{I,II} arxiv GD,S
adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs 2015 adaQN ecml'16 numpy GD,S
A Linearly-Convergent Stochastic L-BFGS Algorithm 2015 SVRG-SQN aistats julia GD,S
Optimizing Neural Networks with Kronecker-factored Approximate Curvature 2015 K-FAC icml'15 tf GD
Probabilistic Line Searches for Stochastic Optimization 2015 ProbLS JMLR GD
Scale-Free Algorithms for Online Linear Optimization 2015 AdaFTRL alt'15 GD
Adam: A Method for Stochastic Optimization 2014 Adam, AdaMax iclr'15 pytorch GD
Random feedback weights support learning in deep neural networks 2014 FA arxiv pytorch GD
A Computationally Efficient Limited Memory CMA-ES for Large Scale Optimization 2014 LM-CMA-ES gecco'14 E
A Proximal Stochastic Gradient Method with Progressive Variance Reduction 2014 Prox-SVRG SIAM tf, numpy VR
RES: Regularized Stochastic BFGS Algorithm 2014 Reg-oBFGS-Inf arxiv GD,S
A Stochastic Quasi-Newton Method for Large-Scale Optimization 2014 SQN SIAM matlab GD,S
SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives 2014 SAGA neurips'14 numpy VR
Accelerating stochastic gradient descent using predictive variance reduction 2013 SVRG neurips'13 pytorch VR
Ad Click Prediction: a View from the Trenches 2013 FTRL kdd'13 pytorch GD
Semi-Stochastic Gradient Descent Methods 2013 S2GD arxiv VR
Stochastic First- and Zeroth-order Methods for Nonconvex Stochastic Programming 2013 ZO-SGD SIAM GF
Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization 2013 ZO-{ProxSGD,PSGD} arxiv GF
Adaptive learning rates and parallelization for stochastic, sparse, non-smooth gradients 2013 vSGD-fd arxiv GD
Neural Networks for Machine Learning 2012 RMSProp coursera tf GD
An Enhanced Hypercube-Based Encoding for Evolving the Placement, Density, and Connectivity of Neurons 2012 ES-HyperNEAT AL go E
CMA-TWEANN: efficient optimization of neural networks via self-adaptation and seamless augmentation 2012 CMA-TWEANN gecoo'12 E
ADADELTA: An Adaptive Learning Rate Method 2012 ADADELTA arxiv pytorch GD
No More Pesky Learning Rates 2012 vSGD-{b,g,l} icml'13 lua VR
A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets 2012 SAG neurips'12 VR
CMA-ES: evolution strategies and covariance matrix adaptation 2011 CMA-ES gecco'12 tf E
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization 2011 AdaGrad JMLR pytorch,C++ GD
AdaDiff: Adaptive Gradient Descent with the Differential of Gradient 2010 AdaDiff iopscience GD
A Hypercube-Based Encoding for Evolving Large-Scale Neural Networks 2009 HyperNEAT AL E
Scalable training of L1-regularized log-linear models 2007 OWL-QN acm javascript GD,S
A Stochastic Quasi-Newton Method for Online Convex Optimization 2007 O-LBFGS icml'07 GD,S
Online convex programming and generalized infinitesimal gradient ascent 2003 OGD icml'03 GD
A Limited Memory Algorithm for Bound Constrained Optimization 2003 L-BFGS-B SIAM fortran, matlab GD,S
Evolving Neural Networks through Augmenting Topologies 2002 NEAT EC numpy E
Trust region methods 2000 Sub-sampled TR SIAM S
A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm 1993 RPROP icnn'93 pytorch GD
Acceleration of Stochastic Approximation by Averaging 1992 ASGD SIAM pytorch GD
Particle swarm optimization 1995 PSO icnn'95 E
On the limited memory BFGS method for large scale optimization 1989 L-BFGS MP GD,S
Large-scale linearly constrained optimization 1978 MINOS MP pytorch GD,S
Some methods of speeding up the convergence of iteration methods 1964 Polyak (momentum) paper GD
A Stochastic Approximation Method 1951 SGD paper pytorch GD