Skip to content

Visualizing why SGD with momentum converges faster than Vanilla SGD

Notifications You must be signed in to change notification settings

Hawar-Dzaee/SGD-with-momentum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 

Repository files navigation

Vanilla SGD VS SGD with momentum

Visualizing why SGD with momentum converges faster than Vanilla SGD

🛑 Please make sure to open the notebook in Google Colab for Plotly's graphs to render.
🛑 I will be using (SGD == vanilla SGD == plain SGD) interchangeably, which is different from (SGD with momentum)
⚡ visualing the path and speed each optimization algorithm(SGD,SGD with momentum) will take given trained for 30 epochs.

Plotting (plain SGD) and (SGD with momentum) in the loss function landscape provides us with the intuition why SGD with momentum converges faster.
We are also aiming to show why overshooting when using SGD with momentum won't be a problem (in fact, it will help us escape local minima).

1- Loss(Cost) function Landscape with initial parameters, and Global minima. our task is to go(descent) from red point to the green point. landscape

2- Vanilla SGD. Given 30 epochs, the loss function is not zero but we can see it's getting there. vanilla_SGD

3- SGD with momentum. Given same number of epochs(30), it passes(overshots) global minima but it's able to return.(it converges faster than Vanilla SGD) SGD_with_momentum

4- Vanilla SGD, SGD with momentum on the same Loss Function Landscape. both_SGD

About

Visualizing why SGD with momentum converges faster than Vanilla SGD

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published