Vanilla SGD VS SGD with momentum

Visualizing why SGD with momentum converges faster than Vanilla SGD

🛑 Please make sure to open the notebook in Google Colab for Plotly's graphs to render.
🛑 I will be using (SGD == vanilla SGD == plain SGD) interchangeably, which is different from (SGD with momentum)
⚡ visualing the path and speed each optimization algorithm(SGD,SGD with momentum) will take given trained for 30 epochs.

Plotting (plain SGD) and (SGD with momentum) in the loss function landscape provides us with the intuition why SGD with momentum converges faster.
We are also aiming to show why overshooting when using SGD with momentum won't be a problem (in fact, it will help us escape local minima).

1- Loss(Cost) function Landscape with initial parameters, and Global minima. our task is to go(descent) from red point to the green point.

2- Vanilla SGD. Given 30 epochs, the loss function is not zero but we can see it's getting there.

3- SGD with momentum. Given same number of epochs(30), it passes(overshots) global minima but it's able to return.(it converges faster than Vanilla SGD)

4- Vanilla SGD, SGD with momentum on the same Loss Function Landscape.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
README.md		README.md
SGD_with_momentum_3d.ipynb		SGD_with_momentum_3d.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vanilla SGD VS SGD with momentum

About

Releases

Packages

Languages

Hawar-Dzaee/SGD-with-momentum

Folders and files

Latest commit

History

Repository files navigation

Vanilla SGD VS SGD with momentum

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages