[QUESTION] How to enable ZeRO 2/3 stages ? #1156

polisettyvarma · 2024-09-24T12:01:47Z

How to enable ZeRO 2/3 stages ?
similar to #589

lmcafee-nvidia · 2024-09-24T16:09:30Z

I responded to this on #589.

polisettyvarma · 2024-09-25T04:11:41Z

please convert this issue to feature request for ZeRO 2/3
Thank you.

carolove · 2024-09-26T06:15:55Z

i think article,https://www.deepspeed.ai/tutorials/megatron/, is useful.
deepspeed ZeRO 1/2 works with Megatron-lm latest code.

polisettyvarma · 2024-09-26T06:19:46Z

@carolove Thanks for the inputs, i am familiar with deepspeed framework to enable all ZeRO stages. here query is regarding enabling ZeRO in this repo natively.
can you please share commits which added ZeRO 2 support in latest code of this repo.
Thank you.

carolove · 2024-09-29T06:30:43Z

I also look for such example~.

SeunghyunSEO · 2024-09-29T15:29:42Z

megatron-lm now has its own zero-1 (it is called distributed optimizer in this project), but if u are more familiar with deepspeed, then how bout using deepspeed-megatron, @polisettyvarma ?
And to my best knowledge, zero-3 is not compatible with model parallelism (TP or PP) of megatron-lm.
zero-3 reduce vram memory and improve throughput by partitioning and broadcasting model parameters but TP or PP partition its own way and rather communicate activations (all-reduce activations for backward and forward).
So TP or PP has no room for communicating model parameters.

polisettyvarma · 2024-09-29T17:22:21Z

Thank you @SeunghyunSEO for your inputs. Yes Megatron-DeepSpeed repo can be used but it's not up to date with Megatron-LM. I agree on Zero > 1 is not compatible with PP.
My request here is some similar feature of ZeRO on Megatron-LM.

deepakn94 · 2024-09-30T00:46:45Z

We should have PyTorch FSDP support compatible with TP in the next couple of weeks.

polisettyvarma · 2024-09-30T03:22:20Z

Thank you @deepakn94 for sharing this information.

polisettyvarma mentioned this issue Sep 24, 2024

[ENHANCEMENT] support zero 2 distributed optimize #589

Closed

polisettyvarma changed the title ~~[QUESTION] How to enable ZeRO 1/2/3 stages ?~~ [QUESTION] How to enable ZeRO 2/3 stages ? Sep 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] How to enable ZeRO 2/3 stages ? #1156

[QUESTION] How to enable ZeRO 2/3 stages ? #1156

polisettyvarma commented Sep 24, 2024 •

edited

Loading

lmcafee-nvidia commented Sep 24, 2024

polisettyvarma commented Sep 25, 2024 •

edited

Loading

carolove commented Sep 26, 2024

polisettyvarma commented Sep 26, 2024 •

edited

Loading

carolove commented Sep 29, 2024

SeunghyunSEO commented Sep 29, 2024

polisettyvarma commented Sep 29, 2024

deepakn94 commented Sep 30, 2024

polisettyvarma commented Sep 30, 2024 •

edited

Loading

[QUESTION] How to enable ZeRO 2/3 stages ? #1156

[QUESTION] How to enable ZeRO 2/3 stages ? #1156

Comments

polisettyvarma commented Sep 24, 2024 • edited Loading

lmcafee-nvidia commented Sep 24, 2024

polisettyvarma commented Sep 25, 2024 • edited Loading

carolove commented Sep 26, 2024

polisettyvarma commented Sep 26, 2024 • edited Loading

carolove commented Sep 29, 2024

SeunghyunSEO commented Sep 29, 2024

polisettyvarma commented Sep 29, 2024

deepakn94 commented Sep 30, 2024

polisettyvarma commented Sep 30, 2024 • edited Loading

polisettyvarma commented Sep 24, 2024 •

edited

Loading

polisettyvarma commented Sep 25, 2024 •

edited

Loading

polisettyvarma commented Sep 26, 2024 •

edited

Loading

polisettyvarma commented Sep 30, 2024 •

edited

Loading