-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FSDP+PP bug where reshard_after_forward must be true #1105
Comments
This seems like an important / high(er) priority issue since FSDP + PP generally wants |
I believe in old FSDP, where FSDP API is called on the whole model, I don't know if the new FSDP still allow the API to be called on the whole model or not, if allowed, can it be investigated so that this burden is not on the user? After all, That said, following @awgu 's comment, should we just do:
|
It is still the same (cannot be automatically figured out -- only the root module auto changes to |
By "corner" case, I refer to this line:
As compared to actively choosing ZeRo-2 or ZeRo-3, I think the user is more saying: I want to use FSDP, but I also want slightly more perf since the last layer's backward will immediately start after its forward so please don't reshard it. |
Said in a different way, if we already know that: |
I see. I think since we do not know the execution order in general, we cannot do it easily in the FSDP API itself, which is a building block. Maybe a higher level API that knows how to call FSDP for some class of models could do it. |
https://github.com/pytorch/torchtitan/pull/161/files#diff-80b04fce2b861d9470c6160853441793678ca13904dae2a9b8b7145f29cd017aR269
IIRC @awgu mentioned there was an issue requiring this setting for the time being. Not sure why or if it has been fixed yet?
The text was updated successfully, but these errors were encountered: