Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Callbacks] Remove EventLifecycle and on_start event #1170

Open
wants to merge 40 commits into
base: main
Choose a base branch
from

Conversation

kylesayrs
Copy link
Collaborator

@kylesayrs kylesayrs commented Feb 19, 2025

Purpose

  • Simplify code related to callbacks
    • Remove event lifecycle
      • Callback event lifecycle
      • Optimizer event lifecycle
    • Remove the concept of a "start_event", which was originally used because initialization sometimes requires triggering on_start, and on_start requires an event in order to get an index and index-related info
      • For now, we create a dummy event on initialization which has an index of zero
      • In the future, all start events will be triggered by events (never initialize) and all event events will be triggered by events or finalization

Prerequisites

Changes

  • Instead of using event lifecycle as a proxy, pass events to modifiers directly
- self._check_setup_event_lifecycle(event_type)
+ event = Event(event_type=event_type)
- for event in self.event_lifecycle.events_from_type(event_type):
+ for mod in self.modifiers:
     data = mod.update_event(state=self.state, event=event, **kwargs)
  • This was originally needed for optimizer event lifecycle, that is now removed
  • Make silent noops into errors
  if not self.initialized_:
-     return
+     raise RuntimeError("Cannot update an uninitialized modifier")
  • Remove check_initialized, which was used to allow for checking in situations which required double initialization

Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
@dsikka
Copy link
Collaborator

dsikka commented Feb 19, 2025

can you combine some of these removal PRs?

Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
@kylesayrs kylesayrs added the ready When a PR is ready for review label Feb 26, 2025
Signed-off-by: Kyle Sayers <[email protected]>
@kylesayrs kylesayrs marked this pull request as ready for review February 26, 2025 16:32
@kylesayrs kylesayrs added ready When a PR is ready for review and removed ready When a PR is ready for review labels Feb 26, 2025
Signed-off-by: Kyle Sayers <[email protected]>
Copy link
Collaborator

@markurtz markurtz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's talk through this one in detail as well. As it is, I'm against landing this and removing this functionality.

For example, check_initialized was important so that we surfaced the state of initialization from the modifiers and it was on the modifiers to report whether they had the required information to run rather than the session owning that check. This allows better flexibility for future modifiers in the event we don't plan out everything that needs to be passed in. It additionally makes it so that it's easier for integration so users can skip on passing through things that don't matter for their use case based on the modifiers they want to use.

Further, for the event lifecycles, those are core and do need to remain intact. The goal with those is to simplify the integration of LLM Compressor into external pathways by offering optionality for how LLM Compressor could be integrated based on the desired functionality and enabling anywhere from simple to advanced integrations based on what's needed. These were further setup to then guarantee that with that, the user is running the proper and expected lifecycle and are not missing any calls, integration points, or have a bug in their pipeline code.

From an incomplete external doc, the expected pathways I want to ensure we enable are the following, where we are removing two of these here:

  • Integration Types
    • Callbacks
      • Advantages
      • Disadvantages
      • Uses
        • None
      • Checks
        • Add model loading support
          • Required: check compressed tensors support
          • Required if distillation: Teacher loading
          • Required if staging or checkpointing: Previous recipe loading, if any
        • Add session creation and intialization calls
          • Required: recipe and supporting args, model
          • Required if training: Optimizer
          • Required if distillation: Teacher
          • Required if checkpointing: start and supporting args
          • Required if staging or checkpointing: previous model recipe
        • Add event lifecycle callbacks
          • Required: batch start, batch end
          • Required if training: optim step start, optim step end
          • Required if training with loss update algorithms: loss calculated
          • Required if post training accumulation: micro_batch_start, micro_batch_end (with targeted layers passed in as well)
        • Add session finalization call
          • Returns the finalized modified model for use in the rest of the system
        • Add saving/checkpointing
          • Required: modified model with compressed tensors
          • Required if checkpointing or staging: serialized recipe as recipe.yaml or recipe.md
      • Sample Code
        • Minimal
        • Full
    • Wrapped Model (post training only)
      • Advantages
      • Disadvantages
      • Uses
        • Callbacks
      • Checks
        • Add model loading support
          • Required: check compressed tensors support
          • Required if distillation: Teacher loading
          • Required if staging or checkpointing: Previous recipe loading, if any
        • Add session creation and initialization calls
          • Required: recipe and supporting args, model
          • Required if checkpointing: start and supporting args
          • Required if staging: previous model recipe
        • Add session finalization call
          • Returns the finalized modified model for use in the rest of the system
        • Add saving / checkpointing
          • Required: modified model with compressed tensors
          • Required if checkpointing or staging: serialized recipe as recipe.yaml or recipe.md
      • Sample Code
        • Minimal
        • Full
    • Wrapped Optimizer (training only)
      • Advantages
      • Disadvantages
      • Uses
        • Callbacks
      • Checks
        • Add model loading support
          • Required: check compressed tensors support
          • Required if distillation: Teacher loading
          • Required if staging or checkpointing: Previous recipe loading, if any
        • Add session creation and intialization:
          • Required: recipe and supporting args, model, optimizer
          • Required if distillation: teacher
          • Required if checkpointing: start and supporting args
          • Required if staging or checkpointing: previous model recipe
        • Add event lifecycle callbacks
          • Required if loss update algorithms: loss calculated
        • Add session finalization call
          • Returns the finalized modified model for use in the rest of the system
        • Add saving/checkpointing
          • Required: modified model with compressed tensors
          • Required if checkpointing or staging: serialized recipe as recipe.yaml or recipe.md
      • Sample Code
        • Minimal
        • Full
    • Transformers Trainer Callback
      • Advantages
      • Disadvantages
      • Uses
        • Callbacks
      • Checks
        • Add model loading support
          • Required: check compressed tensors support
          • Required if distillation: Teacher loading
          • Required if staging or checkpointing: Previous recipe loading, if any
        • Add session creation and intialization:
          • Required: recipe and supporting args, model, optimizer
          • Required if distillation: teacher
          • Required if checkpointing: start and supporting args
          • Required if staging or checkpointing: previous model recipe
        • Add CompressionTrainerCallback registration
        • Add event lifecycle callbacks
          • Required if loss update algorithms: loss calculated
        • Add saving/checkpointing
          • Required: modified model with compressed tensors
          • Required if checkpointing or staging: serialized recipe as recipe.yaml or recipe.md
      • Sample Code
        • Minimal
        • Full
    • Transformers Trainer Mixin
      • Advantages
      • Disadvantages
      • Uses
        • Callbacks
        • Transformers Trainer Callback
      • Checks
        • Add model loading support
          • Required: check compressed tensors support
          • Required if distillation: Teacher loading
          • Required if staging or checkpointing: Previous recipe loading, if any
        • Add Transformers Trainer Mixin
          • Required: recipe and supporting args
          • Required if distillation: teacher
          • Required if checkpointing: start and supporting args
          • Required if staging or checkpointing: Previous recipe loading, if any
      • Sample Code
        • Minimal
        • Full

Base automatically changed from kylesayrs/remove-preinitialize-structure to main February 26, 2025 20:37
Copy link
Collaborator

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2025-02-27 at 10 51 02 AM

Love it! :D

Copy link
Collaborator

@markurtz markurtz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quickly flagging this as scheduled for rework since it will break some flows based on current implementation so it isn't accidentally landed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants