Skip to content

Checkpoint Conversion to HuggingFace (GPT2) #305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Mar 19, 2025

Conversation

flxst
Copy link
Member

@flxst flxst commented Feb 21, 2025

What does this PR do?

This PR implements the true conversion of the modalities gpt2 model into a llama style huggingface model. Note that this does not include the tokenizer.

In addition, the getting started tutorial got refactored and updated. Checkpoint conversion is end-to-end tested based on the checkpoint from the getting started tutorial.

Follow-up issues were created: #308 and #309.

General Changes

  • The configs for the getting started tutorial got updated (no bias, layer norm, swiglu, pytorch_flash attn)
  • Under modalities/conversion/gpt2 code was added for turning a modalities gpt2 checkpoint into something that can be loaded via AutoModelForCausalLM.from_pretrained().

Breaking Changes

  • None

Checklist before submitting final PR

  • My PR is minimal and addresses one issue in isolation
  • I have merged the latest version of the target branch into this feature branch
  • I have reviewed my own code w.r.t. correct implementation, missing type hints, proper documentation, etc.
  • I have run a sample config for model training
  • I have checked that all tests run through (python tests/tests.py)
  • I have updated the internal changelog (CHANGELOG_DEV.md)

…dalities gpt2 model into a llama style huggingface model.
@flxst flxst marked this pull request as draft February 21, 2025 07:38
flxst and others added 22 commits February 21, 2025 10:47
TODO: test_converting_gpt2_does_not_change_outputs currently fails.
@flxst flxst self-assigned this Feb 25, 2025
@BlueCrescent BlueCrescent marked this pull request as ready for review February 25, 2025 15:05
@fromm-m fromm-m self-requested a review March 3, 2025 11:07
…have the same setting.

Having different settings is currently not supported by our hf model.
Also, some additionally minor fixes and refactorings.
@ajude2s
Copy link
Collaborator

ajude2s commented Mar 5, 2025

LGTM

Copy link
Member

@fromm-m fromm-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fromm-m fromm-m merged commit adb23a3 into main Mar 19, 2025
3 checks passed
@BlueCrescent BlueCrescent deleted the conversion_modalities_to_huggingface branch June 12, 2025 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants