Add HF integration, better discoverability #469

NielsRogge · 2024-07-15T19:56:02Z

Hi @tridao and team,

I wrote a quick PoC to showcase that you can easily have integration with the 🤗 hub so that you can automatically load the various Mamba models using from_pretrained (and push them using push_to_hub), track download numbers for your models (similar to models in the Transformers library), and have nice model cards on a per-model basis. It leverages the PyTorchModelHubMixin class which allows to inherits these methods.

Yes this works for any custom PyTorch models, it's not limited to Transformers/Diffusers :)

Usage is as follows:

from mamba_ssm import Mamba2

model = Mamba2(
    # This module uses roughly 3 * expand * d_model^2 parameters
    d_model=dim, # Model dimension d_model
    d_state=64,  # SSM state expansion factor, typically 64 or 128
    d_conv=4,    # Local convolution width
    expand=2,    # Block expansion factor
).to("cuda")
y = model(x)
assert y.shape == x.shape

# optionally, one can push a trained model to the hub
model.push_to_hub("state-spaces/mamba2-demo")

# reload
model = Mamba2.from_pretrained("state-spaces/mamba2-demo")

This means people don't need to manually download a checkpoint first in their local environment, it just loads automatically from the hub. All checkpoints could be hosted as part of the state-spaces organization on the hub or a personal user account if you're interested.

Would you be interested in this integration?

Kind regards,

Niels
ML @ HF

tridao · 2024-07-16T01:30:57Z

This looks very convenient, thanks!

NielsRogge · 2024-07-16T09:02:38Z

Thanks for quickly merging my PR! Would you be interested in trying out?

tridao · 2024-07-16T09:05:13Z

Actually I just realize that maybe this Mixin should be part of MambaLMHeadModel (which is the model) instead of Mamba2 (which is a layer within a model)?

mamba/mamba_ssm/models/mixer_seq_simple.py

Line 215 in 014c094

class MambaLMHeadModel(nn.Module, GenerationMixin):

NielsRogge · 2024-07-16T09:19:18Z

Thanks that's right, it will be addressed at #471

Add mixin

df2daa3

tridao merged commit 7fb78a5 into state-spaces:main Jul 16, 2024

Wauplin mentioned this pull request Jul 16, 2024

Better HF integration for MambaLMHeadModel #471

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add HF integration, better discoverability #469

Add HF integration, better discoverability #469

Uh oh!

NielsRogge commented Jul 15, 2024 •

edited

Loading

Uh oh!

tridao commented Jul 16, 2024

Uh oh!

NielsRogge commented Jul 16, 2024

Uh oh!

tridao commented Jul 16, 2024

Uh oh!

NielsRogge commented Jul 16, 2024

Uh oh!

Uh oh!

Add HF integration, better discoverability #469

Add HF integration, better discoverability #469

Uh oh!

Conversation

NielsRogge commented Jul 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tridao commented Jul 16, 2024

Uh oh!

NielsRogge commented Jul 16, 2024

Uh oh!

tridao commented Jul 16, 2024

Uh oh!

NielsRogge commented Jul 16, 2024

Uh oh!

Uh oh!

NielsRogge commented Jul 15, 2024 •

edited

Loading