You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With the current setup, the same learning rate is applied to non gain or bias params of the text and image encoders. It would be nice to have flexibility in setting these. For instance, the SigLIP paper gets peak performance with pretrained image encoders by disabling weight decay on the image encoder (though I'm not sure if that's the trunk, head, or both). Here's the figure from the paper for reference:
I'm not sure what the best mechanism to accomodate various use cases would be. One more useful fine-tuning setup I can imagine is setting differential learning rates for diff parts of the network.
The text was updated successfully, but these errors were encountered:
With the current setup, the same learning rate is applied to non gain or bias params of the text and image encoders. It would be nice to have flexibility in setting these. For instance, the SigLIP paper gets peak performance with pretrained image encoders by disabling weight decay on the image encoder (though I'm not sure if that's the
trunk
,head
, or both). Here's the figure from the paper for reference:I'm not sure what the best mechanism to accomodate various use cases would be. One more useful fine-tuning setup I can imagine is setting differential learning rates for diff parts of the network.
The text was updated successfully, but these errors were encountered: