Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should SigLip memory consumption increase as we scale number of GPUs #132

Open
khalidsaifullaah opened this issue Sep 30, 2024 · 0 comments

Comments

@khalidsaifullaah
Copy link

From the SigLip paper my understanding is that it doesn't require any all_gather and it's always performing local b x b computation iteratively, where b is micro_batch_size (see this section from the paper).
image

So if I can fit let's say micro_batch_size 10 (in 8 GPUs), and then I increase the number of GPUs to 16, 32, 64, 128, ... my memory consumption should (more or less) remain the same (just like doing normal DDP). Or simply put, we should be able to scale world_batch_size or the number of nodes by keeping the micro_batch_size constant (in theory) right?

But what I've observed is that the memory consumption spikes as i increase world_batch_size (num of nodes) and I need to lower my micro_batch_size (even to as low as 2 for 128 devices).

  1. I'm wondering if my understanding of siglip is correct that keeping the micro_batch_size constant it allows you to scale world_batch_size or num of devices?
  2. It could also be the case that the siglip GPU implementation I'm using isn't quite how the official TPU one works? For example, I think it could also be possible that while swapping neighbors it doesn't free up the memory, and that's why the consumption accumulates...?

I could be totally wrong on both of these assumptions, so I'd be glad if maybe the authors could provide some insights, so i could validate my hypothesis regarding horizontal scaling of siglip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant