Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to run CV-VAE on multiple GPUs? using something like accelerate to do device_map? #9

Open
radna0 opened this issue Jul 12, 2024 · 6 comments

Comments

@radna0
Copy link

radna0 commented Jul 12, 2024

Are there ways to reduce the amount of VRAM consumption? For example like how the Open-Sora-Plan team reduced the number of CausalConv3D layers in the encoder? As from the paper, it seems like batch processing isn't possible , because the video is encoded all at once? Is your team working on ways to mitigate this problem?

Open-Sora-Plan Technical Report v1.1
image

@radna0
Copy link
Author

radna0 commented Jul 12, 2024

Is it possible to quantize a VAE?

@radna0
Copy link
Author

radna0 commented Jul 12, 2024

If your team is training the z=16 channel VAE, how are you solving memory problems? @sijeh

@sijeh
Copy link
Collaborator

sijeh commented Jul 15, 2024

Are there ways to reduce the amount of VRAM consumption? For example like how the Open-Sora-Plan team reduced the number of CausalConv3D layers in the encoder? As from the paper, it seems like batch processing isn't possible , because the video is encoded all at once? Is your team working on ways to mitigate this problem?

Open-Sora-Plan Technical Report v1.1 image

You can save GPU memory by modifying en_de_n_frames_a_time and tile_spatial_size. During the encoding process, the video is split into blocks of approximately tile_spatial_size x tile_spatial_size x en_de_n_frames_a_time for inference, and then merged. Adjusting these parameters allows you to process videos of any resolution and length within the limited GPU memory. The smaller the size of the block, the less GPU memory is required. There is no need to encode/decode video with more than 1 gpu.

@sijeh
Copy link
Collaborator

sijeh commented Jul 15, 2024

Is it possible to quantize a VAE?

We did not quantize the CV-VAE because using tiled encoding and decoding, combined with fp16 inference, is sufficient for GPU memory

@sijeh
Copy link
Collaborator

sijeh commented Jul 15, 2024

If your team is training the z=16 channel VAE, how are you solving memory problems? @sijeh

If your team is training the z=16 channel VAE, how are you solving memory problems? @sijeh

The number of parameters in the CV-VAE with z=16 is roughly the same as that in the model with z=4, So we don't need to solve the memory problem.

@radna0
Copy link
Author

radna0 commented Jul 15, 2024

Are there ways to reduce the amount of VRAM consumption? For example like how the Open-Sora-Plan team reduced the number of CausalConv3D layers in the encoder? As from the paper, it seems like batch processing isn't possible , because the video is encoded all at once? Is your team working on ways to mitigate this problem?
Open-Sora-Plan Technical Report v1.1 image

You can save GPU memory by modifying en_de_n_frames_a_time and tile_spatial_size. During the encoding process, the video is split into blocks of approximately tile_spatial_size x tile_spatial_size x en_de_n_frames_a_time for inference, and then merged. Adjusting these parameters allows you to process videos of any resolution and length within the limited GPU memory. The smaller the size of the block, the less GPU memory is required. There is no need to encode/decode video with more than 1 gpu.

So because it's batch processing, any resolution, and longer videos still can be processed? As long as the batch is within the gpu memory limit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants