Using a batch size of the default value of 2048 may use an excessive amount of GPU memory, may i change the batch size #2244
Unanswered
cooleasyhan
asked this question in
Q&A
Replies: 1 comment
-
2048 is the logical batch size while the default value for the physical batch size is 512, which is the actual number affecting vram usage Starting from version 0.12, we will include the llama-server (from llama.cpp) in the Tabby distribution package. Consequently, you will be able to launch the llama-server with any configuration that suits your needs and connect Tabby to it via the HTTP interface. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Using a batch size of the default value of 2048 may use an excessive amount of GPU memory. because :
gpu mem full size = 2 * 2 * head_dim * n_heads * n_layers * max_context_length * batch_size
Moreover, the concurrency is not very high, for example, when the concurrency requirement is 30.
May I change the batch size。
Beta Was this translation helpful? Give feedback.
All reactions