You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using llama3 with a single Nividia V100 GPU (32GiB memory). When I increase the batch size from 1 to 8, the inference throughput does not increase, but it decreases. However, when I set the batch size to 16, throughput increases.
The max_seq_len in my config is 512, and I have not modified any other codes of the model.
Is this related to the cuda core or tensor core config of NVIDIA GPU?
The text was updated successfully, but these errors were encountered:
I'm using llama3 with a single Nividia V100 GPU (32GiB memory). When I increase the batch size from 1 to 8, the inference throughput does not increase, but it decreases. However, when I set the batch size to 16, throughput increases.
The
max_seq_len
in my config is 512, and I have not modified any other codes of the model.Is this related to the
cuda core
ortensor core
config of NVIDIA GPU?The text was updated successfully, but these errors were encountered: