Throughput is not improved with the increase of batch size #316

Irr-free · 2024-08-13T08:41:01Z

I'm using llama3 with a single Nividia V100 GPU (32GiB memory). When I increase the batch size from 1 to 8, the inference throughput does not increase, but it decreases. However, when I set the batch size to 16, throughput increases.
The max_seq_len in my config is 512, and I have not modified any other codes of the model.
Is this related to the cuda core or tensor core config of NVIDIA GPU?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Throughput is not improved with the increase of batch size #316

Throughput is not improved with the increase of batch size #316

Irr-free commented Aug 13, 2024

Throughput is not improved with the increase of batch size #316

Throughput is not improved with the increase of batch size #316

Comments

Irr-free commented Aug 13, 2024