You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
sudo nice -n -20 ./dllama inference --prompt "Hello world" --model models/llama3_3_70b_instruct_q40/dllama_model_llama3_3_70b_instruct_q40.m --tokenizer models/llama3_3_70b_instruct_q40/dllama_tokenizer_llama3_3_70b_instruct_q40.t --buffer-float-type q80 --nthreads 8 --max-seq-len 4096 --steps 64 --workers 192.168.0.121:9999
...
🔶 G 272 ms I 260 ms T 10 ms S 1392 kB R 1610 kB elements
🔶 G 271 ms I 257 ms T 13 ms S 1392 kB R 1610 kB ,
🔶 G 272 ms I 255 ms T 16 ms S 1392 kB R 1610 kB and
🔶 G 272 ms I 253 ms T 19 ms S 1392 kB R 1610 kB they
🔶 G 274 ms I 260 ms T 12 ms S 1392 kB R 1610 kB can
🔶 G 272 ms I 259 ms T 13 ms S 1392 kB R 1610 kB also
🔶 G 274 ms I 260 ms T 13 ms S 1392 kB R 1610 kB contain
🔶 G 272 ms I 258 ms T 14 ms S 1392 kB R 1610 kB text
🔶 G 277 ms I 263 ms T 14 ms S 1392 kB R 1610 kB content
Generated tokens: 64
Avg tokens / second: 3.31
Avg generation time: 302.30 ms
Avg inference time: 285.05 ms
Avg transfer time: 16.28 ms
2 x Mac Mini M4 Pro via 10G Ethernet
sudo nice -n -20 ./dllama inference --prompt "Hello world" --model models/llama3_3_70b_instruct_q40/dllama_model_llama3_3_70b_instruct_q40.m --tokenizer models/llama3_3_70b_instruct_q40/dllama_tokenizer_llama3_3_70b_instruct_q40.t --buffer-float-type q80 --nthreads 8 --max-seq-len 4096 --steps 64 --workers <deleted>:9999
...
🔶 G 303 ms I 279 ms T 24 ms S 1392 kB R 1610 kB font
🔶 G 305 ms I 275 ms T 28 ms S 1392 kB R 1610 kB -family
🔶 G 304 ms I 282 ms T 21 ms S 1392 kB R 1610 kB :
🔶 G 307 ms I 279 ms T 28 ms S 1392 kB R 1610 kB Arial
🔶 G 305 ms I 281 ms T 23 ms S 1392 kB R 1610 kB ,
🔶 G 306 ms I 281 ms T 24 ms S 1392 kB R 1610 kB sans
🔶 G 305 ms I 278 ms T 26 ms S 1392 kB R 1610 kB -serif
🔶 G 306 ms I 281 ms T 24 ms S 1392 kB R 1610 kB ;
Generated tokens: 64
Avg tokens / second: 3.25
Avg generation time: 307.81 ms
Avg inference time: 280.92 ms
Avg transfer time: 26.12 ms
4 x Mac Mini M4 Pro via Thunderbolt 5
sudo nice -n -20 ./dllama inference --prompt "Hello world" --model models/llama3_3_70b_instruct_q40/dllama_model_llama3_3_70b_instruct_q40.m --tokenizer models/llama3_3_70b_instruct_q40/dllama_tokenizer_llama3_3_70b_instruct_q40.t --buffer-float-type q80 --nthreads 8 --max-seq-len 4096 --steps 64 --workers 192.168.0.121:9999 192.168.0.122:9999 192.168.0.141:9999
...
🔶 G 151 ms I 119 ms T 32 ms S 4176 kB R 4455 kB The
🔶 G 152 ms I 113 ms T 39 ms S 4176 kB R 4455 kB order
🔶 G 153 ms I 121 ms T 32 ms S 4176 kB R 4455 kB of
🔶 G 153 ms I 115 ms T 38 ms S 4176 kB R 4455 kB the
Generated tokens: 64
Avg tokens / second: 6.51
Avg generation time: 153.56 ms
Avg inference time: 116.64 ms
Avg transfer time: 36.62 ms
4 x Mac Mini M4 Pro via 10G Ethernet
sudo nice -n -20 ./dllama inference --prompt "Hello world" --model models/llama3_3_70b_instruct_q40/dllama_model_llama3_3_70b_instruct_q40.m --tokenizer models/llama3_3_70b_instruct_q40/dllama_tokenizer_llama3_3_70b_instruct_q40.t --buffer-float-type q80 --nthreads 8 --max-seq-len 4096 --steps 64 --workers <redacted>:9999 <redacted>:9999 <redacted>:9999
...
🔶 G 164 ms I 125 ms T 39 ms S 4176 kB R 4455 kB of
🔶 G 165 ms I 140 ms T 24 ms S 4176 kB R 4455 kB a
🔶 G 165 ms I 129 ms T 36 ms S 4176 kB R 4455 kB place
🔶 G 165 ms I 129 ms T 35 ms S 4176 kB R 4455 kB you
🔶 G 164 ms I 134 ms T 29 ms S 4176 kB R 4455 kB 've
🔶 G 170 ms I 120 ms T 50 ms S 4176 kB R 4455 kB visited
Generated tokens: 64
Avg tokens / second: 6.04
Avg generation time: 165.55 ms
Avg inference time: 130.47 ms
Avg transfer time: 34.41 ms
llama3_1_8b_instruct_q40
Network
1 x Mac Mini M4 Pro 24GB RAM
2 x Mac Mini M4 Pro 24GB RAM
4 x Mac Mini M4 Pro 24GB RAM
Thunderbolt 5
16.91 tok/s
30.27 tok/s (1.79x faster 🔥)
38.83 tok/s
10G Ethernet
16.91 tok/s
24.54 tok/s
34.61 tok/s
1 x Mac Mini M4 Pro
./dllama inference --prompt "Hello world" --model models/llama3_1_8b_instruct_q40/dllama_model_llama3_1_8b_instruct_q40.m --tokenizer models/llama3_1_8b_instruct_q40/dllama_tokenizer_llama3_1_8b_instruct_q40.t --buffer-float-type q80 --steps 64
...
🔶 G 52 ms I 51 ms T 0 ms S 0 kB R 0 kB amazing
🔶 G 51 ms I 51 ms T 0 ms S 0 kB R 0 kB clients
🔶 G 52 ms I 52 ms T 0 ms S 0 kB R 0 kB and
Generated tokens: 64
Avg tokens / second: 16.91
Avg generation time: 59.14 ms
Avg inference time: 58.70 ms
Avg transfer time: 0.06 ms
2 x Mac Mini M4 Pro via Thunderbolt 5
./dllama inference --prompt "Hello world" --model models/llama3_1_8b_instruct_q40/dllama_model_llama3_1_8b_instruct_q40.m --tokenizer models/llama3_1_8b_instruct_q40/dllama_tokenizer_llama3_1_8b_instruct_q40.t --buffer-float-type q80 --steps 64 --nthreads 8 --workers 192.168.0.121:9999
...
🔶 G 30 ms I 24 ms T 6 ms S 288 kB R 522 kB in
🔶 G 32 ms I 25 ms T 6 ms S 288 kB R 522 kB the
🔶 G 31 ms I 22 ms T 9 ms S 288 kB R 522 kB kitchen
Generated tokens: 64
Avg tokens / second: 30.27
Avg generation time: 33.03 ms
Avg inference time: 25.81 ms
Avg transfer time: 6.94 ms
2 x Mac Mini M4 Pro via 10G Ethernet
./dllama inference --prompt "Hello world" --model models/llama3_1_8b_instruct_q40/dllama_model_llama3_1_8b_instruct_q40.m --tokenizer models/llama3_1_8b_instruct_q40/dllama_tokenizer_llama3_1_8b_instruct_q40.t --buffer-float-type q80 --steps 64 --nthreads 8 --workers <redacted>:9999
...
🔶 G 38 ms I 28 ms T 10 ms S 288 kB R 522 kB with
🔶 G 40 ms I 26 ms T 14 ms S 288 kB R 522 kB food
🔶 G 40 ms I 28 ms T 11 ms S 288 kB R 522 kB ,
🔶 G 39 ms I 28 ms T 11 ms S 288 kB R 522 kB family
Generated tokens: 64
Avg tokens / second: 24.54
Avg generation time: 40.75 ms
Avg inference time: 28.00 ms
Avg transfer time: 12.50 ms
4 x Mac Mini M4 Pro via Thunderbolt 5
./dllama inference --prompt "Hello world" --model models/llama3_1_8b_instruct_q40/dllama_model_llama3_1_8b_instruct_q40.m --tokenizer models/llama3_1_8b_instruct_q40/dllama_tokenizer_llama3_1_8b_instruct_q40.t --buffer-float-type q80 --steps 64 --nthreads 8 --workers 192.168.0.121:9999 192.168.0.122:9999 192.168.0.141:9999
...
🔶 G 26 ms I 23 ms T 3 ms S 864 kB R 1191 kB Pacific
🔶 G 25 ms I 19 ms T 6 ms S 864 kB R 1191 kB Northwest
🔶 G 26 ms I 20 ms T 6 ms S 864 kB R 1191 kB .
🔶 G 26 ms I 23 ms T 2 ms S 864 kB R 1191 kB I
Generated tokens: 64
Avg tokens / second: 38.83
Avg generation time: 25.75 ms
Avg inference time: 20.73 ms
Avg transfer time: 4.61 ms
4 x Mac Mini M4 Pro via 10G Ethernet
./dllama inference --prompt "Hello world" --model models/llama3_1_8b_instruct_q40/dllama_model_llama3_1_8b_instruct_q40.m --tokenizer models/llama3_1_8b_instruct_q40/dllama_tokenizer_llama3_1_8b_instruct_q40.t --buffer-float-type q80 --steps 64 --nthreads 8 --workers <redacted>:9999 <redacted>:9999 <redacted>:9999
...
🔶 G 29 ms I 14 ms T 15 ms S 864 kB R 1191 kB and
🔶 G 28 ms I 14 ms T 14 ms S 864 kB R 1191 kB data
🔶 G 29 ms I 15 ms T 14 ms S 864 kB R 1191 kB visualization
🔶 G 29 ms I 15 ms T 13 ms S 864 kB R 1191 kB .
🔶 G 29 ms I 14 ms T 15 ms S 864 kB R 1191 kB I
🔶 G 29 ms I 13 ms T 16 ms S 864 kB R 1191 kB have
Generated tokens: 64
Avg tokens / second: 34.61
Avg generation time: 28.89 ms
Avg inference time: 14.34 ms
Avg transfer time: 14.25 ms
This performance test was made possible thanks to MacWeb.com ❤️, which offers on-demand access to Macs in the cloud.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Distributed Llama version: 0.11.1 (CPU only)
llama3_3_70b_instruct_q40
2 x Mac Mini M4 Pro via Thunderbolt 5
2 x Mac Mini M4 Pro via 10G Ethernet
4 x Mac Mini M4 Pro via Thunderbolt 5
4 x Mac Mini M4 Pro via 10G Ethernet
llama3_1_8b_instruct_q40
1 x Mac Mini M4 Pro
2 x Mac Mini M4 Pro via Thunderbolt 5
2 x Mac Mini M4 Pro via 10G Ethernet
4 x Mac Mini M4 Pro via Thunderbolt 5
4 x Mac Mini M4 Pro via 10G Ethernet
This performance test was made possible thanks to MacWeb.com ❤️, which offers on-demand access to Macs in the cloud.
Beta Was this translation helpful? Give feedback.
All reactions