a bit slow on my mbp 16 m1 #3

cpietsch · 2023-08-09T10:40:31Z

I downloaded the https://huggingface.co/coreml-projects/Llama-2-7b-chat-coreml model and compiled the chat with xcode. When running the example prompt it takes around 15 minutes to complete. I am not sure what I did wrong, but the performance should be better right ?
2023-08-09 12:01:55.346753+0200 SwiftChat[27414:583595] Metal API Validation Enabled

The text was updated successfully, but these errors were encountered:

jsj · 2023-08-09T14:17:41Z

I believe because this is the unquantized version, if you compress it you will get better pref

pcuenca · 2023-08-09T15:08:10Z

Hi @cpietsch! It sounds to me as if the model was running on CPU only. Could you maybe try to run it again with the "GPU History" window of Activity Monitor open at the same time? It should show very clear GPU activity if it's in use.

Also, what computer are you using?

cpietsch · 2023-08-09T15:30:48Z

Hi @pcuenca, I am running it on an Apple M1 Pro with 16 GB and osx 13.4.1
I checked the perf history and it actually does not show significant activity on the GPU and CPU.

jsj · 2023-08-09T18:40:48Z

Interesting remember that Activity Monitor does not show Neural Engine, perhaps, https://github.com/tlkh/asitop, could provide more insight

pcuenca · 2023-08-11T19:41:47Z

My suspicion is that the computer is swapping because of memory pressure.

awmartin · 2023-08-16T17:15:06Z

Same experience. I have a Macbook Pro M1 Max with 32GB of RAM, and I get 0.39 tokens/s. It's even worse with Falcon 7b.

cpietsch · 2023-08-16T17:25:39Z

I believe because this is the unquantized version, if you compress it you will get better pref

maybe we need to convert the model ourselves. but 0.39 t/s is not that bad...

awmartin · 2023-08-16T17:31:01Z

Whelp, just closing all other apps, restarting, and running the SwiftChat build without Xcode has resulted in 4.96 tokens/s. Woohoo!

cpietsch · 2023-08-16T17:37:25Z

so @pcuenca was right with the memory pressure

cpietsch · 2023-08-21T11:02:28Z

Here are some profiling images which show a low workload:

It seams that other have the same problem

longseespace · 2023-08-26T06:57:13Z

I have a same problem. One thing I don't understand is I was able to get fast response using [ollama](https://ollama.ai). Any idea why? I can see that the default model used in ollama is the 7b model 🤔

cpietsch · 2023-08-26T08:17:07Z

Nice, ollama worked for me too right out of the box.
I tried to convert llama2 for the swift-chat myself with python -m exporters.coreml -m=./Llama-2-7b-hf --quantize=float16 --compute_units=cpu_and_gpu ll but it always crashes without error after around 15 minutes. 🤔

markwitt1 · 2023-09-05T11:35:50Z

I am experiencing the same issue on MBP 16 inch. Do you have any updates?

matiasvillaverde · 2023-09-22T14:51:18Z

I am encountering a similar issue while utilizing a MacBook M2 with 32GB of RAM. It appears that the system may be engaging in swapping due to elevated memory pressure. I would greatly appreciate any insights or recommendations you might have for optimizing and mitigating the memory footprint in this context.

AndreaChiChengdu · 2024-03-11T03:03:23Z

hi guys,any update for this question?
I met the same issue on my M3 mbp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a bit slow on my mbp 16 m1 #3

a bit slow on my mbp 16 m1 #3

cpietsch commented Aug 9, 2023

jsj commented Aug 9, 2023

pcuenca commented Aug 9, 2023 •

edited

Loading

cpietsch commented Aug 9, 2023

jsj commented Aug 9, 2023

pcuenca commented Aug 11, 2023

awmartin commented Aug 16, 2023

cpietsch commented Aug 16, 2023 •

edited

Loading

awmartin commented Aug 16, 2023

cpietsch commented Aug 16, 2023

cpietsch commented Aug 21, 2023

longseespace commented Aug 26, 2023

cpietsch commented Aug 26, 2023

markwitt1 commented Sep 5, 2023

matiasvillaverde commented Sep 22, 2023

AndreaChiChengdu commented Mar 11, 2024

a bit slow on my mbp 16 m1 #3

a bit slow on my mbp 16 m1 #3

Comments

cpietsch commented Aug 9, 2023

jsj commented Aug 9, 2023

pcuenca commented Aug 9, 2023 • edited Loading

cpietsch commented Aug 9, 2023

jsj commented Aug 9, 2023

pcuenca commented Aug 11, 2023

awmartin commented Aug 16, 2023

cpietsch commented Aug 16, 2023 • edited Loading

awmartin commented Aug 16, 2023

cpietsch commented Aug 16, 2023

cpietsch commented Aug 21, 2023

longseespace commented Aug 26, 2023

cpietsch commented Aug 26, 2023

markwitt1 commented Sep 5, 2023

matiasvillaverde commented Sep 22, 2023

AndreaChiChengdu commented Mar 11, 2024

pcuenca commented Aug 9, 2023 •

edited

Loading

cpietsch commented Aug 16, 2023 •

edited

Loading