-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
a bit slow on my mbp 16 m1 #3
Comments
I believe because this is the unquantized version, if you compress it you will get better pref |
Hi @cpietsch! It sounds to me as if the model was running on CPU only. Could you maybe try to run it again with the "GPU History" window of Activity Monitor open at the same time? It should show very clear GPU activity if it's in use. Also, what computer are you using? |
Hi @pcuenca, I am running it on an Apple M1 Pro with 16 GB and osx 13.4.1 |
Interesting remember that Activity Monitor does not show Neural Engine, perhaps, https://github.com/tlkh/asitop, could provide more insight |
My suspicion is that the computer is swapping because of memory pressure. |
maybe we need to convert the model ourselves. but 0.39 t/s is not that bad... |
Whelp, just closing all other apps, restarting, and running the SwiftChat build without Xcode has resulted in 4.96 tokens/s. Woohoo! |
so @pcuenca was right with the memory pressure |
Here are some profiling images which show a low workload: |
I have a same problem. One thing I don't understand is I was able to get fast response using |
Nice, ollama worked for me too right out of the box. |
I am experiencing the same issue on MBP 16 inch. Do you have any updates? |
I am encountering a similar issue while utilizing a MacBook M2 with 32GB of RAM. It appears that the system may be engaging in swapping due to elevated memory pressure. I would greatly appreciate any insights or recommendations you might have for optimizing and mitigating the memory footprint in this context. |
hi guys,any update for this question? |
I downloaded the https://huggingface.co/coreml-projects/Llama-2-7b-chat-coreml model and compiled the chat with xcode. When running the example prompt it takes around 15 minutes to complete. I am not sure what I did wrong, but the performance should be better right ?
2023-08-09 12:01:55.346753+0200 SwiftChat[27414:583595] Metal API Validation Enabled
The text was updated successfully, but these errors were encountered: