Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow Response #328

Open
sneh20122001 opened this issue Sep 4, 2024 · 2 comments
Open

Slow Response #328

sneh20122001 opened this issue Sep 4, 2024 · 2 comments

Comments

@sneh20122001
Copy link

llama offical website : https://llama.meta.com/docs/llama-everywhere/running-meta-llama-on-mac/

Describe the bug

curl http://localhost:11434/api/chat -d '{
"model": "llama3",
"messages": [
{
"role": "user",
"content": "who wrote the book godfather?"
}
],
"stream": false
}'

I had ran this code in my system, which has 16GB of RAM, 1TB of HDD,512GB SSD and nvidia geforce 1060 GPU but still model not return a response as much as fast, it's take around 40-45 seconds for a single line of prompt

If any one have suggestion then, please let me know, it will be helpful for me

@ADITYA1720
Copy link

Which llama 3 model version are you using? (Number of parameters?)

What device are you using?
I'm using a 8 GB macBook M2 Pro with 512 GB SSD and was able to get an instant response from the chat query as well as the API call.

@sneh20122001
Copy link
Author

sneh20122001 commented Sep 5, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants