Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLama 3.1 Error: Device was lost during reload. This can happen due to insufficient memory or other GPU constraints. Detailed error: [object GPUDeviceLostInfo]. Please try to reload WebLLM with a less resource-intensive model. #517

Open
djaffer opened this issue Jul 26, 2024 · 4 comments

Comments

@djaffer
Copy link

djaffer commented Jul 26, 2024

Getting error with llama 3.1. All other models are working fine?
Device was lost during reload. This can happen due to insufficient memory or other GPU constraints. Detailed error: [object GPUDeviceLostInfo]. Please try to reload WebLLM with a less resource-intensive model.

@CharlieFRuan
Copy link
Contributor

Are you seeing this on chat.webllm.ai? Perhaps try the one with -1k suffix, which has smaller kv cache, hence less memory requirement. Also try q4f16_1 instead of q4f32_1.

@djaffer
Copy link
Author

djaffer commented Jul 26, 2024

Ok. Thanks. Something definitely seems not right here. All other models are working fine besides this.

Getting this error.
"Error while parsing WGSL: :4:8 error: extension 'f16' is not allowed in the current environment\nenable f16;\n ^^^\n\n\n - While validating [ShaderModuleDescriptor]\n - While calling [Device].CreateShaderModule([ShaderModuleDescriptor]).\n"

@CharlieFRuan
Copy link
Contributor

The f16 error suggests that the WebGPU on your browser/device does not support f16 computation. You can check it manually at https://webgpureport.org/. If supported, you should see this shader-f16 in features:
image

The f16 error and Device lost error are separate. Seeing device lost with Llama3.1-q4f32_1 suggests you do not have enough RAM (it requires ^5GB according to our config.ts); seeing f16 not supported with q4f16_1 means WebGPU compatibility with f16 computation. On a side note, q4f32 models require more RAM than the q4f16 counterparts. You can see the config.ts for more.

@djaffer
Copy link
Author

djaffer commented Aug 3, 2024

Got that thanks! Not sure why I was recommended to use that. Something seems off for llama 3.1 that it is giving error. The gpu has 8gb ram. Any specific reason to not reduce the size as other models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants