-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial version of aarch64 container with Vulkan #270
base: main
Are you sure you want to change the base?
Conversation
@sroecker please sign your commit, this is failing the DCO build:
to sign an old commit. |
Signed-off-by: Steffen Roecker <[email protected]>
40672af
to
3e41c4e
Compare
I would prefer these all to be based off a base image with all of the python tools required to run ramalama and then rocm, vulcan, ... can all share the lower layer. |
@sroecker is llama.cpp working properly with you with a container generated from this Containerfile? Which models have you tested? I'm asking because the Vulkan backend hasn't worked for me since March, which is the reason why I started favoring the Kompute (which also uses Vulkan) backend. |
I had to test a smaller model due to machine constraints:
I can check with the kompute backend tomorrow. |
Tested with Mistral-7B and Wizard-Vicuna-13B and got random answers with both of them. Sadly, the Vulkan backend is still broken for Apple Silicon GPUs upstream. I think we're going to need to stay for a while with the Kompute backend, as implemented in #235. |
|
||
RUN git clone https://github.com/ggerganov/whisper.cpp.git && \ | ||
cd whisper.cpp && \ | ||
git reset --hard ${WHISPER_CPP_SHA} && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to build whisper.cpp with GGML_VULKAN also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried by just setting the env variable GGML_VULKAN=ON which failed because ggml-vulkan-shaders.hpp were missing.
Seems like additional steps like running ggml_vk_generate_shaders.py before are missing.
RUN git clone https://github.com/ggerganov/llama.cpp && \ | ||
cd llama.cpp && \ | ||
git reset --hard ${LLAMA_CPP_SHA} && \ | ||
#cmake -B build -DCMAKE_INSTALL_PREFIX:PATH=/usr -DGGML_CCACHE=0 && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well remove this commented out line
I think this image should inherit from the kompute one like @rhatdan said, if anything it makes the Containerfiles easier to maintain, less duplication. I notice this container image was put in a aarch64 directory. This image GGML_VULKAN and the GGML_KOMPUTE one it inherits from should be x86_64/aarch64 multi-arch images. @slp do you think GGML_VULKAN backend might work on other non-Apple GPUs? I was kinda thinking we merge this anyway, with Kompute being the primary Vulkan backend, but that one can switch to this one via some command-line option in ramalama if they wish. |
Pretty much all images that can be x86_64/aarch64, should be. I think the ROCm one will be x86_64 only, but that's because some of the required things aren't built for aarch64. |
Yes all images should be available in as many arches as makes sense. |
@slp @ericcurtin we need to get this in, so we can run on Mac with Containers. |
@pufferffish this might interest you. |
Initial version for aarch64 container with Vulkan support that runs on libkrun containers on MacOs