-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nvidia GH200 / ARM64: SIGSEGV in XPU and Vector, but not Scalar modes #180
Comments
Quick update; attached lldb in the container.
I'm not much of a C++ guy, so let me know if this is helpful :) |
I see what I did wrong in collecting this with lldb. I've updated my environment to not use
and with vector
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi Team,
I'm successfully using OpenMoonRay 1.7 in Gentoo on an AMD EPYC 9654 workstation (Ebuild here, patches to OMR here). I'm working on building out a render farm, and I hope to use the well-priced NVIDIA GH200 platform on VULTR as an on-demand Arras compute node.
I've built an OMR docker image for ARM64 Neocortex-V2 with Optix and CUDA (
-march=armv9-a -mcpu=neoverse-v2 -mtune=neoverse-v2
), the chips used in the NVIDIA GH200. The patches I've made against OMR's source are here and the ebuild, slightly modified from the previous example, is here.I'm launching my docker container like so:
docker run -it -v /root:/root --runtime=nvidia --gpus=all -e NVIDIA_DRIVER_CAPABILITIES=graphics,compute,utility openmoonray-arm64
My bash environment looks like this:
The processor looks like this in /proc/cpuinfo
I'm running the test render on the country kitchen scene with
moonray -debug -exec_mode xpu -in scene.rdla -in scene.rdlb -out arm64.exr
And the output is in the attached file
kitchen.log
I'd love to contribue a coherent patch once I get this working. The majority of the changes I've made to try and get this working are all about changing
__APPLE__
to__ARM_NEON__
in the appropriate places, and separating out the concerns between ARM on Darwin and ARM on Linux. It's been a whirlwind trying to get this far, and compiling on qemu had made this process slower than usual :)Where is a good place to start with debugging this? Since scalar works, I imagine I made some mistakes in my patching as it relates to vector and XPU. I also assume after reading the code that Apple hasn't been tested with Optix at all and we're in uncharted waters.
Looking forward to working with everyone! Thanks!
The text was updated successfully, but these errors were encountered: