Closed
Description
I'm not sure if this is something specific to my configuration, but I'm seeing this across the board in CI & local machine: the Release variant compiler flags are missing O2 (& DNDEBUG). I don't think this has anything to do with this project per se as I suspect the bug lies in the cmake crate, but wanted to flag. This results in CPU inference being >2x slower on my AMD with performance equivalent to debug llama-cli. Switching to RelWithDebInfo speeds it up quite a bit.
//Flags used by the CXX compiler during RELEASE builds.
CMAKE_CXX_FLAGS_RELEASE:STRING= -nologo -MD -Brepro
//Flags used by the CXX compiler during RELWITHDEBINFO builds.
CMAKE_CXX_FLAGS_RELWITHDEBINFO:STRING=/MD /Zi /O2 /Ob1 /DNDEBUG
//Flags used by the C compiler during RELEASE builds.
CMAKE_C_FLAGS_RELEASE:STRING= -nologo -MD -Brepro
//Flags used by the C compiler during RELWITHDEBINFO builds.
CMAKE_C_FLAGS_RELWITHDEBINFO:STRING=/MD /Zi /O2 /Ob1 /DNDEBUG
It's also worth noting that stock cmake configures /Ob2
as well for Release but I didn't observe any meaningful performance difference from that.