Windows performance in Release profile seems crippled when building dev Cargo profile (RelWithDebInfo is faster)

I'm not sure if this is something specific to my configuration, but I'm seeing this across the board in CI & local machine: the Release variant compiler flags are missing O2 (& DNDEBUG). I don't think this has anything to do with this project per se as I suspect the bug lies in the cmake crate, but wanted to flag. This results in CPU inference being >2x slower on my AMD with performance equivalent to debug llama-cli. Switching to RelWithDebInfo speeds it up quite a bit.

```
//Flags used by the CXX compiler during RELEASE builds.
CMAKE_CXX_FLAGS_RELEASE:STRING= -nologo -MD -Brepro

//Flags used by the CXX compiler during RELWITHDEBINFO builds.
CMAKE_CXX_FLAGS_RELWITHDEBINFO:STRING=/MD /Zi /O2 /Ob1 /DNDEBUG

//Flags used by the C compiler during RELEASE builds.
CMAKE_C_FLAGS_RELEASE:STRING= -nologo -MD -Brepro

//Flags used by the C compiler during RELWITHDEBINFO builds.
CMAKE_C_FLAGS_RELWITHDEBINFO:STRING=/MD /Zi /O2 /Ob1 /DNDEBUG
```

It's also worth noting that stock cmake configures `/Ob2` as well for Release but I didn't observe any meaningful performance difference from that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Windows performance in Release profile seems crippled when building dev Cargo profile (RelWithDebInfo is faster) #649

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Windows performance in Release profile seems crippled when building dev Cargo profile (RelWithDebInfo is faster) #649

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions