Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache-line aligned allocation for OpenMP #920

Open
2 tasks
Slaedr opened this issue Nov 8, 2021 · 5 comments
Open
2 tasks

Cache-line aligned allocation for OpenMP #920

Slaedr opened this issue Nov 8, 2021 · 5 comments
Labels
is:enhancement An improvement of an existing feature. mod:openmp This is related to the OpenMP module.

Comments

@Slaedr
Copy link
Contributor

Slaedr commented Nov 8, 2021

Currently, the OpenMP raw_alloc just uses malloc. Typically we would want to have allocated memory blocks aligned to cache-line boundaries, typically 64 bytes. Two things would be needed:

  • Query the cache line length during configuration. I believe most modern CPUs use 64 bytes, but this would be good to have.
  • A simple (non-interface) class to implement the alignment logic (or simply use std::align), hold the pointers involved, and delete it properly in the end. Adding alignment to OmpExecutor::raw_alloc instead is not a nice option because it's protected, and Executor::alloc has logging that is not thread-safe. Another option to implement the alignment logic and proper deletion is to use the C11 aligned_alloc, which is easier.

Especially, the semantics we need is allocation of a group of memory blocks, the beginning of each of which is aligned to 64-byte boundaries. Typically, this can be done by aligning the beginning of the large block and making the stride a multiple of 64.

The current use case on my mind is dynamic "shared memory" for batched openmp solvers.

@Slaedr Slaedr added mod:openmp This is related to the OpenMP module. is:enhancement An improvement of an existing feature. labels Nov 8, 2021
@upsj
Copy link
Member

upsj commented Nov 8, 2021

Our system is not able to do that portably (and may not until C++ supports it natively) since you can't reconstruct the original pointer returned from malloc after it has been aligned.

@Slaedr
Copy link
Contributor Author

Slaedr commented Nov 9, 2021

Yes, that's why I propose to keep the original pointer in a struct/class and delete using it once we are done. For now, I'm thinking of a static scope-based object that can return the aligned pointer when needed, and it would have a destructor that would use the cached original pointer to delete the entire memory block. Maybe this is best done following the allocator interface, and the allocator can be passed to std::vector, for example, too. Boost already has the exact thing we need, and so does C++17, but I guess we don't want to use either of them, at least not for now.

@Slaedr
Copy link
Contributor Author

Slaedr commented Nov 9, 2021

Wait, actually, aligned_alloc is there in the C11 standard, and C++ 17 just "inherits" from that. Maybe we can just use the C11 version while still compiling with C++14. I think this works for the alignment logic and properly freeing the memory, and could be used in OmpExecutor::raw_alloc. aligned_alloc has a slightly greater overhead than simple malloc, though I'm positive the benefits outweigh that. In addition, I would add a small function for allocating a large block of memory containing many sub-blocks each aligned to a cache line.

@upsj
Copy link
Member

upsj commented Jul 10, 2023

With C++17, we could extend #1315 to also enable aligned allocation.

@upsj
Copy link
Member

upsj commented Jul 10, 2023

We can query the L1 cache size using the "appropriately" named hardware_destructive_interference_size

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is:enhancement An improvement of an existing feature. mod:openmp This is related to the OpenMP module.
Projects
None yet
Development

No branches or pull requests

2 participants