-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Providing hints for near-memory (DRAM) cache in Memory Mode? #161
Comments
Not at the hardware level. Prefetching would need to be implemented within the application or memory allocator. If you need predictability, then configuring the host in AppDirect and letting the app decide when and where the data is located is always the most optimal solution. However, it requires some development unless you use a memory allocator solution such as the one developed by MemVerge, or the open source Memory Tiering solution currently being developed by Intel for the Linux Virtual Memory Sub-system. Both solutions provide unmodified applications to use both DRAM and PMem. Data movement and placement in both solutions depends on access patterns (page temperature/colour) vs prefetching (which is very hard to get correct for random accesses). Memory Mode is an implementation of a Direct Mapped Cache within the Integrated Memory Controller. No prefetching occurs in the memory controller, it's on-demand. Either the page (data) is in DRAM (cache hit), in PMem (cache miss), or it's not yet loaded into memory which causes a paging event to pull it from storage. On a cache miss, the data is read from PMem and returned to the CPU. The existing page in the cache (map entry) is evicted from DRAM to PMem and the page just read/requested is copied into DRAM. The potential number of conflicts is directly related to the ratio of DRAM to PMem. For example, a 1:8 ratio indicates one page map entry in DRAM maps to 8 pages/locations in PMem. If your dataset fits within DRAM, then we don't use PMem at all. If your dataset is larger than DRAM, then we should expect some cache hits/misses. Even if your dataset is smaller than DRAM, due to Address Space Layout Randomization (ALSR) and other factors, you may experience cache misses since we may not fully use DRAM. It just depends on what pages the application is assigned by the kernel - virtual to physical mappings. Tools such as PCM and VTune Platform Profiler can show the cache hit/miss ratios along with bandwidth metrics. If your workload is predictable, then have a look at the FLEXMALLOC project/paper. By profiling the application during an initial run and processing the memory allocations and accesses, subsequent runs can be optimized by placing data in the optimal tier (DRAM or PMem) for improved performance. |
Thanks for commenting sscargal. That is much more thorough response than I was writing. |
Yes, thank you @sscargal, I truly appreciate the very detailed and thorough response. MemVerge's Memory Machine caught my eye for a bit there. I'll do some more reading on the above and let you know if I have any more questions. Thanks again! |
Hello Intel Developers,
Is there a way to provide hints about what memory is about to be accessed soon, i.e. prefetching, and what is not going to be accessed soon and should be evicted? Do existing hints for prefetching have the same effect on the near-memory cache as they would on the CPU cache? I'm wondering because it would be useful in, say, irregular applications where you know ahead of time exactly which other parts of memory will need to be accessed soon (i.e. neighborhood of active vertices in a graph). Does anyone know if what I want to do is possible in Memory-Mode? Thanks in advance!
The text was updated successfully, but these errors were encountered: