Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Providing hints for near-memory (DRAM) cache in Memory Mode? #161

Open
LouisJenkinsCS opened this issue Feb 2, 2021 · 3 comments
Open

Comments

@LouisJenkinsCS
Copy link

Hello Intel Developers,

Is there a way to provide hints about what memory is about to be accessed soon, i.e. prefetching, and what is not going to be accessed soon and should be evicted? Do existing hints for prefetching have the same effect on the near-memory cache as they would on the CPU cache? I'm wondering because it would be useful in, say, irregular applications where you know ahead of time exactly which other parts of memory will need to be accessed soon (i.e. neighborhood of active vertices in a graph). Does anyone know if what I want to do is possible in Memory-Mode? Thanks in advance!

@sscargal
Copy link
Contributor

sscargal commented Feb 2, 2021

@LouisJenkinsCS

Is there a way to provide hints about what memory is about to be accessed soon, i.e. prefetching

Not at the hardware level. Prefetching would need to be implemented within the application or memory allocator.

If you need predictability, then configuring the host in AppDirect and letting the app decide when and where the data is located is always the most optimal solution. However, it requires some development unless you use a memory allocator solution such as the one developed by MemVerge, or the open source Memory Tiering solution currently being developed by Intel for the Linux Virtual Memory Sub-system. Both solutions provide unmodified applications to use both DRAM and PMem. Data movement and placement in both solutions depends on access patterns (page temperature/colour) vs prefetching (which is very hard to get correct for random accesses).

Memory Mode is an implementation of a Direct Mapped Cache within the Integrated Memory Controller. No prefetching occurs in the memory controller, it's on-demand. Either the page (data) is in DRAM (cache hit), in PMem (cache miss), or it's not yet loaded into memory which causes a paging event to pull it from storage. On a cache miss, the data is read from PMem and returned to the CPU. The existing page in the cache (map entry) is evicted from DRAM to PMem and the page just read/requested is copied into DRAM.

The potential number of conflicts is directly related to the ratio of DRAM to PMem. For example, a 1:8 ratio indicates one page map entry in DRAM maps to 8 pages/locations in PMem. If your dataset fits within DRAM, then we don't use PMem at all. If your dataset is larger than DRAM, then we should expect some cache hits/misses. Even if your dataset is smaller than DRAM, due to Address Space Layout Randomization (ALSR) and other factors, you may experience cache misses since we may not fully use DRAM. It just depends on what pages the application is assigned by the kernel - virtual to physical mappings.

Tools such as PCM and VTune Platform Profiler can show the cache hit/miss ratios along with bandwidth metrics.

If your workload is predictable, then have a look at the FLEXMALLOC project/paper. By profiling the application during an initial run and processing the memory allocations and accesses, subsequent runs can be optimized by placing data in the optimal tier (DRAM or PMem) for improved performance.

@StevenPontsler
Copy link
Contributor

Thanks for commenting sscargal. That is much more thorough response than I was writing.

@LouisJenkinsCS
Copy link
Author

Yes, thank you @sscargal, I truly appreciate the very detailed and thorough response. MemVerge's Memory Machine caught my eye for a bit there. I'll do some more reading on the above and let you know if I have any more questions. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants