A small utility allowing programmatic access to information about the memory pages backing a given region of memory on Linux. For example, with page-info you can answer questions like:
- How many of the pages associated with memory I just got from
malloc
are physically present in RAM? - How much more information can I put on the stack before sufferring a page fault?
- What fraction of this allocation is backed by huge pages?
- Have any pages in this range been swapped out to the swapfile?
Basically this parses the /proc/$PID/pagemap
file for the current process, which returns basic information about the allocated memory regions (VMAs), and then if possible it looks up more interesting flags on a per-page basis in /proc/kpagemap
. The available flags are documented here and more briefly on the proc manpage.
As a simple example, here's a snippet which prints to stdout the percentage of pages that have been allocated with huge pages.
char *array = malloc(size);
memset(array, 1, size); // commit the pages
page_info_array pinfo = get_info_for_range(array, array + size);
flag_count thp_count = get_flag_count(pinfo, KPF_THP);
if (thp_count.pages_available) {
printf("Source pages allocated with transparent hugepages: %4.1f%% (%lu total pages, %4.1f%% flagged)\n",
100.0 * thp_count.pages_set / thp_count.pages_total,
thp_count.pages_total,
100.0 * thp_count.pages_available / thp_count.pages_total);
} else {
printf("Couldn't determine hugepage info (you are probably not running as root)\n");
}
A slightly more complete version of this example is available as a standalone program as malloc-demo. On my system it reports (this depends heavily on the value in /sys/kernel/mm/transparent_hugepage/enabled
):
Allocating an array of size 7168 KiB using malloc
Source pages allocated with transparent hugepages: 85.7% (1793 total pages, 100.0% flagged)
Unfortunately (from the perspective of those wanting to use this library to its maximum capability), most of the juicy infomation about backing pages lives in the /proc/kpagemap
file and this file is only accessible as root. You can still use this utility as a regular user, but only a handful of flags that are encoded directly in /proc/pagemap
are available. They are those directly named in the page_info
structure in page-info.h
:
/* soft-dirty set */
bool softdirty;
/* exclusively mapped, see e.g., https://patchwork.kernel.org/patch/6787921/ */
bool exclusive;
/* is a file mapping */
bool file;
/* page is swapped out */
bool swapped;
/* page is present, i.e, a physical page is allocated */
bool present;
So you can determine if a page is present, swapped out, its soft-dirty status, whether it is exclusive and whether it is a file mapping, but not much more. On older kernels, you can also get the physical frame number (the pfn
) field, which is essentially the physical address of the page (shifted right by 12).
So if you want the full info about a mapped region, you have to run this as root. This could probably be fixed as decribed in issue #3, but that work hasn't been done yet.
Just run make
which builds the page-info-test
binary.
You can run the page-info-test
binary to see the information obtained by getting page info on a series of allocations via malloc
, starting at 256 KiB and running through 4 GiB. Information is presented both before and after touching each page in the allocation via memset
. The difference is that for larger allocation sizes, most pages in the allocation are not present until you touch them, so limited information is available (indeed, there are no pages backing them, so questions about the nature of the backing pages have no answer).
Here's a portion of the output on my system:
PFN sdirty excl file swappd presnt LOCK ACTI SLAB BUDD MMAP ANON SWAP SWAP COMP COMP HUGE UNEV HWPO NOPA KSM THP BALL ZERO IDLE
MADV_HUGEPAGE 2.00 MiB BEFORE memset: ---------- 1.0000 0.0019 0.0000 0.0000 0.0019 0.00 1.00 0.00 0.00 1.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
MADV_HUGEPAGE 2.00 MiB AFTER memset: ---------- 1.0000 1.0000 0.0000 0.0000 1.0000 0.00 1.00 0.00 0.00 1.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
MADV_NORMAL 2.00 MiB BEFORE memset: ---------- 1.0000 0.5029 0.0000 0.0000 0.5029 0.00 1.00 0.00 0.00 1.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
MADV_NORMAL 2.00 MiB AFTER memset: ---------- 1.0000 1.0000 0.0000 0.0000 1.0000 0.00 1.00 0.00 0.00 1.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
MADV_NOHUGEPAGE 2.00 MiB BEFORE memset: ---------- 1.0000 1.0000 0.0000 0.0000 1.0000 0.00 1.00 0.00 0.00 1.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
MADV_NOHUGEPAGE 2.00 MiB AFTER memset: ---------- 1.0000 1.0000 0.0000 0.0000 1.0000 0.00 1.00 0.00 0.00 1.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
MADV_HUGEPAGE 4.00 MiB BEFORE memset: ---------- 1.0000 0.0010 0.0000 0.0000 0.0010 0.00 1.00 0.00 0.00 1.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
MADV_HUGEPAGE 4.00 MiB AFTER memset: ---------- 1.0000 1.0000 0.0000 0.0000 1.0000 0.00 0.50 0.00 0.00 1.00 1.00 0.00 0.50 0.00 0.50 0.00 0.00 0.00 0.00 0.00 0.50 0.00 0.00 0.00
MADV_NORMAL 4.00 MiB BEFORE memset: ---------- 1.0000 0.5015 0.0000 0.0000 0.5015 0.00 1.00 0.00 0.00 1.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
MADV_NORMAL 4.00 MiB AFTER memset: ---------- 1.0000 1.0000 0.0000 0.0000 1.0000 0.00 1.00 0.00 0.00 1.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
MADV_NOHUGEPAGE 4.00 MiB BEFORE memset: ---------- 1.0000 1.0000 0.0000 0.0000 1.0000 0.00 1.00 0.00 0.00 1.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
MADV_NOHUGEPAGE 4.00 MiB AFTER memset: ---------- 1.0000 1.0000 0.0000 0.0000 1.0000 0.00 1.00 0.00 0.00 1.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
MADV_HUGEPAGE 8.00 MiB BEFORE memset: ---------- 1.0000 0.0005 0.0000 0.0000 0.0005 0.00 1.00 0.00 0.00 1.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
MADV_HUGEPAGE 8.00 MiB AFTER memset: ---------- 1.0000 1.0000 0.0000 0.0000 1.0000 0.00 0.25 0.00 0.00 1.00 1.00 0.00 0.25 0.00 0.75 0.00 0.00 0.00 0.00 0.00 0.75 0.00 0.00 0.00
MADV_NORMAL 8.00 MiB BEFORE memset: ---------- 1.0000 0.0010 0.0000 0.0000 0.0010 0.00 1.00 0.00 0.00 1.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
MADV_NORMAL 8.00 MiB AFTER memset: ---------- 1.0000 1.0000 0.0000 0.0000 1.0000 0.00 0.25 0.00 0.00 1.00 1.00 0.00 0.25 0.00 0.75 0.00 0.00 0.00 0.00 0.00 0.75 0.00 0.00 0.00
MADV_NOHUGEPAGE 8.00 MiB BEFORE memset: ---------- 1.0000 1.0000 0.0000 0.0000 1.0000 0.00 0.25 0.00 0.00 1.00 1.00 0.00 0.25 0.00 0.75 0.00 0.00 0.00 0.00 0.00 0.75 0.00 0.00 0.00
MADV_NOHUGEPAGE 8.00 MiB AFTER memset: ---------- 1.0000 1.0000 0.0000 0.0000 1.0000 0.00 0.25 0.00 0.00 1.00 1.00 0.00 0.25 0.00 0.75 0.00 0.00 0.00 0.00 0.00 0.75 0.00 0.00 0.00
MADV_HUGEPAGE 16.00 MiB BEFORE memset: ---------- 1.0000 0.0002 0.0000 0.0000 0.0002 0.00 1.00 0.00 0.00 1.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
MADV_HUGEPAGE 16.00 MiB AFTER memset: ---------- 1.0000 1.0000 0.0000 0.0000 1.0000 0.00 0.13 0.00 0.00 1.00 1.00 0.00 0.13 0.00 0.87 0.00 0.00 0.00 0.00 0.00 0.87 0.00 0.00 0.00
MADV_NORMAL 16.00 MiB BEFORE memset: ---------- 1.0000 0.5004 0.0000 0.0000 0.5004 0.00 0.25 0.00 0.00 1.00 1.00 0.00 0.25 0.00 0.75 0.00 0.00 0.00 0.00 0.00 0.75 0.00 0.00 0.00
MADV_NORMAL 16.00 MiB AFTER memset: ---------- 1.0000 1.0000 0.0000 0.0000 1.0000 0.00 0.25 0.00 0.00 1.00 1.00 0.00 0.25 0.00 0.75 0.00 0.00 0.00 0.00 0.00 0.75 0.00 0.00 0.00
MADV_NOHUGEPAGE 16.00 MiB BEFORE memset: ---------- 1.0000 1.0000 0.0000 0.0000 1.0000 0.00 0.25 0.00 0.00 1.00 1.00 0.00 0.25 0.00 0.75 0.00 0.00 0.00 0.00 0.00 0.75 0.00 0.00 0.00
MADV_NOHUGEPAGE 16.00 MiB AFTER memset: ---------- 1.0000 1.0000 0.0000 0.0000 1.0000 0.00 0.25 0.00 0.00 1.00 1.00 0.00 0.25 0.00 0.75 0.00 0.00 0.00 0.00 0.00 0.75 0.00 0.00 0.00
MADV_HUGEPAGE 32.00 MiB BEFORE memset: ---------- 1.0000 0.0001 0.0000 0.0000 0.0001 0.00 1.00 0.00 0.00 1.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
MADV_HUGEPAGE 32.00 MiB AFTER memset: ---------- 1.0000 1.0000 0.0000 0.0000 1.0000 0.00 0.06 0.00 0.00 1.00 1.00 0.00 0.06 0.00 0.94 0.00 0.00 0.00 0.00 0.00 0.94 0.00 0.00 0.00
MADV_NORMAL 32.00 MiB BEFORE memset: ---------- 1.0000 0.0001 0.0000 0.0000 0.0001 0.00 1.00 0.00 0.00 1.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
MADV_NORMAL 32.00 MiB AFTER memset: ---------- 1.0000 1.0000 0.0000 0.0000 1.0000 0.00 0.06 0.00 0.00 1.00 1.00 0.00 0.06 0.00 0.94 0.00 0.00 0.00 0.00 0.00 0.94 0.00 0.00 0.00
MADV_NOHUGEPAGE 32.00 MiB BEFORE memset: ---------- 1.0000 0.0001 0.0000 0.0000 0.0001 0.00 1.00 0.00 0.00 1.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
MADV_NOHUGEPAGE 32.00 MiB AFTER memset: ---------- 1.0000 1.0000 0.0000 0.0000 1.0000 0.00 1.00 0.00 0.00 1.00 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
You can see, for example, by looking at the presnt
column, what fraction of pages are "present in RAM" - as the allocations become larger, only a small fraction (usually the first page) of an allocation is present after allocation, but all are present following the memset
. You can also look at the THP
column to see that some fraction of the larger allocatoins are usually backed by huge pages, depending on the value of the madvise()
call.
There are many other columns which have more or less interesting information depending on your scenario. The first few columns in lowercase (sdirty excl file swappd presnt
) are available without special permissions since they come from /proc/$PID/pagemap
, but the following uppercase columns require /proc/kpageflags
access and so are generally only available to processes running as root (more precisely, those with the CAP_SYS_ADMIN
priviledge).
Just copy page-info.c
and page-info.h
into your project and include page-info.h
in any file where you want to access the exposed methods.