Using the library with much lighter overhead #602
Replies: 3 comments
-
The instruction count (156.010) does not match the order of the iteration count (10.000.000). Likely the thread was migrated? Could you try to pin the thread using https://man7.org/linux/man-pages/man3/pthread_setaffinity_np.3.html |
Beta Was this translation helpful? Give feedback.
-
Thank you for taking time to respond. Intel PMUs are a great feature btw. My error - pinning the thread is required as you rightly point out.
Now contrast counter 2 and 3 (metrics on LLC hit and misses) to this alternative run on same machine running on a pinned lcore.
I'm guessing the guidance for PCM is to run a no-op ... measure the overhead pcm introduces and then run the code under test a bunch of times and subtract out (or average out) the overhead? |
Beta Was this translation helpful? Give feedback.
-
I think the difference is noise-level here.. E.g. for instructions retired it is 0.06% |
Beta Was this translation helpful? Give feedback.
-
Consider this example I added to a fork of this repository which is basically a redo of supplied
c_example.c
:https://github.com/rodgarrison/pcm/blob/master/examples1/example1.cpp
the nub of the code:
run with:
Giving this ouput:
While there will be sporadic LLC hits/misses, the counts
counter2: 16334, counter3: 720
are insanely high. My interpretation is that all of that stuff, probably leaking into instruction and cycle counts, is the overhead of the API which gets loads and loads of data. All that noise makes the reported stats hard to understand.Is there an example using this library that is way, way lighter in overhead? Usually for this kind of micro benchmarking we're looking to profile with PMU,
Beta Was this translation helpful? Give feedback.
All reactions