-
-
Notifications
You must be signed in to change notification settings - Fork 995
The execution time of micro benchmark is not consistent #868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@helloguo can it be affected by AVX frequency throttling? |
This might be a problem with the inprocess toolchain, try the following config, for creating an netcoreapp3.0 toolchain with fairly recent daily bits. private static IConfig CreateClrVsCoreConfig()
{
var config = DefaultConfig.Instance.With(
Job.Default.With(CustomCoreClrToolchainBuilder.Create().UseCoreClrNuGet("3.0.0-preview1-26905-04")
.UseCoreFxNuGet("3.0.0-preview1-26904-01").TargetFrameworkMoniker("netcoreapp3.0").ToToolchain()).WithLaunchCount(1));
return config;
} Edit: With your repro I saw the same behaviour a few times (not always) with the inprocess toolchain, but never with the custom one. |
@EgorBo Thank you for your input. I did not see the frequency changed that much when I profiled the benchmark. @Tornhoof Thank you for your suggestion. Yes, CustomCoreClrToolchainBuilder makes the results more consistent. I guess I should use CustomCoreClrToolchainBuilder instead of InProcessToolchain. It would be interesting to know why the variance happened with InProcessToolchain. Any idea how to root cause it? |
@helloguo As you use netcoreapp3.0, I would probably first check for the tiered jitter, by setting SET COMPlus_TieredCompilation = 0 and running it again, I think I saw an issue over at coreclr that the hw intrinsics behave slightly differently with tiered_compilation, if that does not change it, I have no idea. Maybe @adamsitnik knows a good way too debug that. |
Thank you for your suggestion. It seems alignment (cache split) makes the difference from that blog. But if we use CustomCoreClrToolchainBuilder, we do not see much variance. In this case, probably we just get lucky that arrays are 32 bytes aligned for most of the time? The ideal case is that we can define if the array is 32 bytes alignment or not. In this way, I could test the function against both aligned array and non-aligned array. Unfortunately, I'm not aware of any possible way. Is there a way to reduce the impact of alignment of the tested array? Maybe measure it multiple times and take the median number? |
I guess so. We had a very long discussion about this in #756 What I think that you could do:
|
Thank you. I will close this issue if there is no more concerns. |
@helloguo I am glad I could help! Please let me know if it helps or not. |
I have a micro benchmark looks like this.
ScaleUPerf0, ScaleUPerf1 and ScaleUPerf2 are actually testing the exactly same function. I expect the execution time of each benchmark is similar. However, when I run the benchmark, the perf data is quite different.
I was wondering what would be the reasons? The whole micro benchmark can be found here https://github.com/helloguo/tmp-code/tree/master/bench
The text was updated successfully, but these errors were encountered: