Should gem5 results be portable accross software environment? #1517

aperais · 2024-08-26T21:55:18Z

aperais
Aug 26, 2024

Hello all,

My student and I ran into an unexpected issue running the same gem5 code, FS image and checkpoints on two machines with different environments:

Debian 11, kernel 5.10.0-26-amd64, gcc 10.2.1, Python 3.9.2
Debian 11, kernel 6.6.13+bpo-amd64, clang 9.0.0, Python 3.6.15

In both cases, I am running an old gem5 (4ef1f17e0f9c6f15b8ad63eff7a2d07025c29709 from 2020), but after looking at current stable and develop, I think the problem is still there. The problem is that when running a simulation in FS mode, single core (O3CPU), restoring a simpoint, and classic memory model, I get different stats (including IPC).

The reason, it appears, is in src/base/random.hh. A bunch of components (notably the branch predictors, some replacement policies, the fetch stage of O3CPU) use the random_mt object of type Random to obtain random values. Under the hood, random_mt relies on a member of type std::mt19937_64 called gen to provide numbers. Now, that is portable because std::mt19937_64 is not left to the implementation and follows a specific algorithm with specific initial values.

However, instead of providing a random number by calling operator() on gen, we go through a std::uniform_int_distribution in case we want an number whose range is smaller than the 64 bit provided by gen(), for instance like the fetch stage of O3CPU is doing (src/cpu/o3/fetch.cc:883 on stable)

 std::advance(tid_itr,
            random_mt.random<uint8_t>(0, activeThreads->size() - 1));

It turns out that std::uniform_int_distribution is implementation-dependent (https://www.ida.liu.se/~TDDD38/ISOCPP/rand.dist.general.html or point 29.6.8.1.3 of the standard). If I change the code in src/base/random.hh from (what is currently in stable and develop):

template <typename T>
typename std::enable_if_t<std::is_integral_v<T>, T>
random()
{
    // [0, max_value] for integer types
    std::uniform_int_distribution<T> dist;
    return dist(gen);
}

to:

template <typename T>
typename std::enable_if_t<std::is_integral_v<T>, T>
random()
{
    // [0, max_value] for integer types
    return gen() % std::numeric_limits<T>::max();
}

Then I get the same result on both machines, because the RNG is deterministic and the max value is the same on gcc and clang for uint8_t . I'm not entirely sure how to deal with the FP version (haven't thought about it very hard), and the quality of the randomness may not be the same with this technique compared to using std::uniform_int_distribution.

So, the general discussion would be : Is this problem supposed to be handled by using docker or do we want to strive for two runs of the same code and inputs providing identical results on different machines? I think the latter is good to have to avoid surprises for people that have heterogeneous infrastructures and don't use Docker (or similar) by default. Moreover, it's an easy fix in this case, assuming the fix provides "random enough" numbers for whatever the components are using them.

I have no idea if that would still hold in multithreaded runs (I feel like it should?), and I suspect it may not work with KVMCPU but I know I would like to have this feature. I agree this may be painful to make sure there is no regression because you would essentially need to setup CI with two compilers...

Any thoughts?

aperais · 2024-09-11T14:29:35Z

aperais
Sep 11, 2024
Author

I'm bumping this and marking the discussion as a gem5-dev discussion, this might be more relevant.

0 replies

powerjg · 2024-09-11T15:22:54Z

powerjg
Sep 11, 2024
Maintainer

Thanks for bumping this, I think I missed it originally. I assume this is somewhat related: #1534

My view is that it's the intention of gem5 to be deterministic, and we shouldn't depend on docker (etc.) to make this guarantee. (KVMCPU is an outlier and is non-deterministic because it depends on the host hardware. I don't see any way around that.)

I am in favor of your suggested fix.

0 replies

giactra · 2024-09-12T14:12:57Z

giactra
Sep 12, 2024
Maintainer

Looks good to me. An additional step would be to stop using random<T> with T being an host dependent data type. Even with your solution (which solves the compiler compatibility) we would have different random sequences in a 32bit vs 64bit machines if we use random<int>.

I am not sure it is technically possible in C++ to forbid a user doing that, as uint32_t and uint64_t are simple aliases to unsigned or unsigned long. However we can start amending existing code and be vigilant on future PRs

1 reply

powerjg Sep 12, 2024
Maintainer

Relatedly, we could probably update the random API to return exactly the size the user wants. I noticed in #1534 that many uses of random take the value and then modulo it. We could introduce an API to do that for the users. I think with that API, we would be able get rid of random<T> in the public-facing API. This seems like a relatively big change, though.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gem5

Should gem5 results be portable accross software environment? #1517

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

gem5

Should gem5 results be portable accross software environment? #1517

aperais Aug 26, 2024

Replies: 3 comments · 1 reply

aperais Sep 11, 2024 Author

powerjg Sep 11, 2024 Maintainer

giactra Sep 12, 2024 Maintainer

powerjg Sep 12, 2024 Maintainer

aperais
Aug 26, 2024

Replies: 3 comments 1 reply

aperais
Sep 11, 2024
Author

powerjg
Sep 11, 2024
Maintainer

giactra
Sep 12, 2024
Maintainer

powerjg Sep 12, 2024
Maintainer