-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for bit-reproducible numerical operations #12
Comments
@marshallward, can you elaborate exactly what you mean by bit-reproducible numerical operations and why you need it? Typically, even the most accurate libraries, such as https://openlibm.org/, do not actually guarantee correct rounding, say of I assume you do not need things to be correctly rounded, but you want to get the exact bits on all platforms? I think For my codes, I do not require bit reproducibility, as I don't think it's even possible --- with MPI and BLAS and LAPACK, the bits are not guaranteed. Only the overall accuracy. Also I like to use |
That's right, we are primarily focused on reproducing the bits over a range of configurations, and at least for now we're not too concerned with the overall accuracy of the lowest bits. As an example, if we increase our optimization level, say Ideally, we would like an implementation that we can control, which won't be implicitly chosen by the compiler. You're right about the issues with other libraries. It's a bit off-topic and I won't go into detail, but for our MPI reduction operations we convert our floating points to an extended fixed-precision format to ensure that the result is invariant to ordering. We also aggressively apply parentheses in all of our expressions, and only use compilers which honor the parenthesis ordering. Bit reproducibility is extremely challenging, but we have managed to achieve it in our climate models for many years. At the very least, it has vastly improved the quality of our test suites. The issues around transcendentals appears to be a "final frontier" for us, but so far we have been reluctant to abandon the Fortran intrinsics. I am not familiar with OpenLibm, but I will look into it more. I'm also not sure if something like this belongs in a standard library, so hoping for more feedback on this issue. |
I am just trying to understand your use case better, so that we can later discuss where this belongs to. For MPI, you take double precision number, convert to quadruple, accumulate in parallel with random order, and convert back to double? I wasn't aware you could even get bit reproducibility for a complicated parallel code. How do you handle cases where the hardware itself gives different results for the same binary? That has happened to me once (some nodes on the HPC cluster gave slightly different results than other nodes with the same binary and operating system). My solution was to ensure the code is robust against such issues. |
The MPI summation method uses an extended fixed precision method, which basically converts the floats to tuples of integers, each value denoting a fixed precision over a selected range. MPI operations are then applied for the integer data, which is reproducible. It is described in this paper and this source code but I would guess that it's very similar to other extended precision implementations. We have the luxury of solving hydrostatic fluid dynamics equations - hyperbolic PDEs, with no elliptic pressure solvers - so summations and other collectives are very rare and are primarily used for diagnostics like the total mass and energy of the system, which are computed infrequently, so we can overlook issues like performance of collectives. Others may not be so lucky. It's hard to summarize everything required to achieve bit reproducibility, but careful ordering of arithmetic using parentheses is a big part. There are other rules, such as restricting the number of divisions in an expression (usually only once). We also do very aggressive testing of initialization and memory accesses, so we never see stray values in our memory (or so we hope). We also never commit any changes to our codebase which modify answers, unless it is done intentionally (e.g. bug fix). The ability to reproduce such runs is also a part of the tender process, so that may prevent some of the issues you have seen. But there have also been more exotic problems. In one case - before my time with this group - there was a tridiagonal solver which failed to reproduce, and was tracked down to anomalous voltage noise in the CPU, which the vendor acknowledged and had to fix. I guess the short answer to your question is that due to a combination of software practices and the hardware tender process, we do not see variability across nodes. |
Thanks for the details. I had no idea this was possible. I suspect there might be performance hits in some cases, so perhaps your method might not work in all cases, even if it works for you. Regarding bit reproducible functions, let me know if the |
I think there are a few different problems conflated here:
I agree that 1) is crucial and stdlib should have it as a requirement. I think 2) is out the door. Higher optimization flags mean "generate slightly less accurate results faster". The compiler generates different code. Am I misunderstanding this?
So back to the original post, it seems to me it asks for transcendental function implementations that are guaranteed to be cross-platform bit reproducible (item 1). Am I correct or still not understanding? If yes, I agree this would be useful (though specialized), but a high technical challenge -- I guess it would need multi-platform assembly implementations. If yes, then I'd say it's more fitting for a specialized library rather than a stdlib. It's also quite possible I'm not understanding this well so please correct me :). |
The underlying issue is that if we invoke a transcendental intrinsic, then we really just don't know where it is coming from. For example, we once had our system math library change without our knowledge (possibly libm, but we cannot say for sure) after which we could no longer reproduce the original input fields for selected runs. Currently, the Fortran intrinsics are not a safe option. The optimization issue is secondary, but also of concern for us. If I compile a Fortran application with -O2 and -O3, then my I am less concerned about a particular implementation changing answers if the math library itself is compiled with different settings, which is what I think you are describing in 2). While we could just force ourselves to use certain math libraries, such as OpenLibm, I would really like for this to be more controllable within the language itself. I'm still not sure if this is the place to be having this discussion, but it would be nice if there was a safe and "standard" way to address it. BTW I wanted to say that it was #10 which prompted this discussion for me. Mostly I was just wondering if there was any aspiration to provide bit reproducibility for users in certain controlled situations. The transcendental function is a particular issue for us, but I think that bit reproducibility is a broader question that is worth considering. |
@marshallward the reason As such, it seems the possible bit-reproducible versions of every function would need to be separate functions. Say |
@certik True, though it's not necessarily accuracy that matters in our case. I just want the same numbers! |
Okay, thanks, I understand better now. I've had the same issue and I agree it's unsettling. Specific to this issue, I think this is the scenario. I have a Fortran program executable that is dynamically linked to the libraries on system.
This executable works well and produces the same output given same input. However, there may come a system update that upgrades The proposal here is to add a mirrored set of transcendental functions that would be expressed in stdlib using only intrinsic arithmetic operators and parentheses to enforce order of operations. Then, you always know what function you're calling. A few questions come up for me now:
As an aside, a known and popular problem in numerical weather prediction is that if you recompile the same WRF code (or other weather model) with the new compiler version (say, ifort-18 -> ifort-19), you can get the simulated hurricane to make landfall in a completely different US state. |
It's an old debate regarding rearrangement. Theoretically unless you use That's in theory. In practice, there are all kinds of issues, but apparently @marshallward was able to resolve most of them, which is amazing. @marshallward do you agree that if you want bit-reproducibility, you might not always get the best performance as if you are willing to sacrifice it? |
@milancurcic You've captured the main issue, which is an implicit link to some FWIW I am less worried about The MOM team does not yet have custom transcendentals, but we have started to prototype them in Python preprocessing scripts, and may find ourselves using them at some point. I suppose that I am starting to send out feelers to see if something like stdlib is a place to hold such functions. @certik I don't want to take credit for this work, it is really due to my colleagues at GFDL - Robert Hallberg in particular - who have refined these methods. As a relatively new member of the group, I am more of a student than a practitioner. The methods do rely on rigid enforcement of parentheses, as you correctly point out. I haven't yet perused the language standard to see what it says about the matter. As for your question: no, we would not expect such functions or methods to be highly performant. I think everyone understands that there is some computational cost associated with reproducibility. |
I see, thanks. In that case, it seems that bit-reproducible capabilities should be implemented in addition to high performance (but not strictly bit-reproducible) capabilities, not instead of them. |
I thought I'd emerge from the shadows to voice my support for this proposal. Our specific use case is running the same code on different architectures with different compilers and still obtaining the identical result. This requires a variety of compiler flags and code style but we get correctly-rounded (towards zero, IIRC) mathematical functions from |
@warrickball That is encouraging news. I also supported the idea of testing with a prescribed We are moving towards implementing our own transcendentals, with a similar goal of platform-independent results, but we still have some ways to go. (The transcendentals are working, but there are some newly discovered problems...) |
Your remark about transcendentals reminded me of an article by Anton
Shterenlikht on the subject of transcendentals not too long ago. It focuses
on the behaviour at branch cuts and the like. See
https://dl.acm.org/doi/10.1145/3301318.
Op di 15 dec. 2020 om 16:28 schreef Marshall Ward <[email protected]
…:
@warrickball <https://github.com/warrickball> That is encouraging news. I
also supported the idea of testing with a prescribed libm. However, there
is the deeper problem that the compiler is under no obligation to use libm
and is free to use its own code. For example, I think Intel strongly defers
to libimf.a, although I believe this can be overridden. Your own
experience seems to confirm that it is possible.
We are moving towards implementing our own transcendentals, with a similar
goal of platform-independent results, but we still have some ways to go.
(The transcendentals are working, but there are some newly discovered
problems...)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#12 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAN6YRY25KJ25USDGRBMWDLSU56ATANCNFSM4J37JFNQ>
.
|
Review of the specs of hash maps
@marshallward wrote in #1 (comment):
I would like to see greater support for bit-reproducible numerical operations. This is a very high priority for us since our models are used in weather and climate forecasting, and much of our time is devoted to bit reproducibility of our numerical calculations.
A common problem is intrinsic Fortran reduction operations like
sum()
, where the order is ambiguous (deliberately, one might say), and therefore not reproducible. A more serious problem for us is transcendental functions, likeexp()
orcos()
, which will give different results for different optimizations, and we typically cannot say where it was invoked (libm? Vendor library? etc.).A standard library may be a place to provide bit-reproducible implementations.
The text was updated successfully, but these errors were encountered: