-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How should we model the "side effects" of modulo and division operations? #750
Comments
Division by 0 is UB, and UB is really only "UB" if it is reached in a program (i.e. if no execution path leads to, or an UB expression is not evaluated then it is irrelevant -- cf. also forced lazy evaluation for && and ||). Termination is an observable side effect of a program, so needs to be modelled and affects reachability. Reordering may obviously not introduce UB into a UB-free program (where UB-free as above conditioned on runtime control flow). Therefore yes, I think you would need to mark all operations that are conditionally-UB as sequentialized wrt to any effects concerning control flow, and if premature program exit is modelled using I/O state then conditionally-UB needs to be marked the same as well. That may also affect pointer dereferences. Yes that gets rather nasty in practice, consider example 1:
vs example 2:
Assuming that compiler would "know" that call_something does not read *p, then it would be allowed to reorder the assignment This actually came practically biting at some point like this:
The condition |
LLVM IR has a bunch of function / function call attributes like In our case, routing the I/O state through division and modulo is probably quite cheap. If a region contains multiple, the I/O state can be split up across them, like we do with loads that share memory state edges. For memory operations, I am more inclined to use memory state edges to maintain the conservative ordering, rather than sending I/O states through all loads and stores. Throwing out some more examples: void foo();
static int test1(int* p) {
*p = 7; // Can not be moved, but this case is already handled by memory state encoding
foo();
*p = 10; // Can not be moved, but already handled
....
} int test2() {
int* p = malloc(4); // This malloc is private to this function (even stronger guarantee than not escaping module)
*p = 7; // Can be moved below, If p is NULL UB is triggered here already
foo();
*p = 10; // Can be moved above since we know p is not NULL
..
} PROBLEM: int test3() {
int* p = malloc(4);
foo();
*p = 10; // Can not be moved above, as p might be NULL, which is only UB if foo terminates
...
} int test4() {
int* p = alloca(4);
foo();
*p = 10; // Can be moved above, since alloca can never be NULL
...
} PROBLEM: int test5(int x) {
int* p = NULL;
if (x < 10)
p = &x;
foo(x);
*p = 10; // Can not be moved, as p might still be NULL
} So from what I can tell the only examples that are not already correctly handled by (non-buggy) memory state encoding is the case where a function has its own private malloc, or when a pointer has multiple possible targets, one of which is In all cases where a pointer has unknown origin, the memory state encoding will already conservatively assume that it points to something that the call is clobbering.
static int* q;
int test6() {
q = malloc(4);
*q = 8; // Can not be moved, as *q may be mod/ref, or no longer exist after foo
foo();
*q = 10; // Can not be moved either, as *q may be mod/ref by foo()
}
// Exported functions that foo() may call
void callback() {
free(q);
}
void callback2() {
*q = 20;
} As far as I can tell, there are only two situations where NULL values appear, that are not already handled because the pointer is marked as "PointingToExternal" and thus handled conservatively.
In the Alias Analysis, there is no concept of a NULL pointer, or any other "invalid" pointee for that matter. I would definitely not want to add a "memory location" that is called NULL, as that would create several edge cases (aliasing because two pointers both may target NULL, trying to extend the points-to-set of NULL during solving). At the very most there would be a flag, I think, added to pointers that may target NULL. I think it might be better if the memory state encoding handles the NULL cases, as there appears to be very few of them. This would also allow the memory state encoding to "remove" the "might be NULL" flag as soon as a pointer is actually used for anything. What do you think, @phate, @caleridas? |
I will reply in more detail later. I am not sure that I fully agree with all the examples you have drawn up (or I misunderstand something...) @haved It seems we agree that the best way to solve the side-effect of the division/module operators is to implement another division/module operator that also takes an IO state. Those are used conservatively by the front-end when an RVSDG is created. Once we can assert that a modulo/division operator can not have a zero as second operand, we can replace them with the side-effect free equivalent. Does this make sense @haved @caleridas? Any concerns? |
The issues with #526 are caused by the following situation:
Using a build of
jlm-opt
from and executing the following commands:The program terminates with SIGFPE (Floating Point Exception), as it attempts to do modulo by 0.
For reference, the definition of
main
inexample-m2r.ll
isWe need to
exit
via thefoo
function to prevent LLVM from addingunreachable
right below the call, but we could just as well imaginefoo
being an external function that somehow callsexit
.The RVSDG produced by jlm-opt ends up looking like
As you can see, the modulo operation
BitSMod32
ends up neither higher nor lower in the DAG than thegamma
containing the call. When the program is converted back into LLVM IR, it (more or less randomly) ends up like so:This means the modulo by 0, which normally would never occur, happens before the terminating call to
foo()
.We somehow need to model the possible side effect of modulo and division by 0 sending a signal.
This raises a few questions:
@caleridas @phate @sjalander
The text was updated successfully, but these errors were encountered: