Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add andersen region-aware CI job [AndersenRegionAware] #526

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

phate
Copy link
Owner

@phate phate commented Jun 26, 2024

No description provided.

@phate phate requested review from sjalander and haved June 26, 2024 05:21
@phate phate enabled auto-merge (squash) June 26, 2024 05:22
@phate
Copy link
Owner Author

phate commented Jun 26, 2024

2024-06-26T05:41:37.8621716Z **********
2024-06-26T05:41:37.9042968Z ********************
2024-06-26T05:41:37.9045056Z Failed Tests (1):
2024-06-26T05:41:37.9046072Z   test-suite :: SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-pr38819.test
2024-06-26T05:41:37.9046624Z 
2024-06-26T05:41:37.9046629Z 
2024-06-26T05:41:37.9046735Z Testing Time: 662.54s
2024-06-26T05:41:37.9047027Z   Passed: 1775
2024-06-26T05:41:37.9047283Z   Failed:    1

This is the same single source test that also failed for Steensgaard region-aware. Moreover, we did not have any problem in the run when we added the target to the llvm-test-suite. Ultimately, it means we should invest in some bigger/better/heavier guns for debugging.

The problem arises due to the alias analysis in combination with the region-aware encoder relaxing the sequentialization order, and depending on which sequence the back-end chooses, it can happen that we get a valid one according to program semantics, or maybe a faulty one, because of bugs in the alias analysis/provisioning/encoder.

@phate phate disabled auto-merge June 26, 2024 06:06
Copy link
Collaborator

@haved haved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI change itself looks good. I definitely agree that some hardening of tests is needed to avoid flakey CI jobs

@phate
Copy link
Owner Author

phate commented Jun 26, 2024

@haved I believe that has nothing to do with flaky CI jobs. My suspicion is that this is a bug in the alias analysis and/or the encoding. The region-aware encoding relaxes the sequential order of the operations with side-effects. This means that the jlm back-end could emit a wrong order of operations if there is a bug in any of those passes. This would not happen with the agnostic encoding (at least not without further transformations such as invariant value redirection + dead node elimination).

@phate
Copy link
Owner Author

phate commented Jun 26, 2024

Let's not merge this before I have not found out what the root cause is.

@haved
Copy link
Collaborator

haved commented Jun 26, 2024

@phate, I agree with your description of the problem. Calling it "flakey CI" was not a good description on my part.

In general there seems to be multiple places in the compiler where non-determinism is introduced due to iterating over sets and maps keyed by addresses. As long as there are no bugs this is not an issue, but when there are, it can make it more difficult to reproduce.

This non-determinism combined with the current bug you describe is why I referred to the potential CI job as flaky. Even if the CI job fails consistently, it is no fun when you run a failing job locally and no issues occur.

To make bugs more reproducible, I see a few options, but there are probably many more:

  • serialize the program in multiple ways to detect missing state dependencies. Either do N random topological orderings, or do the one that lexographically minimizes the ordering, and one that lexographically maximizes it, in order to make as many operations as possible switch places. If there are bugs, they are hopefully detected consistently (but no guarantee).

  • change the implementation of util::HashSet and add a new util::HashMap that in debug builds preserve insertion order when iterating, or deterministically randomizes iteration order.

  • somehow make malloc fully deterministic in debug builds. I'm not sure if this is possible?

@phate
Copy link
Owner Author

phate commented Jun 26, 2024

@haved Right, I see your point with flaky CI now. Yes, having the compiler behave differently/non-deterministically is not great for reproducibility. Regarding your mentioned options:

  1. The option of producing multiple program serializations is high on my TODO list. The basic idea is that we simply have a tool that takes an LLVM file and some optimizations as input. It applies the optimizations on the input LLVM file and then just writes several output LLVM files to a folder. Each of these output files has a different permutation of operations with side-effects (everything that consumes/produces state edges). If that single input file is a complete program, then we can compile each of the output files to an executable, run each executable, and compare the outputs between all of them. If the output diverges of a single executable from the other executables, we clearly have a bug in one of the input optimization passes.

  2. This option makes sense as well, but it should not be in debug build I think. We currently use release builds in the CI for performance reasons. We should have another flag that indicates whether the deterministic or non-deterministic versions should be used.

  3. No idea either. I guess it is maybe worth finding out.

@caleridas might have something to say here as well.

@caleridas
Copy link
Collaborator

Long-term, any indeterminism should be rooted out from the compiler, and not just in debug builds. (I confess to being guilty to introducing indeterministic behavior to begin with, and it was a bad idea). iterating HashSet/HashMap should mostly be "forbidden" unless the iteration has no effect on output (e.g. just deleting from the map, or copying things somewhere else), or the result is brought into a deterministic order afterwards. The strategy that I have observed to work best in various projects to get rid of this problem is to "intentionally" introduce into hash maps and add validation tests that fail if multiple runs fail to produce bit-identical output.
Within limits, malloc can be made deterministic on the same target/machine (and libc configuration, and various other factors) for single threaded programs when disabling ASLR for a process, but that's not an overall strategy -- better to accept that anything that hashes on address "must not" be iterated over assuming any stability as an implementation rule. I think it ought to be possible to do a clang-tidy rule to find & flag places in the code that do this for manual inspection.
The other question is of increasing test coverage by fuzzing and intentionally producing a larger range of "conceptually valid output" subject to given graph ordering constraints. That should however be controlled (pseudo-)randomness: Set up a test using mersenne twister with a fixed seed and have any variation be determined by this.

@haved
Copy link
Collaborator

haved commented Jun 29, 2024

Aiming to remove non-determinism would be very nice, and making it a part of automatic testing sounds like a good plan long-term. I agree that it is a better solution to not rely on any ordering of HashSets keyed with addresses.

I have created discussion #529 to not discuss all of this in this PR.

Regarding non-determinism from malloc, I have for some reason gotten very consistent results even though ASLR is enabled. I have had logic bugs that caused every execution to give the wrong result when executed normally, but never give the wrong result when executed in gdb. This was not a memory bug, so the only possible difference was iteration order in sets of addresses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants