8346836: C2: Verify CastII/CastLL bounds at runtime #22880

merykitty · 2024-12-25T14:54:02Z

Hi,

This patch adds a develop flag VerifyConstraintCasts, which will verify the correctness of CastIINodes and CastLLNodes at runtime and crash the VM if the dynamic value lies outside the type value range.

Please take a look, thanks a lot.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8346836: C2: Verify CastII/CastLL bounds at runtime (Enhancement - P4)

Reviewers

Emanuel Peter (@eme64 - Reviewer) Review applies to 8d140fd9
Vladimir Ivanov (@iwanowww - Reviewer)

Contributors

Vladimir Ivanov <[email protected]>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/22880/head:pull/22880
$ git checkout pull/22880

Update a local copy of the PR:
$ git checkout pull/22880
$ git pull https://git.openjdk.org/jdk.git pull/22880/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 22880

View PR using the GUI difftool:
$ git pr show -t 22880

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/22880.diff

Using Webrev

Link to Webrev Comment

bridgekeeper · 2024-12-25T14:54:39Z

👋 Welcome back qamai! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2024-12-25T14:55:29Z

@merykitty This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8346836: C2: Verify CastII/CastLL bounds at runtime

Co-authored-by: Vladimir Ivanov <[email protected]>
Reviewed-by: vlivanov, epeter

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 122 new commits pushed to the master branch:

e01e33d: 8354424: java/util/logging/LoggingDeadlock5.java fails intermittently in tier6
370e611: 8355221: Get rid of unnecessary override of JDIBase.breakpointForCommunication in nsk/jdi tests
29f1070: 8355211: nsk/jdi/EventRequest/disable/disable001.java should use JDIBase superclass
... and 119 more: https://git.openjdk.org/jdk/compare/0995b9409d910d816276673b5c06fdf7826bfac7...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk · 2024-12-25T14:55:57Z

@merykitty The following label will be automatically applied to this pull request:

hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2024-12-25T14:58:44Z

Webrevs

merykitty · 2024-12-25T15:02:27Z

Running tier 1 tests with this flag and -XX:+StressGCM reveals several failures. One example is compiler.arraycopy.TestArrayCopyConjoint. In the method testByte(int, int, int), we have the final graph before matching:

228 CastII is a cast on P0 has its type being int[0, 32], it depends on 2 dominating ifs:

100 If, which corresponds to (P0 < 0) == false
200 If, which corresponds to (ConvI2L(P0) u<= 32L) == true

As a result, 228 CastII should have its control flow being the IfTrueNode projection of 200 If. However, it is incorrectly wired to the IfTrueNode projection of 223 If (P0 != 0 == true), which leads to the verification failure.

eme64

Looks interesting :)

Would we have caught any bug with this, do you have an example?

If I remember correctly, we relax/widen the Cast ranges somewhere later in optimizations, so that different CastII etc can common. Probably happens after loop-opts. So the ranges usually go from [1..10] -> [0, max] or [-1 .. 1] -> int. So this verification would then not be super effective, right? Things might have gone wrong much earlier with bad assumptions. I mean it could still catch issues, but I'm not sure how likely that is?

TLDR: I'd like some more context / motivation for this patch ;)

And: you should have at least one plain test where you enable the flag, and it compiles everything required to run an empty main function.

eme64 · 2025-01-08T07:24:18Z

Ah, I actually see that you have some examples. So you plan on introducing this flag first, and only then fixing the issues? But does it fail with a simple java --version? Or an empty main method, maybe with -Xcomp?

merykitty · 2025-01-08T07:47:38Z

@eme64 Thanks for looking at this

The context is that while reviewing #22666 I came to the conclusion that our handling of depends_only_on_test is broken. I have added a comment explaining my understanding and concerns there. In principle, before the execution, a DivINode is the same as a CastIINode which limits the value range of the divisor to != 0. As a result, there should not be any difference in the way we handle the movements of these nodes. This leads me to the conclusion that CastIINodes may also be wired to the wrong control input, the reason we have not caught them is that unlike a division complaining loudly, a CastIINode will silently accept incorrect input values. This motivates me to make this patch.

If I remember correctly, we relax/widen the Cast ranges somewhere later in optimizations, so that different CastII etc can common. Probably happens after loop-opts. So the ranges usually go from [1..10] -> [0, max] or [-1 .. 1] -> int.

You are right, it is in ConstraintCastNode::widen_type for which I will disable that widening in the presence of VerifyConstraintCasts.

So you plan on introducing this flag first, and only then fixing the issues?

There are several failures in tier 1 alone, and this flag is not enabled by default or in the pipeline, so I think incorporating it first would be preferable, then after fixing all the issues we can add it to the stress options.

But does it fail with a simple java --version? Or an empty main method, maybe with -Xcomp?

No it does not fail with --version or with an empty main method with and without -Xcomp.

eme64

I like this idea a lot. I'm personally on the fence if it is ok to integrate a flag that does not yet pass in the tests. The risk is that nobody fixes it.

What do you think about that @TobiHartmann @vnkozlov ?

Did you run testing with and without the flag? Cannot see the link posted on JIRA.

src/hotspot/share/opto/castnode.cpp

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp

eme64 · 2025-01-17T07:48:02Z

Talked with @TobiHartmann , he thinks it is ok to integrate the flags, even if it has failures. Let's go ahead with it then.

It would still be nice to have 2 modes: one where we do the widening, and one without. Both may catch different things, right? I imagine it like this: VerifyConstraintCasts=1 -> do widening. VerifyConstraintCasts=2 -> disable widening.

Plus, can you do this?

And: you should have at least one plain test where you enable the flag, and it compiles everything required to run an empty main function.

Or does that already trigger failures? It would be nice just so we have some basic testing, and see that the flag is not completely broken.

eme64 · 2025-01-17T07:48:49Z

What about implementing the same for aarch64? That would increase our coverage eventually.

vnkozlov · 2025-01-17T16:41:00Z

We can add this flag to our stress testing sets of flags to make sure we run with it during our regular testing.

merykitty · 2025-01-19T15:11:57Z

@eme64 Thanks for your reviews, I have added 2 test cases for TestIterativeGVN that set VerifyConstraintCasts, the name of the test may need to change but I have not been able to come up with anything preferable.

For aarch64, I don't have an aarch64 machine around so it would be not so trivial.

I like this idea a lot. I'm personally on the fence if it is ok to integrate a flag that does not yet pass in the tests. The risk is that nobody fixes it.

I am working on JDK-8347365 which can hopefully solve some of the issues we are having here.

eme64

Just a few more comments :)

src/hotspot/share/opto/c2_globals.hpp

src/hotspot/share/opto/castnode.cpp

test/hotspot/jtreg/compiler/c2/TestVerifyConstraintCasts.java

eme64

Sorry for the delay.

Looks great, especially with the better comments! 👏

vnkozlov

Can we add AArch64 implementation too to cover our platforms?

merykitty · 2025-02-06T19:11:28Z

@vnkozlov I don't have an AArch64 machine so I feel less confident writing one. We can add an AArch64 implementation later, though. What do you think?

iwanowww

Very nice!

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp

src/hotspot/share/opto/c2_globals.hpp

iwanowww · 2025-02-07T17:11:22Z

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp

+  movl(rax, dst);
+  movl(rcx, type->_lo);
+  movl(rdx, type->_hi);
+  hlt(); // hlt so we have the stack trace


That's interesting. Sounds like a problem in NativeStackPrinter::print_stack().

Speaking of debugging output, a call into a local helper function (encapsulating pretty printing logic) followed by a hlt call will do the job. But, considering the usages are in-line and quite common I suggest to make it conditional (guarded by a flag). It is possible to recover all 3 values from generated code if needed and turn on error reporting (specify the diagnostic flag) when reproducing failures.

I have made a helper function that will print the error message together with the parameters, the stack printing is still problematic, though.

src/hotspot/cpu/x86/x86_64.ad

vnkozlov · 2025-02-07T22:06:07Z

@vnkozlov I don't have an AArch64 machine so I feel less confident writing one. We can add an AArch64 implementation later, though. What do you think?

Okay, later is fine.

iwanowww

Looks good.

iwanowww · 2025-04-08T21:37:40Z

Speaking of bug synopsis, can you make it a bit more concrete and succinct?

How about "C2: Verify CastII/CastLL bounds at runtime"?

merykitty · 2025-04-09T18:06:40Z

@iwanowww Thanks a lot for the reviews, I have updated according to your suggestions.

iwanowww · 2025-04-10T00:37:56Z

FTR here's AArch64 support:
7ed34d0

Feel free to incorporate it in this PR or I'll upstream it separately..

merykitty · 2025-04-22T17:07:29Z

@iwanowww Thanks a lot for your help, I have incorporated your patches.

/contributor add vlivanov

openjdk · 2025-04-22T17:07:44Z

@merykitty
Contributor Vladimir Ivanov <[email protected]> successfully added.

iwanowww

Looks good. Thanks.

iwanowww · 2025-04-23T18:21:47Z

src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp

+  jint hi = t->_hi;
+
+  if (lo != min_jint && hi != max_jint) {
+    subsw(rtmp, rval, lo);


It turns out it's equivalent to cmpw(rval, lo) which is clearer IMO.

I don't think it is, cmpw(rval, lo) is equivalent to subsw(zr, rval, lo). However, if lo does not fit into an immediate instruction, MacroAssembler::subsw, which calls into wrap_adds_subs_imm_insn, will use Rd as a temporary register to store lo, this is invalid if Rd is zr. Am I understanding it right?

Yes, you are right. Completely forgot that there's only 12 bits available for the immediate (Assembler::operand_valid_for_add_sub_immediate()).

Completely forgot what I was thinking about when writing that code :-)

eme64

Wow, this looks even much better with the improved printing on failure now!

Just out of curiosity: Is the whole reconstruct_frame_pointer mechanism general enough so that we could use it in other places as well? It is not super important to me any more, but I've wanted to have something like this for VerifyAlignVector already :)

test/hotspot/jtreg/compiler/c2/TestVerifyConstraintCasts.java

iwanowww · 2025-04-24T23:18:08Z

Just out of curiosity: Is the whole reconstruct_frame_pointer mechanism general enough so that we could use it in other places as well?

IMO it is general enough, but I haven't found a good place to put it. Ideally, all runtime/debug calls should keep frame pointer valid for diagnostic purposes.

merykitty · 2025-04-25T02:09:06Z

@iwanowww @eme64 Thanks a lot for your reviews!
/integrate

openjdk · 2025-04-25T02:09:59Z

Going to push as commit ed60403.
Since your change was applied there have been 123 commits pushed to the master branch:

8a39f07: 8354431: gc/logging/TestGCId fails on Shenandoah
e01e33d: 8354424: java/util/logging/LoggingDeadlock5.java fails intermittently in tier6
370e611: 8355221: Get rid of unnecessary override of JDIBase.breakpointForCommunication in nsk/jdi tests
... and 120 more: https://git.openjdk.org/jdk/compare/0995b9409d910d816276673b5c06fdf7826bfac7...master

Your commit was automatically rebased without conflicts.

openjdk · 2025-04-25T02:10:11Z

@merykitty Pushed as commit ed60403.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

merykitty · 2025-04-25T02:13:15Z

IMO it is general enough, but I haven't found a good place to put it. Ideally, all runtime/debug calls should keep frame pointer valid for diagnostic purposes.

Tbh I don't understand ... How does the VM normally walk the stack when we crash in a compiled method? I thought that it has to know the frame size of a compiled method? Why do we need to manually reconstruct the frame pointer?

iwanowww · 2025-04-25T05:19:16Z

JVM knows how to unwind the stack when crash happens in compiled code (compiled frame on top). When native frame is on top, it relies on platform ABI, so fails to unwind the stack at the border of native and compiled frames because compiled code doesn't follow platform ABI conventions.

Introduce VerifyConstraintCasts

de9aea4

openjdk bot added the rfr Pull request is ready for review label Dec 25, 2024

openjdk bot added the hotspot-compiler [email protected] label Dec 25, 2024

merykitty mentioned this pull request Dec 26, 2024

8331717: C2: Crash with SIGFPE Because Loop Predication Wrongly Hoists Division Requiring Zero Check #22666

Closed

3 tasks

eme64 suggested changes Jan 8, 2025

View reviewed changes

Merge branch 'master' into verifycast

6010c82

eme64 reviewed Jan 17, 2025

View reviewed changes

src/hotspot/share/opto/castnode.cpp Outdated Show resolved Hide resolved

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp Outdated Show resolved Hide resolved

merykitty added 2 commits January 19, 2025 21:57

make VerifyConstraintCast uint, better debug info

25cda2f

add tests

f18ca9d

move test to a new file, add block_comment

b3826a5

eme64 suggested changes Jan 22, 2025

View reviewed changes

src/hotspot/share/opto/c2_globals.hpp Outdated Show resolved Hide resolved

src/hotspot/share/opto/castnode.cpp Outdated Show resolved Hide resolved

test/hotspot/jtreg/compiler/c2/TestVerifyConstraintCasts.java Show resolved Hide resolved

better comments

7f2af65

eme64 approved these changes Feb 4, 2025

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Feb 4, 2025

vnkozlov reviewed Feb 4, 2025

View reviewed changes

Merge branch 'master' into verifycast

da854c1

iwanowww reviewed Feb 7, 2025

View reviewed changes

iwanowww approved these changes Apr 8, 2025

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Apr 8, 2025

assert CastLL

45b4549

openjdk bot removed the ready Pull request is ready to be integrated label Apr 9, 2025

merykitty changed the title ~~8346836: C2: Introduce a way to verify the correctness of ConstraintCastNodes at runtime~~ 8346836: C2: Verify CastII/CastLL bounds at runtime Apr 9, 2025

merykitty and others added 3 commits April 23, 2025 00:00

Merge branch 'master' into verifycast

2c89363

aarch64 support

cb107fd

Reconstruct FP

8d140fd

iwanowww approved these changes Apr 23, 2025

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Apr 23, 2025

eme64 approved these changes Apr 24, 2025

View reviewed changes

test/hotspot/jtreg/compiler/c2/TestVerifyConstraintCasts.java Outdated Show resolved Hide resolved

Emanuel's suggestion

8238894

openjdk bot removed the ready Pull request is ready to be integrated label Apr 24, 2025

iwanowww approved these changes Apr 24, 2025

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Apr 24, 2025

openjdk bot added the integrated Pull request has been integrated label Apr 25, 2025

openjdk bot closed this Apr 25, 2025

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Apr 25, 2025

8346836: C2: Verify CastII/CastLL bounds at runtime #22880

8346836: C2: Verify CastII/CastLL bounds at runtime #22880

Uh oh!

Conversation

merykitty commented Dec 25, 2024 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Contributors

Reviewing

Uh oh!

bridgekeeper bot commented Dec 25, 2024

Uh oh!

openjdk bot commented Dec 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openjdk bot commented Dec 25, 2024

Uh oh!

mlbridge bot commented Dec 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

merykitty commented Dec 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eme64 left a comment

Choose a reason for hiding this comment

Uh oh!

eme64 commented Jan 8, 2025

Uh oh!

merykitty commented Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eme64 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

eme64 commented Jan 17, 2025

Uh oh!

eme64 commented Jan 17, 2025

Uh oh!

vnkozlov commented Jan 17, 2025

Uh oh!

merykitty commented Jan 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eme64 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eme64 left a comment

Choose a reason for hiding this comment

Uh oh!

vnkozlov left a comment

Choose a reason for hiding this comment

Uh oh!

merykitty commented Feb 6, 2025

Uh oh!

iwanowww left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

iwanowww Feb 7, 2025

Choose a reason for hiding this comment

Uh oh!

merykitty Apr 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vnkozlov commented Feb 7, 2025

Uh oh!

iwanowww left a comment

Choose a reason for hiding this comment

Uh oh!

merykitty commented Dec 25, 2024 •

edited by openjdk bot

Loading

openjdk bot commented Dec 25, 2024 •

edited

Loading

mlbridge bot commented Dec 25, 2024 •

edited

Loading

merykitty commented Dec 25, 2024 •

edited

Loading

merykitty commented Jan 8, 2025 •

edited

Loading

merykitty commented Jan 19, 2025 •

edited

Loading

iwanowww commented Apr 24, 2025 •

edited

Loading