8315066: Add unsigned bounds and known bits to TypeInt/Long #17508

merykitty · 2024-01-20T19:23:23Z

Hi,

This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot.

In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance.

This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization.

Please kindly review, thanks a lot.

Testing

GHA
Linux x64, tier 1-4

Progress

Change must not contain extraneous whitespace
Commit message must refer to an issue
Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

JDK-8315066: Add unsigned bounds and known bits to TypeInt/Long (Enhancement - P4)

Reviewers

Emanuel Peter (@eme64 - Reviewer)
Vladimir Kozlov (@vnkozlov - Reviewer)
Jatin Bhateja (@jatin-bhateja - Reviewer)

Contributors

Emanuel Peter <[email protected]>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508
$ git checkout pull/17508

Update a local copy of the PR:
$ git checkout pull/17508
$ git pull https://git.openjdk.org/jdk.git pull/17508/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 17508

View PR using the GUI difftool:
$ git pr show -t 17508

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/17508.diff

Using Webrev

Link to Webrev Comment

bridgekeeper · 2024-01-20T19:24:45Z

👋 Welcome back qamai! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2024-01-20T19:26:30Z

@merykitty The following label will be automatically applied to this pull request:

hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2024-01-20T19:29:40Z

Webrevs

merykitty · 2024-01-20T19:45:15Z

@TobiHartmann @eme64 I have extracted a part of #15440, could you take a look when you have time, please? Thanks a lot for your help.

eme64

@merykitty I just had a quick look. Thanks for spitting out parts and making it more reviewable that way! Since John Rose is generally excited (#15440 (comment)), I'll now put in a bit more effort into reviewing this.

Thanks for adding some gtests.
I would really like to see some IR tests, where we can see that this folds cases, and folds them correctly.
And just some general java-code correctness tests, which test your optimizations in an end-to-end way.

I have a general concern with the math functions. They have quite a few arguments, often 5-10. And on average half of them are passed as a reference. Sometimes it is hard to immediately see which are the arguments that will not be mutated, and which are return values, and which are both arguments and return values, which are simply further constrained/narrowed etc.

I wonder if it might be better to have types like:

SRange<int/long> {lo, hi}
URange<int/long> {lo, hi}
KnownBits<int/long> {ones, zeros}

Make them immutable, i.e. the fields are constant.
Then as function parameters, you always pass in these as const, and return the new values (possibly in some combined type, or a pair or tuple or whatever).

I think it would make the code cleaner, have fewer arguments, and a bit easier to reason about when things are immutable.

Plus, then you can put the range-inference methods inside those classes, you can directly ask such an object if it is empty etc. You could for example have somelthing like:
SRange::constrained_with(KnownBits) -> returns SRange. Basically I'm asking for the code to be a little more object-oriented, and less C-style ;)

src/hotspot/share/opto/type.hpp

src/hotspot/share/opto/rangeinference.cpp

…uments

merykitty · 2024-01-22T18:31:59Z

@eme64 Thanks a lot for your reviews, I hope that I have addressed your concerns.

Regarding IR tests, I don't think I can come up with any as there is no node taking advantage of the additional information yet.

jaskarth

I took a quick look through the patch, this is really impressive :)

Early last year I had an attempt at the same idea (extremely rough patch, sorry), where I went with the approach of using a 2-bit value for each bit position to represent 0, 1, BOTTOM, and TOP. My general idea was to create a boolean lattice so that the meet() and dual() operations were easier to implement, before I realized how difficult reasoning about multiple constraints in the meet and dual operations was. I think your idea of marking the dual makes more sense and is cleaner, especially with how the constraints interact.

src/hotspot/share/opto/type.hpp

src/hotspot/share/opto/type.cpp

merykitty · 2024-01-25T02:57:08Z

@jaskarth Thanks for looking into this patch. I have tried not having an explicit _dual field but in the end it is too hard and cumbersome without any benefits so I end up with this approach. I will address your suggestions in the next iteration. Regarding contains vs higher_equal, it is mainly due to the fact that contains being a much cheaper operation while higher_equal will do a meet followed by a hash table indexing.

jaskarth · 2024-01-27T23:13:50Z

Regarding contains vs higher_equal, it is mainly due to the fact that contains being a much cheaper operation while higher_equal will do a meet followed by a hash table indexing.

I forgot that higher_equal requires hashconsing- that makes sense to me!

bridgekeeper · 2024-02-24T23:20:05Z

@merykitty This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

merykitty · 2024-02-25T17:52:16Z

Keep alive.

openjdk · 2024-03-13T20:15:18Z

@merykitty This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8315066: Add unsigned bounds and known bits to TypeInt/Long

Co-authored-by: Emanuel Peter <[email protected]>
Reviewed-by: epeter, kvn, jbhateja

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 54 new commits pushed to the master branch:

523a4ef: 8358970: CounterMonitorMBean.getDerivedGaugeTimeStamp() JavaDoc incorrectly documents null
0dd7c69: 8357286: (bf) Remove obsolete instanceof checks in CharBuffer.append
66535fe: 8358701: Remove misleading javax.management.remote API doc wording about JMX spec, and historic link to JMXMP
... and 51 more: https://git.openjdk.org/jdk/compare/cae1fd3385e0635beeac34a2287627e556155783...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

bridgekeeper · 2024-04-12T05:12:49Z

@merykitty This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper · 2024-05-10T06:01:31Z

@merykitty This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

merykitty · 2024-08-14T16:25:07Z

/open

merykitty · 2025-05-05T12:39:30Z

@eme64 Thanks for not forgetting me :) I have answered your reviews.

eme64

@merykitty Ok then, thanks for working through all this with me!
I'm really happy with the result now, and I feel like I learned a lot from you 😊

@vnkozlov You might want to have another pass over it, as we reworked quite a bit.

vnkozlov · 2025-05-12T15:31:53Z

This looks fine to me but to be on safe side lets push it into JDK 26 when it is forked.
And I don't see link in RFE to recent testing of this. It needs to be tested in all tiers including tier10, xcomp and stress.

merykitty · 2025-06-09T17:25:47Z

@vnkozlov I have merged this branch with master, can you run your tests and approve the changes, please?

vnkozlov · 2025-06-09T18:08:04Z

@vnkozlov I have merged this branch with master, can you run your tests and approve the changes, please?

I submitted testing.

vnkozlov

My testing passed.

jatin-bhateja · 2025-06-10T02:20:45Z

Hi @merykitty,
TypeIntPrototype currently accepts any random unsigned and signed value ranges and then tries to canonicalize them. This is a flexible approach, but do you see any practical use case for this relaxation?

I guess the reason is that the existing TypeInt lattice only captures signed value ranges. All non-top Integral types must ensure that type._lo must never be greater than type._hi, in case of an unsigned range value wraps around after MAX_INT, thus it may result into breaking the value range invariant as in such a case type._hi < type.lo.

The point of associating multiple lattice values i.e. signed, unsigned, and known bits, with a given integral type is to have a generic implementation where at any given point all three lattices are in sync, and canonicalization ensures this.

But under practical scenarios, KnownBits and signed lattice points will be used most since Java type system inherently supports only signed types.

jatin-bhateja · 2025-06-10T04:00:49Z

@merykitty , your comments on the following understand will be helpful

Q. Is it ok to keep a bool flag in a value which signifies that bounds hold unsigned values?

A. Java primitive types are inherently signed, and C2 compiler represents all the integral types i.e., byte, short, int through TypeInt by simply constraining the value range, and long using TypeInt. For float C2 type system creates Type::FLOAT, and for double Type::DOUBLE, unlike the integral type, these two types do not record the actual value range bound. floating point constants C2 creates a different type, i.e. TypeF, and for double constant TypeD.

Currently, the scope of the unsigned type is limited to comparison, multiplication, and division operations. Since the signed and unsigned value ranges overlap hence keeping a flag with _lo and _hi shoud suffice, new scheme accepts bounds of signed and unsigned value ranges then finds the effective value range, this allows user to feed any random signed and unsigned value ranges to a TypeInt and then let compiler find the effective value range by canonicalization. A TypeInt is only useful after canonicalization, it mimics the job of constructor where a newly allocated object is usable after it pushed though constructor, likewise a TypeInt accepts different signed and unsigned bounds but its only usable after normalization which computes effective value range and after normalization, signed, unsigned and knownbits are in sync.

During the dataflow analysis, flow functions associated with different operators may modify the value ranges (signed or unsigned), this will tigger re-normalization, in other cases flow analysis may transfer only the known bits which is then used to prune the value ranges, at any given point, signed / unsigned and knownbit should be in sync, else the type is inconsistent and not usable, iterative canonicalization ensures this.

Thus, to be flexible in implementation, keeping a separate value range for unsigned bounds is justified, but may not add huge value as all Java primitive types are inherently signed, and mixing of signed and unsigned operations in type flow is not possible. The whole idea of keeping the implementation flexible with unsigned bounds is based on an assumption that during data flow any of the lattices associated with an integral types TypeInt or TypeLong i.e. unsigned bounds, signed bounds or known bits may change. In practice only known bits (bit level df) and signed bounds may be usable a flag with signify a bound is unsigned may suffice, associating 3 lattice points with TypeInt is on account of flexibility, it may facilitate injecting opaque IR with unsigned bounds in SoN by optimization passes and then let the type canonicalization and iterative data flow analysis do the magic.

merykitty · 2025-06-10T05:34:59Z

@jatin-bhateja Thanks a lot for your suggestion. I will address your concerns below:

In addition to being additional information, unsigned bounds make it easier for canonicalization. This is because bits are inherently unsigned, canonicalizing bits and unsigned bounds together is an easier task than canonicalize bits and signed bounds. I think it is also beneficial to be consistent, keeping a boolean to signify unsigned bounds splits the set of all TypeInts into 2, which makes it hard to reason about and verify the results of different operations. For example, consider substracting 2 TypeInts, it will be significantly more complex if we have to consider 4 cases: signed - signed, signed - unsigned, unsigned - signed, unsigned - unsigned.

I don't think it suffices to think that Java integral types are inherently signed. Bitwise and, or, xor, not are inherently unsigned. Left shift is both signed and unsigned, right shift has both the signed variant and the unsigned variant. Add, sub, mul are both signed and unsigned. There are only cmp, div, mod, toString and conversions that are signed, but we have methods to do the unsigned variants for all of them, Integer::compareUnsigned, Integer::divide/remainderUnsigned, Integer::toUnsignedString, and Integer::toUnsignedLong. Mul-hi does not have a native operation for both variants and j.l.Math provides both the signed and unsigned variants as utility methods. As a result, I think it is better to think of and int as a 32-bit integral value with unspecified signness and the operations operated on it are what decide whether it is signed or unsigned. And as you can see, for all operations, we have both the signed and unsigned variants.

jatin-bhateja · 2025-06-10T07:36:22Z

@jatin-bhateja Thanks a lot for your suggestion. I will address your concerns below:

In addition to being additional information, unsigned bounds make it easier for canonicalization. This is because bits are inherently unsigned, canonicalizing bits and unsigned bounds together is an easier task than canonicalize bits and signed bounds. I think it is also beneficial to be consistent, keeping a boolean to signify unsigned bounds splits the set of all TypeInts into 2, which makes it hard to reason about and verify the results of different operations. For example, consider substracting 2 TypeInts, it will be significantly more complex if we have to consider 4 cases: signed - signed, signed - unsigned, unsigned - signed, unsigned - unsigned.

I don't think it suffices to think that Java integral types are inherently signed. Bitwise and, or, xor, not are inherently unsigned. Left shift is both signed and unsigned, right shift has both the signed variant and the unsigned variant. Add, sub, mul are both signed and unsigned. There are only cmp, div, mod, toString and conversions that are signed, but we have methods to do the unsigned variants for all of them, Integer::compareUnsigned, Integer::divide/remainderUnsigned, Integer::toUnsignedString, and Integer::toUnsignedLong. Mul-hi does not have a native operation for both variants and j.l.Math provides both the signed and unsigned variants as utility methods. As a result, I think it is better to think of and int as a 32-bit integral value with unspecified signness and the operations operated on it are what decide whether it is signed or unsigned. And as you can see, for all operations, we have both the signed and unsigned variants.

"As a result, I think it is better to think of and int as a 32-bit integral value with unspecified signness and the operations operated on it are what decide whether it is signed or unsigned"

Sounds good, I think since now integral types has multiple lattics point associated with it, and all our existing Value transforms are based on signed value ranges, we will also need to extend existing value transforms to make use of KnownBits, for that we will need to extend newly added KnownBits class to support other operations which accepts KnownBits inputs operates over them as per the operation semantics and returns a new KnownBits which post canonicalization also adjusts other lattice points i.e. signed and unsigned ranges.

All in all, IIUC going forward we are planning to perform bit-level data flow analysis using KnownBits and then apply its result over signed / unsigned ranges, this will ensure all existing Value transforms, which are based on value ranges, continue to be useful and KnownBits analysis will sharpen the ranges.

jatin-bhateja

Great work @merykitty , it was a pleasure to study your patch and refresh some of low level details of bit level data flow analysis.

LGTM,

Best Regards

eme64 · 2025-06-10T07:46:02Z

@jatin-bhateja
FYI: #17508 (comment)

The goal is to make it possible to run gtests with any size integral types (e.g. 4 bit), so that we can efficiently test the corresponding value optimizations. @merykitty Did you already file a JBS issue for that?

So when we refactor, we should try to create methods that take in types, and not nodes. That would allow us to generate types in the gtest, and get a type back. We can do all sorts of enhanced verification that way. For example, we can feed in wider and narrower types as inputs, and then expect that narrower types lead to narrower outputs, and wider types to wider outputs. If constants are fed in, then constants should come out. etc. This would really allow us to exhaustively verify for all sorts of ranges and bit patterns - at least for the smaller types (e.g. 4 bits).

eme64 · 2025-06-10T07:46:46Z

So there is indeed a lot of extension work to do, like @jatin-bhateja said. But we can use that to also refactor the code for testability.

jatin-bhateja · 2025-06-10T08:15:28Z

@jatin-bhateja FYI: #17508 (comment)

The goal is to make it possible to run gtests with any size integral types (e.g. 4 bit), so that we can efficiently test the corresponding value optimizations. @merykitty Did you already file a JBS issue for that?

So when we refactor, we should try to create methods that take in types, and not nodes. That would allow us to generate types in the gtest, and get a type back. We can do all sorts of enhanced verification that way. For example, we can feed in wider and narrower types as inputs, and then expect that narrower types lead to narrower outputs, and wider types to wider outputs. If constants are fed in, then constants should come out. etc. This would really allow us to exhaustively verify for all sorts of ranges and bit patterns - at least for the smaller types (e.g. 4 bits).

Thanks @eme64 for clarifying that is what I assumed, currently all value transforms are based on signed ranges and the role of KnownBits is restricted to canonicalization, which sync up the value ranges to known bits, but it will add much more value if we do the reverse where in data flow analysis is performed over KnownBits and then canonicalization adjusts the value ranges accordingly. So, adding a new gtest for each new operation is the right next step.

merykitty · 2025-06-10T15:51:07Z

@eme64 I have created https://bugs.openjdk.org/browse/JDK-8359149

merykitty · 2025-06-12T23:31:06Z

I think all the tests have passed, I will integrate this PR.

Thank you a lot for reviewing this PR. Especially, thanks @eme64 for your rigorous reviews and valuable suggestions, it helps a lot in shaping this PR in a better manner.

/contributor add @eme64

openjdk · 2025-06-12T23:31:50Z

@merykitty
Contributor Emanuel Peter <[email protected]> successfully added.

merykitty · 2025-06-13T01:04:39Z

/integrate

openjdk · 2025-06-13T01:05:50Z

Going to push as commit 991097b.
Since your change was applied there have been 54 commits pushed to the master branch:

523a4ef: 8358970: CounterMonitorMBean.getDerivedGaugeTimeStamp() JavaDoc incorrectly documents null
0dd7c69: 8357286: (bf) Remove obsolete instanceof checks in CharBuffer.append
66535fe: 8358701: Remove misleading javax.management.remote API doc wording about JMX spec, and historic link to JMXMP
... and 51 more: https://git.openjdk.org/jdk/compare/cae1fd3385e0635beeac34a2287627e556155783...master

Your commit was automatically rebased without conflicts.

openjdk · 2025-06-13T01:05:59Z

@merykitty Pushed as commit 991097b.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

eme64 · 2025-06-13T06:01:55Z

@merykitty Thanks for the attribution 😊 Congrats on shipping it 🚀 Looking forward to the improvements that come from this!

merykitty added 4 commits January 20, 2024 20:01

implement unsigned bounds and known bits

3a87d66

refactor

61f4d21

fix template parameter

d11497e

add unit tests

12f268a

openjdk bot added the rfr Pull request is ready for review label Jan 20, 2024

openjdk bot added the hotspot-compiler [email protected] label Jan 20, 2024

fix tests, add verify

6b417f9

eme64 suggested changes Jan 22, 2024

View reviewed changes

add comments, group arguments to reduce C-style reference passing arg…

756d615

…uments

fix release build

1faa48b

jaskarth reviewed Jan 24, 2024

View reviewed changes

src/hotspot/share/opto/type.hpp Show resolved Hide resolved

src/hotspot/share/opto/type.cpp Outdated Show resolved Hide resolved

merykitty added 2 commits March 15, 2024 11:55

Merge branch 'master' into unsignedbounds

ffb0abd

add comments

6e2e6c5

bridgekeeper bot closed this May 10, 2024

jaskarth mentioned this pull request Jul 8, 2024

8335444: Generalize implementation of AndNode mul_ring #20066

Closed

3 tasks

merykitty added 2 commits August 14, 2024 23:03

Merge branch 'master' into unsignedbounds

ec4ca3e

fix compile errors

d5ad9f1

add more intn_t tests

77aa406

eme64 approved these changes May 5, 2025

View reviewed changes

SirYwell mentioned this pull request May 15, 2025

8356813: Improve Mod(I|L)Node::Value #25254

Open

3 tasks

Merge branch 'master' into unsignedbounds

ad6ac7c

vnkozlov approved these changes Jun 10, 2025

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Jun 10, 2025

jatin-bhateja approved these changes Jun 10, 2025

View reviewed changes

openjdk bot added the integrated Pull request has been integrated label Jun 13, 2025

openjdk bot closed this Jun 13, 2025

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jun 13, 2025

MaxXSoft mentioned this pull request Jun 24, 2025

8360192: C2: Make the type of count leading/trailing zero nodes more precise #25928

Open

3 tasks

8315066: Add unsigned bounds and known bits to TypeInt/Long #17508

8315066: Add unsigned bounds and known bits to TypeInt/Long #17508

Uh oh!

Conversation

merykitty commented Jan 20, 2024 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Contributors

Reviewing

Uh oh!

bridgekeeper bot commented Jan 20, 2024

Uh oh!

openjdk bot commented Jan 20, 2024

Uh oh!

mlbridge bot commented Jan 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

merykitty commented Jan 20, 2024

Uh oh!

eme64 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

merykitty commented Jan 22, 2024

Uh oh!

jaskarth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

merykitty commented Jan 25, 2024

Uh oh!

jaskarth commented Jan 27, 2024

Uh oh!

bridgekeeper bot commented Feb 24, 2024

Uh oh!

merykitty commented Feb 25, 2024

Uh oh!

openjdk bot commented Mar 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bridgekeeper bot commented Apr 12, 2024

Uh oh!

bridgekeeper bot commented May 10, 2024

Uh oh!

merykitty commented Aug 14, 2024

Uh oh!

merykitty commented May 5, 2025

Uh oh!

eme64 left a comment

Choose a reason for hiding this comment

Uh oh!

vnkozlov commented May 12, 2025

Uh oh!

merykitty commented Jun 9, 2025

Uh oh!

vnkozlov commented Jun 9, 2025

Uh oh!

vnkozlov left a comment

Choose a reason for hiding this comment

Uh oh!

jatin-bhateja commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jatin-bhateja commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

merykitty commented Jun 10, 2025

Uh oh!

jatin-bhateja commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jatin-bhateja left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

merykitty commented Jan 20, 2024 •

edited by openjdk bot

Loading

mlbridge bot commented Jan 20, 2024 •

edited

Loading

openjdk bot commented Mar 13, 2024 •

edited

Loading

jatin-bhateja commented Jun 10, 2025 •

edited

Loading

jatin-bhateja commented Jun 10, 2025 •

edited

Loading

jatin-bhateja commented Jun 10, 2025 •

edited

Loading

jatin-bhateja left a comment •

edited

Loading

jatin-bhateja commented Jun 10, 2025 •

edited

Loading