8353230: Emoji rendering regression after JDK-8208377 #24412

gredler · 2025-04-03T11:23:42Z

It looks like this regression actually fits into a longer series of fixes / regressions in this area:

JDK-4517298 fixed metrics for zero-width characters, but broke some ligatures / glyph substitutions
JDK-7017058 fixed the ligatures / glyph substitutions, but broke some zero-width metrics
JDK-8208377 fixed some metrics and rendering for zero-width characters, but broke some ligatures / glyph substitutions
Now, with this PR, we aim to fix the ligatures without re-breaking zero-width metrics and display

We have two different types of use cases pulling CharToGlyphMapper in two different directions: the users who need raw, untransformed glyph info, and the users who need normalized / transformed glyph info.

It looks to me like, in the current code base, the only CharToGlyphMapper user which requires raw font data is HarfBuzz (explicitly confirmed with the HarfBuzz team here: harfbuzz/harfbuzz#5234).

The regression mechanism at play here is that the HarfBuzz font callbacks are currently providing HarfBuzz with transformed glyph info (e.g. ZWJ -> INVISIBLE_GLYPH_ID), which prevents HarfBuzz from recognizing and applying the correct font GSUB substitutions (which involve ZWJ).

In order to fix this without (yet again) breaking metrics and display behavior elsewhere, I've added two methods to CharToGlyphMapper which provide access to raw glyph info, to be used by the HarfBuzz font callbacks: charToGlyphRaw(int) and charToVariationGlyphRaw(int).

Note two intricacies related to CompositeGlyphMapper:

We need to be careful to only cache raw (untransformed) values, to avoid conflicts between requests for a raw version of a glyph and a transformed version of the same glyph. Another option would have been two separate caches, but I don't think that's necessary.
Consumers who are using CompositeGlyphMapper.SLOTMASK to check glyph slots (e.g. FontRunIterator and CTextPipe) will "see" invisible glyphs as having come from slot 0. This isn't new, and I think it's OK, but something to be aware of.

The glyph cache handling in CCharToGlyphMapper (for macOS) also requires care to avoid mixing value types.

Please also note that I'm not sure if the tweak to sunFont.c is being tested, since FFM is being used by default for HarfBuzz integration. (Is there a plan to remove the JNI version soon?)

This PR includes a self-contained regression test. It includes a small font created just for this test, which exercises the ligature / glyph substitution infrastructure. The font tests, including the new regression test, all pass locally on Linux, Windows and macOS (make test TEST="jtreg:test/jdk/java/awt/font").

Interestingly, the changes for JDK-7017058 (mentioned above) included a test (ZWJLigatureTest) which I think would have caught this last regression, but it depends on optional Windows fonts which I guess do not exist on any commonly-used test infrastructure. This should not be an issue with the new test, since it does not depend on any external fonts.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8353230: Emoji rendering regression after JDK-8208377 (Bug - P3)

Reviewers

Phil Race (@prrace - Reviewer)
Harshitha Onkar (@honkar-jdk - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24412/head:pull/24412
$ git checkout pull/24412

Update a local copy of the PR:
$ git checkout pull/24412
$ git pull https://git.openjdk.org/jdk.git pull/24412/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 24412

View PR using the GUI difftool:
$ git pr show -t 24412

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24412.diff

Using Webrev

Link to Webrev Comment

…h info

bridgekeeper · 2025-04-03T11:24:16Z

👋 Welcome back dgredler! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-04-03T11:24:31Z

@gredler This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8353230: Emoji rendering regression after JDK-8208377

Reviewed-by: prr, honkar

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 94 new commits pushed to the master branch:

b7ca672: 8357047: [ubsan] AdapterFingerPrint::AdapterFingerPrint runtime error: index 3 out of bounds
82807d4: 8357184: Test vmTestbase/nsk/jdi/ExceptionEvent/itself/exevent008/TestDescription.java fails with unreported exception
3cc6309: 8353955: nsk/jdi tests should be fixed to not always require includevirtualthreads=y
... and 91 more: https://git.openjdk.org/jdk/compare/bbceab072555d5e2f5d3e99ae07a5ca5e909d7dc...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@prrace, @honkar-jdk) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

openjdk · 2025-04-03T11:25:38Z

@gredler The following label will be automatically applied to this pull request:

client

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2025-04-03T11:28:22Z

Webrevs

YaaZ · 2025-04-15T17:28:12Z

We had similar emoji-related regressions at JetBrains. Although our font-related code diverged from OpenJDK a bit, porting this patch seems to resolve them too. I am not an OpenJDK reviewer, but LGTM nevertheless.

gredler · 2025-04-21T17:43:19Z

@YaaZ Thanks for the information!

@prrace Have you had a chance to look at this PR?

prrace · 2025-04-24T03:04:36Z

@YaaZ Thanks for the information!

@prrace Have you had a chance to look at this PR?

It passed all the testing I did. I still need to look hard at the changes.

YaaZ · 2025-04-29T09:55:56Z

By the way, I see that in each implementation, both charToGlyph and charToGlyphRaw call a common method, like getGlyph(int uniciode, boolean raw). At first there was just charToGlyph, then charToVariationGlyph was added and now you added a "raw" version for each of them, I see that in the future we will need other variants and how it's already starting an exponential explosion. Overriding all of those methods in each implementation brings quite a bit of boilerplate, and it becomes easier to miss something. Maybe take a step back and refactor this into a single charToGlyph(int unicode, int variationSelector, boolean raw) version?
Also, this raw parameter only really controls isDefaultIgnorable check in the end of each method. Maybe we could factor this out without bringing it separately into each mapper implementation?

gredler · 2025-05-01T20:02:45Z

@YaaZ: Thanks for the additional feedback, please see my thoughts below:

By the way, I see that in each implementation, both charToGlyph and charToGlyphRaw call a common method, like getGlyph(int uniciode, boolean raw). At first there was just charToGlyph, then charToVariationGlyph was added and now you added a "raw" version for each of them, I see that in the future we will need other variants and how it's already starting an exponential explosion.

I don't know if I would call two changes to CharToGlyphMapper in 20 years an exponential explosion, but I get your point :-)

Overriding all of those methods in each implementation brings quite a bit of boilerplate, and it becomes easier to miss something.

True, but again keep in mind that there are only 5 implementations, only one of which (the macOS CCharToGlyphMapper) has been added in the last 20 years.

Maybe take a step back and refactor this into a single charToGlyph(int unicode, int variationSelector, boolean raw) version?

We'd still need separate methods for int vs. char, but I think this might reduce 5 methods down to 3? The changeset would be a bit more intrusive (lots of callers would need to change to reflect the new method signature). I'd be interested to hear thoughts from some of the reviewers on this one.

Also, this raw parameter only really controls isDefaultIgnorable check in the end of each method. Maybe we could factor this out without bringing it separately into each mapper implementation?

I prefer to think of it as controlling whether or not any transformations to INVISIBLE_GLYPH_ID happen (right now it's just for default-ignorable characters, but there may be other scenarios in the future, e.g. \r, \n and \t which currently are handled elsewhere).

Any ideas for what this refactoring might look like?

YaaZ · 2025-05-01T21:55:00Z

I was talking about the explosion because there is a scenario in my mind, which I didn't make clear for everybody else. There is a change which I didn't have time to contribute, but would like to: it's related to composite fonts and variation selectors. We may need 2 variants for retrieving a glyph with a variation selector - one strictly matching a variation selector and another with a fallback to the base glyph, multiplied by raw/transformed versions, which adds 2 more methods. Not like it's a big problem, but given that they all end up calling a single method anyway... You get the point.

there may be other scenarios in the future, e.g. \r, \n and \t which currently are handled elsewhere).

Are those scenarios specific to a patricular mapper/font type? I was thinking that those transformations are generic.

Any ideas for what this refactoring might look like?

I was thinking about moving this default-ignorable or any potential generic transformation into base CharToGlyphMapper or even Font2D. For example, make default implementation of CharToGlyphMapper.charToGlyph check ignorable characters and then call charToGlyphRaw - then other implementations would only need to override charToGlyphRaw.

prrace

Sorry, this one fell off the radar. I see my testing - a long time ago now - passed.
The code changes make sense, so I'm ready to approve.
But its been so long I think I should re-test with the latest version on the latest repo.

gredler · 2025-05-26T21:35:41Z

@prrace Please don't approve yet. I was able to fix the code conflicts caused by PR #23665, but it looks like the logic does not layer nicely as there is now a test failure on macOS after the merge. I'm having a look and will let you know what I find.

prrace · 2025-05-26T23:44:34Z

@prrace Please don't approve yet. I was able to fix the code conflicts caused by PR #23665, but it looks like the logic does not layer nicely as there is now a test failure on macOS after the merge. I'm having a look and will let you know what I find.

And indeed the tests just finished and a test (not the new one, a previous one : IgnoredWhitespaceTest.java) failed on macOS x64 and ARM ..

java.lang.RuntimeException: for text '\t\t\t\t\tXXXXX' with font java.awt.Font[family=Dialog,name=Dialog,style=plain,size=40]: java.awt.Rectangle[x=300,y=271,width=123,height=28] != java.awt.Rectangle[x=365,y=271,width=123,height=28]
at IgnoredWhitespaceTest.assertEqual(IgnoredWhitespaceTest.java:127)
at IgnoredWhitespaceTest.test(IgnoredWhitespaceTest.java:103)
at IgnoredWhitespaceTest.test(IgnoredWhitespaceTest.java:69)
at IgnoredWhitespaceTest.main(IgnoredWhitespaceTest.java:49)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
at java.base/java.lang.reflect.Method.invoke(Method.java:565)
at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138)
at java.base/java.lang.Thread.run(Thread.java:1447)

JavaTest Message: Test threw exception: java.lang.RuntimeException: for text '\t\t\t\t\tXXXXX' with font java.awt.Font[family=Dialog,name=Dialog,style=plain,size=40]: java.awt.Rectangle[x=300,y=271,width=123,height=28] != java.awt.Rectangle[x=365,y=271,width=123,height=28]
JavaTest Message: shutting down test

gredler · 2025-05-27T00:37:38Z

Yep, that's the test added in PR #23665 (with which this PR had a conflict).

gredler · 2025-05-28T11:19:03Z

@YaaZ I had a look at your suggestion to push up the raw checks into the CharToGlyphMapper superclass and the complicating factor are the three charsToGlyphs methods which take arrays. These are handled differently in each subclass, and even if we were able to refactor all subclass implementations to ensure that they all call into the public single-char superclass methods (where we would add the raw checks), it would be relatively brittle (not obvious to future maintainers that the call to the superclass method cannot be removed). The current approach does require each subclass to handle the raw boolean, but it also allows each subclass to internally do that in a unified, simplified way. Please feel free to prototype something on your end -- I'm open to helping with future improvements in this area, if feasible.

gredler · 2025-05-28T20:25:23Z

@prrace This is ready for review again. I've updated the code to combine nicely with the recent whitespace fixes. The test/jdk/java/awt/font, test/jdk/java/awt/print and test/jdk/java/awt/Graphics2D/DrawString tests all pass for me locally on Windows, Linux and macOS.

Once these changes are integrated it should be relatively simple to address JDK-8356803 ("Test TextLayout/TestControls fails on windows & linux: line and paragraph separator show non-zero advance") and JDK-8356812 ("Create an automated version of TextLayout/TestControls").

prrace

Took me a while, but I think the code looks OK.
Automated testing passed, and I did some manual playing around in Font2DTest.

prrace · 2025-05-29T21:55:41Z

src/java.desktop/share/classes/sun/print/RasterPrinterJob.java

@@ -2482,7 +2484,7 @@ protected String removeControlChars(String s) {

        for (int i = 0; i < len; i++) {
            char c = in_chars[i];
-            if (c > '\r' || c < '\t' || c == '\u000b' || c == '\u000c')  {


hmm. looks like you are fixing a bug here. I think those == were supposed to be !=

Hah, I've had to check it a few times but I do think the code was correct -- just formulated in a confusing way, probably as a micro-optimization.

Although I do think we will end up wanting to add Vertical Tab and Form Feed to the list of ignored whitespace chars later, as part of JDK-8356803.

honkar-jdk · 2025-05-30T16:49:46Z

@gredler In the process of reviewing this PR.

Just a headsup, JDK 25 RDP1 is next week (https://openjdk.org/projects/jdk/25/) in case you are planning to get the following fixes into 25:
JDK-8356803 - Test TextLayout/TestControls fails on windows & linux: line and paragraph separator show non-zero advance
JDK-8356812 - Create an automated version of TextLayout/TestControls

gredler · 2025-05-30T17:27:10Z

@honkar-jdk Thanks! It shouldn't take me more than a day to address those two issues once this PR has been integrated, so I think it will just depend on how long the reviews take. I'll keep a close eye out to keep things quick on my end, though.

honkar-jdk

LGTM

gredler · 2025-05-30T18:51:34Z

/integrate

openjdk · 2025-05-30T18:52:00Z

@gredler
Your change (at version 95652a1) is now ready to be sponsored by a Committer.

honkar-jdk · 2025-05-30T19:15:29Z

/sponsor

openjdk · 2025-05-30T19:16:22Z

Going to push as commit 94039e2.
Since your change was applied there have been 94 commits pushed to the master branch:

b7ca672: 8357047: [ubsan] AdapterFingerPrint::AdapterFingerPrint runtime error: index 3 out of bounds
82807d4: 8357184: Test vmTestbase/nsk/jdi/ExceptionEvent/itself/exevent008/TestDescription.java fails with unreported exception
3cc6309: 8353955: nsk/jdi tests should be fixed to not always require includevirtualthreads=y
... and 91 more: https://git.openjdk.org/jdk/compare/bbceab072555d5e2f5d3e99ae07a5ca5e909d7dc...master

Your commit was automatically rebased without conflicts.

openjdk · 2025-05-30T19:16:29Z

@honkar-jdk @gredler Pushed as commit 94039e2.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

gredler added 2 commits April 2, 2025 22:38

Differentiate CharToGlyphMapper users who want raw vs normalized glyp…

a8e5905

…h info

Finish macOS implementation

3fdc790

openjdk bot added the rfr Pull request is ready for review label Apr 3, 2025

openjdk bot added the client [email protected] label Apr 3, 2025

gredler mentioned this pull request May 6, 2025

8350203: [macos] Newlines and tabs are not ignored when drawing text to a Graphics2D object #23665

Closed

3 tasks

Merge branch 'master' into JDK-8353230

fb27236

prrace reviewed May 26, 2025

View reviewed changes

Fix conflict with whitespace logic handling updates

c68240b

Remove one last now-unnecessary whitespace check inside CMap

95652a1

prrace approved these changes May 29, 2025

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label May 29, 2025

honkar-jdk approved these changes May 30, 2025

View reviewed changes

openjdk bot added the sponsor Pull request is ready to be sponsored label May 30, 2025

openjdk bot added the integrated Pull request has been integrated label May 30, 2025

openjdk bot closed this May 30, 2025

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels May 30, 2025

mrserb mentioned this pull request Jun 1, 2025

8357696: Enhance code consistency: java.desktop/unix #25439

Closed

3 tasks

8353230: Emoji rendering regression after JDK-8208377 #24412

8353230: Emoji rendering regression after JDK-8208377 #24412

Uh oh!

Conversation

gredler commented Apr 3, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Reviewing

Uh oh!

bridgekeeper bot commented Apr 3, 2025

Uh oh!

openjdk bot commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openjdk bot commented Apr 3, 2025

Uh oh!

mlbridge bot commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

YaaZ commented Apr 15, 2025

Uh oh!

gredler commented Apr 21, 2025

Uh oh!

prrace commented Apr 24, 2025

Uh oh!

YaaZ commented Apr 29, 2025

Uh oh!

gredler commented May 1, 2025

Uh oh!

YaaZ commented May 1, 2025

Uh oh!

prrace left a comment

Choose a reason for hiding this comment

Uh oh!

gredler commented May 26, 2025

Uh oh!

prrace commented May 26, 2025

Uh oh!

gredler commented May 27, 2025

Uh oh!

gredler commented May 28, 2025

Uh oh!

gredler commented May 28, 2025

Uh oh!

prrace left a comment

Choose a reason for hiding this comment

Uh oh!

prrace May 29, 2025

Choose a reason for hiding this comment

Uh oh!

gredler May 30, 2025

Choose a reason for hiding this comment

Uh oh!

gredler May 30, 2025

Choose a reason for hiding this comment

Uh oh!

honkar-jdk commented May 30, 2025

Uh oh!

gredler commented May 30, 2025

Uh oh!

honkar-jdk left a comment

Choose a reason for hiding this comment

Uh oh!

gredler commented May 30, 2025

Uh oh!

openjdk bot commented May 30, 2025

Uh oh!

honkar-jdk commented May 30, 2025

Uh oh!

openjdk bot commented May 30, 2025

Uh oh!

openjdk bot commented May 30, 2025

Uh oh!

Uh oh!

gredler commented Apr 3, 2025 •

edited by openjdk bot

Loading

openjdk bot commented Apr 3, 2025 •

edited

Loading

mlbridge bot commented Apr 3, 2025 •

edited

Loading