Skip to content

8353230: Emoji rendering regression after JDK-8208377 #24412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

gredler
Copy link
Contributor

@gredler gredler commented Apr 3, 2025

It looks like this regression actually fits into a longer series of fixes / regressions in this area:

  • JDK-4517298 fixed metrics for zero-width characters, but broke some ligatures / glyph substitutions
  • JDK-7017058 fixed the ligatures / glyph substitutions, but broke some zero-width metrics
  • JDK-8208377 fixed some metrics and rendering for zero-width characters, but broke some ligatures / glyph substitutions
  • Now, with this PR, we aim to fix the ligatures without re-breaking zero-width metrics and display

We have two different types of use cases pulling CharToGlyphMapper in two different directions: the users who need raw, untransformed glyph info, and the users who need normalized / transformed glyph info.

It looks to me like, in the current code base, the only CharToGlyphMapper user which requires raw font data is HarfBuzz (explicitly confirmed with the HarfBuzz team here: harfbuzz/harfbuzz#5234).

The regression mechanism at play here is that the HarfBuzz font callbacks are currently providing HarfBuzz with transformed glyph info (e.g. ZWJ -> INVISIBLE_GLYPH_ID), which prevents HarfBuzz from recognizing and applying the correct font GSUB substitutions (which involve ZWJ).

In order to fix this without (yet again) breaking metrics and display behavior elsewhere, I've added two methods to CharToGlyphMapper which provide access to raw glyph info, to be used by the HarfBuzz font callbacks: charToGlyphRaw(int) and charToVariationGlyphRaw(int).

Note two intricacies related to CompositeGlyphMapper:

  1. We need to be careful to only cache raw (untransformed) values, to avoid conflicts between requests for a raw version of a glyph and a transformed version of the same glyph. Another option would have been two separate caches, but I don't think that's necessary.
  2. Consumers who are using CompositeGlyphMapper.SLOTMASK to check glyph slots (e.g. FontRunIterator and CTextPipe) will "see" invisible glyphs as having come from slot 0. This isn't new, and I think it's OK, but something to be aware of.

The glyph cache handling in CCharToGlyphMapper (for macOS) also requires care to avoid mixing value types.

Please also note that I'm not sure if the tweak to sunFont.c is being tested, since FFM is being used by default for HarfBuzz integration. (Is there a plan to remove the JNI version soon?)

This PR includes a self-contained regression test. It includes a small font created just for this test, which exercises the ligature / glyph substitution infrastructure. The font tests, including the new regression test, all pass locally on Linux, Windows and macOS (make test TEST="jtreg:test/jdk/java/awt/font").

Interestingly, the changes for JDK-7017058 (mentioned above) included a test (ZWJLigatureTest) which I think would have caught this last regression, but it depends on optional Windows fonts which I guess do not exist on any commonly-used test infrastructure. This should not be an issue with the new test, since it does not depend on any external fonts.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8353230: Emoji rendering regression after JDK-8208377 (Bug - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24412/head:pull/24412
$ git checkout pull/24412

Update a local copy of the PR:
$ git checkout pull/24412
$ git pull https://git.openjdk.org/jdk.git pull/24412/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 24412

View PR using the GUI difftool:
$ git pr show -t 24412

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24412.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 3, 2025

👋 Welcome back dgredler! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 3, 2025

@gredler This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8353230: Emoji rendering regression after JDK-8208377

Reviewed-by: prr, honkar

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 94 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@prrace, @honkar-jdk) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 3, 2025
@openjdk
Copy link

openjdk bot commented Apr 3, 2025

@gredler The following label will be automatically applied to this pull request:

  • client

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@mlbridge
Copy link

mlbridge bot commented Apr 3, 2025

Webrevs

@YaaZ
Copy link
Member

YaaZ commented Apr 15, 2025

We had similar emoji-related regressions at JetBrains. Although our font-related code diverged from OpenJDK a bit, porting this patch seems to resolve them too. I am not an OpenJDK reviewer, but LGTM nevertheless.

@gredler
Copy link
Contributor Author

gredler commented Apr 21, 2025

@YaaZ Thanks for the information!

@prrace Have you had a chance to look at this PR?

@prrace
Copy link
Contributor

prrace commented Apr 24, 2025

@YaaZ Thanks for the information!

@prrace Have you had a chance to look at this PR?

It passed all the testing I did. I still need to look hard at the changes.

@YaaZ
Copy link
Member

YaaZ commented Apr 29, 2025

By the way, I see that in each implementation, both charToGlyph and charToGlyphRaw call a common method, like getGlyph(int uniciode, boolean raw). At first there was just charToGlyph, then charToVariationGlyph was added and now you added a "raw" version for each of them, I see that in the future we will need other variants and how it's already starting an exponential explosion. Overriding all of those methods in each implementation brings quite a bit of boilerplate, and it becomes easier to miss something. Maybe take a step back and refactor this into a single charToGlyph(int unicode, int variationSelector, boolean raw) version?
Also, this raw parameter only really controls isDefaultIgnorable check in the end of each method. Maybe we could factor this out without bringing it separately into each mapper implementation?

@gredler
Copy link
Contributor Author

gredler commented May 1, 2025

@YaaZ: Thanks for the additional feedback, please see my thoughts below:

By the way, I see that in each implementation, both charToGlyph and charToGlyphRaw call a common method, like getGlyph(int uniciode, boolean raw). At first there was just charToGlyph, then charToVariationGlyph was added and now you added a "raw" version for each of them, I see that in the future we will need other variants and how it's already starting an exponential explosion.

I don't know if I would call two changes to CharToGlyphMapper in 20 years an exponential explosion, but I get your point :-)

Overriding all of those methods in each implementation brings quite a bit of boilerplate, and it becomes easier to miss something.

True, but again keep in mind that there are only 5 implementations, only one of which (the macOS CCharToGlyphMapper) has been added in the last 20 years.

Maybe take a step back and refactor this into a single charToGlyph(int unicode, int variationSelector, boolean raw) version?

We'd still need separate methods for int vs. char, but I think this might reduce 5 methods down to 3? The changeset would be a bit more intrusive (lots of callers would need to change to reflect the new method signature). I'd be interested to hear thoughts from some of the reviewers on this one.

Also, this raw parameter only really controls isDefaultIgnorable check in the end of each method. Maybe we could factor this out without bringing it separately into each mapper implementation?

I prefer to think of it as controlling whether or not any transformations to INVISIBLE_GLYPH_ID happen (right now it's just for default-ignorable characters, but there may be other scenarios in the future, e.g. \r, \n and \t which currently are handled elsewhere).

Any ideas for what this refactoring might look like?

@YaaZ
Copy link
Member

YaaZ commented May 1, 2025

I was talking about the explosion because there is a scenario in my mind, which I didn't make clear for everybody else. There is a change which I didn't have time to contribute, but would like to: it's related to composite fonts and variation selectors. We may need 2 variants for retrieving a glyph with a variation selector - one strictly matching a variation selector and another with a fallback to the base glyph, multiplied by raw/transformed versions, which adds 2 more methods. Not like it's a big problem, but given that they all end up calling a single method anyway... You get the point.

there may be other scenarios in the future, e.g. \r, \n and \t which currently are handled elsewhere).

Are those scenarios specific to a patricular mapper/font type? I was thinking that those transformations are generic.

Any ideas for what this refactoring might look like?

I was thinking about moving this default-ignorable or any potential generic transformation into base CharToGlyphMapper or even Font2D. For example, make default implementation of CharToGlyphMapper.charToGlyph check ignorable characters and then call charToGlyphRaw - then other implementations would only need to override charToGlyphRaw.

Copy link
Contributor

@prrace prrace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this one fell off the radar. I see my testing - a long time ago now - passed.
The code changes make sense, so I'm ready to approve.
But its been so long I think I should re-test with the latest version on the latest repo.

@gredler
Copy link
Contributor Author

gredler commented May 26, 2025

@prrace Please don't approve yet. I was able to fix the code conflicts caused by PR #23665, but it looks like the logic does not layer nicely as there is now a test failure on macOS after the merge. I'm having a look and will let you know what I find.

@prrace
Copy link
Contributor

prrace commented May 26, 2025

@prrace Please don't approve yet. I was able to fix the code conflicts caused by PR #23665, but it looks like the logic does not layer nicely as there is now a test failure on macOS after the merge. I'm having a look and will let you know what I find.

And indeed the tests just finished and a test (not the new one, a previous one : IgnoredWhitespaceTest.java) failed on macOS x64 and ARM ..

java.lang.RuntimeException: for text '\t\t\t\t\tXXXXX' with font java.awt.Font[family=Dialog,name=Dialog,style=plain,size=40]: java.awt.Rectangle[x=300,y=271,width=123,height=28] != java.awt.Rectangle[x=365,y=271,width=123,height=28]
at IgnoredWhitespaceTest.assertEqual(IgnoredWhitespaceTest.java:127)
at IgnoredWhitespaceTest.test(IgnoredWhitespaceTest.java:103)
at IgnoredWhitespaceTest.test(IgnoredWhitespaceTest.java:69)
at IgnoredWhitespaceTest.main(IgnoredWhitespaceTest.java:49)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
at java.base/java.lang.reflect.Method.invoke(Method.java:565)
at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138)
at java.base/java.lang.Thread.run(Thread.java:1447)

JavaTest Message: Test threw exception: java.lang.RuntimeException: for text '\t\t\t\t\tXXXXX' with font java.awt.Font[family=Dialog,name=Dialog,style=plain,size=40]: java.awt.Rectangle[x=300,y=271,width=123,height=28] != java.awt.Rectangle[x=365,y=271,width=123,height=28]
JavaTest Message: shutting down test

@gredler
Copy link
Contributor Author

gredler commented May 27, 2025

Yep, that's the test added in PR #23665 (with which this PR had a conflict).

@gredler
Copy link
Contributor Author

gredler commented May 28, 2025

@YaaZ I had a look at your suggestion to push up the raw checks into the CharToGlyphMapper superclass and the complicating factor are the three charsToGlyphs methods which take arrays. These are handled differently in each subclass, and even if we were able to refactor all subclass implementations to ensure that they all call into the public single-char superclass methods (where we would add the raw checks), it would be relatively brittle (not obvious to future maintainers that the call to the superclass method cannot be removed). The current approach does require each subclass to handle the raw boolean, but it also allows each subclass to internally do that in a unified, simplified way. Please feel free to prototype something on your end -- I'm open to helping with future improvements in this area, if feasible.

@gredler
Copy link
Contributor Author

gredler commented May 28, 2025

@prrace This is ready for review again. I've updated the code to combine nicely with the recent whitespace fixes. The test/jdk/java/awt/font, test/jdk/java/awt/print and test/jdk/java/awt/Graphics2D/DrawString tests all pass for me locally on Windows, Linux and macOS.

Once these changes are integrated it should be relatively simple to address JDK-8356803 ("Test TextLayout/TestControls fails on windows & linux: line and paragraph separator show non-zero advance") and JDK-8356812 ("Create an automated version of TextLayout/TestControls").

Copy link
Contributor

@prrace prrace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a while, but I think the code looks OK.
Automated testing passed, and I did some manual playing around in Font2DTest.

@@ -2482,7 +2484,7 @@ protected String removeControlChars(String s) {

for (int i = 0; i < len; i++) {
char c = in_chars[i];
if (c > '\r' || c < '\t' || c == '\u000b' || c == '\u000c') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. looks like you are fixing a bug here. I think those == were supposed to be !=

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, I've had to check it a few times but I do think the code was correct -- just formulated in a confusing way, probably as a micro-optimization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I do think we will end up wanting to add Vertical Tab and Form Feed to the list of ignored whitespace chars later, as part of JDK-8356803.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 29, 2025
@honkar-jdk
Copy link
Contributor

@gredler In the process of reviewing this PR.

Just a headsup, JDK 25 RDP1 is next week (https://openjdk.org/projects/jdk/25/) in case you are planning to get the following fixes into 25:
JDK-8356803 - Test TextLayout/TestControls fails on windows & linux: line and paragraph separator show non-zero advance
JDK-8356812 - Create an automated version of TextLayout/TestControls

@gredler
Copy link
Contributor Author

gredler commented May 30, 2025

@honkar-jdk Thanks! It shouldn't take me more than a day to address those two issues once this PR has been integrated, so I think it will just depend on how long the reviews take. I'll keep a close eye out to keep things quick on my end, though.

Copy link
Contributor

@honkar-jdk honkar-jdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gredler
Copy link
Contributor Author

gredler commented May 30, 2025

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label May 30, 2025
@openjdk
Copy link

openjdk bot commented May 30, 2025

@gredler
Your change (at version 95652a1) is now ready to be sponsored by a Committer.

@honkar-jdk
Copy link
Contributor

/sponsor

@openjdk
Copy link

openjdk bot commented May 30, 2025

Going to push as commit 94039e2.
Since your change was applied there have been 94 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label May 30, 2025
@openjdk openjdk bot closed this May 30, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels May 30, 2025
@openjdk
Copy link

openjdk bot commented May 30, 2025

@honkar-jdk @gredler Pushed as commit 94039e2.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client [email protected] integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants