Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: mb_convert_encoding PHP 8.2 deprecation #52

Merged
merged 10 commits into from
Nov 28, 2024

Conversation

shvlv
Copy link
Contributor

@shvlv shvlv commented Feb 27, 2024

Please check if the PR fulfills these requirements

  • The commit message follows our guidelines
  • Tests for the changes have been added (for bug fixes/features)
  • Docs have been added/updated (for bug fixes/features)

What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
Fix.

What is the current behavior? (You can also link to an open issue here)
I added minor improvements on the top of @nlemoine's PR #49. It was inspired by very good SO answer - https://stackoverflow.com/a/8218649/13962131 and we tested same change for several projects already where we need HTML processing.

So the problem is (https://www.php.net/manual/en/function.mb-convert-encoding.php):

from 8.2.0 mb_convert_encoding() will no longer return the following non text encodings: "Base64", "QPrint", "UUencode", "HTML entities", "7 bit" and "8 bit".

What is the new behavior (if this is a feature change)?

The right way to do things is mb_encode_numericentity.

DOMDocument::loadHTML loads ISO-8859-1 by default. So for map argument, we choose:

  • start_code - 128 (0x80). Anything above ISO-8859-1 that supports 0-127.
  • end_code - I think using the maximum Unicode value U+10FFFF is better.
  • offset - 0 as we don't need to change code
  • mask - we don't need to change anything so that we can repeat the end_code value. But I like the idea of ~0 that produces the maximal byte for your system. i.e. 1111111111111111111111111111111111111111111111111111111111111111 for a 64bit system. It guarantees the mask never changes its actual value.

Regarding html_entity_decode, I think it's an important addition because we can leave HTML entities for HTML content, and the browser handles them as expected. But we don't want to break Unicode characters inside JS. So if the user has var title = 'Lösungen', changing to var = 'Lösungen' is not good.

Also, tests are added, including the character from the Unicode top plate - https://util.unicode.org/UnicodeJsps/character.jsp?a=2000B. I.e. 𠀋

Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)
No.

Other information:

@shvlv shvlv requested a review from Chrico February 27, 2024 15:05
@nlemoine
Copy link
Contributor

Thanks you @shvlv, this solves PHP 8.2 deprecations.

However, I stumbled upon another bug today. Here's the context.

I'm using Vite to build my assets (with a custom Vite loader). In dev mode, I have to include vite client to handle hot reload, etc.

The default URL for vite server is http://[::1]:5173/path/to/build/@vite/client. When this goes through DomDocument, the URL ends up encoded like so:

"http://%5B::1%5D:5173/path/to/build/@vite/client

html_entity_decode doesn't take effect here. I couldn't find a solution to this issue.

I was wondering if instead of DomDocument (which has a lot of quirks and can be heavy on memory) and since the only manipulations achieved here targets attributes, you could leverage the new Tag processor API introduced in 6.2?

Of course, that would mean a WP 6.2 requirement on the other hand.

6.3 also introduced native support for defer and async.

Copy link

codecov bot commented Mar 15, 2024

Codecov Report

Attention: Patch coverage is 0% with 9 lines in your changes missing coverage. Please review.

Project coverage is 88.42%. Comparing base (391e140) to head (1180455).
Report is 13 commits behind head on master.

Files with missing lines Patch % Lines
src/OutputFilter/AttributesOutputFilter.php 0.00% 9 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master      #52      +/-   ##
============================================
- Coverage     91.02%   88.42%   -2.60%     
+ Complexity      282      281       -1     
============================================
  Files            21       21              
  Lines           836      821      -15     
============================================
- Hits            761      726      -35     
- Misses           75       95      +20     
Flag Coverage Δ
unittests 88.42% <0.00%> (-2.60%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@shvlv
Copy link
Contributor Author

shvlv commented Mar 18, 2024

After internal discussion, the new approach with WordPress HTML API was implemented. The following points should be discussed additionally:

  1. Do we consider the current change as a breaking change, or will triggering deprecation be enough?
  2. We need integration tests for this change. I added database-less tests as PoC. Maybe we can consider a different approach here.

@nlemoine
Copy link
Contributor

nlemoine commented Apr 8, 2024

Quick note: from 6.5, there is now a wp_enqueue_script_module function.

@Chrico Chrico linked an issue May 10, 2024 that may be closed by this pull request
1 task
@nlemoine
Copy link
Contributor

@shvlv Friendly bump, any progress on this?

@shvlv
Copy link
Contributor Author

shvlv commented Aug 8, 2024

@nlemoine Sorry for the delay, we have a vacation time 😄 will proceed as soon as possible

@nlemoine
Copy link
Contributor

nlemoine commented Nov 13, 2024

Hello @shvlv, would you mind having a look at this PR? 🙂

Copy link
Member

@tyrann0us tyrann0us left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I have two comments but they don't stop me from approving.

  1. I'm not sure how much we should care about WordPress < 6.2. It might be enough simply returning $html without triggering an error message (and have a test for it).
  2. If the QA issues in the unchanged files are quick wins, it would be great to have them fixed according to the Boy Scout rules.

src/OutputFilter/AttributesOutputFilter.php Outdated Show resolved Hide resolved
src/OutputFilter/AttributesOutputFilter.php Outdated Show resolved Hide resolved
@shvlv
Copy link
Contributor Author

shvlv commented Nov 28, 2024

  • I'm not sure how much we should care about WordPress < 6.2. It might be enough simply returning $html without triggering an error message (and have a test for it).

Done partially. We decided to postpone integration testing for now.

  • If the QA issues in the unchanged files are quick wins, it would be great to have them fixed according to the Boy Scout rules.

It is better to handle it in the standalone PR.

@tyrann0us tyrann0us merged commit 2b2d2ac into master Nov 28, 2024
11 of 14 checks passed
@tyrann0us tyrann0us deleted the fix-php82-deprecation-improvement branch November 28, 2024 11:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request]: remove deprecation warning
3 participants