-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use more robust HTML Tag Processor for auto sizes injection #1471
Use more robust HTML Tag Processor for auto sizes injection #1471
Conversation
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.
To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
When i try to add unit test for the add_filter(
'get_image_tag',
static function ( $html ) {
return str_replace(
'" />',
'" loading="lazy" Sizes="(max-width: 1000px) 100vw, 1000px" />',
$html
);
}
);
$image_tag = $this->get_image_tag( self::$image_id );
$image_content = wp_filter_content_tags( $image_tag );
var_dump($image_content); Result: <img
decoding="async"
src="http://localhost:8889/wp-content/uploads/2024/08/leaves-303-1024x683.jpg"
alt=""
width="1024"
height="683"
class="align size-large wp-image-4"
loading="lazy"
sizes="auto, (max-width: 1000px) 100vw, 1000px"
srcset="
http://localhost:8889/wp-content/uploads/2024/08/leaves-303-1024x683.jpg 1024w,
http://localhost:8889/wp-content/uploads/2024/08/leaves-303-300x200.jpg 300w,
http://localhost:8889/wp-content/uploads/2024/08/leaves-303-768x512.jpg 768w,
http://localhost:8889/wp-content/uploads/2024/08/leaves-303.jpg 1080w
"
sizes="(max-width: 1024px) 100vw, 1024px"
/> Possible bug in WP_HTML_Tag_ProcessorThe HTML tag process does not check the text case when Given this code: $img_markup = '<p><img SIZES="(max-width: 100px) 100vw, 100px" Sizes="(max-width: 200px) 100vw, 200px" sizes="(max-width: 300px) 100vw, 300px" /></p>';
$processor = new WP_HTML_Tag_Processor( $img_markup );
$processor->next_tag( array( 'tag_name' => 'IMG' ) );
$sizes = $processor->get_attribute( 'sizes' );
var_dump( $sizes );
// Result: (max-width: 100px) 100vw, 100px
$processor->set_attribute( 'sizes', '(max-width: 400px) 100vw, 400px' );
var_dump( $processor->get_updated_html() );
// Result: <p><img sizes="(max-width: 400px) 100vw, 400px" Sizes="(max-width: 200px) 100vw, 200px" sizes="(max-width: 300px) 100vw, 300px" /></p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mukeshpanchal27 Code changes look solid to me, and from a code perspective this is definitely more robust. However, we need to consider performance too. Could you run a benchmark comparison with vs without this change though? Maybe for posts with 1, 5, 20 images to have a few data points?
We need to verify that this is not causing a performance regression. A few months back when @kt-12 tried using WP_HTML_Tag_Processor
for something similar in core it notably affected performance, which is why that exploration was discarded. I believe WP_HTML_Tag_Processor
has become a lot faster since, but we need to still verify that this is not causing a performance regression.
Marking this with changes requested, only because I think it is critical we verify that before merging.
Nonetheless, we're already using |
Also, even if not using |
Were any performance benchmarks done for those? I guess for new code, it's a slightly different story because you would need to write both versions of the code to compare. Either way, my concern applies to those usages equally. We don't need to benchmark all of them, but I'd like to at least see that we do this exercise in one place and verify that Again, going back to a point I've made before: Rather than implementing this here, if we think this is how it should work, we should change the WP core implementation to use
Of course, but that is more likely when it comes to more complex operations/mutations. But I don't think we have established in any of the regexes in question that they produce broken HTML. |
The broken HTML is demonstrated in the test cases listed in #1446. |
@mukeshpanchal27 Why are you adding multiple sizes attributes to the same tag. AFAIK, the HTML spec leaves this behavior undefined. The test cases should only have one |
All of those would be problems in WP core too. They're edge-cases though. But I get it's fair feedback that this should be possible. Yet still, I think we're focusing these efforts in the wrong place. Why do we want to fix something in this plugin that doesn't work in core in the first place? |
Yeah, we should fix them in core too. But that will take time when we can get this out the door now as part of the feature plugin (and to dogfood the new API for eventual use in core). The scope here is just to add the
|
I'm supportive of this landing after we've conducted a benchmark comparison that this doesn't have adverse performance effects. Otherwise, I think there's not enough justification to fix this right here, and we should rather see if we can fix it at the root (in WP core) in a performant way. |
IMO, there is justification for the fix since it prevents corrupting HTML. For example, if an image has a In any case, I did some benchmarking. I created a vanilla site in Local with the latest build of the Enhanced Responsive Images plugin from I created test posts consisting of image blocks which begin with 3 unique images, followed by many duplicate images of a 4th image. All images starting with the 4th are lazy-loaded and thus also get I used I performed these tests: 1,000 requests to post with 100 imagesGiven this file:
I ran: npm run research -- benchmark-server-timing -f one-hundred-images-urls.txt -n 1000 --output=csv Results:
The Here are the results of another run:
Again, the And here the results of a third run:
Yet again, the 100 requests to post with 10,000 imagesThis is a laughable number of images so it is probably not a good test case. In any case, given this file:
I ran: npm run research -- benchmark-server-timing -f ten-thousand-images-urls.txt -n 1000 --output=csv Results:
Here HTML Tag Parser is slower by 0.18%. But I don't see this as being significant, especially given the insane number of images. These results seem to indicate that using HTML Tag Processor does not have adverse performance effects. |
I did some benchmark for script execution over 1,000 iterations. This comparison shows that while both functions( Average time spent for 1000 x 1 iterations: auto_sizes_update_content_img_tag() - 9.1886520385742E-7 auto_sizes_update_content_img_tag_with_html_tag_processor() - 1.8219947814941E-6 Change: 98.3%
Average time spent for 1000 x 1 iterations: auto_sizes_update_content_img_tag() - 4.7612190246582E-7 auto_sizes_update_content_img_tag_with_html_tag_processor() - 9.7560882568359E-7 Change: 104.9%
Average time spent for 1000 x 1 iterations: auto_sizes_update_content_img_tag() - 5.1116943359375E-7 auto_sizes_update_content_img_tag_with_html_tag_processor() - 1.0497570037842E-6 Change: 105.4%
Average time spent for 1000 x 1 iterations: auto_sizes_update_content_img_tag() - 5.3501129150391E-7 auto_sizes_update_content_img_tag_with_html_tag_processor() - 1.014232635498E-6 Change: 89.6%
Average time spent for 1000 x 1 iterations: auto_sizes_update_content_img_tag() - 6.6995620727539E-7 auto_sizes_update_content_img_tag_with_html_tag_processor() - 1.2216567993164E-6 Change: 82.3%
Average time spent for 1000 x 1 iterations: auto_sizes_update_content_img_tag() - 5.7649612426758E-7 auto_sizes_update_content_img_tag_with_html_tag_processor() - 1.1348724365234E-6 Change: 96.9%
Average time spent for 1000 x 1 iterations: auto_sizes_update_content_img_tag() - 5.4311752319336E-7 auto_sizes_update_content_img_tag_with_html_tag_processor() - 1.1184215545654E-6 Change: 105.9%
Average time spent for 1000 x 1 iterations: auto_sizes_update_content_img_tag() - 6.1535835266113E-7 auto_sizes_update_content_img_tag_with_html_tag_processor() - 1.1684894561768E-6 Change: 89.9%
Average time spent for 1000 x 1 iterations: auto_sizes_update_content_img_tag() - 5.7220458984375E-7 auto_sizes_update_content_img_tag_with_html_tag_processor() - 1.0898113250732E-6 Change: 90.5%
Average time spent for 1000 x 1 iterations: auto_sizes_update_content_img_tag() - 4.6300888061523E-7 auto_sizes_update_content_img_tag_with_html_tag_processor() - 9.6726417541504E-7 Change: 108.9% |
@westonruter @mukeshpanchal27 Those benchmarks show very different results. Not sure what to make of it. 🤔 On the one hand, the I guess we can probably move forward with this, given overall impact seems negligible. But it also is fair to conclude that it is slower. Just maybe not slower enough for us to care. :) In any case, I think we should definitely explore replacing the regexes with |
@mukeshpanchal27 did you know you can pass Also, I think these tests are good to examine, but also not draw major conclusions from a percentage. If I'm reading the code properly, it's suggesting that on a post with 10,000 images, WordPress spends 0.903 µs more to protect against common sources of corruption and security issues. This is an extreme case, and "double" the time amounts to less than a microsecond. Are we willing to spend 100 ps per But I think there are some notes about this benchmark and the one @westonruter did:
My attempts:
So I'll leave this here for now. Benchmarking is hard. We can hold long discussions about numbers that aren't measuring what we think they are, if we're not careful. The main question I ask is whether something is fast enough and worth the cost. It's hard for me to tell if either of these synthetic benchmarks means anything because both represent cases that are unlike typical WordPress requests. It would be interesting to examine a full page render for a block theme'd website with a typical post containing lots of text, lots of markup, and a few images. It's likely that any impact on performance would be so far under the noise floor of how the code runs that it'd be undetectable. My modified synthetic unrealistic benchmark code<?php
// Bootstrapping.
define('WP_USE_THEMES', true);
/** Loads the WordPress Environment and Template */
require ('wp-load.php');
// Setup.
$images =[
[ '<img src="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg" srcset="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg 300w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1024x768.jpg 1024w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-768x576.jpg 768w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1536x1152.jpg 1536w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-2048x1536.jpg 2048w" sizes="(max-width: 650px) 100vw, 650px" loading="lazy">' ],
[ '<img src="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg" srcset="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg 300w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1024x768.jpg 1024w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-768x576.jpg 768w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1536x1152.jpg 1536w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-2048x1536.jpg 2048w" sizes="(max-width: 650px) 100vw, 650px" loading="lazy">' ],
[ '<img src="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg" srcset="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg 300w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1024x768.jpg 1024w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-768x576.jpg 768w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1536x1152.jpg 1536w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-2048x1536.jpg 2048w" sizes="(max-width: 650px) 100vw, 650px" loading="lazy">' ],
[ '<img src="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg" srcset="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg 300w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1024x768.jpg 1024w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-768x576.jpg 768w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1536x1152.jpg 1536w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-2048x1536.jpg 2048w" sizes="(max-width: 650px) 100vw, 650px" loading="lazy">' ],
[ '<img src="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg" srcset="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg 300w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1024x768.jpg 1024w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-768x576.jpg 768w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1536x1152.jpg 1536w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-2048x1536.jpg 2048w" sizes="(max-width: 650px) 100vw, 650px" loading="lazy">' ],
[ '<img src="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg" srcset="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg 300w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1024x768.jpg 1024w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-768x576.jpg 768w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1536x1152.jpg 1536w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-2048x1536.jpg 2048w" sizes="(max-width: 650px) 100vw, 650px" loading="lazy">' ],
[ '<img src="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg" srcset="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg 300w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1024x768.jpg 1024w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-768x576.jpg 768w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1536x1152.jpg 1536w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-2048x1536.jpg 2048w" sizes="(max-width: 650px) 100vw, 650px" loading="lazy">' ],
[ '<img src="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg" srcset="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg 300w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1024x768.jpg 1024w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-768x576.jpg 768w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1536x1152.jpg 1536w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-2048x1536.jpg 2048w" sizes="(max-width: 650px) 100vw, 650px" loading="lazy">' ],
[ '<img src="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg" srcset="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg 300w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1024x768.jpg 1024w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-768x576.jpg 768w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1536x1152.jpg 1536w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-2048x1536.jpg 2048w" sizes="(max-width: 650px) 100vw, 650px" loading="lazy">' ],
[ '<img src="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg" srcset="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg 300w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1024x768.jpg 1024w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-768x576.jpg 768w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1536x1152.jpg 1536w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-2048x1536.jpg 2048w" sizes="(max-width: 650px) 100vw, 650px" loading="lazy">' ]
];
$repetitions = 1000;
$iterations = 1;
$stringcount = count( $images );
function auto_sizes_update_content_img_tag( $html ): string {
if ( ! is_string( $html ) ) {
$html = '';
}
if ( 1 !== preg_match( '/<img[^>]+>/', $html, $img_match, PREG_OFFSET_CAPTURE ) ) {
return $html;
}
// Bail early if the image is not lazy-loaded.
if ( false === strpos( $img_match[0][0], 'loading="lazy"' ) ) {
return $html;
}
// Bail early if the image is not responsive.
if ( 1 !== preg_match( '/sizes="([^"]+)"/', $img_match[0][0], $match, PREG_OFFSET_CAPTURE ) ) {
return $html;
}
// Don't add 'auto' to the sizes attribute if it already exists.
if ( auto_sizes_attribute_includes_valid_auto( $match[1][0] ) ) {
return $html;
}
$start_of_sizes = $img_match[0][1] + $match[1][1];
return substr( $html, 0, $start_of_sizes ) . 'auto, ' . substr( $html, $start_of_sizes );
}
function auto_sizes_update_content_img_tag_with_html_tag_processor( $html ): string {
if ( ! is_string( $html ) ) {
$html = '';
}
$processor = new WP_HTML_Tag_Processor( $html );
// Bail if there is no IMG tag.
if ( ! $processor->next_tag( array( 'tag_name' => 'IMG' ) ) ) {
return $html;
}
// Bail early if the image is not lazy-loaded.
if ( 'lazy' !== $processor->get_attribute( 'loading' ) ) {
return $html;
}
$sizes = $processor->get_attribute( 'sizes' );
// Bail early if the image is not responsive.
if ( ! is_string( $sizes ) ) {
return $html;
}
// Don't add 'auto' to the sizes attribute if it already exists.
if ( auto_sizes_attribute_includes_valid_auto( $sizes ) ) {
return $html;
}
$processor->set_attribute( 'sizes', "AUTO, $sizes" ); // Upper-case AUTO just to make it clear when this version is running.
return $processor->get_updated_html();
}
function auto_sizes_attribute_includes_valid_auto( string $sizes_attr ): bool {
$token = strtok( strtolower( $sizes_attr ), ',' );
return false !== $token && 'auto' === trim( $token, " \t\f\r\n" );
}
// Profiling.
$groups = array( 'naive' => 0, 'secure' => 0 );
for ( $repetition = 0; $repetition <= $repetitions; $repetition++ ) {
$group = array_rand( $groups );
$method = 'naive' === $group
? 'auto_sizes_update_content_img_tag'
: 'auto_sizes_update_content_img_tag_with_html_tag_processor';
// Pre-run
for ( $index = 0; $index < $iterations; $index ++ ) {
$image = $images[ $index % $stringcount ];
call_user_func( $method, $image[0] );
}
$time_start = hrtime(true);
for ( $index = 0; $index < $iterations; $index ++ ) {
$image = $images[ $index % $stringcount ];
$result = call_user_func( $method, $image[0] );
}
$time_end = hrtime(true);
$groups[ $group ] += $time_end - $time_start;
}
$percentage = $groups['secure'] / $groups['naive'] * 100;
$n = new NumberFormatter( 'en-US', \NumberFormatter::DEFAULT_STYLE );
echo "Average time spent for {$repetitions} x {$iterations} iterations:\n";
echo "auto_sizes_update_content_img_tag() - {$n->format($groups['naive'] / $iterations)} ns\n";
echo "auto_sizes_update_content_img_tag_with_html_tag_processor() - {$n->format($groups['secure'] / $iterations)} ns\n\n";
echo "Change: {$n->format($percentage)}%\n"; |
I also forked the Gist from @mukeshpanchal27 to do synthetic benchmarking, including additional test case for the various logic paths in the In WordPress 6.6.1, I found that using Tag Processor is 10.53x slower. In WordPress 6.7-alpha, I found that using Tag Processor is 8.48x slower, due to ongoing performance improvements. And yet:
And:
Yes! 😄 |
Running some of my own benchmarks on a single post in Twenty Twenty-four with six images and posted the results below of the median runs for 50 requests measured using server timing. TL;DR: I'm not seeing any major performance concerns with this change. Given the number of identified bugs this guards against, I think we should land this PR. Before
ControlDoing a control run allows us to see general variability between benchmarks to consider whether any changes we see in the test case are due to normal variation.
This PR
|
Refactor test cases to use snapshots
I discovered two more cases that weren't considered for the
The keyword |
Thanks everyone. |
Summary
Fixes #1446
Based on the test cases mentioned in the issue description, in 1859613, I have added unit tests that clearly demonstrate the lack of robustness in the current codebase, resulting in failing tests.
Failed unit tests:
In 32b9abc, after updating the codebase using WP_HTML_Tag_Processor, the unit tests are now passing ✅.