Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use more robust HTML Tag Processor for auto sizes injection #1471

Merged
merged 4 commits into from
Aug 16, 2024

Conversation

mukeshpanchal27
Copy link
Member

@mukeshpanchal27 mukeshpanchal27 commented Aug 14, 2024

Summary

Fixes #1446

Based on the test cases mentioned in the issue description, in 1859613, I have added unit tests that clearly demonstrate the lack of robustness in the current codebase, resulting in failing tests.

Failed unit tests:
1) Test_AutoSizes::test_content_image_with_single_quote_replacement_does_not_have_auto_sizes
The sizes attribute does not include "auto" as the first item in the list.
Failed asserting that false is true.

/var/www/html/wp-content/plugins/performance/plugins/auto-sizes/tests/test-auto-sizes.php:2[40](https://github.com/WordPress/performance/actions/runs/10386066002/job/28756529008?pr=1471#step:9:41)
phpvfscomposer:///var/www/html/wp-content/plugins/performance/vendor/phpunit/phpunit/phpunit:97

2) Test_AutoSizes::test_content_image_with_custom_attribute_name_with_sizes_at_the_end_does_not_have_auto_sizes
The data-tshirt-sizes attribute should not contain "auto".
Failed asserting that 'auto, S M L' does not contain "auto".

/var/www/html/wp-content/plugins/performance/plugins/auto-sizes/tests/test-auto-sizes.php:262
phpvfscomposer:///var/www/html/wp-content/plugins/performance/vendor/phpunit/phpunit/phpunit:97

3) Test_AutoSizes::test_content_image_with_lazy_load_text_in_alt_does_not_have_auto_sizes
The sizes attribute does not include "auto" as the first item in the list.
Failed asserting that true is false.

/var/www/html/wp-content/plugins/performance/plugins/auto-sizes/tests/test-auto-sizes.php:285
phpvfscomposer:///var/www/html/wp-content/plugins/performance/vendor/phpunit/phpunit/phpunit:97

4) Test_AutoSizes::test_content_image_with_custom_attribute_name_with_loading_lazy_at_the_end_does_not_have_auto_sizes
The sizes attribute does not include "auto" as the first item in the list.
Failed asserting that true is false.

/var/www/html/wp-content/plugins/performance/plugins/auto-sizes/tests/test-auto-sizes.php:311
phpvfscomposer:///var/www/html/wp-content/plugins/performance/vendor/phpunit/phpunit/phpunit:97

In 32b9abc, after updating the codebase using WP_HTML_Tag_Processor, the unit tests are now passing ✅.

@mukeshpanchal27 mukeshpanchal27 added [Type] Enhancement A suggestion for improvement of an existing feature [Plugin] Enhanced Responsive Images Issues for the Enhanced Responsive Images plugin (formerly Auto Sizes) labels Aug 14, 2024
@mukeshpanchal27 mukeshpanchal27 added this to the auto-sizes 1.2.0 milestone Aug 14, 2024
@mukeshpanchal27 mukeshpanchal27 self-assigned this Aug 14, 2024
@mukeshpanchal27 mukeshpanchal27 marked this pull request as ready for review August 14, 2024 11:00
Copy link

github-actions bot commented Aug 14, 2024

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Co-authored-by: mukeshpanchal27 <[email protected]>
Co-authored-by: westonruter <[email protected]>
Co-authored-by: felixarntz <[email protected]>
Co-authored-by: joemcgill <[email protected]>
Co-authored-by: dmsnell <[email protected]>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@mukeshpanchal27
Copy link
Member Author

@westonruter @joemcgill

When i try to add unit test for the sizes attribute for any case it return second sizes attributes, one from the core and one that passed through filter. It updates the first sizes attribute as it didn't check case for sizes check below Possible bug in WP_HTML_Tag_Processor section for more details.

add_filter(
	'get_image_tag',
	static function ( $html ) {
		return str_replace(
			'" />',
			'" loading="lazy" Sizes="(max-width: 1000px) 100vw, 1000px" />',
			$html
		);
	}
);

$image_tag = $this->get_image_tag( self::$image_id );
$image_content = wp_filter_content_tags( $image_tag );

var_dump($image_content);

Result:

<img 
  decoding="async" 
  src="http://localhost:8889/wp-content/uploads/2024/08/leaves-303-1024x683.jpg" 
  alt="" 
  width="1024" 
  height="683" 
  class="align size-large wp-image-4" 
  loading="lazy" 
  sizes="auto, (max-width: 1000px) 100vw, 1000px" 
  srcset="
    http://localhost:8889/wp-content/uploads/2024/08/leaves-303-1024x683.jpg 1024w, 
    http://localhost:8889/wp-content/uploads/2024/08/leaves-303-300x200.jpg 300w, 
    http://localhost:8889/wp-content/uploads/2024/08/leaves-303-768x512.jpg 768w, 
    http://localhost:8889/wp-content/uploads/2024/08/leaves-303.jpg 1080w
  " 
  sizes="(max-width: 1024px) 100vw, 1024px"
/>

Possible bug in WP_HTML_Tag_Processor

The HTML tag process does not check the text case when get_attribute or set_attribute it will get or set first element and mark them lowercase.

Given this code:

$img_markup = '<p><img SIZES="(max-width: 100px) 100vw, 100px" Sizes="(max-width: 200px) 100vw, 200px" sizes="(max-width: 300px) 100vw, 300px" /></p>';
$processor = new WP_HTML_Tag_Processor( $img_markup );
$processor->next_tag( array( 'tag_name' => 'IMG' ) );
$sizes = $processor->get_attribute( 'sizes' );
var_dump( $sizes );

// Result: (max-width: 100px) 100vw, 100px

$processor->set_attribute( 'sizes', '(max-width: 400px) 100vw, 400px' );
var_dump( $processor->get_updated_html() );

// Result: <p><img sizes="(max-width: 400px) 100vw, 400px" Sizes="(max-width: 200px) 100vw, 200px" sizes="(max-width: 300px) 100vw, 300px" /></p>

Copy link
Member

@felixarntz felixarntz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mukeshpanchal27 Code changes look solid to me, and from a code perspective this is definitely more robust. However, we need to consider performance too. Could you run a benchmark comparison with vs without this change though? Maybe for posts with 1, 5, 20 images to have a few data points?

We need to verify that this is not causing a performance regression. A few months back when @kt-12 tried using WP_HTML_Tag_Processor for something similar in core it notably affected performance, which is why that exploration was discarded. I believe WP_HTML_Tag_Processor has become a lot faster since, but we need to still verify that this is not causing a performance regression.

Marking this with changes requested, only because I think it is critical we verify that before merging.

@westonruter
Copy link
Member

Marking this with changes requested, only because I think it is critical we verify that before merging.

Nonetheless, we're already using WP_HTML_Tag_Processor in several other places exactly like this, as mentioned in #1446.

@westonruter
Copy link
Member

Also, even if not using WP_HTML_Tag_Processor is faster, when the result may be broken HTML then it doesn't seem faster is always better.

@felixarntz
Copy link
Member

felixarntz commented Aug 14, 2024

Nonetheless, we're already using WP_HTML_Tag_Processor in several other places exactly like this, as mentioned in #1446.

Were any performance benchmarks done for those? I guess for new code, it's a slightly different story because you would need to write both versions of the code to compare. Either way, my concern applies to those usages equally.

We don't need to benchmark all of them, but I'd like to at least see that we do this exercise in one place and verify that WP_HTML_Tag_Processor isn't notably worse than the regexes from a performance perspective.

Again, going back to a point I've made before: Rather than implementing this here, if we think this is how it should work, we should change the WP core implementation to use WP_HTML_Tag_Processor on the whole content and then introduce new/alternative filters that receive that instance. One of my main concerns with what we're doing here is recreating new instances lots of times (for every image), which just seems inefficient.

Also, even if not using WP_HTML_Tag_Processor is faster, when the result may be broken HTML then it doesn't seem faster is always better.

Of course, but that is more likely when it comes to more complex operations/mutations. But I don't think we have established in any of the regexes in question that they produce broken HTML.

@westonruter
Copy link
Member

Of course. But I don't think we have established in any of the regexes in question that they produce broken HTML.

The broken HTML is demonstrated in the test cases listed in #1446.

@westonruter
Copy link
Member

@mukeshpanchal27 Why are you adding multiple sizes attributes to the same tag. AFAIK, the HTML spec leaves this behavior undefined. The test cases should only have one sizes attribute per tag. Also, I suggest not using wp_filter_content_tags() or $this->get_image_tag() and instead to use HTML string literals (e.g. test cases listed in #1446) for the entire <img> tag as input in the tests.

@felixarntz
Copy link
Member

The broken HTML is demonstrated in the test cases listed in #1446.

All of those would be problems in WP core too. They're edge-cases though. But I get it's fair feedback that this should be possible. Yet still, I think we're focusing these efforts in the wrong place. Why do we want to fix something in this plugin that doesn't work in core in the first place?

@westonruter
Copy link
Member

Yeah, we should fix them in core too. But that will take time when we can get this out the door now as part of the feature plugin (and to dogfood the new API for eventual use in core). The scope here is just to add the auto value for the sizes attribute in a way that doesn't result in broken HTML. Core isn't doing replacements of the sizes attribute in this way, so the breakages here are unique. As part of core merge, we should leverage WP_HTML_Tag_Processor. In the meantime, as @joemcgill recommended in #1445 (comment):

Extra performance overhead was true of the first iteration of the HTML Tag Processor, in WP 6.4 (I believe). However, a lot has been improved since then. I don't think we've conducted a thorough analysis of whether using it in cases like this still has potential negative performance implications. Given how useful this is and how quickly I've seen adoption throughout Core, I would be in support of us doing some additional analysis. In the mean time, I think we should use it where we can so we're dogfooding our own WP Core APIs.

@felixarntz
Copy link
Member

felixarntz commented Aug 14, 2024

I'm supportive of this landing after we've conducted a benchmark comparison that this doesn't have adverse performance effects.

Otherwise, I think there's not enough justification to fix this right here, and we should rather see if we can fix it at the root (in WP core) in a performant way.

@westonruter
Copy link
Member

IMO, there is justification for the fix since it prevents corrupting HTML. For example, if an image has a data-sizes attribute, then this will cause a breakage. There's also the other cases listed in #1446.

In any case, I did some benchmarking.

I created a vanilla site in Local with the latest build of the Enhanced Responsive Images plugin from trunk. I also installed the Performance Lab plugin and I enabled output buffering in order to obtain the Server-Timing header. Lastly, I also added an Auto-Sizes via HTML Tag Processor plugin which replaces the regex-based logic with HTML Tag Processor. This plugin includes a auto_sizes_via_html_tag_processor_disabled query parameter which preserves the original regex-based logic.

I created test posts consisting of image blocks which begin with 3 unique images, followed by many duplicate images of a 4th image. All images starting with the 4th are lazy-loaded and thus also get sizes=auto.

I used benchmark-server-timing to obtain the metrics for these two posts.

I performed these tests:

1,000 requests to post with 100 images

Given this file:

one-hundred-images-urls.txt:

http://localhost:10003/one-hundred-images/?auto_sizes_via_html_tag_processor_disabled
http://localhost:10003/one-hundred-images/

I ran:

npm run research -- benchmark-server-timing -f one-hundred-images-urls.txt -n 1000 --output=csv

Results:

Metric  Regex TagParser Diff
Response Time (median) 418.23 376.82 -41.41
wp-before-template (median) 14.62 14.74 0.12
wp-template (median) 55.66 55.27 -0.39
wp-total (median) 70.89 70.35 -0.54

The wp-template metric with HTML Tag Processor is 0.7% faster.

Here are the results of another run:

Metric Regex TagParser Diff
Response Time (median) 446.29 418.55 -27.74
wp-before-template (median) 16.38 14.61 -1.77
wp-template (median) 60.18 56.11 -4.07
wp-total (median) 76.83 71.27 -5.56

Again, the wp-template metric with HTML Tag Processor is faster, here by 6.76%.

And here the results of a third run:

Metric Regex TagParser Diff
Response Time (median) 441.15 389.09 -52.06
wp-before-template (median) 16.35 15.58 -0.77
wp-template (median) 60.34 57.77 -2.57
wp-total (median) 76.69 73.57 -3.12

Yet again, the wp-template metric with HTML Tag Processor is faster by 4.07%.

100 requests to post with 10,000 images

This is a laughable number of images so it is probably not a good test case. In any case, given this file:

ten-thousand-images-urls.txt:

http://localhost:10003/ten-thousand-images/?auto_sizes_via_html_tag_processor_disabled
http://localhost:10003/ten-thousand-images/

I ran:

npm run research -- benchmark-server-timing -f ten-thousand-images-urls.txt -n 1000 --output=csv

Results:

Metric  Regex TagParser Diff
Response Time (median) 1322.17 1324.17 2
wp-before-template (median) 34.11 34.61 0.5
wp-template (median) 1269.57 1270.87 1.3
wp-total (median) 1302.9 1305.26 2.36

Here HTML Tag Parser is slower by 0.18%. But I don't see this as being significant, especially given the insane number of images.


These results seem to indicate that using HTML Tag Processor does not have adverse performance effects.

@mukeshpanchal27
Copy link
Member Author

I did some benchmark for script execution over 1,000 iterations. This comparison shows that while both functions( auto_sizes_update_content_img_tag and new used in this PR that uses WP_HTML_Tag_Processor) are very fast, the new function (Use with WP_HTML_Tag_Processor) still takes almost twice the time to execute compared to the first function, as reflected in the ~97% increase in time.

Average time spent for 1000 x 1 iterations: auto_sizes_update_content_img_tag() - 9.1886520385742E-7 auto_sizes_update_content_img_tag_with_html_tag_processor() - 1.8219947814941E-6 Change: 98.3%
Average time spent for 1000 x 1 iterations: auto_sizes_update_content_img_tag() - 4.7612190246582E-7 auto_sizes_update_content_img_tag_with_html_tag_processor() - 9.7560882568359E-7 Change: 104.9%
Average time spent for 1000 x 1 iterations: auto_sizes_update_content_img_tag() - 5.1116943359375E-7 auto_sizes_update_content_img_tag_with_html_tag_processor() - 1.0497570037842E-6 Change: 105.4%
Average time spent for 1000 x 1 iterations: auto_sizes_update_content_img_tag() - 5.3501129150391E-7 auto_sizes_update_content_img_tag_with_html_tag_processor() - 1.014232635498E-6 Change: 89.6%
Average time spent for 1000 x 1 iterations: auto_sizes_update_content_img_tag() - 6.6995620727539E-7 auto_sizes_update_content_img_tag_with_html_tag_processor() - 1.2216567993164E-6 Change: 82.3%
Average time spent for 1000 x 1 iterations: auto_sizes_update_content_img_tag() - 5.7649612426758E-7 auto_sizes_update_content_img_tag_with_html_tag_processor() - 1.1348724365234E-6 Change: 96.9%
Average time spent for 1000 x 1 iterations: auto_sizes_update_content_img_tag() - 5.4311752319336E-7 auto_sizes_update_content_img_tag_with_html_tag_processor() - 1.1184215545654E-6 Change: 105.9%
Average time spent for 1000 x 1 iterations: auto_sizes_update_content_img_tag() - 6.1535835266113E-7 auto_sizes_update_content_img_tag_with_html_tag_processor() - 1.1684894561768E-6 Change: 89.9%
Average time spent for 1000 x 1 iterations: auto_sizes_update_content_img_tag() - 5.7220458984375E-7 auto_sizes_update_content_img_tag_with_html_tag_processor() - 1.0898113250732E-6 Change: 90.5%
Average time spent for 1000 x 1 iterations: auto_sizes_update_content_img_tag() - 4.6300888061523E-7 auto_sizes_update_content_img_tag_with_html_tag_processor() - 9.6726417541504E-7 Change: 108.9%

@felixarntz
Copy link
Member

@westonruter @mukeshpanchal27 Those benchmarks show very different results. Not sure what to make of it. 🤔

On the one hand, the npm run research results from @westonruter indicate that at the grand scheme of things, the performance impact is irrelevant. On the other hand, the cost analysis of just the specific algorithms from @mukeshpanchal27 indicate that the performance cost of WP_HTML_Tag_Processor is double.

I guess we can probably move forward with this, given overall impact seems negligible. But it also is fair to conclude that it is slower. Just maybe not slower enough for us to care. :)

In any case, I think we should definitely explore replacing the regexes with WP_HTML_Tag_Processor in WP core. There may even be a Trac ticket already, but I don't fully recall. If there is none, let's open one.

@dmsnell
Copy link
Member

dmsnell commented Aug 15, 2024

@mukeshpanchal27 did you know you can pass true to microtime(true) and get a float result? also, in benchmarks hrtime(true): nanoseconds can be handy because it returns a monotonic timer whereas microtime() returns system time, which fluctuates regularly. when we're measuring the impact of nanoseconds and picoseconds, system clock skew and jitter can have a measurable impact.

Also, I think these tests are good to examine, but also not draw major conclusions from a percentage. If I'm reading the code properly, it's suggesting that on a post with 10,000 images, WordPress spends 0.903 µs more to protect against common sources of corruption and security issues. This is an extreme case, and "double" the time amounts to less than a microsecond.

Are we willing to spend 100 ps per IMG tag to avoid introducing XSS attacks and break peoples' pages?

But I think there are some notes about this benchmark and the one @westonruter did:

  • it would be nice to randomly pick between running the test group and the control group.
  • the test code already has a bug in it where it's replacing all sizes=" substrings. I only point this out to note how easy it is for parsing bugs to sneak in. I guess it's worth noting too that the test code would replace the text content of a post writing use the sizes="(max-width: 900px) 50em" attribute to hint at image size. this is not about corrupting page renders though, so I'll stop here.
  • this benchmark compares 10 posts with 1 image each and his a single post with 1000 images. the different results are likely partially explainable by the fact that the more tags a snippet of code visits, the more its performance approaches and then falls behind the tag processor. for example, wp_html_split() is a good example, where Core is attempting to "visit" or split HTML in every token. the reason the Tag Processor is slower than naive parsers like strpos( 'sizes="' ) is because it visits every token in the document. by the time we're calling Core's functions like wptexturize() or wp_strip_all_tags() or wp_kses() we've already reverted to mechanisms which are not only broken, but slower and more memory heavy than the Tag Processor. that is, @westonruter's test results are going to favor the HTML API because the document is larger and it visits more <img> tags.
  • the absolute value of the difference in runtime here is so trivial it's hardly worth discussion, as @felixarntz noted - it's not significant.

My attempts:

  • replace microtime_float() with hrtime(true)
    • Tag Processor is reporting around 5x slower than the broken parser, with the difference per <img> is reporting around 30 ns (that's nanoseconds).
  • randomized the test groups
    • no change in results
  • call_user_func() for easier iteration randomization
    • no change in results
  • wait a second - @mukeshpanchal27 your benchmark is sending an array to the test functions. none of the replacement code ever runs. it's effectively comparing an early-abort in line 43 against an empty string to instantiating the Tag Processor on an empty string. benchmarking is hard. I found this because I was trying to ensure that the test group only modifies content inside the <img>
    • I've fixed the benchmark by passing $image[0] instead of $image
    • New results show Tag Processor being 10x slower
    • Fixing the bug in my modified benchmark shows a 17x slowdown.
    • Absolute difference is now higher at 450 ns / <img>, so much worse, but still less than a microsecond.
  • Added a preg_match( '/<img[^]+>/', $html ) statement based on existing Core code to find IMG and ensure that this replacement doesn't completely replace different content.
    • Brought the speed comparison back down to around 10x, again still less than half a microsecond per <img>.
  • As soon as I start adding other logic in the main loop the disparity between the two approaches diminishes. This goes for things like if ( strlen( $result ) > 50 ) { $count++; }. This should be obvious, but I tested to confirm anyway.

So I'll leave this here for now. Benchmarking is hard. We can hold long discussions about numbers that aren't measuring what we think they are, if we're not careful. The main question I ask is whether something is fast enough and worth the cost. It's hard for me to tell if either of these synthetic benchmarks means anything because both represent cases that are unlike typical WordPress requests. It would be interesting to examine a full page render for a block theme'd website with a typical post containing lots of text, lots of markup, and a few images. It's likely that any impact on performance would be so far under the noise floor of how the code runs that it'd be undetectable.

My modified synthetic unrealistic benchmark code
<?php

// Bootstrapping.

define('WP_USE_THEMES', true);

/** Loads the WordPress Environment and Template */
require ('wp-load.php');

// Setup.

$images =[
	[ '<img src="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg" srcset="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg 300w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1024x768.jpg 1024w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-768x576.jpg 768w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1536x1152.jpg 1536w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-2048x1536.jpg 2048w" sizes="(max-width: 650px) 100vw, 650px" loading="lazy">' ],
	[ '<img src="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg" srcset="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg 300w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1024x768.jpg 1024w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-768x576.jpg 768w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1536x1152.jpg 1536w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-2048x1536.jpg 2048w" sizes="(max-width: 650px) 100vw, 650px" loading="lazy">' ],
	[ '<img src="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg" srcset="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg 300w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1024x768.jpg 1024w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-768x576.jpg 768w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1536x1152.jpg 1536w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-2048x1536.jpg 2048w" sizes="(max-width: 650px) 100vw, 650px" loading="lazy">' ],
	[ '<img src="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg" srcset="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg 300w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1024x768.jpg 1024w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-768x576.jpg 768w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1536x1152.jpg 1536w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-2048x1536.jpg 2048w" sizes="(max-width: 650px) 100vw, 650px" loading="lazy">' ],
	[ '<img src="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg" srcset="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg 300w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1024x768.jpg 1024w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-768x576.jpg 768w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1536x1152.jpg 1536w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-2048x1536.jpg 2048w" sizes="(max-width: 650px) 100vw, 650px" loading="lazy">' ],
	[ '<img src="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg" srcset="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg 300w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1024x768.jpg 1024w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-768x576.jpg 768w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1536x1152.jpg 1536w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-2048x1536.jpg 2048w" sizes="(max-width: 650px) 100vw, 650px" loading="lazy">' ],
	[ '<img src="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg" srcset="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg 300w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1024x768.jpg 1024w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-768x576.jpg 768w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1536x1152.jpg 1536w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-2048x1536.jpg 2048w" sizes="(max-width: 650px) 100vw, 650px" loading="lazy">' ],
	[ '<img src="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg" srcset="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg 300w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1024x768.jpg 1024w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-768x576.jpg 768w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1536x1152.jpg 1536w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-2048x1536.jpg 2048w" sizes="(max-width: 650px) 100vw, 650px" loading="lazy">' ],
	[ '<img src="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg" srcset="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg 300w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1024x768.jpg 1024w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-768x576.jpg 768w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1536x1152.jpg 1536w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-2048x1536.jpg 2048w" sizes="(max-width: 650px) 100vw, 650px" loading="lazy">' ],
	[ '<img src="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg" srcset="https://pd.w.org/2024/08/36566abfcb2952e20.99062056-300x225.jpg 300w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1024x768.jpg 1024w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-768x576.jpg 768w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-1536x1152.jpg 1536w, https://pd.w.org/2024/08/36566abfcb2952e20.99062056-2048x1536.jpg 2048w" sizes="(max-width: 650px) 100vw, 650px" loading="lazy">' ]
];

$repetitions = 1000;
$iterations  = 1;

$stringcount = count( $images );

function auto_sizes_update_content_img_tag( $html ): string {
	if ( ! is_string( $html ) ) {
		$html = '';
	}

	if ( 1 !== preg_match( '/<img[^>]+>/', $html, $img_match, PREG_OFFSET_CAPTURE ) ) {
		return $html;
	}

	// Bail early if the image is not lazy-loaded.
	if ( false === strpos( $img_match[0][0], 'loading="lazy"' ) ) {
		return $html;
	}

	// Bail early if the image is not responsive.
	if ( 1 !== preg_match( '/sizes="([^"]+)"/', $img_match[0][0], $match, PREG_OFFSET_CAPTURE ) ) {
		return $html;
	}

	// Don't add 'auto' to the sizes attribute if it already exists.
	if ( auto_sizes_attribute_includes_valid_auto( $match[1][0] ) ) {
		return $html;
	}

	$start_of_sizes = $img_match[0][1] + $match[1][1];
	return substr( $html, 0, $start_of_sizes ) . 'auto, ' . substr( $html, $start_of_sizes );
}

function auto_sizes_update_content_img_tag_with_html_tag_processor( $html ): string {
	if ( ! is_string( $html ) ) {
		$html = '';
	}

	$processor = new WP_HTML_Tag_Processor( $html );

	// Bail if there is no IMG tag.
	if ( ! $processor->next_tag( array( 'tag_name' => 'IMG' ) ) ) {
		return $html;
	}

	// Bail early if the image is not lazy-loaded.
	if ( 'lazy' !== $processor->get_attribute( 'loading' ) ) {
		return $html;
	}

	$sizes = $processor->get_attribute( 'sizes' );

	// Bail early if the image is not responsive.
	if ( ! is_string( $sizes ) ) {
		return $html;
	}

	// Don't add 'auto' to the sizes attribute if it already exists.
	if ( auto_sizes_attribute_includes_valid_auto( $sizes ) ) {
		return $html;
	}

	$processor->set_attribute( 'sizes', "AUTO, $sizes" ); // Upper-case AUTO just to make it clear when this version is running.

	return $processor->get_updated_html();
}

function auto_sizes_attribute_includes_valid_auto( string $sizes_attr ): bool {
	$token = strtok( strtolower( $sizes_attr ), ',' );
	return false !== $token && 'auto' === trim( $token, " \t\f\r\n" );
}

// Profiling.

$groups = array( 'naive' => 0, 'secure' => 0 );
for ( $repetition = 0; $repetition <= $repetitions; $repetition++ ) {
	$group  = array_rand( $groups );
	$method = 'naive' === $group
		? 'auto_sizes_update_content_img_tag'
		: 'auto_sizes_update_content_img_tag_with_html_tag_processor';

	// Pre-run
	for ( $index = 0; $index < $iterations; $index ++ ) {
		$image  = $images[ $index % $stringcount ];
		call_user_func( $method, $image[0] );
	}

	$time_start = hrtime(true);
	for ( $index = 0; $index < $iterations; $index ++ ) {
		$image  = $images[ $index % $stringcount ];
		$result = call_user_func( $method, $image[0] );
	}
	$time_end          = hrtime(true);
	$groups[ $group ] += $time_end - $time_start;
}

$percentage = $groups['secure'] / $groups['naive'] * 100;
$n          = new NumberFormatter( 'en-US', \NumberFormatter::DEFAULT_STYLE );

echo "Average time spent for {$repetitions} x {$iterations} iterations:\n";
echo "auto_sizes_update_content_img_tag() - {$n->format($groups['naive'] / $iterations)} ns\n";
echo "auto_sizes_update_content_img_tag_with_html_tag_processor() - {$n->format($groups['secure'] / $iterations)} ns\n\n";
echo "Change: {$n->format($percentage)}%\n";

@westonruter
Copy link
Member

I also forked the Gist from @mukeshpanchal27 to do synthetic benchmarking, including additional test case for the various logic paths in the auto_sizes_update_content_img_tag() function: https://gist.github.com/westonruter/840c12249c8a44a4fc32053d845bcb83

In WordPress 6.6.1, I found that using Tag Processor is 10.53x slower.

In WordPress 6.7-alpha, I found that using Tag Processor is 8.48x slower, due to ongoing performance improvements.

And yet:

  • the absolute value of the difference in runtime here is so trivial it's hardly worth discussion

And:

Are we willing to spend 100 ps per IMG tag to avoid introducing XSS attacks and break peoples' pages?

Yes! 😄

@joemcgill
Copy link
Member

joemcgill commented Aug 15, 2024

Running some of my own benchmarks on a single post in Twenty Twenty-four with six images and posted the results below of the median runs for 50 requests measured using server timing.

TL;DR: I'm not seeing any major performance concerns with this change. Given the number of identified bugs this guards against, I think we should land this PR.

Before
╔═══════════════════════════════════════╤════════════════════════════════════════════╗
║ URL                                   │ http://localhost:8011/2024/07/image-tests/ ║
╟───────────────────────────────────────┼────────────────────────────────────────────╢
║ Success Rate                          │ 100%                                       ║
╟───────────────────────────────────────┼────────────────────────────────────────────╢
║ Response Time (median)                │ 40.34                                      ║
╟───────────────────────────────────────┼────────────────────────────────────────────╢
║ wp-before-template (median)           │ 12.83                                      ║
╟───────────────────────────────────────┼────────────────────────────────────────────╢
║ wp-template (median)                  │ 25.28                                      ║
╟───────────────────────────────────────┼────────────────────────────────────────────╢
║ wp-filter-wp_content_img_tag (median) │ 0.35                                       ║
╟───────────────────────────────────────┼────────────────────────────────────────────╢
║ wp-optimization-detective (median)    │ 1.91                                       ║
╟───────────────────────────────────────┼────────────────────────────────────────────╢
║ wp-total (median)                     │ 38.43                                      ║
╚═══════════════════════════════════════╧════════════════════════════════════════════╝
Control

Doing a control run allows us to see general variability between benchmarks to consider whether any changes we see in the test case are due to normal variation.

╔═══════════════════════════════════════╤════════════════════════════════════════════╗
║ URL                                   │ http://localhost:8011/2024/07/image-tests/ ║
╟───────────────────────────────────────┼────────────────────────────────────────────╢
║ Success Rate                          │ 100%                                       ║
╟───────────────────────────────────────┼────────────────────────────────────────────╢
║ Response Time (median)                │ 41.81                                      ║
╟───────────────────────────────────────┼────────────────────────────────────────────╢
║ wp-before-template (median)           │ 13.36                                      ║
╟───────────────────────────────────────┼────────────────────────────────────────────╢
║ wp-template (median)                  │ 26.35                                      ║
╟───────────────────────────────────────┼────────────────────────────────────────────╢
║ wp-filter-wp_content_img_tag (median) │ 0.35                                       ║
╟───────────────────────────────────────┼────────────────────────────────────────────╢
║ wp-optimization-detective (median)    │ 1.93                                       ║
╟───────────────────────────────────────┼────────────────────────────────────────────╢
║ wp-total (median)                     │ 39.72                                      ║
╚═══════════════════════════════════════╧════════════════════════════════════════════╝
This PR
╔═══════════════════════════════════════╤════════════════════════════════════════════╗
║ URL                                   │ http://localhost:8011/2024/07/image-tests/ ║
╟───────────────────────────────────────┼────────────────────────────────────────────╢
║ Success Rate                          │ 100%                                       ║
╟───────────────────────────────────────┼────────────────────────────────────────────╢
║ Response Time (median)                │ 39.95                                      ║
╟───────────────────────────────────────┼────────────────────────────────────────────╢
║ wp-before-template (median)           │ 12.87                                      ║
╟───────────────────────────────────────┼────────────────────────────────────────────╢
║ wp-template (median)                  │ 25.35                                      ║
╟───────────────────────────────────────┼────────────────────────────────────────────╢
║ wp-filter-wp_content_img_tag (median) │ 0.36                                       ║
╟───────────────────────────────────────┼────────────────────────────────────────────╢
║ wp-optimization-detective (median)    │ 1.88                                       ║
╟───────────────────────────────────────┼────────────────────────────────────────────╢
║ wp-total (median)                     │ 38.21                                      ║
╚═══════════════════════════════════════╧════════════════════════════════════════════╝

@westonruter
Copy link
Member

I discovered two more cases that weren't considered for the loading attribute:

  • LOADING=LAZY
  • loading=" lazy "

The keyword lazy is case-insensitive in HTML, and the whitespace is also trimmed. These are now accounted for in c5de91c.

@mukeshpanchal27
Copy link
Member Author

Thanks everyone.

@mukeshpanchal27 mukeshpanchal27 merged commit 4d22998 into trunk Aug 16, 2024
15 checks passed
@mukeshpanchal27 mukeshpanchal27 deleted the fix/1446-use-html-tag-processor-auto-sizes branch August 16, 2024 05:49
@westonruter westonruter changed the title Enhanced Responsive Images: Use more robust HTML Tag Processor for auto sizes injection Use more robust HTML Tag Processor for auto sizes injection Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Plugin] Enhanced Responsive Images Issues for the Enhanced Responsive Images plugin (formerly Auto Sizes) [Type] Enhancement A suggestion for improvement of an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Injection of sizes=auto into images can be more robust with HTML Tag Processor
5 participants