Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML API: Track spans of text with (offset, length) instead of (start, end) #5721

Conversation

dmsnell
Copy link
Member

@dmsnell dmsnell commented Nov 30, 2023

Trac ticket: Core-59993

This patch follows-up with earlier design questions around how to represent spans of strings inside the class. It's relevant now as preparation for #5683.

The mixture of (offset, length) and (start, end) coordinates becomes confusing at times and all final string operations are performed with the (offset, length) pair, since these feed into strlen().

In preparation for exposing all tokens within an HTML document this change:

  • Unifies the representation throughout the class.
  • It creates token_starts_at to track the start of the current token.
  • It replaces tag_ends_at with token_length for re-use with other token types.

There should be no functional or behavioral changes in this patch.

For the internal helper classes this patch introduces breaking changes, but those classes are marked private and should not be used outside of the HTML API itself.

Trac ticket:


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

* @var int
*/
public $start;

/**
* The string offset after the attribute value or its name.
* Byte length of the entire attribute name or name and value pair expression.
Copy link
Contributor

@adamziel adamziel Dec 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readability nitpick:

Suggested change
* Byte length of the entire attribute name or name and value pair expression.
* A byte length of either:
* * The entire attribute name, or
* * A name and value pair expression

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I updated the wording in f2c715d to include different wording and now seeing this I've added f6d3a64

Comment on lines 36 to 37
* @since 6.2.0
* @since {WP_VERSION}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not remove 6.2.0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is actually a defect of the diff view. I'm not actually removing the old comment, but removing the old property and its attached docblock. this is the new @since tag for the new $length property, and the @since tag for the now-removed $end property is gone.

at first I added it back in but I found that more confusing, and the class itself has a @since tag noting the removal of the attribute. thoughts?

* @param int $start Byte offset into document where replacement span begins.
* @param int $end Byte offset into document where replacement span ends.
* @param int $start Byte offset into document where replacement span begins.
* @param int $length Byte length of span.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @param int $length Byte length of span.
* @param int $length Byte length of this span.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 045b5c0

Comment on lines 1652 to 1653
$bookmark_start = $bookmark->start;
$bookmark_end = $bookmark_start + $bookmark->length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these allocations be removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$bookmark_end is hard to remove because we have to compute it, and I left $bookmark_start in for symmetry. it wasn't offering much otherwise so I took it out in 449ae5a

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these aren't really allocations either, as they will be created locally on the stack.

Copy link
Contributor

@adamziel adamziel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some notes, but it looks good to me otherwise!

@@ -15,6 +15,7 @@
*
* @access private
* @since 6.2.0
* @since {WP_VERSION} Replaced `end` with `length` to more closely match `substr()`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note to remember to update it before merging

@adamziel
Copy link
Contributor

@dmsnell The unit tests fail now. Once they're fixed and the @since annotations are adjusted, we should be ready to merge.


1) Tests_HtmlApi_WpHtmlProcessorBreadcrumbs::test_remains_stable_when_editing_attributes
Undefined variable $bookmark_start

/var/www/src/wp-includes/html-api/class-wp-html-tag-processor.php:1654
/var/www/src/wp-includes/html-api/class-wp-html-tag-processor.php:2331
/var/www/tests/phpunit/tests/html-api/wpHtmlProcessorBreadcrumbs.php:449
/var/www/vendor/bin/phpunit:122

@dmsnell dmsnell force-pushed the html-api/use-start-length-instead-of-start-end branch from b502273 to 279e43d Compare December 10, 2023 10:42
@dmsnell
Copy link
Member Author

dmsnell commented Dec 10, 2023

@adamziel everything should be buttoned up and working properly now

…, end)

This patch follows-up with earlier design questions around how to represent
spans of strings inside the class. It's relevant now as preparation for WordPress#5683.

The mixture of (offset, length) and (start, end) coordinates becomes confusing
at times and all final string operations are performed with the (offset, length)
pair, since these feed into `strlen()`.

In preparation for exposing all tokens within an HTML document this change:
 - Unifies the representation throughout the class.
 - It creates `token_starts_at` to track the start of the current token.
 - It replaces `tag_ends_at` with `token_length` for re-use with other token types.

There should be no functional or behavioral changes in this patch.

For the internal helper classes this patch introduces breaking changes, but those
classes are marked private and should not be used outside of the HTML API itself.
@dmsnell dmsnell force-pushed the html-api/use-start-length-instead-of-start-end branch from 279e43d to 0346ceb Compare December 10, 2023 11:51
@adamziel
Copy link
Contributor

Merged in https://core.trac.wordpress.org/changeset/57179.

@adamziel adamziel closed this Dec 10, 2023
@dmsnell dmsnell deleted the html-api/use-start-length-instead-of-start-end branch December 10, 2023 13:33
dmsnell added a commit to WordPress/gutenberg that referenced this pull request Dec 10, 2023
…d Crash

Previously the HTML API was using a mixture of approaches in its helper classes
for describing spans of text. One was (start, end) and the other was (start, length).
In WordPress/wordpress-develop#5721 these were all normalized to (start, length) but
the Interactivity API was not updated in concert with that change.

In this patch the `inner_html` methods are updated to use the appropriate formats.

A following PR is necessary to update the compat layer so that crashes don't appear
on sites running older versions of WordPress with this updated code.
dmsnell added a commit to WordPress/gutenberg that referenced this pull request Dec 10, 2023
Fix: Update Interactivity API Use of Private HTML API Classes to avoid Crash

Previously the HTML API was using a mixture of approaches in its helper classes
for describing spans of text. One was (start, end) and the other was (start, length).
In WordPress/wordpress-develop#5721 these were all normalized to (start, length) but
the Interactivity API was not updated in concert with that change.

In this patch the `inner_html` methods are updated to use the appropriate formats.

this PR also includes the necessary back port of the HTML API to provide the new methods so that older WP installs don’t crash
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants