Skip to content

Commit

Permalink
scan all tokens
Browse files Browse the repository at this point in the history
Squashed commit of the following:

commit 4959837
Author: Dennis Snell <[email protected]>
Date:   Tue Jan 16 17:08:27 2024 -0600

    Update to call `$this->next_token()` in `seek()` and fix tests

commit 3d8b20e
Author: Dennis Snell <[email protected]>
Date:   Tue Jan 16 09:58:13 2024 -0600

    RAWTEXT and SCRIPT elements do no decode character references

commit f502153
Author: Dennis Snell <[email protected]>
Date:   Mon Jan 15 15:56:54 2024 -0600

    WPCS

commit 543a0b8
Author: Dennis Snell <[email protected]>
Date:   Mon Jan 15 15:54:41 2024 -0600

    Fix span-of-dashes comment modifiable text

commit b0aae8a
Author: Jon Surrell <[email protected]>
Date:   Mon Jan 15 22:07:27 2024 +0100

    Add failing test for `<!----->`

commit 283df46
Author: Dennis Snell <[email protected]>
Date:   Mon Jan 15 11:55:30 2024 -0600

    Expand comment introducing modifiable text.

commit ede20ca
Author: Dennis Snell <[email protected]>
Date:   Mon Jan 15 11:38:59 2024 -0600

    Rename INCOMPLETE state to INCOMPLETE_INPUT

commit 7de4cc2
Author: Dennis Snell <[email protected]>
Date:   Fri Jan 12 13:49:12 2024 -0500

    PR Feedback

    Co-authored-by: Jon Surrell <[email protected]>

commit 094176e
Author: Dennis Snell <[email protected]>
Date:   Fri Jan 12 13:23:12 2024 -0500

    Remove early bailout of special elements. It's duplicated.

commit 7fa58c8
Author: Dennis Snell <[email protected]>
Date:   Fri Jan 12 13:10:53 2024 -0500

    Feedback updates.

    Co-authored-by: David Herrera <[email protected]>
    Co-authored-by: Jon Surrell <[email protected]>

commit 28fc54d
Author: Dennis Snell <[email protected]>
Date:   Fri Jan 12 12:54:22 2024 -0500

    Expand docblocks for CDATA/PINodes and re-add removed tests

commit 1194d6f
Author: Dennis Snell <[email protected]>
Date:   Fri Jan 12 07:53:28 2024 -0500

    Provisionarily: add back CDATA and PI nodes

commit b6d4300
Author: Dennis Snell <[email protected]>
Date:   Wed Jan 10 12:05:45 2024 -0500

    Fix + WPCS

commit 7d1c2e8
Author: Dennis Snell <[email protected]>
Date:   Thu Jan 11 21:22:51 2024 -0500

    Remove support for CDATA sections.

commit e91a33b
Author: Dennis Snell <[email protected]>
Date:   Wed Jan 10 11:51:17 2024 -0500

    Remove support for Processing Instructions

    Attempting to parse processing instructions conflicts with parsing bogus
    comments when a document may be incomplete, which might create a
    divergence in the HTML API from browser behavior.

commit 3d68e28
Author: Dennis Snell <[email protected]>
Date:   Wed Jan 10 11:17:57 2024 -0500

    Fix non-PI-node tests

commit d596176
Author: Dennis Snell <[email protected]>
Date:   Wed Jan 10 11:09:20 2024 -0500

    Add basic conformance tests

commit 2199e86
Author: Dennis Snell <[email protected]>
Date:   Sun Dec 10 15:17:01 2023 +0100

    HTML API: Avoid processing incomplete syntax elements.

    The HTML Tag Processor is able to know if it starts parsing a syntax element
    and reaches the end of the document before it reaches the end of the element.
    In these cases, after this patch, the processor will indicate this condition.

    For example, when processing `<div><input type="te` there is an incomplete INPUT
    element. The processor will fail to find the INPUT, it will pause right after
    the DIV, and `paused_at_incomplete_token()` will return `true`.

    This patch doesn't change any existing behaviors, but it adds the new method
    to report on the final failure condition. It provides a mechanism for later
    use to add chunked parsing to the class, wherein it will be possible to process
    a document without having the entire document loaded in memory, for example
    when processing unbuffered output.

    This is also a necessary change for adding the ability to scan every token in
    the document. Currently the Tag Processor only exposes tags as tokens, but it
    will need to process `#text` nodes, HTML comments, and other markup in order
    to enable behaviors in the HTML Processor and in refactors of existing HTML
    processing in Core.
  • Loading branch information
sirreal committed Jan 17, 2024
1 parent c399654 commit 2d5ffd8
Show file tree
Hide file tree
Showing 4 changed files with 1,459 additions and 142 deletions.
32 changes: 20 additions & 12 deletions src/wp-includes/html-api/class-wp-html-processor.php
Original file line number Diff line number Diff line change
Expand Up @@ -149,17 +149,6 @@ class WP_HTML_Processor extends WP_HTML_Tag_Processor {
*/
const MAX_BOOKMARKS = 100;

/**
* Static query for instructing the Tag Processor to visit every token.
*
* @access private
*
* @since 6.4.0
*
* @var array
*/
const VISIT_EVERYTHING = array( 'tag_closers' => 'visit' );

/**
* Holds the working state of the parser, including the stack of
* open elements and the stack of active formatting elements.
Expand Down Expand Up @@ -424,6 +413,23 @@ public function next_tag( $query = null ) {
return false;
}

/**
* Steps through the HTML document and stop at the next token, if any.
*
* Currently only supports stepping through tags.
*
* @return bool
*/
public function next_token() {
$found_a_token = parent::next_token();

if ( '#tag' === $this->get_token_type() ) {
$this->step( self::REPROCESS_CURRENT_NODE );
}

return $found_a_token;
}

/**
* Indicates if the currently-matched tag matches the given breadcrumbs.
*
Expand Down Expand Up @@ -520,7 +526,9 @@ public function step( $node_to_process = self::PROCESS_NEXT_NODE ) {
$this->state->stack_of_open_elements->pop();
}

parent::next_tag( self::VISIT_EVERYTHING );
while ( parent::next_token() && '#tag' !== $this->get_token_type() ) {
continue;
}
}

// Finish stepping when there are no more tokens in the document.
Expand Down
Loading

0 comments on commit 2d5ffd8

Please sign in to comment.