Skip to content

Conversation

@sirreal
Copy link
Member

@sirreal sirreal commented Dec 10, 2025

Bookmark exhaustion, typically from deep nesting, can cause the HTML Processor to throw an Exception.

Rather than throwing an exception, return false when bookmark exhaustion is detected. An error is set on the processor and processing stops.

Trac ticket: https://core.trac.wordpress.org/ticket/64394


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

Comment on lines 6307 to 6308
* @throws Exception When unable to allocate a bookmark for the next token in the input HTML document.
*
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bookmark_token() actually throws, but it's not handled here. The annotation may not be appropriate.

* otherwise might involve messier calling and return conventions.
*/
return false;
} catch ( Exception $e ) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exhausted bookmarks throw a generic Exception.

This block catches the exceptions thrown by insert_virtual_token().

@github-actions
Copy link

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

$bookmark_name = $this->bookmark_token();
} catch ( Exception $e ) {
if ( self::ERROR_EXCEEDED_MAX_BOOKMARKS === $this->last_error ) {
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this not perhaps lead a developer to think that they reached the end of the document, when in reality the nesting is too large (or the max bookmarks exceeded)? In that way, I think an exception is more helpful. Otherwise, wouldn't every loop over tokens in a doc need to do something like:

while ( $p->next_tag() ) {
    // ...
}
if ( WP_HTML_Processor::ERROR_EXCEEDED_MAX_BOOKMARKS === $p->get_last_error() ) {
     // Handle max bookmark error.
}

This would put the exception case in the regular code that always runs. Since exceeding the max bookmarks should be exceptional, I would think an exception is preferred:

try {
    while ( $p->next_tag() ) {
        // ...
    }
} catch ( Exception $e ) {
     if ( WP_HTML_Processor::ERROR_EXCEEDED_MAX_BOOKMARKS === $p->get_last_error() ) {
          // Handle max bookmark error.
     }
}

But since this is the only exception that WP_HTML_Tag_Processor throws (currently), then it could be just:

try {
    while ( $p->next_tag() ) {
        // ...
    }
} catch ( Exception $e ) {
    // Handle max bookmark error.
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But since this is the only exception that WP_HTML_Tag_Processor throws

This is only in the WP_HTML_Processor. WP_HTML_Tag_Processor should not throw any errors.

Will this not perhaps lead a developer to think that they reached the end of the document

That's already the case, the HTML processor has avoided throwing errors and exposes problems through some getters. Primarily, ::get_last_error() should be used:

<?php
require '/wordpress/wp-load.php';
echo '<plaintext>';
echo "WordPress " . wp_get_wp_version() . "\n";

$p = WP_HTML_Processor::create_fragment('<table><tbody>unsupported');
while( $p->next_token() ) {
  var_dump($p->get_tag());  
}
// Need to check error status.
var_dump( $p->get_last_error() );
var_dump( $p->get_unsupported_exception()->getMessage() );

When these APIs throw errors that callers are supposed to handle, it's just too easy to bring down users' sites with errors that aren't actionable for them. It's true that superficially "end of document" is the same as "error." It seems preferable that a document silently fail to fully parse instead of crashing and bringing down a site.

In either case, the developer should do another thing that's not obvious (add exception handling with try/catch or check error status after iteration with the available method). Relying on error status is the least impactful for a site if a developer overlooks this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only in the WP_HTML_Processor. WP_HTML_Tag_Processor should not throw any errors.

Yes, sorry, I meant WP_HTML_Processor.

In either case, the developer should do another thing that's not obvious (add exception handling with try/catch or check error status after iteration with the available method). Relying on error status is the least impactful for a site if a developer overlooks this.

OK, makes sense to me.

Co-authored-by: Weston Ruter <westonruter@gmail.com>
@sirreal sirreal marked this pull request as ready for review January 29, 2026 18:27
@github-actions
Copy link

github-actions bot commented Jan 29, 2026

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props jonsurrell, westonruter, dmsnell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an issue where bookmark exhaustion (typically from deep HTML nesting) causes the HTML Processor to throw an Exception. Instead of throwing exceptions, the processor now returns false when bookmark exhaustion is detected, sets an error state, and stops processing gracefully.

Changes:

  • Modified bookmark_token() to return false instead of throwing an exception when MAX_BOOKMARKS is exceeded
  • Updated all callers of bookmark_token() and insert_virtual_node() to check for and handle false return values
  • Removed outdated @throws documentation from methods that no longer throw exceptions
  • Added comprehensive tests to verify the processor handles extreme nesting without throwing exceptions

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/wp-includes/html-api/class-wp-html-processor.php Core implementation changes: modified bookmark_token() and insert_virtual_node() to return false on failure instead of throwing; added error checks in step(), step_before_html(), step_before_head(), step_after_head(), step_in_table(), and step_in_table_body() methods; updated PHPDoc to remove exception references
tests/phpunit/tests/html-api/wpHtmlProcessor.php Added two comprehensive tests verifying the processor handles bookmark exhaustion gracefully for both regular and virtual tokens; fixed annotation from @group to @ticket

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@westonruter westonruter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gemini's review

The following is a review of the changes comparing the current branch to trunk.

General

The changes correctly address the issue of unhandled exceptions when the WP_HTML_Processor exceeds its bookmark limit (e.g., during deep nesting). By converting the exception into a graceful failure (returning false), the processor can abort safely without crashing the runtime.

Code Analysis

  • PHP Compatibility: The removal of the WP_HTML_Token return type hint in insert_virtual_node() is correct and necessary for PHP 7.4 compatibility, as the method now returns WP_HTML_Token|false.
  • Logic:
    • bookmark_token() correctly returns false instead of throwing an Exception when parent::set_bookmark() fails.
    • insert_virtual_node() correctly propagates this false return value.
    • The various insertion modes (e.g., before_html_anything_else, in_table, etc.) correctly check for the false return from insert_virtual_node() and abort processing by returning false.
  • Conventions: The code adheres to WordPress coding standards.

Tests

  • Coverage: The new tests test_deep_nesting_fails_process_without_error and test_deep_nesting_fails_processing_virtual_tokens_without_error effectively cover the scenarios where bookmark limits are hit.
  • Annotations:
    • The use of @ticket 64394 is appropriate.
    • The use of @expectedIncorrectUsage WP_HTML_Tag_Processor::set_bookmark is correct, as WP_HTML_Tag_Processor::set_bookmark() triggers _doing_it_wrong() when the limit is reached.
    • The cleanup of @group to @ticket in test_next_tag_lowercase_tag_name is a valid housekeeping change.

Recommendation

The changes are approved.

No blocking issues found.

Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely don’t love how pervasive and deep our error-checking has become here; did you attempt to use ->bail()? the existing code is throwing because that part of step() isn’t wrapped in a try/catch, right? what if we continue to throw inside bookmark_token() but trap that and handle aborting the same way we handle all deep control flow issues?

@sirreal
Copy link
Member Author

sirreal commented Jan 30, 2026

I definitely don’t love how pervasive and deep our error-checking has become here.

Me neither 😕

did you attempt to use ->bail()?

I did try using bail(). That transforms a bookmark exhaustion error into an unsupported error, which seemed confusing. When bookmarks are exhausted, an error is set. Bail overrides with an unsupported error code.

We could consider having bail leave the last error in case it's already been set.

what if we continue to throw inside bookmark_token() but trap that and handle aborting the same way we handle all deep control flow issues?

I have that in an earlier commit. It wasn't as pervasive (at b02d679):

try {
$bookmark_name = $this->bookmark_token();
} catch ( Exception $e ) {
if ( self::ERROR_EXCEEDED_MAX_BOOKMARKS === $this->last_error ) {
return false;
}
throw $e;
}

} catch ( WP_HTML_Unsupported_Exception $e ) {
/*
* Exceptions are used in this class to escape deep call stacks that
* otherwise might involve messier calling and return conventions.
*/
return false;
} catch ( Exception $e ) {
if ( self::ERROR_EXCEEDED_MAX_BOOKMARKS === $this->last_error ) {
return false;
}
// Rethrow any other exceptions for higher-level handling.
throw $e;
}

Co-authored-by: Weston Ruter <westonruter@gmail.com>
@dmsnell
Copy link
Member

dmsnell commented Jan 30, 2026

I have that in an earlier commit. It wasn't as pervasive

Why did you change it @sirreal? did I ask you to? 🙃 🤦‍♂️

@sirreal
Copy link
Member Author

sirreal commented Jan 30, 2026

No, I was just iterating and exploring. The method actually already had a …|false return type annotation, suggesting it may have intended to return false on failure.

I can't say I like the general Exception catching either, but it works. I don't have a strong preference, would you prefer that other version @dmsnell?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants