-
Notifications
You must be signed in to change notification settings - Fork 3.2k
HTML API: Ensure bookmark exaustion does not error #10616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
HTML API: Ensure bookmark exaustion does not error #10616
Conversation
| * @throws Exception When unable to allocate a bookmark for the next token in the input HTML document. | ||
| * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bookmark_token() actually throws, but it's not handled here. The annotation may not be appropriate.
| * otherwise might involve messier calling and return conventions. | ||
| */ | ||
| return false; | ||
| } catch ( Exception $e ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exhausted bookmarks throw a generic Exception.
This block catches the exceptions thrown by insert_virtual_token().
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
| $bookmark_name = $this->bookmark_token(); | ||
| } catch ( Exception $e ) { | ||
| if ( self::ERROR_EXCEEDED_MAX_BOOKMARKS === $this->last_error ) { | ||
| return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this not perhaps lead a developer to think that they reached the end of the document, when in reality the nesting is too large (or the max bookmarks exceeded)? In that way, I think an exception is more helpful. Otherwise, wouldn't every loop over tokens in a doc need to do something like:
while ( $p->next_tag() ) {
// ...
}
if ( WP_HTML_Processor::ERROR_EXCEEDED_MAX_BOOKMARKS === $p->get_last_error() ) {
// Handle max bookmark error.
}This would put the exception case in the regular code that always runs. Since exceeding the max bookmarks should be exceptional, I would think an exception is preferred:
try {
while ( $p->next_tag() ) {
// ...
}
} catch ( Exception $e ) {
if ( WP_HTML_Processor::ERROR_EXCEEDED_MAX_BOOKMARKS === $p->get_last_error() ) {
// Handle max bookmark error.
}
}But since this is the only exception that WP_HTML_Tag_Processor throws (currently), then it could be just:
try {
while ( $p->next_tag() ) {
// ...
}
} catch ( Exception $e ) {
// Handle max bookmark error.
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But since this is the only exception that
WP_HTML_Tag_Processorthrows
This is only in the WP_HTML_Processor. WP_HTML_Tag_Processor should not throw any errors.
Will this not perhaps lead a developer to think that they reached the end of the document
That's already the case, the HTML processor has avoided throwing errors and exposes problems through some getters. Primarily, ::get_last_error() should be used:
<?php
require '/wordpress/wp-load.php';
echo '<plaintext>';
echo "WordPress " . wp_get_wp_version() . "\n";
$p = WP_HTML_Processor::create_fragment('<table><tbody>unsupported');
while( $p->next_token() ) {
var_dump($p->get_tag());
}
// Need to check error status.
var_dump( $p->get_last_error() );
var_dump( $p->get_unsupported_exception()->getMessage() );When these APIs throw errors that callers are supposed to handle, it's just too easy to bring down users' sites with errors that aren't actionable for them. It's true that superficially "end of document" is the same as "error." It seems preferable that a document silently fail to fully parse instead of crashing and bringing down a site.
In either case, the developer should do another thing that's not obvious (add exception handling with try/catch or check error status after iteration with the available method). Relying on error status is the least impactful for a site if a developer overlooks this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only in the
WP_HTML_Processor.WP_HTML_Tag_Processorshould not throw any errors.
Yes, sorry, I meant WP_HTML_Processor.
In either case, the developer should do another thing that's not obvious (add exception handling with try/catch or check error status after iteration with the available method). Relying on error status is the least impactful for a site if a developer overlooks this.
OK, makes sense to me.
Co-authored-by: Weston Ruter <westonruter@gmail.com>
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the Core Committers: Use this line as a base for the props when committing in SVN: To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes an issue where bookmark exhaustion (typically from deep HTML nesting) causes the HTML Processor to throw an Exception. Instead of throwing exceptions, the processor now returns false when bookmark exhaustion is detected, sets an error state, and stops processing gracefully.
Changes:
- Modified
bookmark_token()to returnfalseinstead of throwing an exception when MAX_BOOKMARKS is exceeded - Updated all callers of
bookmark_token()andinsert_virtual_node()to check for and handlefalsereturn values - Removed outdated
@throwsdocumentation from methods that no longer throw exceptions - Added comprehensive tests to verify the processor handles extreme nesting without throwing exceptions
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
src/wp-includes/html-api/class-wp-html-processor.php |
Core implementation changes: modified bookmark_token() and insert_virtual_node() to return false on failure instead of throwing; added error checks in step(), step_before_html(), step_before_head(), step_after_head(), step_in_table(), and step_in_table_body() methods; updated PHPDoc to remove exception references |
tests/phpunit/tests/html-api/wpHtmlProcessor.php |
Added two comprehensive tests verifying the processor handles bookmark exhaustion gracefully for both regular and virtual tokens; fixed annotation from @group to @ticket |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
westonruter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gemini's review
The following is a review of the changes comparing the current branch to trunk.
General
The changes correctly address the issue of unhandled exceptions when the WP_HTML_Processor exceeds its bookmark limit (e.g., during deep nesting). By converting the exception into a graceful failure (returning false), the processor can abort safely without crashing the runtime.
Code Analysis
- PHP Compatibility: The removal of the
WP_HTML_Tokenreturn type hint ininsert_virtual_node()is correct and necessary for PHP 7.4 compatibility, as the method now returnsWP_HTML_Token|false. - Logic:
bookmark_token()correctly returnsfalseinstead of throwing anExceptionwhenparent::set_bookmark()fails.insert_virtual_node()correctly propagates thisfalsereturn value.- The various insertion modes (e.g.,
before_html_anything_else,in_table, etc.) correctly check for thefalsereturn frominsert_virtual_node()and abort processing by returningfalse.
- Conventions: The code adheres to WordPress coding standards.
Tests
- Coverage: The new tests
test_deep_nesting_fails_process_without_errorandtest_deep_nesting_fails_processing_virtual_tokens_without_erroreffectively cover the scenarios where bookmark limits are hit. - Annotations:
- The use of
@ticket 64394is appropriate. - The use of
@expectedIncorrectUsage WP_HTML_Tag_Processor::set_bookmarkis correct, asWP_HTML_Tag_Processor::set_bookmark()triggers_doing_it_wrong()when the limit is reached. - The cleanup of
@groupto@ticketintest_next_tag_lowercase_tag_nameis a valid housekeeping change.
- The use of
Recommendation
The changes are approved.
No blocking issues found.
dmsnell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely don’t love how pervasive and deep our error-checking has become here; did you attempt to use ->bail()? the existing code is throwing because that part of step() isn’t wrapped in a try/catch, right? what if we continue to throw inside bookmark_token() but trap that and handle aborting the same way we handle all deep control flow issues?
Me neither 😕
I did try using We could consider having
I have that in an earlier commit. It wasn't as pervasive (at b02d679): wordpress-develop/src/wp-includes/html-api/class-wp-html-processor.php Lines 1043 to 1050 in b02d679
wordpress-develop/src/wp-includes/html-api/class-wp-html-processor.php Lines 1157 to 1169 in b02d679
|
Co-authored-by: Weston Ruter <westonruter@gmail.com>
Why did you change it @sirreal? did I ask you to? 🙃 🤦♂️ |
|
No, I was just iterating and exploring. The method actually already had a I can't say I like the general Exception catching either, but it works. I don't have a strong preference, would you prefer that other version @dmsnell? |
Bookmark exhaustion, typically from deep nesting, can cause the HTML Processor to throw an Exception.
Rather than throwing an exception, return
falsewhen bookmark exhaustion is detected. An error is set on the processor and processing stops.Trac ticket: https://core.trac.wordpress.org/ticket/64394
This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.