-
Notifications
You must be signed in to change notification settings - Fork 3.2k
HTML API: Rely on HTML API for wp_html_split()
#6651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the Core Committers: Use this line as a base for the props when committing in SVN: To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
Trac Ticket MissingThis pull request is missing a link to a Trac ticket. For a contribution to be considered, there must be a corresponding ticket in Trac. To attach a pull request to a Trac ticket, please include the ticket's full URL in your pull request description. More information about contributing to WordPress on GitHub can be found in the Core Handbook. |
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
|
It seems like all the remaining test failures are around CDATA sections. The assertions appear to be wrong in HTML5. @westonruter shared some stats on XHTML usage and there seems to be very little: GoogleChromeLabs/wpp-research#74 Should theses tests be removed or updated since they're wrong in the majority of cases? It would be great to see https://core.trac.wordpress.org/ticket/59883 (drop support for pre-HTML5) move forward. |
|
There is one test failure I see that's not CDATA related, but it also seems like it may be a fix in behavior: |
westonruter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should wp_get_internal_tag_processor() be marked as private? Or else, should the logic be put in a closure?
| */ | ||
| function wp_html_split( $input ) { | ||
| return preg_split( get_html_split_regex(), $input, -1, PREG_SPLIT_DELIM_CAPTURE ); | ||
| function wp_html_split( $input_html ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| function wp_html_split( $input_html ) { | |
| $get_internal_tag_processor = static function ( $html ) { | |
| return new class( $html ) extends WP_HTML_Tag_Processor { | |
| /** | |
| * Returns the raw token from the input string at the | |
| * current location, if paused at a location. | |
| * | |
| * @return false|string | |
| */ | |
| public function get_raw_token() { | |
| if ( | |
| WP_HTML_Tag_Processor::STATE_READY === $this->parser_state || | |
| WP_HTML_Tag_Processor::STATE_INCOMPLETE_INPUT === $this->parser_state || | |
| WP_HTML_Tag_Processor::STATE_COMPLETE === $this->parser_state | |
| ) { | |
| return false; | |
| } | |
| $this->set_bookmark( 'here' ); | |
| $here = $this->bookmarks['here']; | |
| return substr( $this->html, $here->start, $here->length ); | |
| } | |
| }; | |
| }; |
| return preg_split( get_html_split_regex(), $input, -1, PREG_SPLIT_DELIM_CAPTURE ); | ||
| function wp_html_split( $input_html ) { | ||
| $chunks = array(); | ||
| $processor = wp_get_internal_tag_processor( $input_html ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| $processor = wp_get_internal_tag_processor( $input_html ); | |
| $processor = $get_internal_tag_processor( $input_html ); |
| $raw_html = $processor->get_raw_token(); | ||
| $first_char = $raw_html[1]; | ||
| $raw_html[1] = 'X'; | ||
| $special = wp_get_internal_tag_processor( $raw_html ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| $special = wp_get_internal_tag_processor( $raw_html ); | |
| $special = $get_internal_tag_processor( $raw_html ); |
| /** | ||
| * Returns a Tag Processor exposing the raw matched tokens. | ||
| * | ||
| * @since 6.6.0 | ||
| * | ||
| * @param string $html Passed into the Tag Processor. | ||
| * @return WP_HTML_Tag_Processor|__anonymous@23567 | ||
| */ | ||
| function wp_get_internal_tag_processor( $html ) { | ||
| return new class( $html ) extends WP_HTML_Tag_Processor { | ||
| /** | ||
| * Returns the raw token from the input string at the | ||
| * current location, if paused at a location. | ||
| * | ||
| * @return false|string | ||
| */ | ||
| public function get_raw_token() { | ||
| if ( | ||
| WP_HTML_Tag_Processor::STATE_READY === $this->parser_state || | ||
| WP_HTML_Tag_Processor::STATE_INCOMPLETE_INPUT === $this->parser_state || | ||
| WP_HTML_Tag_Processor::STATE_COMPLETE === $this->parser_state | ||
| ) { | ||
| return false; | ||
| } | ||
|
|
||
| $this->set_bookmark( 'here' ); | ||
| $here = $this->bookmarks['here']; | ||
|
|
||
| return substr( $this->html, $here->start, $here->length ); | ||
| } | ||
| }; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /** | |
| * Returns a Tag Processor exposing the raw matched tokens. | |
| * | |
| * @since 6.6.0 | |
| * | |
| * @param string $html Passed into the Tag Processor. | |
| * @return WP_HTML_Tag_Processor|__anonymous@23567 | |
| */ | |
| function wp_get_internal_tag_processor( $html ) { | |
| return new class( $html ) extends WP_HTML_Tag_Processor { | |
| /** | |
| * Returns the raw token from the input string at the | |
| * current location, if paused at a location. | |
| * | |
| * @return false|string | |
| */ | |
| public function get_raw_token() { | |
| if ( | |
| WP_HTML_Tag_Processor::STATE_READY === $this->parser_state || | |
| WP_HTML_Tag_Processor::STATE_INCOMPLETE_INPUT === $this->parser_state || | |
| WP_HTML_Tag_Processor::STATE_COMPLETE === $this->parser_state | |
| ) { | |
| return false; | |
| } | |
| $this->set_bookmark( 'here' ); | |
| $here = $this->bookmarks['here']; | |
| return substr( $this->html, $here->start, $here->length ); | |
| } | |
| }; | |
| } |
|
Is this superseded by #9270? |
Status
This is a work in progress.
Description
Replace the regular-expression approach to splitting HTML with the HTML API for a reliable parse.