Skip to content

Conversation

@dmsnell
Copy link
Member

@dmsnell dmsnell commented Jul 11, 2025

Trac ticket: Core-63724

Replaces #7407, dmsnell#5
Coordination in #9256

Review feedback

  • Historically the value and whole properties of the returned array indicate the raw parsed bytes from the HTML (with some exceptions). This means that HTML character references are not decoded. This represents an abstraction leak between the HTML and structural return value.
    • Should this refactor leave the messy return values in place or should it decode the attribute values to enforce the view of the world developers are imagining when calling it? (that all values are normal PHP strings and not HTML text node strings)?

Implementation

wp_kses_hair() is built around an impressive state machine for parsing the $attr of an HTML tag, that is, the span of text after the tag name and before the closing >. Unfortunately, that parsing code doesn’t fully-implement the HTML specification and may be prone to mis-parsing.

This patch replaces the existing state machine with a straight-forward use of the HTML API to parse the attributes for us, constructing a shell tag for the $attr string and reading the attributes structurally. This shell is necessary because a previous stage of the pipeline has already separated what it thinks is the so-called “attribute list” from a tag.

Dependencies

@github-actions
Copy link

github-actions bot commented Jul 11, 2025

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props dmsnell, jonsurrell, westonruter.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 11, 2025
Trac ticket: Core-63694

`wp_kses_hair()` is built around an impressive state machine for parsing
the `$attr` of an HTML tag, that is, the span of text after the tag name
and before the closing `>`. Unfortunately, that parsing code doesn’t
fully-implement the HTML specification and may be prone to mis-parsing.

This patch replaces the existing state machine with a straight-forward
use of the HTML API to parse the attributes for us, constructing a shell
take for the `$attr` string and reading the attributes structurally.
This shell is necessary because a previous stage of the pipeline has
already separated what it thinks is the so-called “attribute list” from
a tag.

Props: dmsnell
@dmsnell dmsnell force-pushed the html-api/refactor-wp-kses-hair-take-3 branch from 68c7746 to b476339 Compare July 11, 2025 22:37
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 11, 2025
Trac ticket: Core-63694

`wp_kses_hair()` is built around an impressive state machine for parsing
the `$attr` of an HTML tag, that is, the span of text after the tag name
and before the closing `>`. Unfortunately, that parsing code doesn’t
fully-implement the HTML specification and may be prone to mis-parsing.

This patch replaces the existing state machine with a straight-forward
use of the HTML API to parse the attributes for us, constructing a shell
take for the `$attr` string and reading the attributes structurally.
This shell is necessary because a previous stage of the pipeline has
already separated what it thinks is the so-called “attribute list” from
a tag.

Props: dmsnell
@dmsnell dmsnell force-pushed the html-api/refactor-wp-kses-hair-take-3 branch from b476339 to 6146ecd Compare July 11, 2025 22:45
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 11, 2025
Trac ticket: Core-63694

`wp_kses_hair()` is built around an impressive state machine for parsing
the `$attr` of an HTML tag, that is, the span of text after the tag name
and before the closing `>`. Unfortunately, that parsing code doesn’t
fully-implement the HTML specification and may be prone to mis-parsing.

This patch replaces the existing state machine with a straight-forward
use of the HTML API to parse the attributes for us, constructing a shell
take for the `$attr` string and reading the attributes structurally.
This shell is necessary because a previous stage of the pipeline has
already separated what it thinks is the so-called “attribute list” from
a tag.

Props: dmsnell
@dmsnell dmsnell force-pushed the html-api/refactor-wp-kses-hair-take-3 branch from 6146ecd to d64f56e Compare July 11, 2025 22:46
@github-actions
Copy link

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 12, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 13, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 13, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 13, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 13, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 13, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 13, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 13, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 13, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 13, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 13, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 13, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 13, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 13, 2025
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jul 13, 2025
@dmsnell dmsnell force-pushed the html-api/refactor-wp-kses-hair-take-3 branch from 550921c to 5565848 Compare October 21, 2025 09:23
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Nov 24, 2025
Trac ticket: Core-63694

`wp_kses_hair()` is built around an impressive state machine for parsing
the `$attr` of an HTML tag, that is, the span of text after the tag name
and before the closing `>`. Unfortunately, that parsing code doesn’t
fully-implement the HTML specification and may be prone to mis-parsing.

This patch replaces the existing state machine with a straight-forward
use of the HTML API to parse the attributes for us, constructing a shell
take for the `$attr` string and reading the attributes structurally.
This shell is necessary because a previous stage of the pipeline has
already separated what it thinks is the so-called “attribute list” from
a tag.

Props: dmsnell
@dmsnell dmsnell force-pushed the html-api/refactor-wp-kses-hair-take-3 branch from 5565848 to 63aecea Compare November 24, 2025 20:25
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Dec 18, 2025
Trac ticket: Core-63694

`wp_kses_hair()` is built around an impressive state machine for parsing
the `$attr` of an HTML tag, that is, the span of text after the tag name
and before the closing `>`. Unfortunately, that parsing code doesn’t
fully-implement the HTML specification and may be prone to mis-parsing.

This patch replaces the existing state machine with a straight-forward
use of the HTML API to parse the attributes for us, constructing a shell
take for the `$attr` string and reading the attributes structurally.
This shell is necessary because a previous stage of the pipeline has
already separated what it thinks is the so-called “attribute list” from
a tag.

Props: dmsnell
@dmsnell dmsnell force-pushed the html-api/refactor-wp-kses-hair-take-3 branch from 63aecea to 726bcfd Compare December 18, 2025 03:52
@sirreal sirreal self-requested a review January 9, 2026 15:57
sirreal added a commit to sirreal/wordpress-develop that referenced this pull request Jan 9, 2026
Adds 60 test methods covering all aspects of wp_kses_hair() function
before the HTML API refactor in PR WordPress#9248.

Test coverage includes:
- Basic attribute parsing (single, double, unquoted, boolean)
- Character reference handling (named, numeric decimal/hex)
- Quote handling and normalization
- URL protocol filtering
- WordPress-specific attributes (data-*, aria-*)
- Edge cases: malformed input, whitespace variations
- Security patterns: injection attempts, contradictory values
- Real-world malformed patterns from broken editors/templates

Total: 60 test methods, 1,441 lines
File: tests/phpunit/tests/kses/wpKsesHair.php

This baseline test suite documents current behavior before refactoring
to ensure no regressions during the HTML API migration.

Related: WordPress#9248

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sirreal added a commit to sirreal/wordpress-develop that referenced this pull request Jan 9, 2026
Adds @ticket 63724 annotation to class-level docblock and all 60 test
methods, following WordPress PHPUnit testing conventions.

This links the test suite to Trac ticket 63724 (Core-63724) which tracks
the wp_kses_hair() refactoring in PR WordPress#9248.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sirreal added a commit to sirreal/wordpress-develop that referenced this pull request Jan 9, 2026
Improves test suite by adding explicit assertions documenting exact
expected behavior for all edge cases and malformed input.

Changes:
- Replace weak assertIsArray() with explicit expected arrays
- Document exact parsing behavior for malformed patterns
- Show protocol filtering results explicitly
- Document error recovery patterns for invalid input
- Add whitespace handling assertions

Results: 60 tests, 64 assertions, all passing (0.218s)

This provides comprehensive baseline documentation of current
wp_kses_hair() behavior before HTML API refactor in PR WordPress#9248.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sirreal added a commit to sirreal/wordpress-develop that referenced this pull request Jan 9, 2026
Related: WordPress#9248

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@sirreal
Copy link
Member

sirreal commented Jan 9, 2026

This function had quirks that change with this PR and I want to understand them.

I created a test suite for wp_kses_hair(), then I merged this branch and updated to get a diff of test changes. I also looked at several of the most popular results from WP Directory to understand usage.

My review of the most common usages on suggest that this change is safe to make and would not negatively impact plugin authors.

  • Historically the value and whole properties of the returned array indicate the raw parsed bytes from the HTML (with some exceptions). This means that HTML character references are not decoded. This represents an abstraction leak between the HTML and structural return value.
    • Should this refactor leave the messy return values in place or should it decode the attribute values to enforce the view of the world developers are imagining when calling it? (that all values are normal PHP strings and not HTML text node strings)?

This is a tricky question. It doesn't seem like folks rely on specifics of the input representation being present in the output, however it's certainly possible.

In one of the examples from plugins, esc_attr() is called on the attribute value to construct a new HTML string. This should be perfectly fine because the original HTML was re-encoded in this PR and esc_attr() will avoid double-encoding. They also statically wrap with ", which made the esc_attr() necessary because the attribute value could have contained "!

After some reflection, I believe the behavior you've implemented here is a good decision. Consider that the input is HTML and the output (value and whole) have always been some form of HTML. The difference here is a normalization of the HTML in the output.


behavior diff
diff --git a/tests/phpunit/tests/kses/wpKsesHair.php b/tests/phpunit/tests/kses/wpKsesHair.php
index 2ed83679f2e3d..05d573bc070bc 100644
--- a/tests/phpunit/tests/kses/wpKsesHair.php
+++ b/tests/phpunit/tests/kses/wpKsesHair.php
@@ -57,7 +57,7 @@ public function data_attribute_parsing() {
 				'title' => array(
 					'name'  => 'title',
 					'value' => 'My Title',
-					'whole' => "title='My Title'",
+					'whole' => 'title="My Title"',
 					'vless' => 'n',
 				),
 			),
@@ -188,8 +188,8 @@ public function data_attribute_parsing() {
 			array(
 				'title' => array(
 					'name'  => 'title',
-					'value' => '&#60;test&#62;',
-					'whole' => 'title="&#60;test&#62;"',
+					'value' => '&lt;test&gt;',
+					'whole' => 'title="&lt;test&gt;"',
 					'vless' => 'n',
 				),
 			),
@@ -200,8 +200,8 @@ public function data_attribute_parsing() {
 			array(
 				'title' => array(
 					'name'  => 'title',
-					'value' => '&#x3C;hex&#x3E;',
-					'whole' => 'title="&#x3C;hex&#x3E;"',
+					'value' => '&lt;hex&gt;',
+					'whole' => 'title="&lt;hex&gt;"',
 					'vless' => 'n',
 				),
 			),
@@ -212,8 +212,8 @@ public function data_attribute_parsing() {
 			array(
 				'title' => array(
 					'name'  => 'title',
-					'value' => '&#X3C;HEX&#X3E;',
-					'whole' => 'title="&#X3C;HEX&#X3E;"',
+					'value' => '&lt;HEX&gt;',
+					'whole' => 'title="&lt;HEX&gt;"',
 					'vless' => 'n',
 				),
 			),
@@ -224,8 +224,8 @@ public function data_attribute_parsing() {
 			array(
 				'title' => array(
 					'name'  => 'title',
-					'value' => '&invalid; &#; &#x;',
-					'whole' => 'title="&invalid; &#; &#x;"',
+					'value' => '&amp;invalid; &amp;#; &amp;#x;',
+					'whole' => 'title="&amp;invalid; &amp;#; &amp;#x;"',
 					'vless' => 'n',
 				),
 			),
@@ -249,7 +249,7 @@ public function data_attribute_parsing() {
 				'data-text' => array(
 					'name'  => 'data-text',
 					'value' => 'Single quoted value',
-					'whole' => "data-text='Single quoted value'",
+					'whole' => 'data-text="Single quoted value"',
 					'vless' => 'n',
 				),
 			),
@@ -267,7 +267,7 @@ public function data_attribute_parsing() {
 				'alt'   => array(
 					'name'  => 'alt',
 					'value' => 'single',
-					'whole' => "alt='single'",
+					'whole' => 'alt="single"',
 					'vless' => 'n',
 				),
 				'id'    => array(
@@ -284,8 +284,8 @@ public function data_attribute_parsing() {
 			array(
 				'title' => array(
 					'name'  => 'title',
-					'value' => "It's working",
-					'whole' => 'title="It\'s working"',
+					'value' => 'It&apos;s working',
+					'whole' => 'title="It&apos;s working"',
 					'vless' => 'n',
 				),
 			),
@@ -296,8 +296,8 @@ public function data_attribute_parsing() {
 			array(
 				'title' => array(
 					'name'  => 'title',
-					'value' => 'He said "hello"',
-					'whole' => 'title=\'He said "hello"\'',
+					'value' => 'He said &quot;hello&quot;',
+					'whole' => 'title="He said &quot;hello&quot;"',
 					'vless' => 'n',
 				),
 			),
@@ -327,12 +327,32 @@ public function data_attribute_parsing() {
 
 		yield 'invalid attribute name starting with number' => array(
 			'1invalid="value"',
-			array(),
+			array(
+				'1invalid' => array(
+					'name'  => '1invalid',
+					'value' => 'value',
+					'whole' => '1invalid="value"',
+					'vless' => 'n',
+				),
+			),
 		);
 
 		yield 'invalid attribute name special chars' => array(
 			'@invalid="value" $bad="value"',
-			array(),
+			array(
+				'@invalid' => array(
+					'name'  => '@invalid',
+					'value' => 'value',
+					'whole' => '@invalid="value"',
+					'vless' => 'n',
+				),
+				'$bad'     => array(
+					'name'  => '$bad',
+					'value' => 'value',
+					'whole' => '$bad="value"',
+					'vless' => 'n',
+				),
+			),
 		);
 
 		yield 'duplicate attributes first wins' => array(
@@ -355,7 +375,20 @@ public function data_attribute_parsing() {
 
 		yield 'malformed unclosed double quote' => array(
 			'title="unclosed class="test"',
-			array(),
+			array(
+				'title' => array(
+					'name'  => 'title',
+					'value' => 'unclosed class=',
+					'whole' => 'title="unclosed class="',
+					'vless' => 'n',
+				),
+				'test"' => array(
+					'name'  => 'test"',
+					'value' => '',
+					'whole' => 'test"',
+					'vless' => 'y',
+				),
+			),
 		);
 
 		yield 'very long attribute value' => array(
@@ -610,7 +643,7 @@ public function data_attribute_parsing() {
 				'alt'   => array(
 					'name'  => 'alt',
 					'value' => '',
-					'whole' => "alt=''",
+					'whole' => 'alt=""',
 					'vless' => 'n',
 				),
 				'class' => array(
@@ -625,7 +658,7 @@ public function data_attribute_parsing() {
 		yield 'forward slashes between attributes' => array(
 			'att / att2=2 /// att3="3"',
 			array(
-				'att'   => array(
+				'att'  => array(
 					'name'  => 'att',
 					'value' => '',
 					'whole' => 'att',
@@ -652,13 +685,13 @@ public function data_attribute_parsing() {
 				'att'  => array(
 					'name'  => 'att',
 					'value' => 'val',
-					'whole' => "att='val'",
+					'whole' => 'att="val"',
 					'vless' => 'n',
 				),
 				'att2' => array(
 					'name'  => 'att2',
 					'value' => 'val2',
-					'whole' => "att2='val2'",
+					'whole' => 'att2="val2"',
 					'vless' => 'n',
 				),
 			),
@@ -670,13 +703,13 @@ public function data_attribute_parsing() {
 				'att'  => array(
 					'name'  => 'att',
 					'value' => 'val',
-					'whole' => "att='val'",
+					'whole' => 'att="val"',
 					'vless' => 'n',
 				),
 				'att2' => array(
 					'name'  => 'att2',
 					'value' => 'val2',
-					'whole' => "att2='val2'",
+					'whole' => 'att2="val2"',
 					'vless' => 'n',
 				),
 			),
@@ -688,13 +721,13 @@ public function data_attribute_parsing() {
 				'att'  => array(
 					'name'  => 'att',
 					'value' => 'val',
-					'whole' => "att='val'",
+					'whole' => 'att="val"',
 					'vless' => 'n',
 				),
 				'att2' => array(
 					'name'  => 'att2',
 					'value' => 'val2',
-					'whole' => "att2='val2'",
+					'whole' => 'att2="val2"',
 					'vless' => 'n',
 				),
 			),
@@ -706,13 +739,13 @@ public function data_attribute_parsing() {
 				'att'  => array(
 					'name'  => 'att',
 					'value' => 'val',
-					'whole' => "att='val'",
+					'whole' => 'att="val"',
 					'vless' => 'n',
 				),
 				'att2' => array(
 					'name'  => 'att2',
 					'value' => 'val2',
-					'whole' => "att2='val2'",
+					'whole' => 'att2="val2"',
 					'vless' => 'n',
 				),
 			),
@@ -739,34 +772,67 @@ public function data_attribute_parsing() {
 		// Malformed Equals Patterns.
 		yield 'multiple equals signs' => array(
 			'att=="val"',
-			array(),
+			array(
+				'att' => array(
+					'name'  => 'att',
+					'value' => '=&quot;val&quot;',
+					'whole' => 'att="=&quot;val&quot;"',
+					'vless' => 'n',
+				),
+			),
 		);
 
 		yield 'equals with strange spacing' => array(
 			'att= ="val"',
-			array(),
+			array(
+				'att' => array(
+					'name'  => 'att',
+					'value' => '=&quot;val&quot;',
+					'whole' => 'att="=&quot;val&quot;"',
+					'vless' => 'n',
+				),
+			),
 		);
 
 		yield 'triple equals signs' => array(
 			'att==="val"',
-			array(),
+			array(
+				'att' => array(
+					'name'  => 'att',
+					'value' => '==&quot;val&quot;',
+					'whole' => 'att="==&quot;val&quot;"',
+					'vless' => 'n',
+				),
+			),
 		);
 
 		yield 'equals echo pattern' => array(
 			"att==echo 'something'",
 			array(
-				'att' => array(
+				'att'         => array(
 					'name'  => 'att',
 					'value' => '=echo',
 					'whole' => 'att="=echo"',
 					'vless' => 'n',
 				),
+				"'something'" => array(
+					'name'  => "'something'",
+					'value' => '',
+					'whole' => "'something'",
+					'vless' => 'y',
+				),
 			),
 		);
 
 		yield 'attribute starting with equals' => array(
 			'= bool k=v',
 			array(
+				'='    => array(
+					'name'  => '=',
+					'value' => '',
+					'whole' => '=',
+					'vless' => 'y',
+				),
 				'bool' => array(
 					'name'  => 'bool',
 					'value' => '',
@@ -785,18 +851,43 @@ public function data_attribute_parsing() {
 		yield 'mixed quotes and equals chaos' => array(
 			'k=v ="' . "' j=w",
 			array(
-				'k' => array(
+				'k'        => array(
 					'name'  => 'k',
 					'value' => 'v',
 					'whole' => 'k="v"',
 					'vless' => 'n',
 				),
+				'="' . "'" => array(
+					'name'  => '="' . "'",
+					'value' => '',
+					'whole' => '="' . "'",
+					'vless' => 'y',
+				),
+				'j'        => array(
+					'name'  => 'j',
+					'value' => 'w',
+					'whole' => 'j="w"',
+					'vless' => 'n',
+				),
 			),
 		);
 
 		yield 'triple equals quoted whitespace' => array(
 			'==="  "',
-			array(),
+			array(
+				'=' => array(
+					'name'  => '=',
+					'value' => '=&quot;',
+					'whole' => '=="=&quot;"',
+					'vless' => 'n',
+				),
+				'"' => array(
+					'name'  => '"',
+					'value' => '',
+					'whole' => '"',
+					'vless' => 'y',
+				),
+			),
 		);
 
 		yield 'boolean with contradictory value' => array(
@@ -820,7 +911,13 @@ public function data_attribute_parsing() {
 		yield 'empty attribute name with value' => array(
 			'="value" class="test"',
 			array(
-				'class' => array(
+				'="value"' => array(
+					'name'  => '="value"',
+					'value' => '',
+					'whole' => '="value"',
+					'vless' => 'y',
+				),
+				'class'    => array(
 					'name'  => 'class',
 					'value' => 'test',
 					'whole' => 'class="test"',
@@ -890,7 +987,7 @@ public function data_protocol_filtering() {
 				'href' => array(
 					'name'  => 'href',
 					'value' => 'alert(1)',
-					'whole' => "href='alert(1)'",
+					'whole' => 'href="alert(1)"',
 					'vless' => 'n',
 				),
 			),
@@ -925,8 +1022,8 @@ public function data_protocol_filtering() {
 			array(
 				'src' => array(
 					'name'  => 'src',
-					'value' => 'text/html,<script>alert(1)</script>',
-					'whole' => 'src="text/html,<script>alert(1)</script>"',
+					'value' => 'text/html,&lt;script&gt;alert(1)&lt;/script&gt;',
+					'whole' => 'src="text/html,&lt;script&gt;alert(1)&lt;/script&gt;"',
 					'vless' => 'n',
 				),
 			),

Here are two examples from the most most popular plugins in the WP Directory search:

From YITH (this appears to be part of the yith library used in many of their plugins):

	function yith_plugin_fw_html_attributes_to_string( $attributes = array(), $echo = false ) {
		$html_attributes = '';
		if ( ! ! $attributes ) {
			if ( is_string( $attributes ) ) {
				$parsed_attrs = wp_kses_hair( $attributes, wp_allowed_protocols() );
				$attributes   = array();
				foreach ( $parsed_attrs as $attr ) {
					$attributes[ $attr['name'] ] = 'n' === $attr['vless'] ? $attr['value'] : null;
				}
			}

			if ( is_array( $attributes ) ) {
				$html_attributes = array();
				foreach ( $attributes as $key => $value ) {
					if ( ! is_null( $value ) ) {
						$html_attributes[] = esc_attr( $key ) . '="' . esc_attr( $value ) . '"';
					} else {
						$html_attributes[] = esc_attr( $key );
					}
				}
				$html_attributes = implode( ' ', $html_attributes );
			}
		}

		if ( $echo ) {
			// Already escaped above.
			echo $html_attributes; // phpcs:ignore WordPress.Security.EscapeOutput.OutputNotEscaped
		}

		return $html_attributes;
	}

And Jetpack:

				$params = wp_kses_hair( $params, array( 'http' ) );


				$width  = isset( $params['width'] ) ? (int) $params['width']['value'] : 0;
				$height = isset( $params['height'] ) ? (int) $params['height']['value'] : 0;
				$wh     = '';


				if ( $width && $height ) {
					$wh = "&w=$width&h=$height";
				}


				$url = esc_url_raw( "https://www.youtube.com/watch?v={$match[3]}{$wh}" );

Copy link
Member

@sirreal sirreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good improvement.

* );
*
* @since 1.0.0
* @since 6.9.0 Rebuilt on HTML API
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @since 6.9.0 Rebuilt on HTML API
* @since 7.0.0 Rebuilt on HTML API

dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jan 9, 2026
Trac ticket: Core-63694

`wp_kses_hair()` is built around an impressive state machine for parsing
the `$attr` of an HTML tag, that is, the span of text after the tag name
and before the closing `>`. Unfortunately, that parsing code doesn’t
fully-implement the HTML specification and may be prone to mis-parsing.

This patch replaces the existing state machine with a straight-forward
use of the HTML API to parse the attributes for us, constructing a shell
take for the `$attr` string and reading the attributes structurally.
This shell is necessary because a previous stage of the pipeline has
already separated what it thinks is the so-called “attribute list” from
a tag.

Props: dmsnell
@dmsnell dmsnell force-pushed the html-api/refactor-wp-kses-hair-take-3 branch from 726bcfd to 34ba49c Compare January 9, 2026 22:37
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jan 9, 2026
Trac ticket: Core-63694

`wp_kses_hair()` is built around an impressive state machine for parsing
the `$attr` of an HTML tag, that is, the span of text after the tag name
and before the closing `>`. Unfortunately, that parsing code doesn’t
fully-implement the HTML specification and may be prone to mis-parsing.

This patch replaces the existing state machine with a straight-forward
use of the HTML API to parse the attributes for us, constructing a shell
take for the `$attr` string and reading the attributes structurally.
This shell is necessary because a previous stage of the pipeline has
already separated what it thinks is the so-called “attribute list” from
a tag.

Props: adamziel, dmsnell, jonsurrell, jorbin.

Co-authored-by: Adam Zieliński <zieladam@git.wordpress.org>
Co-authored-by: Aaron Jorbin <jorbin@git.wordpress.org>
Co-authored-by: Jon Surrell <jonsurrell@git.wordpress.org>
Github-PR: 9248
Github-PR-URL: WordPress#9248
Trac-Ticket: 63694
Trac-Ticket-URL: https://core.trac.wordpress.org/ticket/63724
Git-Branch: html-api/refactor-wp-kses-hair-take-3
@dmsnell dmsnell force-pushed the html-api/refactor-wp-kses-hair-take-3 branch from 2fe88f3 to 60c3c67 Compare January 9, 2026 23:27
dmsnell added a commit to dmsnell/wordpress-develop that referenced this pull request Jan 9, 2026
Trac ticket: Core-63694

`wp_kses_hair()` is built around an impressive state machine for parsing
the `$attr` of an HTML tag, that is, the span of text after the tag name
and before the closing `>`. Unfortunately, that parsing code doesn’t
fully-implement the HTML specification and may be prone to mis-parsing.

This patch replaces the existing state machine with a straight-forward
use of the HTML API to parse the attributes for us, constructing a shell
take for the `$attr` string and reading the attributes structurally.
This shell is necessary because a previous stage of the pipeline has
already separated what it thinks is the so-called “attribute list” from
a tag.

Props: adamziel, dmsnell, jonsurrell, jorbin.

Co-authored-by: Adam Zieliński <zieladam@git.wordpress.org>
Co-authored-by: Aaron Jorbin <jorbin@git.wordpress.org>
Co-authored-by: Jon Surrell <jonsurrell@git.wordpress.org>
Github-PR: 9248
Github-PR-URL: WordPress#9248
Trac-Ticket: 63694
Trac-Ticket-URL: https://core.trac.wordpress.org/ticket/63724
Git-Branch: html-api/refactor-wp-kses-hair-take-3
@dmsnell dmsnell force-pushed the html-api/refactor-wp-kses-hair-take-3 branch from 60c3c67 to f3d1a51 Compare January 9, 2026 23:36
Copy link
Member

@westonruter westonruter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Props to Gemini)

Trac ticket: Core-63694

`wp_kses_hair()` is built around an impressive state machine for parsing
the `$attr` of an HTML tag, that is, the span of text after the tag name
and before the closing `>`. Unfortunately, that parsing code doesn’t
fully-implement the HTML specification and may be prone to mis-parsing.

This patch replaces the existing state machine with a straight-forward
use of the HTML API to parse the attributes for us, constructing a shell
take for the `$attr` string and reading the attributes structurally.
This shell is necessary because a previous stage of the pipeline has
already separated what it thinks is the so-called “attribute list” from
a tag.

Props: adamziel, dmsnell, jonsurrell, jorbin, westonruter.

Co-authored-by: Adam Zieliński <zieladam@git.wordpress.org>
Co-authored-by: Aaron Jorbin <jorbin@git.wordpress.org>
Co-authored-by: Jon Surrell <jonsurrell@git.wordpress.org>
CO-authored-by: Weston Ruter <westonruter@git.wordpress.org>
Github-PR: 9248
Github-PR-URL: WordPress#9248
Trac-Ticket: 63694
Trac-Ticket-URL: https://core.trac.wordpress.org/ticket/63724
Git-Branch: html-api/refactor-wp-kses-hair-take-3
@dmsnell dmsnell force-pushed the html-api/refactor-wp-kses-hair-take-3 branch from f3d1a51 to 290c46a Compare January 10, 2026 20:42
pento pushed a commit that referenced this pull request Jan 10, 2026
`wp_kses_hair()` is built around an impressive state machine for parsing the span of text following an HTML tag name and the tag’s closing `>` into a structured representation of the attributes. Unfortunately that parsing code doesn’t comply with the HTML Living Standard and is prone to mis-parsing attributes, particularly in the presence of malformed inputs.

This patch replaces the existing state machine with the spec-compliant parsing from the HTML API. With a comprehensive test suite covering attribute parsing, the same reliability the Tag Processor affords will be applied to `wp_kses_hair()`, giving new guarantees not previously available in Core:

 - All attribute values are reported fully-normalized, where character references are decoded and then re-encoded in a predictable manner. Only the “big five” syntax characters (“&<>'"”) will remain, and in their named forms.
 - All `whole` values are fully normalized and presented either as boolean attributes without a value, or with double-quoted attribute values.
 - All attributes and their values will be properly parsed according to how a browser would parse them, bringing agreement between the server and user agents.

Developed in #9248
Discussed in https://core.trac.wordpress.org/ticket/63724

Props adamziel, dmsnell, jonsurrell, jorbin, westonruter.
Fixes #63724.


git-svn-id: https://develop.svn.wordpress.org/trunk@61467 602fd350-edb4-49c9-b593-d223f7449a82
@github-actions
Copy link

A commit was made that fixes the Trac ticket referenced in the description of this pull request.

SVN changeset: 61467
GitHub commit: b5daf3e

This PR will be closed, but please confirm the accuracy of this and reopen if there is more work to be done.

@github-actions github-actions bot closed this Jan 10, 2026
@dmsnell dmsnell deleted the html-api/refactor-wp-kses-hair-take-3 branch January 10, 2026 21:46
markjaquith pushed a commit to markjaquith/WordPress that referenced this pull request Jan 10, 2026
`wp_kses_hair()` is built around an impressive state machine for parsing the span of text following an HTML tag name and the tag’s closing `>` into a structured representation of the attributes. Unfortunately that parsing code doesn’t comply with the HTML Living Standard and is prone to mis-parsing attributes, particularly in the presence of malformed inputs.

This patch replaces the existing state machine with the spec-compliant parsing from the HTML API. With a comprehensive test suite covering attribute parsing, the same reliability the Tag Processor affords will be applied to `wp_kses_hair()`, giving new guarantees not previously available in Core:

 - All attribute values are reported fully-normalized, where character references are decoded and then re-encoded in a predictable manner. Only the “big five” syntax characters (“&<>'"”) will remain, and in their named forms.
 - All `whole` values are fully normalized and presented either as boolean attributes without a value, or with double-quoted attribute values.
 - All attributes and their values will be properly parsed according to how a browser would parse them, bringing agreement between the server and user agents.

Developed in WordPress/wordpress-develop#9248
Discussed in https://core.trac.wordpress.org/ticket/63724

Props adamziel, dmsnell, jonsurrell, jorbin, westonruter.
Fixes #63724.

Built from https://develop.svn.wordpress.org/trunk@61467


git-svn-id: http://core.svn.wordpress.org/trunk@60779 1a063a9b-81f0-0310-95a4-ce76da25c4cd
github-actions bot pushed a commit to platformsh/wordpress-performance that referenced this pull request Jan 10, 2026
`wp_kses_hair()` is built around an impressive state machine for parsing the span of text following an HTML tag name and the tag’s closing `>` into a structured representation of the attributes. Unfortunately that parsing code doesn’t comply with the HTML Living Standard and is prone to mis-parsing attributes, particularly in the presence of malformed inputs.

This patch replaces the existing state machine with the spec-compliant parsing from the HTML API. With a comprehensive test suite covering attribute parsing, the same reliability the Tag Processor affords will be applied to `wp_kses_hair()`, giving new guarantees not previously available in Core:

 - All attribute values are reported fully-normalized, where character references are decoded and then re-encoded in a predictable manner. Only the “big five” syntax characters (“&<>'"”) will remain, and in their named forms.
 - All `whole` values are fully normalized and presented either as boolean attributes without a value, or with double-quoted attribute values.
 - All attributes and their values will be properly parsed according to how a browser would parse them, bringing agreement between the server and user agents.

Developed in WordPress/wordpress-develop#9248
Discussed in https://core.trac.wordpress.org/ticket/63724

Props adamziel, dmsnell, jonsurrell, jorbin, westonruter.
Fixes #63724.

Built from https://develop.svn.wordpress.org/trunk@61467


git-svn-id: https://core.svn.wordpress.org/trunk@60779 1a063a9b-81f0-0310-95a4-ce76da25c4cd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants