Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions src/wp-includes/canonical.php
Original file line number Diff line number Diff line change
Expand Up @@ -774,6 +774,25 @@ function redirect_canonical( $requested_url = null, $do_redirect = true ) {
return;
}

/*
* Avoid redirects when URLs differ only in query string encoding.
* Per RFC 3986, certain characters can be represented in multiple equivalent ways:
* - Spaces: '+' vs '%20' (e.g., ?name=John+Doe vs ?name=John%20Doe)
* - Unreserved chars unnecessarily encoded: '~' vs '%7E', '-' vs '%2D', '_' vs '%5F', '.' vs '%2E'
* - Reserved chars in values: '/' vs '%2F', ':' vs '%3A', '@' vs '%40'
*
* Example problematic scenarios:
* - UTM params: ?utm_content=Hello+World vs ?utm_content=Hello%20World
* - Encoded paths: ?redirect=/path/to/page vs ?redirect=%2Fpath%2Fto%2Fpage
* - Email params: ?email=user@example.com vs ?email=user%40example.com
*
Comment on lines +779 to +788
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment claims the fix handles 'unreserved chars unnecessarily encoded' and 'reserved chars in values', but the implementation using urldecode() doesn't actually preserve the semantic distinction between these cases. For example, %2F (encoded slash) in a query parameter value has a different meaning than a literal / - the former is a data character while the latter is a path separator in RFC 3986. The comment should accurately reflect that the implementation treats all percent-encoded characters as equivalent to their decoded forms, which may not be the intended behavior for all the listed examples.

Suggested change
* Per RFC 3986, certain characters can be represented in multiple equivalent ways:
* - Spaces: '+' vs '%20' (e.g., ?name=John+Doe vs ?name=John%20Doe)
* - Unreserved chars unnecessarily encoded: '~' vs '%7E', '-' vs '%2D', '_' vs '%5F', '.' vs '%2E'
* - Reserved chars in values: '/' vs '%2F', ':' vs '%3A', '@' vs '%40'
*
* Example problematic scenarios:
* - UTM params: ?utm_content=Hello+World vs ?utm_content=Hello%20World
* - Encoded paths: ?redirect=/path/to/page vs ?redirect=%2Fpath%2Fto%2Fpage
* - Email params: ?email=user@example.com vs ?email=user%40example.com
*
* This comparison is performed on fully urldecoded URLs, so any
* percent-encoded octet is treated as equivalent to its decoded character.
* Typical cases covered include:
* - Spaces: '+' vs '%20' (e.g., ?name=John+Doe vs ?name=John%20Doe)
* - Unreserved chars unnecessarily encoded: '~' vs '%7E', '-' vs '%2D', '_' vs '%5F', '.' vs '%2E'
* It will also treat reserved chars in query parameter values as equivalent
* to their encoded form (e.g., '/' vs '%2F', ':' vs '%3A', '@' vs '%40'),
* which collapses the distinction between using these characters as data
* and using them as separators, but matches how WordPress historically
* handles canonical URL comparisons.
*

Copilot uses AI. Check for mistakes.
* Redirecting between these variants provides no SEO or functional benefit
* while potentially causing caching issues and breaking analytics.
*/
if ( urldecode( $redirect_url ) === urldecode( $requested_url ) ) {
Comment on lines +778 to +792
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The urldecode() function is too broad for this use case. It decodes the entire URL (including scheme, host, path, and query), which could suppress legitimate redirects when encoding differences exist in URL components other than query strings. Additionally, urldecode() converts both + to space AND decodes all percent-encoded characters, which means URLs like ?redirect=%2Fpath%2Fto%2Fpage (encoded slashes) and ?redirect=/path/to/page would be treated as identical, even though they have fundamentally different semantic meanings. Consider implementing a more targeted solution that only normalizes space encoding (+ vs %20) in the query string portion specifically, or use rawurldecode() which doesn't convert + to space. Alternatively, parse the URLs and compare query strings separately after normalizing only space encoding.

Suggested change
* Avoid redirects when URLs differ only in query string encoding.
* Per RFC 3986, certain characters can be represented in multiple equivalent ways:
* - Spaces: '+' vs '%20' (e.g., ?name=John+Doe vs ?name=John%20Doe)
* - Unreserved chars unnecessarily encoded: '~' vs '%7E', '-' vs '%2D', '_' vs '%5F', '.' vs '%2E'
* - Reserved chars in values: '/' vs '%2F', ':' vs '%3A', '@' vs '%40'
*
* Example problematic scenarios:
* - UTM params: ?utm_content=Hello+World vs ?utm_content=Hello%20World
* - Encoded paths: ?redirect=/path/to/page vs ?redirect=%2Fpath%2Fto%2Fpage
* - Email params: ?email=user@example.com vs ?email=user%40example.com
*
* Redirecting between these variants provides no SEO or functional benefit
* while potentially causing caching issues and breaking analytics.
*/
if ( urldecode( $redirect_url ) === urldecode( $requested_url ) ) {
* Avoid redirects when URLs differ only in query string space encoding ('+' vs '%20').
* Normalizing only the query portion prevents us from collapsing semantically
* distinct URLs that differ in how reserved characters (like '/') are encoded
* in paths or parameter values.
*/
if ( ! function_exists( 'wp_normalize_query_space_encoding' ) ) {
/**
* Normalizes space encoding in the query portion of a URL.
*
* Converts '+' to '%20' only in the query string so that URLs that differ
* solely by space encoding in their query are treated as equivalent.
*
* @since 3.1.0
* @ignore
*
* @param string $url The URL to normalize.
* @return string The URL with normalized query string space encoding.
*/
function wp_normalize_query_space_encoding( $url ) {
$qpos = strpos( $url, '?' );
if ( false === $qpos ) {
return $url;
}
$hashpos = strpos( $url, '#', $qpos );
if ( false === $hashpos ) {
$base = substr( $url, 0, $qpos + 1 );
$query = substr( $url, $qpos + 1 );
$fragment = '';
} else {
$base = substr( $url, 0, $qpos + 1 );
$query = substr( $url, $qpos + 1, $hashpos - ( $qpos + 1 ) );
$fragment = substr( $url, $hashpos );
}
$normalized_query = str_replace( '+', '%20', $query );
return $base . $normalized_query . $fragment;
}
}
$normalized_redirect_url = wp_normalize_query_space_encoding( $redirect_url );
$normalized_requested_url = wp_normalize_query_space_encoding( $requested_url );
if ( $normalized_redirect_url === $normalized_requested_url ) {

Copilot uses AI. Check for mistakes.
return;
}

// Hex-encoded octets are case-insensitive.
if ( str_contains( $requested_url, '%' ) ) {
if ( ! function_exists( 'lowercase_octets' ) ) {
Expand Down
64 changes: 64 additions & 0 deletions tests/phpunit/tests/canonical.php
Original file line number Diff line number Diff line change
Expand Up @@ -263,6 +263,12 @@ public function data_canonical() {
array( '/2008%20', '/2008' ),
array( '//2008////', '/2008/' ),

// Query string encoding variants should not redirect (Ticket #64376).
array( '/?test=one+two', '/?test=one+two' ), // Plus sign should not redirect to %20.
array( '/?test=one%20two', '/?test=one%20two' ), // %20 should not redirect to plus.
Comment on lines +267 to +268
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These test cases verify self-referential behavior (plus doesn't redirect to plus, %20 doesn't redirect to %20), but don't test the cross-encoding behavior that the fix is intended to prevent. Add test cases that verify /?test=one+two doesn't redirect to /?test=one%20two and vice versa, which is the actual problem described in the PR description.

Copilot uses AI. Check for mistakes.
array( '/?email=user%40example.com', '/?email=user%40example.com' ), // Encoded @ should not redirect.
array( '/?redirect=%2Fpath%2Fto%2Fpage', '/?redirect=%2Fpath%2Fto%2Fpage' ), // Encoded slashes should not redirect.
Comment on lines +269 to +270
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These test cases verify that encoded characters don't redirect to their encoded forms, but they don't test the more important scenarios: whether ?email=user%40example.com would incorrectly avoid redirecting to ?email=user@example.com (or vice versa), and whether ?redirect=%2Fpath%2Fto%2Fpage would incorrectly avoid redirecting to ?redirect=/path/to/page. The current tests only verify self-referential non-redirects, but the bug concern is about avoiding redirects between semantically different encodings. Add test cases that verify encoded and unencoded versions of these parameters are correctly handled as distinct URLs.

Copilot uses AI. Check for mistakes.

// @todo Endpoints (feeds, trackbacks, etc). More fuzzed mixed query variables, comment paging, Home page (static).
);
}
Expand Down Expand Up @@ -465,6 +471,64 @@ public function test_feed_canonical_with_not_exists_query() {
$this->assertNull( $redirect );
}

/**
* Test that query string encoding variants do not trigger redirects.
*
* Ensures that URLs differing only in encoding (e.g., '+' vs '%20' for spaces)
* do not cause unnecessary 301 redirects.
*
* @ticket 64376
*/
public function test_query_string_encoding_variants_no_redirect() {
// Create a static front page to match the original bug report scenario.
$page_id = self::factory()->post->create(
array(
'post_type' => 'page',
)
);
update_option( 'show_on_front', 'page' );
update_option( 'page_on_front', $page_id );

// Test 1: Plus signs in UTM parameters should not redirect to %20.
$url_with_plus = home_url( '/?utm_content=Hello+World' );
$url_with_percent = home_url( '/?utm_content=Hello%20World' );

$this->go_to( $url_with_plus );
$redirect_from_plus = redirect_canonical( $url_with_plus, false );

$this->go_to( $url_with_percent );
$redirect_from_percent = redirect_canonical( $url_with_percent, false );

// Both should return null (no redirect).
$this->assertNull( $redirect_from_plus, 'URL with + should not redirect' );
$this->assertNull( $redirect_from_percent, 'URL with %20 should not redirect' );

// Test 2: Encoded @ symbol in email parameters.
$url_encoded_at = home_url( '/?email=user%40example.com' );

$this->go_to( $url_encoded_at );
$redirect = redirect_canonical( $url_encoded_at, false );
$this->assertNull( $redirect, 'URL with encoded @ should not redirect' );

// Test 3: Encoded forward slashes in redirect parameters.
$url_encoded_slash = home_url( '/?redirect=%2Fpath%2Fto%2Fpage' );

$this->go_to( $url_encoded_slash );
$redirect = redirect_canonical( $url_encoded_slash, false );
$this->assertNull( $redirect, 'URL with encoded slashes should not redirect' );

// Test 4: Multiple query parameters with mixed encoding.
$url_mixed = home_url( '/?name=John+Doe&city=New+York&zip=12345' );

$this->go_to( $url_mixed );
$redirect = redirect_canonical( $url_mixed, false );
$this->assertNull( $redirect, 'URL with multiple plus-encoded parameters should not redirect' );

// Clean up.
delete_option( 'page_on_front' );
delete_option( 'show_on_front' );
}

/**
* Test canonical redirects for attachment pages when the option is disabled.
*
Expand Down
Loading