Fix isAscii returning true for malformed surrogates #9261

one-kash · 2026-01-21T19:49:38Z

Summary

The isAscii() extension function in OkHostnameVerifier used length == utf8Size() to determine if a string was ASCII. This was incorrect because Okio's utf8Size() maps malformed surrogates (unpaired high/low surrogates) to the replacement character ? which is 1 byte in UTF-8.

This caused strings like "\uD800.com" (unpaired high surrogate) to incorrectly report as ASCII since:

length = 5 (one char for surrogate + 4 for ".com")
utf8Size() = 5 (surrogate mapped to '?' = 1 byte + 4 for ".com")

The fix replaces the comparison with a simple character iteration check: all { it.code <= 127 }

Changes

OkHostnameVerifier.kt: Changed isAscii() to iterate through characters and check code points
HostnameVerifierTest.kt: Added test for malformed surrogates

Test Plan

Added malformedSurrogatesAreNotAscii test that verifies unpaired high/low surrogates are rejected
All existing HostnameVerifierTest tests continue to pass

Fixes #6357

The previous implementation used `length == utf8Size()` to determine if a string was ASCII. This was incorrect because Okio's utf8Size() maps malformed surrogates (unpaired high/low surrogates) to the replacement character `?` which is 1 byte in UTF-8. This caused strings like "\uD800.com" (unpaired high surrogate) to incorrectly report as ASCII since: - length = 5 (one char for surrogate + 4 for ".com") - utf8Size() = 5 (surrogate mapped to '?' = 1 byte + 4 for ".com") The fix iterates through each character and checks if its code point is within the ASCII range (0-127). Fixes square#6357

yschimke · 2026-01-25T15:40:40Z

okhttp/src/commonJvmAndroid/kotlin/okhttp3/internal/tls/OkHostnameVerifier.kt


  /** Returns true if the [String] is ASCII encoded (0-127). */
-  private fun String.isAscii() = length == utf8Size().toInt()
+  private fun String.isAscii() = all { it.code <= 127 }


I suspect we want to avoid Iterable.all to avoid an allocation.

yschimke · 2026-01-25T15:50:57Z

okhttp/src/jvmTest/kotlin/okhttp3/internal/tls/HostnameVerifierTest.kt

+    // Unpaired high surrogate - should not match any hostname
+    assertThat(verifier.verify("\uD800.com", session)).isFalse()
+    assertThat(verifier.verify("foo\uD800.com", session)).isFalse()
+
+    // Unpaired low surrogate - should not match any hostname
+    assertThat(verifier.verify("\uDC00.com", session)).isFalse()
+    assertThat(verifier.verify("foo\uDC00.com", session)).isFalse()


Would any of these have matched before the change? is there some case where length == utf8Size().toInt() would match because of offsetting issues, longer lower case, and malformed surrogate.

yschimke reviewed Jan 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix isAscii returning true for malformed surrogates #9261

Fix isAscii returning true for malformed surrogates #9261

one-kash commented Jan 21, 2026

Uh oh!

yschimke Jan 25, 2026

Uh oh!

yschimke Jan 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix isAscii returning true for malformed surrogates #9261

Are you sure you want to change the base?

Fix isAscii returning true for malformed surrogates #9261

Conversation

one-kash commented Jan 21, 2026

Summary

Changes

Test Plan

Uh oh!

yschimke Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

yschimke Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants