Skip to content

Commit 4729b37

Browse files
committed
bpo-22833: Fix bytes/str inconsistency in email.header.decode_header()
This functions possible return types have been non-intuitive and surprising for the entirety of its Python 3.x history. It can return either: 1. `typing.List[typing.Tuple[bytes, typing.Optional[str]]]` 2. or `typing.List[typing.Tuple[str, None]]`, of length exactly 1 This has meant that any user of this function must be prepared to accept either `bytes` or `str` for the first member of the 2-tuples it returns, which is a very surprising behavior in Python 3.x, particularly given that the second member of the tuple is supposed to represent the charset/encoding of the first member. This change eliminates case (2), ensuring that `email.header.decode_header()` always returns `bytes`, never `str`, as the first member of the 2-tuples it returns. It also adds a test case to verify this behavior.
1 parent dce642f commit 4729b37

File tree

3 files changed

+17
-2
lines changed

3 files changed

+17
-2
lines changed

Lib/email/header.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@
6161
def decode_header(header):
6262
"""Decode a message header value without converting charset.
6363
64-
Returns a list of (string, charset) pairs containing each of the decoded
64+
Returns a list of (bytes, charset) pairs containing each of the decoded
6565
parts of the header. Charset is None for non-encoded parts of the header,
6666
otherwise a lower-case string containing the name of the character set
6767
specified in the encoded string.
@@ -78,7 +78,7 @@ def decode_header(header):
7878
for string, charset in header._chunks]
7979
# If no encoding, just return the header with no charset.
8080
if not ecre.search(header):
81-
return [(header, None)]
81+
return [bytes(header, 'raw-unicode-escape'), None)]
8282
# First step is to parse all the encoded parts into triplets of the form
8383
# (encoded_string, encoding, charset). For unencoded strings, the last
8484
# two parts will be None.

Lib/test/test_email/test_email.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2432,6 +2432,18 @@ def test_multiline_header(self):
24322432
self.assertEqual(str(make_header(decode_header(s))),
24332433
'"Müller T" <T.Mueller@xxx.com>')
24342434

2435+
def test_unencoded_ascii(self):
2436+
# issue 22833
2437+
s = 'header without encoded words'
2438+
self.assertEqual(decode_header(s),
2439+
[(b'header without encoded words', None)])
2440+
2441+
def test_unencoded_utf8(self):
2442+
# issue 22833
2443+
s = 'header with unexpected non ASCII caract\xe8res'
2444+
self.assertEqual(decode_header(s),
2445+
[(b'header with unexpected non ASCII caract\xe8res', None)])
2446+
24352447

24362448
# Test the MIMEMessage class
24372449
class TestMIMEMessage(TestEmailBase):
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
The :func:`email.header.decode_header` function now always provides :class:`bytes`,
2+
never :class:`str`, as the first member of the tuples it returns. Previously, it would
3+
return (str, None) when decoding a header consisting only of a single, unencoded part.

0 commit comments

Comments
 (0)