Skip to content

Commit 6809811

Browse files
miss-islingtonbitdancerStanFromIreland
authored
[3.14] Correctly fold unknown-8bit originating from encoded words. (GH-142517) (#143146)
The unknown-8bit trick was designed to deal with unknown bytes in an ASCII message, and it works fine for that. However, I also tried to extend it to handle bytes that can't be decoded using the charset specified in an encoded word, and there it fails because there can be other non-ASCII characters that were *successfully* decoded. The fix is simple: do the unknown-8bit encoding using the utf-8 codec. This is especially appropriate since anyone trying to do recovery on an unknown byte string will probably attempt utf-8 first. (cherry picked from commit 1e17ccd) Co-authored-by: R. David Murray <rdmurray@bitdance.com> Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
1 parent b921374 commit 6809811

File tree

3 files changed

+13
-1
lines changed

3 files changed

+13
-1
lines changed

Lib/email/_encoded_words.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -219,7 +219,7 @@ def encode(string, charset='utf-8', encoding=None, lang=''):
219219
220220
"""
221221
if charset == 'unknown-8bit':
222-
bstring = string.encode('ascii', 'surrogateescape')
222+
bstring = string.encode('utf-8', 'surrogateescape')
223223
else:
224224
bstring = string.encode(charset)
225225
if encoding is None:

Lib/test/test_email/test__header_value_parser.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3340,5 +3340,13 @@ def test_fold_unfoldable_element_stealing_whitespace(self):
33403340
token = parser.get_address_list(text)[0]
33413341
self._test(token, expected, policy=policy)
33423342

3343+
def test_encoded_word_with_undecodable_bytes(self):
3344+
self._test(parser.get_address_list(
3345+
' =?utf-8?Q?=E5=AE=A2=E6=88=B6=E6=AD=A3=E8=A6=8F=E4=BA=A4=E7?='
3346+
)[0],
3347+
' =?unknown-8bit?b?5a6i5oi25q2j6KaP5Lqk5w==?=\n',
3348+
)
3349+
3350+
33433351
if __name__ == '__main__':
33443352
unittest.main()
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
The non-``compat32`` :mod:`email` policies now correctly handle refolding
2+
encoded words that contain bytes that can not be decoded in their specified
3+
character set. Previously this resulted in an encoding exception during
4+
folding.

0 commit comments

Comments
 (0)