-
-
Notifications
You must be signed in to change notification settings - Fork 33.8k
gh-136702: Deprecate passing non-ascii *encoding* (str) to encodings.normalize_encoding
#140030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 5 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
c8fc658
deprecate non-ascii
StanFromIreland 5b50daa
Relocate import
StanFromIreland 95f2e65
sanitize charset names in email
StanFromIreland fad52cd
Use table, replace with 'ascii'
StanFromIreland 3ac0804
Merge branch 'main' into encodings/non-ascii
StanFromIreland 9d6f06e
Review
StanFromIreland 16697dc
Fix second warning
StanFromIreland e4036f8
Convert to asserts
StanFromIreland b8fc5f4
Fix for platforms with ordered tests
StanFromIreland 7592af8
!fixup
StanFromIreland 8c59899
Fix CI on Android and iOS
StanFromIreland File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
3 changes: 3 additions & 0 deletions
3
Misc/NEWS.d/next/Library/2025-10-13-11-25-41.gh-issue-136702.uvLGK1.rst
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| :mod:`encodings`: Deprecate passing a non-ascii *encoding* name to | ||
| :func:`encodings.normalize_encoding` and schedule removal of support for | ||
| Python 3.17. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the trigger for this change? Do I actually have a test that uses a non-ascii charset name? If I did it should be an error case, since non-ascii is not permitted in charset names per the RFCs. I'm surprised I don't appear to be registering a defect for that, though I didn't go through the code enough to be sure I don't ;)
Regardless it isn't clear to me that 'sanitizing' is a useful operation. It isn't likely to produce a valid charset name, we should just be falling back to ascii at that point. What led you to choose this approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently done by
normalize_encoding.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. emal doesn't call lookup directly and no tests fail without the changes.
I presume you did this to preserve backward compatibility. Unless I'm missing something, I don't think we should bother to do that. Given a non-ascii charset name, there are two possible outcomes from the current code: the name after sanitizing is not a valid codec name, or it is. If it is valid after sanitizing, there are two cases: the sanitized name results in successful decoding, or it does not. It is only the first of these second two cases that would be affected by the post-deprecation change.
How often would that case occur in reality? I would guess it would be a vanishingly small number of cases, if it ever occurs at all.
I think it will be better to remove the changes to the email package from this PR. If anyone sees the deprecation warning maybe they'll open an issue, but I'm betting nobody ever sees it from the email package. The behavior after the deprecation is over is the behavior we want: if the codec name contains non-ascii it is not a valid codec name, so any non-ascii in the text being decoded using that charset name will ultimately get turned into the 'unknown character' glyph when decoded by the email package.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm no email expert and I did not dig into the specifications, so I did this to not change any behaviour. I can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What't the conclusion here ? I still see the email package changes in place, but they look pretty harmless to me.