Skip to content

Commit e49766f

Browse files
committed
Merge remote-tracking branch 'upstream/3.12' into backport-f04bea4-3.12
2 parents 932d28d + a183a11 commit e49766f

31 files changed

+548
-318
lines changed

.github/workflows/build.yml

Lines changed: 4 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -240,24 +240,16 @@ jobs:
240240
strategy:
241241
fail-fast: false
242242
matrix:
243-
# Cirrus and macos-14 are M1, macos-13 is default GHA Intel.
244-
# macOS 13 only runs tests against the GIL-enabled CPython.
245-
# Cirrus used for upstream, macos-14 for forks.
243+
# macos-14 is M1, macos-15-intel is Intel.
244+
# macos-15-intel only runs tests against the GIL-enabled CPython.
246245
os:
247-
- ghcr.io/cirruslabs/macos-runner:sonoma
248246
- macos-14
249-
- macos-13
250-
is-fork: # only used for the exclusion trick
251-
- ${{ github.repository_owner != 'python' }}
247+
- macos-15-intel
252248
free-threading:
253249
- false
254250
# - true
255251
exclude:
256-
- os: ghcr.io/cirruslabs/macos-runner:sonoma
257-
is-fork: true
258-
- os: macos-14
259-
is-fork: false
260-
- os: macos-13
252+
- os: macos-15-intel
261253
free-threading: true
262254
uses: ./.github/workflows/reusable-macos.yml
263255
with:

Doc/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -183,7 +183,7 @@ venv:
183183
fi
184184

185185
.PHONY: dist-no-html
186-
dist-no-html: dist-text dist-pdf dist-epub dist-texinfo
186+
dist-no-html: dist-text dist-epub dist-texinfo
187187

188188
.PHONY: dist
189189
dist:

Doc/library/html.parser.rst

Lines changed: 20 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -15,14 +15,18 @@
1515
This module defines a class :class:`HTMLParser` which serves as the basis for
1616
parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML.
1717

18-
.. class:: HTMLParser(*, convert_charrefs=True)
18+
.. class:: HTMLParser(*, convert_charrefs=True, scripting=False)
1919

2020
Create a parser instance able to parse invalid markup.
2121

22-
If *convert_charrefs* is ``True`` (the default), all character
23-
references (except the ones in ``script``/``style`` elements) are
22+
If *convert_charrefs* is true (the default), all character
23+
references (except the ones in elements like ``script`` and ``style``) are
2424
automatically converted to the corresponding Unicode characters.
2525

26+
If *scripting* is false (the default), the content of the ``noscript``
27+
element is parsed normally; if it's true, it's returned as is without
28+
being parsed.
29+
2630
An :class:`.HTMLParser` instance is fed HTML data and calls handler methods
2731
when start tags, end tags, text, comments, and other markup elements are
2832
encountered. The user should subclass :class:`.HTMLParser` and override its
@@ -37,6 +41,9 @@ parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML.
3741
.. versionchanged:: 3.5
3842
The default value for argument *convert_charrefs* is now ``True``.
3943

44+
.. versionchanged:: 3.12.13
45+
Added the *scripting* parameter.
46+
4047

4148
Example HTML Parser Application
4249
-------------------------------
@@ -159,24 +166,24 @@ implementations do nothing (except for :meth:`~HTMLParser.handle_startendtag`):
159166
.. method:: HTMLParser.handle_data(data)
160167

161168
This method is called to process arbitrary data (e.g. text nodes and the
162-
content of ``<script>...</script>`` and ``<style>...</style>``).
169+
content of elements like ``script`` and ``style``).
163170

164171

165172
.. method:: HTMLParser.handle_entityref(name)
166173

167174
This method is called to process a named character reference of the form
168175
``&name;`` (e.g. ``&gt;``), where *name* is a general entity reference
169-
(e.g. ``'gt'``). This method is never called if *convert_charrefs* is
170-
``True``.
176+
(e.g. ``'gt'``).
177+
This method is only called if *convert_charrefs* is false.
171178

172179

173180
.. method:: HTMLParser.handle_charref(name)
174181

175182
This method is called to process decimal and hexadecimal numeric character
176183
references of the form :samp:`&#{NNN};` and :samp:`&#x{NNN};`. For example, the decimal
177184
equivalent for ``&gt;`` is ``&#62;``, whereas the hexadecimal is ``&#x3E;``;
178-
in this case the method will receive ``'62'`` or ``'x3E'``. This method
179-
is never called if *convert_charrefs* is ``True``.
185+
in this case the method will receive ``'62'`` or ``'x3E'``.
186+
This method is only called if *convert_charrefs* is false.
180187

181188

182189
.. method:: HTMLParser.handle_comment(data)
@@ -284,8 +291,8 @@ Parsing an element with a few attributes and a title::
284291
Data : Python
285292
End tag : h1
286293

287-
The content of ``script`` and ``style`` elements is returned as is, without
288-
further parsing::
294+
The content of elements like ``script`` and ``style`` is returned as is,
295+
without further parsing::
289296

290297
>>> parser.feed('<style type="text/css">#python { color: green }</style>')
291298
Start tag: style
@@ -294,10 +301,10 @@ further parsing::
294301
End tag : style
295302

296303
>>> parser.feed('<script type="text/javascript">'
297-
... 'alert("<strong>hello!</strong>");</script>')
304+
... 'alert("<strong>hello! &#9786;</strong>");</script>')
298305
Start tag: script
299306
attr: ('type', 'text/javascript')
300-
Data : alert("<strong>hello!</strong>");
307+
Data : alert("<strong>hello! &#9786;</strong>");
301308
End tag : script
302309

303310
Parsing comments::
@@ -317,7 +324,7 @@ correct char (note: these 3 references are all equivalent to ``'>'``)::
317324

318325
Feeding incomplete chunks to :meth:`~HTMLParser.feed` works, but
319326
:meth:`~HTMLParser.handle_data` might be called more than once
320-
(unless *convert_charrefs* is set to ``True``)::
327+
if *convert_charrefs* is false::
321328

322329
>>> for chunk in ['<sp', 'an>buff', 'ered ', 'text</s', 'pan>']:
323330
... parser.feed(chunk)

Include/patchlevel.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,12 +18,12 @@
1818
/*--start constants--*/
1919
#define PY_MAJOR_VERSION 3
2020
#define PY_MINOR_VERSION 12
21-
#define PY_MICRO_VERSION 11
21+
#define PY_MICRO_VERSION 12
2222
#define PY_RELEASE_LEVEL PY_RELEASE_LEVEL_FINAL
2323
#define PY_RELEASE_SERIAL 0
2424

2525
/* Version as a string */
26-
#define PY_VERSION "3.12.11+"
26+
#define PY_VERSION "3.12.12+"
2727
/*--end constants--*/
2828

2929
/* Version as a single 4-byte hex number, e.g. 0x010502B2 == 1.5.2b2.

Lib/html/parser.py

Lines changed: 18 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -109,17 +109,25 @@ class HTMLParser(_markupbase.ParserBase):
109109
argument.
110110
"""
111111

112-
CDATA_CONTENT_ELEMENTS = ("script", "style")
112+
# See the HTML5 specs section "13.4 Parsing HTML fragments".
113+
# https://html.spec.whatwg.org/multipage/parsing.html#parsing-html-fragments
114+
# CDATA_CONTENT_ELEMENTS are parsed in RAWTEXT mode
115+
CDATA_CONTENT_ELEMENTS = ("script", "style", "xmp", "iframe", "noembed", "noframes")
113116
RCDATA_CONTENT_ELEMENTS = ("textarea", "title")
114117

115-
def __init__(self, *, convert_charrefs=True):
118+
def __init__(self, *, convert_charrefs=True, scripting=False):
116119
"""Initialize and reset this instance.
117120
118-
If convert_charrefs is True (the default), all character references
121+
If convert_charrefs is true (the default), all character references
119122
are automatically converted to the corresponding Unicode characters.
123+
124+
If *scripting* is false (the default), the content of the
125+
``noscript`` element is parsed normally; if it's true,
126+
it's returned as is without being parsed.
120127
"""
121128
super().__init__()
122129
self.convert_charrefs = convert_charrefs
130+
self.scripting = scripting
123131
self.reset()
124132

125133
def reset(self):
@@ -154,7 +162,9 @@ def get_starttag_text(self):
154162
def set_cdata_mode(self, elem, *, escapable=False):
155163
self.cdata_elem = elem.lower()
156164
self._escapable = escapable
157-
if escapable and not self.convert_charrefs:
165+
if self.cdata_elem == 'plaintext':
166+
self.interesting = re.compile(r'\Z')
167+
elif escapable and not self.convert_charrefs:
158168
self.interesting = re.compile(r'&|</%s(?=[\t\n\r\f />])' % self.cdata_elem,
159169
re.IGNORECASE|re.ASCII)
160170
else:
@@ -435,8 +445,10 @@ def parse_starttag(self, i):
435445
self.handle_startendtag(tag, attrs)
436446
else:
437447
self.handle_starttag(tag, attrs)
438-
if tag in self.CDATA_CONTENT_ELEMENTS:
439-
self.set_cdata_mode(tag)
448+
if (tag in self.CDATA_CONTENT_ELEMENTS or
449+
(self.scripting and tag == "noscript") or
450+
tag == "plaintext"):
451+
self.set_cdata_mode(tag, escapable=False)
440452
elif tag in self.RCDATA_CONTENT_ELEMENTS:
441453
self.set_cdata_mode(tag, escapable=True)
442454
return endpos

Lib/ntpath.py

Lines changed: 41 additions & 85 deletions
Original file line numberDiff line numberDiff line change
@@ -409,17 +409,23 @@ def expanduser(path):
409409
# XXX With COMMAND.COM you can use any characters in a variable name,
410410
# XXX except '^|<>='.
411411

412+
_varpattern = r"'[^']*'?|%(%|[^%]*%?)|\$(\$|[-\w]+|\{[^}]*\}?)"
413+
_varsub = None
414+
_varsubb = None
415+
412416
def expandvars(path):
413417
"""Expand shell variables of the forms $var, ${var} and %var%.
414418
415419
Unknown variables are left unchanged."""
416420
path = os.fspath(path)
421+
global _varsub, _varsubb
417422
if isinstance(path, bytes):
418423
if b'$' not in path and b'%' not in path:
419424
return path
420-
import string
421-
varchars = bytes(string.ascii_letters + string.digits + '_-', 'ascii')
422-
quote = b'\''
425+
if not _varsubb:
426+
import re
427+
_varsubb = re.compile(_varpattern.encode(), re.ASCII).sub
428+
sub = _varsubb
423429
percent = b'%'
424430
brace = b'{'
425431
rbrace = b'}'
@@ -428,94 +434,44 @@ def expandvars(path):
428434
else:
429435
if '$' not in path and '%' not in path:
430436
return path
431-
import string
432-
varchars = string.ascii_letters + string.digits + '_-'
433-
quote = '\''
437+
if not _varsub:
438+
import re
439+
_varsub = re.compile(_varpattern, re.ASCII).sub
440+
sub = _varsub
434441
percent = '%'
435442
brace = '{'
436443
rbrace = '}'
437444
dollar = '$'
438445
environ = os.environ
439-
res = path[:0]
440-
index = 0
441-
pathlen = len(path)
442-
while index < pathlen:
443-
c = path[index:index+1]
444-
if c == quote: # no expansion within single quotes
445-
path = path[index + 1:]
446-
pathlen = len(path)
447-
try:
448-
index = path.index(c)
449-
res += c + path[:index + 1]
450-
except ValueError:
451-
res += c + path
452-
index = pathlen - 1
453-
elif c == percent: # variable or '%'
454-
if path[index + 1:index + 2] == percent:
455-
res += c
456-
index += 1
457-
else:
458-
path = path[index+1:]
459-
pathlen = len(path)
460-
try:
461-
index = path.index(percent)
462-
except ValueError:
463-
res += percent + path
464-
index = pathlen - 1
465-
else:
466-
var = path[:index]
467-
try:
468-
if environ is None:
469-
value = os.fsencode(os.environ[os.fsdecode(var)])
470-
else:
471-
value = environ[var]
472-
except KeyError:
473-
value = percent + var + percent
474-
res += value
475-
elif c == dollar: # variable or '$$'
476-
if path[index + 1:index + 2] == dollar:
477-
res += c
478-
index += 1
479-
elif path[index + 1:index + 2] == brace:
480-
path = path[index+2:]
481-
pathlen = len(path)
482-
try:
483-
index = path.index(rbrace)
484-
except ValueError:
485-
res += dollar + brace + path
486-
index = pathlen - 1
487-
else:
488-
var = path[:index]
489-
try:
490-
if environ is None:
491-
value = os.fsencode(os.environ[os.fsdecode(var)])
492-
else:
493-
value = environ[var]
494-
except KeyError:
495-
value = dollar + brace + var + rbrace
496-
res += value
497-
else:
498-
var = path[:0]
499-
index += 1
500-
c = path[index:index + 1]
501-
while c and c in varchars:
502-
var += c
503-
index += 1
504-
c = path[index:index + 1]
505-
try:
506-
if environ is None:
507-
value = os.fsencode(os.environ[os.fsdecode(var)])
508-
else:
509-
value = environ[var]
510-
except KeyError:
511-
value = dollar + var
512-
res += value
513-
if c:
514-
index -= 1
446+
447+
def repl(m):
448+
lastindex = m.lastindex
449+
if lastindex is None:
450+
return m[0]
451+
name = m[lastindex]
452+
if lastindex == 1:
453+
if name == percent:
454+
return name
455+
if not name.endswith(percent):
456+
return m[0]
457+
name = name[:-1]
515458
else:
516-
res += c
517-
index += 1
518-
return res
459+
if name == dollar:
460+
return name
461+
if name.startswith(brace):
462+
if not name.endswith(rbrace):
463+
return m[0]
464+
name = name[1:-1]
465+
466+
try:
467+
if environ is None:
468+
return os.fsencode(os.environ[os.fsdecode(name)])
469+
else:
470+
return environ[name]
471+
except KeyError:
472+
return m[0]
473+
474+
return sub(repl, path)
519475

520476

521477
# Normalize a path, e.g. A//B, A/./B and A/foo/../B all become A\B.

0 commit comments

Comments
 (0)