Skip to content
3 changes: 3 additions & 0 deletions Doc/whatsnew/3.15.rst
Original file line number Diff line number Diff line change
Expand Up @@ -660,6 +660,9 @@ xml.parsers.expat

.. _billion laughs: https://en.wikipedia.org/wiki/Billion_laughs_attack

* Add support for namespace prefixes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be also documented in the expat docs.

Copy link
Contributor Author

@ukarroum ukarroum Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean I should add a link to the C libexpat doc ?

EDIT: I think you probably meant to document this here: https://docs.python.org/3/library/pyexpat.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, to the specs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

(Contributed by Yassir Karroum in :gh:`118317`.)


zlib
----
Expand Down
34 changes: 33 additions & 1 deletion Lib/test/test_sax.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# regression test for SAX 2.0

from xml.sax import make_parser, ContentHandler, \
SAXException, SAXReaderNotAvailable, SAXParseException
SAXException, SAXReaderNotAvailable, SAXParseException, handler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why import handler when ContentHandler is already imported? why not use the latter directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced the handler.ContentHandler, with ContentHandler.
but I still need to keep the handler import for the namespace/namespace features.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about?:

Suggested change
SAXException, SAXReaderNotAvailable, SAXParseException, handler
SAXException, SAXReaderNotAvailable, SAXParseException
from xml.sax.handler import feature_namespaces, feature_namespace_prefixes

import unittest
from unittest import mock
try:
Expand Down Expand Up @@ -1307,6 +1307,38 @@ def test_expat_locator_withinfo_nonascii(self):
self.assertEqual(parser.getSystemId(), fname)
self.assertEqual(parser.getPublicId(), None)

def test_namespace_prefix(self):
parser = create_parser()
parser.setFeature(handler.feature_namespaces, 1)
parser.setFeature(handler.feature_namespace_prefixes, 1)

class Handler(handler.ContentHandler):
def startElementNS(self, name, qname, attrs):
self.qname = qname

h = Handler()

parser.setContentHandler(h)
parser.feed("<Q:E xmlns:Q='http://example.org/testuri'/>")
parser.close()
self.assertEqual(h.qname, "Q:E")

def test_default_namespace(self):
parser = create_parser()
parser.setFeature(handler.feature_namespaces, 1)

class Handler(handler.ContentHandler):
def startElementNS(self, name, qname, attrs):
self.qname = qname

h = Handler()

parser.setContentHandler(h)
parser.feed("<E xmlns='http://example.org/testuri'/>")
parser.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use parser.parse(..., True)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but I didn't add the True, it seems that the parse method only accept one argument.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Wait, @hartwork should we do feed() + close() or parse()?

Copy link
Contributor

@hartwork hartwork Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@picnixz hi!

My understanding is that:

  • create_parser is xml.sax.expatreader.create_parser and it creates an xml.sax.xmlreader.XMLReader, first sentence in the docs
  • xml.sax.xmlreader.XMLReader provides .parse that parses in one single go, one single parameter, no boolean isFinal
  • xml.sax.xmlreader.XMLReader does not provide .feed or .close because that needs IncrementalParser, not a plain XMLReader.
  • (xml.sax.expatreader.create_parser creates xml.sax.expatreader.ExpatParser that is an instance of both IncrementalParser and XMLReader for me in practice.)
  • I therefore consider parser.parse("<E xmlns='http://example.org/testuri'/>") to be better suited here because availability of .feed and .close is not guaranteed on interface level.
  • (If you want to stick to .feed here or elsewhere maybe add self.assertIsInstance(parser, IncrementalParser) prior to a call to communicate that expectation and have the test fail meaningfully for regressions in the future.)

What do you think?

self.assertEqual(h.qname, "E")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't leave 3 blank lines, only 2 is sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed



# ===========================================================================
#
Expand Down
17 changes: 10 additions & 7 deletions Lib/xml/sax/expatreader.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ def __init__(self, namespaceHandling=0, bufsize=2**16-20):
self._entity_stack = []
self._external_ges = 0
self._interning = None
self._namespace_prefixes = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it 1 by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use a new namespacePrefixesHandling argument.


# XMLReader methods

Expand Down Expand Up @@ -126,8 +127,9 @@ def getFeature(self, name):
return self._namespaces
elif name == feature_string_interning:
return self._interning is not None
elif name in (feature_validation, feature_external_pes,
feature_namespace_prefixes):
elif name == feature_namespace_prefixes:
return self._namespace_prefixes
elif name in (feature_validation, feature_external_pes):
return 0
elif name == feature_external_ges:
return self._external_ges
Expand All @@ -147,6 +149,8 @@ def setFeature(self, name, state):
self._interning = {}
else:
self._interning = None
elif name == feature_namespace_prefixes:
self._namespace_prefixes = state
Comment on lines +152 to +153
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote for moving this one further down — right after elif name == feature_external_pes: — to match the left side of the diff.

elif name == feature_validation:
if state:
raise SAXNotSupportedException(
Expand All @@ -155,10 +159,6 @@ def setFeature(self, name, state):
if state:
raise SAXNotSupportedException(
"expat does not read external parameter entities")
elif name == feature_namespace_prefixes:
if state:
raise SAXNotSupportedException(
"expat does not report namespace prefixes")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it a limitation from the C extension module? was it a Expat version limitation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick git blame, and looks like this specific check was added in this commit: 18476a3 (from 2002).
and in this commit Expat version used is 1.95 (which should support namespace prefixes), so I would say the limitation was probably just in the python wrapper.

else:
raise SAXNotRecognizedException(
"Feature '%s' not recognized" % name)
Expand Down Expand Up @@ -347,11 +347,14 @@ def start_element_ns(self, name, attrs):
pair = name.split()
if len(pair) == 1:
# no namespace
elem_qname = name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea whether this is correct or not here. Is there some specs that we could follow for the implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're referring specificaly to elem_qname, there is the spec: https://www.w3.org/TR/xml-names/#ns-qualnames.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added another test to test the "else" branch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@picnixz these lines are 1:1 from PyXML 0.8.4:

meld

pair = (None, name)
elif len(pair) == 3:
elem_qname = "%s:%s" % (pair[2], pair[1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: This is the key value provider in this pull request. pair[2] was previously not exposed.

pair = pair[0], pair[1]
else:
# default namespace
elem_qname = pair[1]
pair = tuple(pair)

newattrs = {}
Expand All @@ -374,7 +377,7 @@ def start_element_ns(self, name, attrs):
newattrs[apair] = value
qnames[apair] = qname

self._cont_handler.startElementNS(pair, None,
self._cont_handler.startElementNS(pair, elem_qname,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will never pass None even with support for namespace prefixes disabled.

The following patch is arguably a bit silly, but it keeps things backwards compatible (and demos the idea):

Suggested change
self._cont_handler.startElementNS(pair, elem_qname,
if not self._namespace_prefixes:
elem_qname = None
self._cont_handler.startElementNS(pair, elem_qname,

AttributesNSImpl(newattrs, qnames))

def end_element_ns(self, name):
Expand Down
Loading