Skip to content
3 changes: 3 additions & 0 deletions Doc/whatsnew/3.15.rst
Original file line number Diff line number Diff line change
Expand Up @@ -824,6 +824,9 @@ xml.parsers.expat

.. _billion laughs: https://en.wikipedia.org/wiki/Billion_laughs_attack

* Add support for `namespace prefixes <https://www.w3.org/TR/xml-names/#dt-prefix>`_.
(Contributed by Yassir Karroum in :gh:`118317`.)

Comment on lines +827 to +829
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is adding to section xml.parsers.expat while the changes in here are patching code elsewhere. (I'm unsure when/if news files should be written to directly. I have only seen things go through news files myself.)


zlib
----
Expand Down
20 changes: 18 additions & 2 deletions Lib/test/test_sax.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# regression test for SAX 2.0

from xml.sax import make_parser, ContentHandler, \
SAXException, SAXReaderNotAvailable, SAXParseException
SAXException, SAXReaderNotAvailable, SAXParseException, handler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why import handler when ContentHandler is already imported? why not use the latter directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced the handler.ContentHandler, with ContentHandler.
but I still need to keep the handler import for the namespace/namespace features.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about?:

Suggested change
SAXException, SAXReaderNotAvailable, SAXParseException, handler
SAXException, SAXReaderNotAvailable, SAXParseException
from xml.sax.handler import feature_namespaces, feature_namespace_prefixes

import unittest
from unittest import mock
try:
Expand Down Expand Up @@ -1307,6 +1306,23 @@ def test_expat_locator_withinfo_nonascii(self):
self.assertEqual(parser.getSystemId(), fname)
self.assertEqual(parser.getPublicId(), None)

def test_qualified_names(self):

class Handler(ContentHandler):
def startElementNS(self, name, qname, attrs):
self.qname = qname

for xml_s, expected_qname in zip(["<Q:E xmlns:Q='http://example.org/testuri'/>", "<E xmlns='http://example.org/testuri'/>", "<E />"], ["Q:E", "E", "E"]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider this variant much more readable:

Suggested change
for xml_s, expected_qname in zip(["<Q:E xmlns:Q='http://example.org/testuri'/>", "<E xmlns='http://example.org/testuri'/>", "<E />"], ["Q:E", "E", "E"]):
for xml_s, expected_qname in (
("<Q:E xmlns:Q='http://example.org/testuri'/>", "Q:E"),
("<E xmlns='http://example.org/testuri'/>", "E"),
("<E />", "E"),
):

parser = create_parser()
parser.setFeature(handler.feature_namespaces, 1)
parser.setFeature(handler.feature_namespace_prefixes, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there should also be a test for the behavior with this disabled where all QNames returned are None as previously.


h = Handler()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't leave 3 blank lines, only 2 is sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

parser.setContentHandler(h)
parser.parse(StringIO(xml_s))
self.assertEqual(h.qname, expected_qname)


# ===========================================================================
#
Expand Down
19 changes: 11 additions & 8 deletions Lib/xml/sax/expatreader.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ def getSystemId(self):
class ExpatParser(xmlreader.IncrementalParser, xmlreader.Locator):
"""SAX driver for the pyexpat C module."""

def __init__(self, namespaceHandling=0, bufsize=2**16-20):
def __init__(self, namespaceHandling=0, namespacePrefixesHandling=0, bufsize=2**16-20):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking aloud: if setFeature can toggle this after initialization and PyXML 0.8.4 did not have this, maybe we do not need this direct channel and can just initialize with = 0 below. So my vote for simplification.

xmlreader.IncrementalParser.__init__(self, bufsize)
self._source = xmlreader.InputSource()
self._parser = None
Expand All @@ -91,6 +91,7 @@ def __init__(self, namespaceHandling=0, bufsize=2**16-20):
self._entity_stack = []
self._external_ges = 0
self._interning = None
self._namespace_prefixes = namespacePrefixesHandling

# XMLReader methods

Expand Down Expand Up @@ -126,8 +127,9 @@ def getFeature(self, name):
return self._namespaces
elif name == feature_string_interning:
return self._interning is not None
elif name in (feature_validation, feature_external_pes,
feature_namespace_prefixes):
elif name == feature_namespace_prefixes:
return self._namespace_prefixes
elif name in (feature_validation, feature_external_pes):
return 0
elif name == feature_external_ges:
return self._external_ges
Expand All @@ -147,6 +149,8 @@ def setFeature(self, name, state):
self._interning = {}
else:
self._interning = None
elif name == feature_namespace_prefixes:
self._namespace_prefixes = state
Comment on lines +152 to +153
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote for moving this one further down — right after elif name == feature_external_pes: — to match the left side of the diff.

elif name == feature_validation:
if state:
raise SAXNotSupportedException(
Expand All @@ -155,10 +159,6 @@ def setFeature(self, name, state):
if state:
raise SAXNotSupportedException(
"expat does not read external parameter entities")
elif name == feature_namespace_prefixes:
if state:
raise SAXNotSupportedException(
"expat does not report namespace prefixes")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it a limitation from the C extension module? was it a Expat version limitation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick git blame, and looks like this specific check was added in this commit: 18476a3 (from 2002).
and in this commit Expat version used is 1.95 (which should support namespace prefixes), so I would say the limitation was probably just in the python wrapper.

else:
raise SAXNotRecognizedException(
"Feature '%s' not recognized" % name)
Expand Down Expand Up @@ -347,11 +347,14 @@ def start_element_ns(self, name, attrs):
pair = name.split()
if len(pair) == 1:
# no namespace
elem_qname = name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea whether this is correct or not here. Is there some specs that we could follow for the implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're referring specificaly to elem_qname, there is the spec: https://www.w3.org/TR/xml-names/#ns-qualnames.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added another test to test the "else" branch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@picnixz these lines are 1:1 from PyXML 0.8.4:

meld

pair = (None, name)
elif len(pair) == 3:
elem_qname = "%s:%s" % (pair[2], pair[1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: This is the key value provider in this pull request. pair[2] was previously not exposed.

pair = pair[0], pair[1]
else:
# default namespace
elem_qname = pair[1]
pair = tuple(pair)

newattrs = {}
Expand All @@ -374,7 +377,7 @@ def start_element_ns(self, name, attrs):
newattrs[apair] = value
qnames[apair] = qname

self._cont_handler.startElementNS(pair, None,
self._cont_handler.startElementNS(pair, elem_qname,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will never pass None even with support for namespace prefixes disabled.

The following patch is arguably a bit silly, but it keeps things backwards compatible (and demos the idea):

Suggested change
self._cont_handler.startElementNS(pair, elem_qname,
if not self._namespace_prefixes:
elem_qname = None
self._cont_handler.startElementNS(pair, elem_qname,

AttributesNSImpl(newattrs, qnames))

def end_element_ns(self, name):
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Backported namespaces prefixes support for xml.sax.expatreader from PyXml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Backported namespaces prefixes support for xml.sax.expatreader from PyXml
Backported namespaces prefixes support for xml.sax.expatreader from PyXML

(0.8.4)
Loading