gh-54873: Add support for namespaces prefixes to `xml.sax.expatreader` #118317

ukarroum · 2024-04-26T12:16:19Z

Issue: xml.sax.expatreader should support namespace prefixes #54873

…ader from PyXml (0.8.4)

picnixz

Thanks, but as this is a new feature:

Better tests must be written. Why are we using a new file?
A What's New entry must be added in addition to the NEWS entry.
Documentation must be added.
PyXML is no longer maintained. What about other XML parsers?

picnixz · 2025-11-02T16:42:10Z

Lib/test/test_xml_expatreader.py

+        self.parser.setContentHandler(h)
+        self.parser.feed("<Q:E xmlns:Q='http://example.org/testuri'/>")
+        self.parser.close()
+        print("self.assertEqual")


What are those?

A shamelessly forgot debug print.
I removed it

picnixz · 2025-11-02T16:42:28Z

Lib/test/test_xml_expatreader.py

+        self.parser.feed("<Q:E xmlns:Q='http://example.org/testuri'/>")
+        self.parser.close()
+        print("self.assertEqual")
+        self.assertFalse(h.qname is None)


This is not sufficient. We should match against the exact qname.

picnixz · 2025-11-02T16:42:52Z

Lib/test/test_xml_expatreader.py

@@ -0,0 +1,24 @@
+import unittest


Why are we having a new test file? there should already be a testfile for expat reader.

I assumed all xml unit tests were in the test_xml_* files. Looks like there is a test_sax.py.
Moved test there.

picnixz · 2025-11-02T16:43:23Z

Misc/NEWS.d/next/Library/2024-04-26-14-21-04.gh-issue-54873.vf2bfp.rst

@@ -0,0 +1,2 @@
+Backported namespaces prefixes support for xml.sax.expatreader from PyXml


A What's New entry should be written; this is a new feature. Also, I don't think we should indicate PyXML either. Please look at other changelog entries involving pyexpat to have an idea of how to write them.

Done.
reason why I added the "from PyXml" is that I didn't write the code from scratch.
I backported / copied it from the existing PyXml implementation.

Not sure how to correctly credit the original library.

picnixz · 2025-11-02T16:43:59Z

Lib/xml/sax/expatreader.py

-        elif name == feature_namespace_prefixes:
-            if state:
-                raise SAXNotSupportedException(
-                    "expat does not report namespace prefixes")


Was it a limitation from the C extension module? was it a Expat version limitation?

I did a quick git blame, and looks like this specific check was added in this commit: 18476a3 (from 2002).
and in this commit Expat version used is 1.95 (which should support namespace prefixes), so I would say the limitation was probably just in the python wrapper.

picnixz · 2025-11-02T16:48:04Z

Lib/xml/sax/expatreader.py

        pair = name.split()
        if len(pair) == 1:
            # no namespace
+            elem_qname = name


I have no idea whether this is correct or not here. Is there some specs that we could follow for the implementation?

if you're referring specificaly to elem_qname, there is the spec: https://www.w3.org/TR/xml-names/#ns-qualnames.

I also added another test to test the "else" branch.

@picnixz these lines are 1:1 from PyXML 0.8.4:

bedevere-app · 2025-11-02T16:48:13Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes, you will be poked with soft cushions!

picnixz · 2025-11-02T19:56:13Z

Lib/test/test_sax.py


 from xml.sax import make_parser, ContentHandler, \
-                    SAXException, SAXReaderNotAvailable, SAXParseException
+                    SAXException, SAXReaderNotAvailable, SAXParseException, handler


Why import handler when ContentHandler is already imported? why not use the latter directly?

I replaced the handler.ContentHandler, with ContentHandler.
but I still need to keep the handler import for the namespace/namespace features.

How about?:

Suggested change

SAXException, SAXReaderNotAvailable, SAXParseException, handler

SAXException, SAXReaderNotAvailable, SAXParseException

from xml.sax.handler import feature_namespaces, feature_namespace_prefixes

picnixz · 2025-11-02T19:57:13Z

Lib/test/test_sax.py

+        parser.feed("<E xmlns='http://example.org/testuri'/>")
+        parser.close()
+        self.assertEqual(h.qname, "E")
+


Don't leave 3 blank lines, only 2 is sufficient.

picnixz · 2025-11-02T19:57:34Z

Lib/test/test_sax.py

+        parser.feed("<E xmlns='http://example.org/testuri'/>")
+        parser.close()


Use parser.parse(..., True)

Done, but I didn't add the True, it seems that the parse method only accept one argument.

Oh. Wait, @hartwork should we do feed() + close() or parse()?

@picnixz hi!

My understanding is that:

create_parser is xml.sax.expatreader.create_parser and it creates an xml.sax.xmlreader.XMLReader, first sentence in the docs

xml.sax.xmlreader.XMLReader provides .parse that parses in one single go, one single parameter, no boolean isFinal

xml.sax.xmlreader.XMLReader does not provide .feed or .close because that needs IncrementalParser, not a plain XMLReader.

(xml.sax.expatreader.create_parser creates xml.sax.expatreader.ExpatParser that is an instance of both IncrementalParser and XMLReader for me in practice.)

I therefore consider parser.parse("<E xmlns='http://example.org/testuri'/>") to be better suited here because availability of .feed and .close is not guaranteed on interface level.

(If you want to stick to .feed here or elsewhere maybe add self.assertIsInstance(parser, IncrementalParser) prior to a call to communicate that expectation and have the test fail meaningfully for regressions in the future.)

What do you think?

picnixz · 2025-11-02T19:58:27Z

Lib/xml/sax/expatreader.py

        self._entity_stack = []
        self._external_ges = 0
        self._interning = None
+        self._namespace_prefixes = 1


Why is it 1 by default?

Changed to use a new namespacePrefixesHandling argument.

picnixz · 2025-11-02T19:59:17Z

Lib/xml/sax/expatreader.py

        pair = name.split()
        if len(pair) == 1:
            # no namespace
+            elem_qname = name


picnixz · 2025-11-02T19:59:38Z

Doc/whatsnew/3.15.rst


  .. _billion laughs: https://en.wikipedia.org/wiki/Billion_laughs_attack

+* Add support for namespace prefixes.


This should be also documented in the expat docs.

do you mean I should add a link to the C libexpat doc ?

EDIT: I think you probably meant to document this here: https://docs.python.org/3/library/pyexpat.html

No, to the specs

picnixz · 2025-11-02T20:18:02Z

Sorry, I meant: keep the NEWS entry and add the What's New entry (IOW, both of them must be added).

Changes were made.

picnixz · 2025-11-02T21:34:31Z

cc @hartwork as the maintainer of Expat

hartwork · 2025-11-02T22:18:20Z

cc @hartwork as the maintainer of Expat

@picnixz thanks!

At least for the first half of the starting week my attention will be elsewhere. Also I will feel more free to participate here after #139460 is merged and fully off my plate.

hartwork

Hi @ukarroum and @picnixz,

I believe I understand the ideas behind this pull request now. I have compared with what PyXML 0.8.4 did and will also attach my local play code here (that is derived from the test case in this pull request):

Download: sax_debug.py

I am optimistic that this pull request will be in mergable shape soon. Here is what I found:

hartwork · 2025-11-26T19:00:56Z

Doc/whatsnew/3.15.rst

+* Add support for `namespace prefixes <https://www.w3.org/TR/xml-names/#dt-prefix>`_.
+  (Contributed by Yassir Karroum in :gh:`118317`.)
+


This is adding to section xml.parsers.expat while the changes in here are patching code elsewhere. (I'm unsure when/if news files should be written to directly. I have only seen things go through news files myself.)

hartwork · 2025-11-26T19:19:43Z

Lib/test/test_sax.py


 from xml.sax import make_parser, ContentHandler, \
-                    SAXException, SAXReaderNotAvailable, SAXParseException
+                    SAXException, SAXReaderNotAvailable, SAXParseException, handler


How about?:

Suggested change

SAXException, SAXReaderNotAvailable, SAXParseException, handler

SAXException, SAXReaderNotAvailable, SAXParseException

from xml.sax.handler import feature_namespaces, feature_namespace_prefixes

hartwork · 2025-11-26T19:23:44Z

Lib/test/test_sax.py

+            def startElementNS(self, name, qname, attrs):
+                self.qname = qname
+
+        for xml_s, expected_qname in zip(["<Q:E xmlns:Q='http://example.org/testuri'/>", "<E xmlns='http://example.org/testuri'/>", "<E />"], ["Q:E", "E", "E"]):


I would consider this variant much more readable:

Suggested change

for xml_s, expected_qname in zip(["<Q:E xmlns:Q='http://example.org/testuri'/>", "<E xmlns='http://example.org/testuri'/>", "<E />"], ["Q:E", "E", "E"]):

for xml_s, expected_qname in (

("<Q:E xmlns:Q='http://example.org/testuri'/>", "Q:E"),

("<E xmlns='http://example.org/testuri'/>", "E"),

("<E />", "E"),

):

hartwork · 2025-11-26T20:09:16Z

Lib/xml/sax/expatreader.py

    """SAX driver for the pyexpat C module."""

-    def __init__(self, namespaceHandling=0, bufsize=2**16-20):
+    def __init__(self, namespaceHandling=0, namespacePrefixesHandling=0, bufsize=2**16-20):


Just thinking aloud: if setFeature can toggle this after initialization and PyXML 0.8.4 did not have this, maybe we do not need this direct channel and can just initialize with = 0 below. So my vote for simplification.

hartwork · 2025-11-26T20:11:44Z

Lib/xml/sax/expatreader.py

+        elif name == feature_namespace_prefixes:
+            self._namespace_prefixes = state


My vote for moving this one further down — right after elif name == feature_external_pes: — to match the left side of the diff.

hartwork · 2025-11-26T20:29:39Z

Lib/xml/sax/expatreader.py

+            elem_qname = name
            pair = (None, name)
        elif len(pair) == 3:
+            elem_qname = "%s:%s" % (pair[2], pair[1])


Note to self: This is the key value provider in this pull request. pair[2] was previously not exposed.

hartwork · 2025-11-26T20:37:00Z

Lib/xml/sax/expatreader.py

        pair = name.split()
        if len(pair) == 1:
            # no namespace
+            elem_qname = name


@picnixz these lines are 1:1 from PyXML 0.8.4:

hartwork · 2025-11-26T20:50:49Z

Misc/NEWS.d/next/Library/2024-04-26-14-21-04.gh-issue-54873.vf2bfp.rst

@@ -0,0 +1,2 @@
+Backported namespaces prefixes support for xml.sax.expatreader from PyXml


Suggested change

Backported namespaces prefixes support for xml.sax.expatreader from PyXml

Backported namespaces prefixes support for xml.sax.expatreader from PyXML

hartwork · 2025-11-26T20:55:59Z

Lib/xml/sax/expatreader.py

            qnames[apair] = qname

-        self._cont_handler.startElementNS(pair, None,
+        self._cont_handler.startElementNS(pair, elem_qname,


This will never pass None even with support for namespace prefixes disabled.

The following patch is arguably a bit silly, but it keeps things backwards compatible (and demos the idea):

Suggested change

self._cont_handler.startElementNS(pair, elem_qname,

if not self._namespace_prefixes:

elem_qname = None

self._cont_handler.startElementNS(pair, elem_qname,

hartwork · 2025-11-26T21:18:08Z

Lib/test/test_sax.py

+        for xml_s, expected_qname in zip(["<Q:E xmlns:Q='http://example.org/testuri'/>", "<E xmlns='http://example.org/testuri'/>", "<E />"], ["Q:E", "E", "E"]):
+            parser = create_parser()
+            parser.setFeature(handler.feature_namespaces, 1)
+            parser.setFeature(handler.feature_namespace_prefixes, 1)


I believe there should also be a test for the behavior with this disabled where all QNames returned are None as previously.

bpo-54873: Backported namespaces prefixes support for xml.sax.expatre…

6ecfc28

…ader from PyXml (0.8.4)

bedevere-app bot mentioned this pull request Apr 26, 2024

xml.sax.expatreader should support namespace prefixes #54873

Open

bedevere-app bot added the awaiting review label Apr 26, 2024

ukarroum added 4 commits April 26, 2024 14:21

bpo-54873: Added News entry

e9ac454

Merge branch 'main' into fix-issue-54873

9c365c2

Merge branch 'main' into fix-issue-54873

2fc666a

Merge branch 'main' into fix-issue-54873

fe9d82f

picnixz previously requested changes Nov 2, 2025

View reviewed changes

bedevere-app bot added awaiting changes and removed awaiting review labels Nov 2, 2025

Moved test_namespace_prefix to test_sax.py

be878cb

ukarroum requested a review from AA-Turner as a code owner November 2, 2025 19:52

picnixz reviewed Nov 2, 2025

View reviewed changes

picnixz self-requested a review November 2, 2025 20:18

bedevere-app bot added awaiting review and removed awaiting changes labels Nov 2, 2025

ukarroum added 2 commits November 2, 2025 21:14

test_sax: Replaced namespace tests with one test on qualified names

0e3f870

ExpatParser: Add namespacePrefixesHandling

c88e7ed

picnixz changed the title ~~gh-54873: Backported namespaces prefixes support for xml.sax.expatreader from PyXml (0.8.4)~~ gh-54873: Add support for namespaces prefixes to xml.sax.expatreader Nov 2, 2025

namespace prefixes: add link to w3.org

27cb273

Merge branch 'main' into fix-issue-54873

91112b9

ukarroum requested a review from hartwork November 24, 2025 04:02

hartwork suggested changes Nov 26, 2025

View reviewed changes

bedevere-app bot added awaiting core review and removed awaiting review labels Nov 26, 2025

		@@ -0,0 +1,2 @@
		Backported namespaces prefixes support for xml.sax.expatreader from PyXml

	SAXException, SAXReaderNotAvailable, SAXParseException, handler
	SAXException, SAXReaderNotAvailable, SAXParseException
	from xml.sax.handler import feature_namespaces, feature_namespace_prefixes

		parser.feed("<E xmlns='http://example.org/testuri'/>")
		parser.close()


		.. _billion laughs: https://en.wikipedia.org/wiki/Billion_laughs_attack

		* Add support for namespace prefixes.

		* Add support for `namespace prefixes <https://www.w3.org/TR/xml-names/#dt-prefix>`_.
		(Contributed by Yassir Karroum in :gh:`118317`.)

-        for xml_s, expected_qname in zip(["<Q:E xmlns:Q='http://example.org/testuri'/>", "<E xmlns='http://example.org/testuri'/>", "<E />"], ["Q:E", "E", "E"]):
+        for xml_s, expected_qname in (
+            ("<Q:E xmlns:Q='http://example.org/testuri'/>", "Q:E"),
+            ("<E xmlns='http://example.org/testuri'/>", "E"),
+            ("<E />", "E"),
+        ):

		elif name == feature_namespace_prefixes:
		self._namespace_prefixes = state

Uh oh!

gh-54873: Add support for namespaces prefixes to xml.sax.expatreader #118317

Are you sure you want to change the base?

gh-54873: Add support for namespaces prefixes to xml.sax.expatreader #118317

Uh oh!

Conversation

ukarroum commented Apr 26, 2024 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

picnixz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bedevere-app bot commented Nov 2, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hartwork Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ukarroum Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gh-54873: Add support for namespaces prefixes to `xml.sax.expatreader` #118317

gh-54873: Add support for namespaces prefixes to `xml.sax.expatreader` #118317

ukarroum commented Apr 26, 2024 •

edited by bedevere-app bot

Loading

hartwork Nov 2, 2025 •

edited

Loading

ukarroum Nov 2, 2025 •

edited

Loading