Skip to content

Conversation

@ukarroum
Copy link
Contributor

@ukarroum ukarroum commented Apr 26, 2024

picnixz
picnixz previously requested changes Nov 2, 2025
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but as this is a new feature:

  • Better tests must be written. Why are we using a new file?
  • A What's New entry must be added in addition to the NEWS entry.
  • Documentation must be added.
  • PyXML is no longer maintained. What about other XML parsers?

self.parser.setContentHandler(h)
self.parser.feed("<Q:E xmlns:Q='http://example.org/testuri'/>")
self.parser.close()
print("self.assertEqual")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are those?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A shamelessly forgot debug print.
I removed it

self.parser.feed("<Q:E xmlns:Q='http://example.org/testuri'/>")
self.parser.close()
print("self.assertEqual")
self.assertFalse(h.qname is None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not sufficient. We should match against the exact qname.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -0,0 +1,24 @@
import unittest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we having a new test file? there should already be a testfile for expat reader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed all xml unit tests were in the test_xml_* files. Looks like there is a test_sax.py.
Moved test there.

@@ -0,0 +1,2 @@
Backported namespaces prefixes support for xml.sax.expatreader from PyXml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A What's New entry should be written; this is a new feature. Also, I don't think we should indicate PyXML either. Please look at other changelog entries involving pyexpat to have an idea of how to write them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
reason why I added the "from PyXml" is that I didn't write the code from scratch.
I backported / copied it from the existing PyXml implementation.

Not sure how to correctly credit the original library.

elif name == feature_namespace_prefixes:
if state:
raise SAXNotSupportedException(
"expat does not report namespace prefixes")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it a limitation from the C extension module? was it a Expat version limitation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick git blame, and looks like this specific check was added in this commit: 18476a3 (from 2002).
and in this commit Expat version used is 1.95 (which should support namespace prefixes), so I would say the limitation was probably just in the python wrapper.

pair = name.split()
if len(pair) == 1:
# no namespace
elem_qname = name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea whether this is correct or not here. Is there some specs that we could follow for the implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're referring specificaly to elem_qname, there is the spec: https://www.w3.org/TR/xml-names/#ns-qualnames.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added another test to test the "else" branch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@picnixz these lines are 1:1 from PyXML 0.8.4:

meld

@bedevere-app
Copy link

bedevere-app bot commented Nov 2, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes, you will be poked with soft cushions!

@ukarroum ukarroum requested a review from AA-Turner as a code owner November 2, 2025 19:52

from xml.sax import make_parser, ContentHandler, \
SAXException, SAXReaderNotAvailable, SAXParseException
SAXException, SAXReaderNotAvailable, SAXParseException, handler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why import handler when ContentHandler is already imported? why not use the latter directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced the handler.ContentHandler, with ContentHandler.
but I still need to keep the handler import for the namespace/namespace features.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about?:

Suggested change
SAXException, SAXReaderNotAvailable, SAXParseException, handler
SAXException, SAXReaderNotAvailable, SAXParseException
from xml.sax.handler import feature_namespaces, feature_namespace_prefixes

parser.feed("<E xmlns='http://example.org/testuri'/>")
parser.close()
self.assertEqual(h.qname, "E")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't leave 3 blank lines, only 2 is sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines 1337 to 1338
parser.feed("<E xmlns='http://example.org/testuri'/>")
parser.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use parser.parse(..., True)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but I didn't add the True, it seems that the parse method only accept one argument.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Wait, @hartwork should we do feed() + close() or parse()?

Copy link
Contributor

@hartwork hartwork Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@picnixz hi!

My understanding is that:

  • create_parser is xml.sax.expatreader.create_parser and it creates an xml.sax.xmlreader.XMLReader, first sentence in the docs
  • xml.sax.xmlreader.XMLReader provides .parse that parses in one single go, one single parameter, no boolean isFinal
  • xml.sax.xmlreader.XMLReader does not provide .feed or .close because that needs IncrementalParser, not a plain XMLReader.
  • (xml.sax.expatreader.create_parser creates xml.sax.expatreader.ExpatParser that is an instance of both IncrementalParser and XMLReader for me in practice.)
  • I therefore consider parser.parse("<E xmlns='http://example.org/testuri'/>") to be better suited here because availability of .feed and .close is not guaranteed on interface level.
  • (If you want to stick to .feed here or elsewhere maybe add self.assertIsInstance(parser, IncrementalParser) prior to a call to communicate that expectation and have the test fail meaningfully for regressions in the future.)

What do you think?

self._entity_stack = []
self._external_ges = 0
self._interning = None
self._namespace_prefixes = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it 1 by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use a new namespacePrefixesHandling argument.

pair = name.split()
if len(pair) == 1:
# no namespace
elem_qname = name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!


.. _billion laughs: https://en.wikipedia.org/wiki/Billion_laughs_attack

* Add support for namespace prefixes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be also documented in the expat docs.

Copy link
Contributor Author

@ukarroum ukarroum Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean I should add a link to the C libexpat doc ?

EDIT: I think you probably meant to document this here: https://docs.python.org/3/library/pyexpat.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, to the specs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@picnixz
Copy link
Member

picnixz commented Nov 2, 2025

Sorry, I meant: keep the NEWS entry and add the What's New entry (IOW, both of them must be added).

@picnixz picnixz self-requested a review November 2, 2025 20:18
@picnixz picnixz dismissed their stale review November 2, 2025 20:18

Changes were made.

@picnixz
Copy link
Member

picnixz commented Nov 2, 2025

cc @hartwork as the maintainer of Expat

@picnixz picnixz changed the title gh-54873: Backported namespaces prefixes support for xml.sax.expatreader from PyXml (0.8.4) gh-54873: Add support for namespaces prefixes to xml.sax.expatreader Nov 2, 2025
@hartwork
Copy link
Contributor

hartwork commented Nov 2, 2025

cc @hartwork as the maintainer of Expat

@picnixz thanks!

At least for the first half of the starting week my attention will be elsewhere. Also I will feel more free to participate here after #139460 is merged and fully off my plate.

@ukarroum ukarroum requested a review from hartwork November 24, 2025 04:02
Copy link
Contributor

@hartwork hartwork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ukarroum and @picnixz,

I believe I understand the ideas behind this pull request now. I have compared with what PyXML 0.8.4 did and will also attach my local play code here (that is derived from the test case in this pull request):

I am optimistic that this pull request will be in mergable shape soon. Here is what I found:

Comment on lines +827 to +829
* Add support for `namespace prefixes <https://www.w3.org/TR/xml-names/#dt-prefix>`_.
(Contributed by Yassir Karroum in :gh:`118317`.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is adding to section xml.parsers.expat while the changes in here are patching code elsewhere. (I'm unsure when/if news files should be written to directly. I have only seen things go through news files myself.)


from xml.sax import make_parser, ContentHandler, \
SAXException, SAXReaderNotAvailable, SAXParseException
SAXException, SAXReaderNotAvailable, SAXParseException, handler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about?:

Suggested change
SAXException, SAXReaderNotAvailable, SAXParseException, handler
SAXException, SAXReaderNotAvailable, SAXParseException
from xml.sax.handler import feature_namespaces, feature_namespace_prefixes

def startElementNS(self, name, qname, attrs):
self.qname = qname

for xml_s, expected_qname in zip(["<Q:E xmlns:Q='http://example.org/testuri'/>", "<E xmlns='http://example.org/testuri'/>", "<E />"], ["Q:E", "E", "E"]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider this variant much more readable:

Suggested change
for xml_s, expected_qname in zip(["<Q:E xmlns:Q='http://example.org/testuri'/>", "<E xmlns='http://example.org/testuri'/>", "<E />"], ["Q:E", "E", "E"]):
for xml_s, expected_qname in (
("<Q:E xmlns:Q='http://example.org/testuri'/>", "Q:E"),
("<E xmlns='http://example.org/testuri'/>", "E"),
("<E />", "E"),
):

"""SAX driver for the pyexpat C module."""

def __init__(self, namespaceHandling=0, bufsize=2**16-20):
def __init__(self, namespaceHandling=0, namespacePrefixesHandling=0, bufsize=2**16-20):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking aloud: if setFeature can toggle this after initialization and PyXML 0.8.4 did not have this, maybe we do not need this direct channel and can just initialize with = 0 below. So my vote for simplification.

Comment on lines +152 to +153
elif name == feature_namespace_prefixes:
self._namespace_prefixes = state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote for moving this one further down — right after elif name == feature_external_pes: — to match the left side of the diff.

elem_qname = name
pair = (None, name)
elif len(pair) == 3:
elem_qname = "%s:%s" % (pair[2], pair[1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: This is the key value provider in this pull request. pair[2] was previously not exposed.

pair = name.split()
if len(pair) == 1:
# no namespace
elem_qname = name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@picnixz these lines are 1:1 from PyXML 0.8.4:

meld

@@ -0,0 +1,2 @@
Backported namespaces prefixes support for xml.sax.expatreader from PyXml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Backported namespaces prefixes support for xml.sax.expatreader from PyXml
Backported namespaces prefixes support for xml.sax.expatreader from PyXML

qnames[apair] = qname

self._cont_handler.startElementNS(pair, None,
self._cont_handler.startElementNS(pair, elem_qname,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will never pass None even with support for namespace prefixes disabled.

The following patch is arguably a bit silly, but it keeps things backwards compatible (and demos the idea):

Suggested change
self._cont_handler.startElementNS(pair, elem_qname,
if not self._namespace_prefixes:
elem_qname = None
self._cont_handler.startElementNS(pair, elem_qname,

for xml_s, expected_qname in zip(["<Q:E xmlns:Q='http://example.org/testuri'/>", "<E xmlns='http://example.org/testuri'/>", "<E />"], ["Q:E", "E", "E"]):
parser = create_parser()
parser.setFeature(handler.feature_namespaces, 1)
parser.setFeature(handler.feature_namespace_prefixes, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there should also be a test for the behavior with this disabled where all QNames returned are None as previously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants