Skip to content

gh-54873: Add support for namespaces prefixes to xml.sax.expatreader#118317

Open
ukarroum wants to merge 26 commits intopython:mainfrom
ukarroum:fix-issue-54873
Open

gh-54873: Add support for namespaces prefixes to xml.sax.expatreader#118317
ukarroum wants to merge 26 commits intopython:mainfrom
ukarroum:fix-issue-54873

Conversation

@ukarroum
Copy link
Copy Markdown
Contributor

@ukarroum ukarroum commented Apr 26, 2024

picnixz
picnixz previously requested changes Nov 2, 2025
Copy link
Copy Markdown
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but as this is a new feature:

  • Better tests must be written. Why are we using a new file?
  • A What's New entry must be added in addition to the NEWS entry.
  • Documentation must be added.
  • PyXML is no longer maintained. What about other XML parsers?

Comment thread Lib/test/test_xml_expatreader.py Outdated
self.parser.setContentHandler(h)
self.parser.feed("<Q:E xmlns:Q='http://example.org/testuri'/>")
self.parser.close()
print("self.assertEqual")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are those?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A shamelessly forgot debug print.
I removed it

Comment thread Lib/test/test_xml_expatreader.py Outdated
self.parser.feed("<Q:E xmlns:Q='http://example.org/testuri'/>")
self.parser.close()
print("self.assertEqual")
self.assertFalse(h.qname is None)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not sufficient. We should match against the exact qname.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment thread Lib/test/test_xml_expatreader.py Outdated
@@ -0,0 +1,24 @@
import unittest
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we having a new test file? there should already be a testfile for expat reader.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed all xml unit tests were in the test_xml_* files. Looks like there is a test_sax.py.
Moved test there.

@@ -0,0 +1,2 @@
Backported namespaces prefixes support for xml.sax.expatreader from PyXml
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A What's New entry should be written; this is a new feature. Also, I don't think we should indicate PyXML either. Please look at other changelog entries involving pyexpat to have an idea of how to write them.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
reason why I added the "from PyXml" is that I didn't write the code from scratch.
I backported / copied it from the existing PyXml implementation.

Not sure how to correctly credit the original library.

elif name == feature_namespace_prefixes:
if state:
raise SAXNotSupportedException(
"expat does not report namespace prefixes")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it a limitation from the C extension module? was it a Expat version limitation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick git blame, and looks like this specific check was added in this commit: 18476a3 (from 2002).
and in this commit Expat version used is 1.95 (which should support namespace prefixes), so I would say the limitation was probably just in the python wrapper.

pair = name.split()
if len(pair) == 1:
# no namespace
elem_qname = name
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea whether this is correct or not here. Is there some specs that we could follow for the implementation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're referring specificaly to elem_qname, there is the spec: https://www.w3.org/TR/xml-names/#ns-qualnames.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added another test to test the "else" branch.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@picnixz these lines are 1:1 from PyXML 0.8.4:

meld

@bedevere-app
Copy link
Copy Markdown

bedevere-app bot commented Nov 2, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes, you will be poked with soft cushions!

@ukarroum ukarroum requested a review from AA-Turner as a code owner November 2, 2025 19:52
Comment thread Lib/test/test_sax.py Outdated
Comment thread Lib/test/test_sax.py
parser.feed("<E xmlns='http://example.org/testuri'/>")
parser.close()
self.assertEqual(h.qname, "E")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't leave 3 blank lines, only 2 is sufficient.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment thread Lib/test/test_sax.py Outdated
Comment on lines +1337 to +1338
parser.feed("<E xmlns='http://example.org/testuri'/>")
parser.close()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use parser.parse(..., True)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but I didn't add the True, it seems that the parse method only accept one argument.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Wait, @hartwork should we do feed() + close() or parse()?

Copy link
Copy Markdown
Contributor

@hartwork hartwork Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@picnixz hi!

My understanding is that:

  • create_parser is xml.sax.expatreader.create_parser and it creates an xml.sax.xmlreader.XMLReader, first sentence in the docs
  • xml.sax.xmlreader.XMLReader provides .parse that parses in one single go, one single parameter, no boolean isFinal
  • xml.sax.xmlreader.XMLReader does not provide .feed or .close because that needs IncrementalParser, not a plain XMLReader.
  • (xml.sax.expatreader.create_parser creates xml.sax.expatreader.ExpatParser that is an instance of both IncrementalParser and XMLReader for me in practice.)
  • I therefore consider parser.parse("<E xmlns='http://example.org/testuri'/>") to be better suited here because availability of .feed and .close is not guaranteed on interface level.
  • (If you want to stick to .feed here or elsewhere maybe add self.assertIsInstance(parser, IncrementalParser) prior to a call to communicate that expectation and have the test fail meaningfully for regressions in the future.)

What do you think?

Comment thread Lib/xml/sax/expatreader.py Outdated
self._entity_stack = []
self._external_ges = 0
self._interning = None
self._namespace_prefixes = 1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it 1 by default?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use a new namespacePrefixesHandling argument.

pair = name.split()
if len(pair) == 1:
# no namespace
elem_qname = name
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment thread Doc/whatsnew/3.15.rst Outdated

.. _billion laughs: https://en.wikipedia.org/wiki/Billion_laughs_attack

* Add support for namespace prefixes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be also documented in the expat docs.

Copy link
Copy Markdown
Contributor Author

@ukarroum ukarroum Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean I should add a link to the C libexpat doc ?

EDIT: I think you probably meant to document this here: https://docs.python.org/3/library/pyexpat.html

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, to the specs

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@picnixz
Copy link
Copy Markdown
Member

picnixz commented Nov 2, 2025

Sorry, I meant: keep the NEWS entry and add the What's New entry (IOW, both of them must be added).

@picnixz picnixz self-requested a review November 2, 2025 20:18
@picnixz picnixz dismissed their stale review November 2, 2025 20:18

Changes were made.

@picnixz
Copy link
Copy Markdown
Member

picnixz commented Nov 2, 2025

cc @hartwork as the maintainer of Expat

@picnixz picnixz changed the title gh-54873: Backported namespaces prefixes support for xml.sax.expatreader from PyXml (0.8.4) gh-54873: Add support for namespaces prefixes to xml.sax.expatreader Nov 2, 2025
@hartwork
Copy link
Copy Markdown
Contributor

hartwork commented Nov 2, 2025

cc @hartwork as the maintainer of Expat

@picnixz thanks!

At least for the first half of the starting week my attention will be elsewhere. Also I will feel more free to participate here after #139460 is merged and fully off my plate.

@ukarroum ukarroum requested a review from hartwork November 24, 2025 04:02
Copy link
Copy Markdown
Contributor

@hartwork hartwork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ukarroum and @picnixz,

I believe I understand the ideas behind this pull request now. I have compared with what PyXML 0.8.4 did and will also attach my local play code here (that is derived from the test case in this pull request):

I am optimistic that this pull request will be in mergable shape soon. Here is what I found:

Comment thread Doc/whatsnew/3.15.rst
Comment on lines +827 to +829
* Add support for `namespace prefixes <https://www.w3.org/TR/xml-names/#dt-prefix>`_.
(Contributed by Yassir Karroum in :gh:`118317`.)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is adding to section xml.parsers.expat while the changes in here are patching code elsewhere. (I'm unsure when/if news files should be written to directly. I have only seen things go through news files myself.)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ukarroum did you see this one? ⬆️

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I will move it to a new section.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's adding it in two places now, it looks like this upper entry ⬇️ still needs to be dropped?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry should have reviewed the diffs one more time before tagging you ^^"
fixed

Comment thread Lib/test/test_sax.py Outdated
Comment thread Lib/test/test_sax.py Outdated
Comment thread Lib/xml/sax/expatreader.py Outdated
"""SAX driver for the pyexpat C module."""

def __init__(self, namespaceHandling=0, bufsize=2**16-20):
def __init__(self, namespaceHandling=0, namespacePrefixesHandling=0, bufsize=2**16-20):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking aloud: if setFeature can toggle this after initialization and PyXML 0.8.4 did not have this, maybe we do not need this direct channel and can just initialize with = 0 below. So my vote for simplification.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread Lib/xml/sax/expatreader.py Outdated
Comment on lines +152 to +153
elif name == feature_namespace_prefixes:
self._namespace_prefixes = state
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote for moving this one further down — right after elif name == feature_external_pes: — to match the left side of the diff.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

elem_qname = name
pair = (None, name)
elif len(pair) == 3:
elem_qname = "%s:%s" % (pair[2], pair[1])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: This is the key value provider in this pull request. pair[2] was previously not exposed.

pair = name.split()
if len(pair) == 1:
# no namespace
elem_qname = name
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@picnixz these lines are 1:1 from PyXML 0.8.4:

meld

@@ -0,0 +1,2 @@
Backported namespaces prefixes support for xml.sax.expatreader from PyXml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Backported namespaces prefixes support for xml.sax.expatreader from PyXml
Backported namespaces prefixes support for xml.sax.expatreader from PyXML

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread Lib/xml/sax/expatreader.py
Comment thread Lib/test/test_sax.py Outdated
for xml_s, expected_qname in zip(["<Q:E xmlns:Q='http://example.org/testuri'/>", "<E xmlns='http://example.org/testuri'/>", "<E />"], ["Q:E", "E", "E"]):
parser = create_parser()
parser.setFeature(handler.feature_namespaces, 1)
parser.setFeature(handler.feature_namespace_prefixes, 1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there should also be a test for the behavior with this disabled where all QNames returned are None as previously.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@github-actions
Copy link
Copy Markdown

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Apr 16, 2026
ukarroum and others added 3 commits April 17, 2026 07:41
Co-authored-by: Sebastian Pipping <sebastian@pipping.org>
Co-authored-by: Sebastian Pipping <sebastian@pipping.org>
Comment thread Misc/NEWS.d/next/Library/2024-04-26-14-21-04.gh-issue-54873.vf2bfp.rst Outdated
ukarroum and others added 4 commits April 17, 2026 16:15
Co-authored-by: Sebastian Pipping <sebastian@pipping.org>
Co-authored-by: Sebastian Pipping <sebastian@pipping.org>
Comment thread Lib/test/test_sax.py Outdated

from xml.sax import make_parser, ContentHandler, \
SAXException, SAXReaderNotAvailable, SAXParseException
from xml.sax.handler import feature_namespaces, feature_namespace_prefixes
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this line be dropped? It looks like both of these symbols are imported twice now?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, removed it.

Comment thread Doc/whatsnew/3.15.rst
Comment on lines +827 to +829
* Add support for `namespace prefixes <https://www.w3.org/TR/xml-names/#dt-prefix>`_.
(Contributed by Yassir Karroum in :gh:`118317`.)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ukarroum did you see this one? ⬆️

Comment thread Doc/whatsnew/3.15.rst Outdated
@ukarroum
Copy link
Copy Markdown
Contributor Author

@hartwork thanks a lot for the review !
I addressed all comments please let me know if you see any other issue

ukarroum and others added 3 commits April 17, 2026 17:38
if state:
raise SAXNotSupportedException(
"expat does not report namespace prefixes")
self._namespace_prefixes = state
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ukarroum I wonder what the expected behaviors is here when feature_namespaces is currently disabled. Or more generally speaking: Do we know anything about the relation between these two features? Potential options seem to be:

  • Block enabling/disabling when their dependency would be violated.
  • Auto-enable/disable to respect dependency at all times
  • Keep the two fully separate but check for both rather than just self._namespace_prefixes where checking for being active?

Just thinking aloud.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I think it will make sense to just enable feature_namespaces when feature_namespace_prefixes is also enabled (and also disable feature_namespace_prefixes if feature_namespaces is disabled).
I will update both set/get feature

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually one drawback I see for this is :

enable feature_namespace_prefixes
disable feature_namespaces
enable feature_namespaces

this will result in feature_namespace_prefixes being disabled which feels odd.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with 1) instead.
I pushed a small change to check if namespace is enabled when we enable namespace prefixes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ukarroum I like it. I think one small dedicated test for that would be great where namespaces are disabled and the exception is raise with the expected message. (Arguably, the success case is already covered by existing tests.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added test

@hartwork
Copy link
Copy Markdown
Contributor

@hartwork thanks a lot for the review ! I addressed all comments please let me know if you see any other issue

@ukarroum thanks for the adjustments! I don't have any approval or merge permissions here, but from my point of view it's close to mergable by now.

@ukarroum
Copy link
Copy Markdown
Contributor Author

I have made the requested changes; please review again

@bedevere-app
Copy link
Copy Markdown

bedevere-app bot commented Apr 17, 2026

Thanks for making the requested changes!

: please review the changes made to this pull request.

Comment thread Lib/test/test_sax.py Outdated
ukarroum and others added 2 commits April 17, 2026 19:22
Co-authored-by: Sebastian Pipping <sebastian@pipping.org>
@hartwork
Copy link
Copy Markdown
Contributor

@ukarroum thanks for the adjustments!

@picnixz @AA-Turner I believe this is ready for official review now, could you have a look? Thank you! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting change review stale Stale PR or inactive for long period of time.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants