gh-54873: Add support for namespaces prefixes to xml.sax.expatreader#118317
gh-54873: Add support for namespaces prefixes to xml.sax.expatreader#118317ukarroum wants to merge 26 commits intopython:mainfrom
xml.sax.expatreader#118317Conversation
…ader from PyXml (0.8.4)
picnixz
left a comment
There was a problem hiding this comment.
Thanks, but as this is a new feature:
- Better tests must be written. Why are we using a new file?
- A What's New entry must be added in addition to the NEWS entry.
- Documentation must be added.
- PyXML is no longer maintained. What about other XML parsers?
| self.parser.setContentHandler(h) | ||
| self.parser.feed("<Q:E xmlns:Q='http://example.org/testuri'/>") | ||
| self.parser.close() | ||
| print("self.assertEqual") |
There was a problem hiding this comment.
A shamelessly forgot debug print.
I removed it
| self.parser.feed("<Q:E xmlns:Q='http://example.org/testuri'/>") | ||
| self.parser.close() | ||
| print("self.assertEqual") | ||
| self.assertFalse(h.qname is None) |
There was a problem hiding this comment.
This is not sufficient. We should match against the exact qname.
| @@ -0,0 +1,24 @@ | |||
| import unittest | |||
There was a problem hiding this comment.
Why are we having a new test file? there should already be a testfile for expat reader.
There was a problem hiding this comment.
I assumed all xml unit tests were in the test_xml_* files. Looks like there is a test_sax.py.
Moved test there.
| @@ -0,0 +1,2 @@ | |||
| Backported namespaces prefixes support for xml.sax.expatreader from PyXml | |||
There was a problem hiding this comment.
A What's New entry should be written; this is a new feature. Also, I don't think we should indicate PyXML either. Please look at other changelog entries involving pyexpat to have an idea of how to write them.
There was a problem hiding this comment.
Done.
reason why I added the "from PyXml" is that I didn't write the code from scratch.
I backported / copied it from the existing PyXml implementation.
Not sure how to correctly credit the original library.
| elif name == feature_namespace_prefixes: | ||
| if state: | ||
| raise SAXNotSupportedException( | ||
| "expat does not report namespace prefixes") |
There was a problem hiding this comment.
Was it a limitation from the C extension module? was it a Expat version limitation?
There was a problem hiding this comment.
I did a quick git blame, and looks like this specific check was added in this commit: 18476a3 (from 2002).
and in this commit Expat version used is 1.95 (which should support namespace prefixes), so I would say the limitation was probably just in the python wrapper.
| pair = name.split() | ||
| if len(pair) == 1: | ||
| # no namespace | ||
| elem_qname = name |
There was a problem hiding this comment.
I have no idea whether this is correct or not here. Is there some specs that we could follow for the implementation?
There was a problem hiding this comment.
if you're referring specificaly to elem_qname, there is the spec: https://www.w3.org/TR/xml-names/#ns-qualnames.
There was a problem hiding this comment.
I also added another test to test the "else" branch.
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase And if you don't make the requested changes, you will be poked with soft cushions! |
| parser.feed("<E xmlns='http://example.org/testuri'/>") | ||
| parser.close() | ||
| self.assertEqual(h.qname, "E") | ||
|
|
There was a problem hiding this comment.
Don't leave 3 blank lines, only 2 is sufficient.
| parser.feed("<E xmlns='http://example.org/testuri'/>") | ||
| parser.close() |
There was a problem hiding this comment.
Done, but I didn't add the True, it seems that the parse method only accept one argument.
There was a problem hiding this comment.
Oh. Wait, @hartwork should we do feed() + close() or parse()?
There was a problem hiding this comment.
@picnixz hi!
My understanding is that:
create_parserisxml.sax.expatreader.create_parserand it creates anxml.sax.xmlreader.XMLReader, first sentence in the docsxml.sax.xmlreader.XMLReaderprovides.parsethat parses in one single go, one single parameter, no booleanisFinalxml.sax.xmlreader.XMLReaderdoes not provide.feedor.closebecause that needsIncrementalParser, not a plainXMLReader.- (
xml.sax.expatreader.create_parsercreatesxml.sax.expatreader.ExpatParserthat is an instance of bothIncrementalParserandXMLReaderfor me in practice.) - I therefore consider
parser.parse("<E xmlns='http://example.org/testuri'/>")to be better suited here because availability of.feedand.closeis not guaranteed on interface level. - (If you want to stick to
.feedhere or elsewhere maybe addself.assertIsInstance(parser, IncrementalParser)prior to a call to communicate that expectation and have the test fail meaningfully for regressions in the future.)
What do you think?
| self._entity_stack = [] | ||
| self._external_ges = 0 | ||
| self._interning = None | ||
| self._namespace_prefixes = 1 |
There was a problem hiding this comment.
Changed to use a new namespacePrefixesHandling argument.
| pair = name.split() | ||
| if len(pair) == 1: | ||
| # no namespace | ||
| elem_qname = name |
|
|
||
| .. _billion laughs: https://en.wikipedia.org/wiki/Billion_laughs_attack | ||
|
|
||
| * Add support for namespace prefixes. |
There was a problem hiding this comment.
This should be also documented in the expat docs.
There was a problem hiding this comment.
do you mean I should add a link to the C libexpat doc ?
EDIT: I think you probably meant to document this here: https://docs.python.org/3/library/pyexpat.html
|
Sorry, I meant: keep the NEWS entry and add the What's New entry (IOW, both of them must be added). |
|
cc @hartwork as the maintainer of Expat |
xml.sax.expatreader
hartwork
left a comment
There was a problem hiding this comment.
I believe I understand the ideas behind this pull request now. I have compared with what PyXML 0.8.4 did and will also attach my local play code here (that is derived from the test case in this pull request):
- Download: sax_debug.py
I am optimistic that this pull request will be in mergable shape soon. Here is what I found:
| * Add support for `namespace prefixes <https://www.w3.org/TR/xml-names/#dt-prefix>`_. | ||
| (Contributed by Yassir Karroum in :gh:`118317`.) | ||
|
|
There was a problem hiding this comment.
This is adding to section xml.parsers.expat while the changes in here are patching code elsewhere. (I'm unsure when/if news files should be written to directly. I have only seen things go through news files myself.)
There was a problem hiding this comment.
I agree, I will move it to a new section.
There was a problem hiding this comment.
I believe it's adding it in two places now, it looks like this upper entry ⬇️ still needs to be dropped?
There was a problem hiding this comment.
sorry should have reviewed the diffs one more time before tagging you ^^"
fixed
| """SAX driver for the pyexpat C module.""" | ||
|
|
||
| def __init__(self, namespaceHandling=0, bufsize=2**16-20): | ||
| def __init__(self, namespaceHandling=0, namespacePrefixesHandling=0, bufsize=2**16-20): |
There was a problem hiding this comment.
Just thinking aloud: if setFeature can toggle this after initialization and PyXML 0.8.4 did not have this, maybe we do not need this direct channel and can just initialize with = 0 below. So my vote for simplification.
| elif name == feature_namespace_prefixes: | ||
| self._namespace_prefixes = state |
There was a problem hiding this comment.
My vote for moving this one further down — right after elif name == feature_external_pes: — to match the left side of the diff.
| elem_qname = name | ||
| pair = (None, name) | ||
| elif len(pair) == 3: | ||
| elem_qname = "%s:%s" % (pair[2], pair[1]) |
There was a problem hiding this comment.
Note to self: This is the key value provider in this pull request. pair[2] was previously not exposed.
| pair = name.split() | ||
| if len(pair) == 1: | ||
| # no namespace | ||
| elem_qname = name |
| @@ -0,0 +1,2 @@ | |||
| Backported namespaces prefixes support for xml.sax.expatreader from PyXml | |||
There was a problem hiding this comment.
| Backported namespaces prefixes support for xml.sax.expatreader from PyXml | |
| Backported namespaces prefixes support for xml.sax.expatreader from PyXML |
| for xml_s, expected_qname in zip(["<Q:E xmlns:Q='http://example.org/testuri'/>", "<E xmlns='http://example.org/testuri'/>", "<E />"], ["Q:E", "E", "E"]): | ||
| parser = create_parser() | ||
| parser.setFeature(handler.feature_namespaces, 1) | ||
| parser.setFeature(handler.feature_namespace_prefixes, 1) |
There was a problem hiding this comment.
I believe there should also be a test for the behavior with this disabled where all QNames returned are None as previously.
|
This PR is stale because it has been open for 30 days with no activity. |
Co-authored-by: Sebastian Pipping <sebastian@pipping.org>
Co-authored-by: Sebastian Pipping <sebastian@pipping.org>
Co-authored-by: Sebastian Pipping <sebastian@pipping.org>
Co-authored-by: Sebastian Pipping <sebastian@pipping.org>
|
|
||
| from xml.sax import make_parser, ContentHandler, \ | ||
| SAXException, SAXReaderNotAvailable, SAXParseException | ||
| from xml.sax.handler import feature_namespaces, feature_namespace_prefixes |
There was a problem hiding this comment.
Can this line be dropped? It looks like both of these symbols are imported twice now?
There was a problem hiding this comment.
Indeed, removed it.
| * Add support for `namespace prefixes <https://www.w3.org/TR/xml-names/#dt-prefix>`_. | ||
| (Contributed by Yassir Karroum in :gh:`118317`.) | ||
|
|
|
@hartwork thanks a lot for the review ! |
Co-authored-by: Sebastian Pipping <sebastian@pipping.org>
| if state: | ||
| raise SAXNotSupportedException( | ||
| "expat does not report namespace prefixes") | ||
| self._namespace_prefixes = state |
There was a problem hiding this comment.
@ukarroum I wonder what the expected behaviors is here when feature_namespaces is currently disabled. Or more generally speaking: Do we know anything about the relation between these two features? Potential options seem to be:
- Block enabling/disabling when their dependency would be violated.
- Auto-enable/disable to respect dependency at all times
- Keep the two fully separate but check for both rather than just
self._namespace_prefixeswhere checking for being active?
Just thinking aloud.
There was a problem hiding this comment.
good point, I think it will make sense to just enable feature_namespaces when feature_namespace_prefixes is also enabled (and also disable feature_namespace_prefixes if feature_namespaces is disabled).
I will update both set/get feature
There was a problem hiding this comment.
actually one drawback I see for this is :
enable feature_namespace_prefixes
disable feature_namespaces
enable feature_namespaces
this will result in feature_namespace_prefixes being disabled which feels odd.
There was a problem hiding this comment.
I went with 1) instead.
I pushed a small change to check if namespace is enabled when we enable namespace prefixes.
There was a problem hiding this comment.
@ukarroum I like it. I think one small dedicated test for that would be great where namespaces are disabled and the exception is raise with the expected message. (Arguably, the success case is already covered by existing tests.)
|
I have made the requested changes; please review again |
|
Thanks for making the requested changes! : please review the changes made to this pull request. |
Co-authored-by: Sebastian Pipping <sebastian@pipping.org>
|
@ukarroum thanks for the adjustments! @picnixz @AA-Turner I believe this is ready for official review now, could you have a look? Thank you! 🙏 |

Uh oh!
There was an error while loading. Please reload this page.