-
-
Notifications
You must be signed in to change notification settings - Fork 33.8k
Open
Labels
3.13bugs and security fixesbugs and security fixes3.14bugs and security fixesbugs and security fixes3.15new features, bugs and security fixesnew features, bugs and security fixesstdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
Description
Bug report
Bug description:
When parsing the input <![CDATA[]]>, the unknown_decl hook is incorrectly called with the corrupted, partial string 'CDATA['.
A correct parser has only two possible-and-correct behaviors:
- If CDATA is supported: Call
handle_cdata(''). - If the declaration is "unrecognized," the
unknown_declhook must receive the entire content inside<!...>, which would be'[CDATA[]]'.
The actual result ('CDATA[') matches neither of those. I use the private _set_support_cdata(True) method here since I think it was the only available trigger to activate this specific code path to expose the bug.
from html.parser import HTMLParser
class CdataBugParser(HTMLParser):
def __init__(self):
super().__init__()
self.unknown_decls = []
def unknown_decl(self, data):
self.unknown_decls.append(data)
html_input = "<![CDATA[]]>"
parser = CdataBugParser()
parser._set_support_cdata(True)
parser.feed(html_input)
print(parser.unknown_decls)['CDATA[']CPython versions tested on:
3.12
Operating systems tested on:
Linux
Metadata
Metadata
Assignees
Labels
3.13bugs and security fixesbugs and security fixes3.14bugs and security fixesbugs and security fixes3.15new features, bugs and security fixesnew features, bugs and security fixesstdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error