Conversation
It runs together in the built HTML.
It's not much use if it's private.
Run "tox -e doc" to build the documentation in doc/_build.
Right now the docs have entries for re-exports like html5lib.__init__.HTMLParser, including full class documentation. This is redundant with the docs for html5lib.html5parser.HTMLParser, which is a public name anyway, so I think that it is best to be explicit that this is a re-export.
HTMLTokenizer is now a private API (I cannot find a public export). HTMLSanitizer no longer exists as a tokenizer, and has been replaced with a filter.
|
The sanitizer still exists, but it got rewritten as a filter and is now in Just a drive-by comment on the off chance it helps. |
|
@willkg Yup, and the new form is documented. The old docs just weren't removed. |
Still lots more to do, as html5lib's sanitization likes to escape tags instead of dropping them. I ended up fixing up the html5lib docs while working on this: html5lib/html5lib-python#332
willkg
left a comment
There was a problem hiding this comment.
I'm really sorry this has been sitting around so long. I appreciate you working on it and fixing so many issues!
I have some comments. If you have time, can you look through them? If not, let me know and I can work through them.
Thank you! Looking forward to landing this!
html5lib/__init__.py
Outdated
|
|
||
| __all__ = ["HTMLParser", "parse", "parseFragment", "getTreeBuilder", | ||
| "getTreeWalker", "serialize"] | ||
| "getTreeWalker", "serialize", "__version__"] |
There was a problem hiding this comment.
I don't think we want __version__ to get imported when importing *.
There was a problem hiding this comment.
As __version__ is part of the public interface of this module, which __all__ defines, I think it should be here.
There was a problem hiding this comment.
I wouldn't think about __all__ that way. It's really for specifying what gets exported if the user does from html5lib import *. I don't want __version__ to get exported in that scenario. Please undo this change.
html5lib/__init__.py
Outdated
| "getTreeWalker", "serialize", "__version__"] | ||
|
|
||
| # this has to be at the top level, see how setup.py parses this | ||
| #: Distribution version number, which asymptotically approaches 1. |
There was a problem hiding this comment.
I'd remove this. Amongst other things, it's not going to be true for much longer.
There was a problem hiding this comment.
Thank goodness! Will do.
|
|
||
| * :class:`lint.Filter <html5lib.filters.lint.Filter>` raises | ||
| ``LintError`` exceptions on invalid tag and attribute names, invalid | ||
| :exc:`AssertionError` exceptions on invalid tag and attribute names, invalid |
There was a problem hiding this comment.
Is it really an AssertionError? If so, we should write up an issue to change that.
There was a problem hiding this comment.
Yeah, the implementation is basically all assert statements:
html5lib-python/html5lib/filters/lint.py
Lines 24 to 79 in 7bbde54
doc/movingparts.rst
Outdated
| ~~~~~~~ | ||
|
|
||
| You can alter the stream content with filters provided by html5lib: | ||
| html5lib provides several filters |
There was a problem hiding this comment.
Given that what follows is a bulleted list, can you add a : to the end of this line?
doc/movingparts.rst
Outdated
| html5lib provides a few tools for consuming token streams: | ||
|
|
||
| * :class:`~html5lib.serializer.HTMLSerializer`, to generate a stream of bytes; and | ||
| * filters, to manipulate the token stream. |
There was a problem hiding this comment.
The leader has "a few tools", but this list has two items. Further, the items seem like a sentence. I'd either change the bullet list into a sentence or unsentencify the bullet items. Maybe something like this?:
html5lib provides two tools for consuming token streams:
* :class:`~html5lib.serializer.HTMLSerializer` for generating a stream of bytes
* filters for manipulating the token stream
There was a problem hiding this comment.
I ended up going with a sentence, rather than the bulleted list, as there aren't exactly two (really there's the serializer and a bunch of filters), so it's really two categories of token consumer, but that distinction isn't really useful to point out.
| ================ | ||
|
|
||
| :mod:`html5lib` Package | ||
| ----------------------- |
There was a problem hiding this comment.
Why take the header out here?
There was a problem hiding this comment.
Otherwise there are two headers in a row with exactly the same text.
reST doesn't support nesting inline markup, so this shows up in rendered for with the backticks.
|
I think that I have addressed all the issues you noted as appropriate. Please let me know if anything else is required! Thanks! |
willkg
left a comment
There was a problem hiding this comment.
I have one outstanding issue with __version__ being listed in __all__. Otherwise this is good to go. Thank you!
|
@twm Thank you so much for this! |
When trying to figure out how to sanitize some HTML I noticed that the docs have lagged behind the implementation. So I removed the mention of
HTMLSanitizerand then... things got out of hand. Summary of changes:tox -e docenvironment so that it's easy to build the docs.__version__tohtml5lib.__all__html5lib.treeadapters.__all__(as this caused Sphinx warnings and didn't really make sense).I didn't squash as suggested in CONTRIBUTING.md because recent PRs don't seem to be following that procedure. I am happy to do so if you like, or to try to break this into smaller PRs.