Skip to content

Commit c24cf76

Browse files
committed
improve readme documentation
1 parent b4a59a7 commit c24cf76

File tree

1 file changed

+4
-11
lines changed

1 file changed

+4
-11
lines changed

README.rst

Lines changed: 4 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -22,23 +22,16 @@
2222
HTMLement
2323
---------
2424

25-
Why another Python HTML Parser? There is no "HTML Parser" in the "Python" Standard Library.
26-
Actually, there is the `html.parser.HTMLParser`_ that simply "traverses the DOM tree" and allows me to be notified as
27-
each tag is being parsed. Usually, when "parsing HTML" I want to query its elements and extract data from it.
25+
HTMLement is a pure Python HTML Parser.
2826

29-
There are a few third party "HTML parsers" available like "lxml", "html5lib" and "beautifulsoup".
30-
* "lxml" is the best "parser" available, fast and reliable but since it requires "C libraries", it's not always possible to install.
31-
* "html5lib" is a "pure-python library" and is designed to conform to the "WHATWG HTML" specification. But it is very slow at parsing HTML.
32-
* "beautifulsoup" is also a "pure-python library" but is considered by most to be "very slow".
33-
34-
The "Object" of this project is to be a "pure-python HTML parser" which is also "faster" than "beautifulsoup".
27+
The object of this project is to be a "pure-python HTML parser" which is also "faster" than "beautifulsoup".
3528
And like "beautifulsoup", will also parse invalid html.
36-
The most simple way to do this is to use `XPath expressions`__.
29+
30+
The most simple way to do this is to use ElementTree `XPath expressions`__.
3731
Python does support a simple (read limited) XPath engine inside its "ElementTree" module.
3832
A benefit of using "ElementTree" is that it can use a "C implementation" whenever available.
3933

4034
This "HTML Parser" extends `html.parser.HTMLParser`_ to build a tree of `ElementTree.Element`_ instances.
41-
The returned "root element" natively supports the ElementTree API.
4235

4336
Install
4437
-------

0 commit comments

Comments
 (0)