HeaderId Extension marked as Pending Deprecation.

waylan · waylan · commit 5389174da277 · 2015-01-01T13:25:04.000-05:00
Use the Table of Contents Extension instead. The HeaderId Extension will
raise a PendingDeprecationWarning.

The last few features of the HeaderID extension were mirgrated to TOC
including the baselevel and separator config options. Also, the
marker config option of TOC can be set to an empty string to disable
searching for a marker.

The `slugify`, `unique` and `stashedHTML2text` functions are now defined
in the TOC extension in preperation for the HeaderId extension being
removed. All coresponding tests are now run against the TOC Extension.

The meta-data support of the HeaderId Extension was not migrated and no plan
exists to make that migration. The `forceid` config option makes no sense in
the TOC Extension and the only other config setting supported by meta-data
was the `header_level`. However, as that depends on the template, it makes
more sense to not be defined at the document level.
diff --git a/docs/extensions/header_id.txt b/docs/extensions/header_id.txt
@@ -15,6 +15,13 @@ elements (`h1`-`h6`) in the resulting HTML document.
 
 This extension is included in the standard Markdown library.
 
+!!! warning
+    This extension is **Pending Deprecation**. The [Table of Contents][toc]
+    Extension should be used instead, which offers most the features of this
+    extension and more.
+    
+[toc]: toc.html
+
 Syntax
 ------
 
@@ -55,7 +62,7 @@ The following options are provided to configure the output:
         >>>  text = '''
         ... #Some Header
         ... ## Next Level'''
-		>>> from markdown.extensions.headerid import HeaderIdExtension
+	>>> from markdown.extensions.headerid import HeaderIdExtension
         >>> html = markdown.markdown(text, extensions=[HeaderIdExtension(level=3)])
         >>> print html
         <h3 id="some_header">Some Header</h3>
diff --git a/docs/extensions/toc.txt b/docs/extensions/toc.txt
@@ -18,6 +18,20 @@ This extension is included in the standard Markdown library.
 Syntax
 ------
 
+By default, all headers will automatically have unique `id` attributes 
+generated based upon the text of the header. Note this example, in which all
+three headers would have the same `id`:
+
+    #Header
+    #Header
+    #Header
+
+Results in:
+
+    <h1 id="header">Header</h1>
+    <h1 id="header_1">Header</h1>
+    <h1 id="header_2">Header</h1>
+
 Place a marker in the document where you would like the Table of Contents to
 appear. Then, a nested list of all the headers in the document will replace the
 marker. The marker defaults to `[TOC]` so the following document:
@@ -41,6 +55,14 @@ would generate the following output:
     <h1 id="header-1">Header 1</h1>
     <h1 id="header-2">Header 2</h1>
 
+Regardless of whether a `marker` is found in the document (or disabled), the Table of
+Contents is available as an attribute (`toc`) on the Markdown class. This allows
+one to insert the Table of Contents elsewhere in their page template. For example:
+
+    >>> md = markdown.Markdown(extensions=['markdown.extensions.toc'])
+    >>> html = md.convert(text)
+    >>> page = render_some_template(context={'body': html, 'toc': md.toc})
+
 Usage
 -----
 
@@ -53,37 +75,57 @@ configuring extensions.
 The following options are provided to configure the output:
 
 * **`marker`**:
-    Text to find and replace with the Table of Contents. Defaults
-    to `[TOC]`.
+    Text to find and replace with the Table of Contents. Defaults to `[TOC]`.
+    
+    Set to an empty string to disable searching for a marker, which may save some time,
+    especially on long documents.
 
-    Regardless of whether a `marker` is found in the document, the Table of Contents is
-    also available as an attribute (`toc`) of the Markdown class. This allows one to insert
-    the Table of Contents elsewhere in their page template. For example:
+* **`title`**:
+    Title to insert in the Table of Contents' `<div>`. Defaults to `None`.
 
-        >>> text = '''
-        # Header 1
+* **`anchorlink`**:
+    Set to `True` to cause all headers to link to themselves. Default is `False`.
 
-        ## Header 2
-        '''
-        >>> md = markdown.Markdown(extensions=['markdown.extensions.toc'])
-        >>> html = md.convert(text)
-        >>> render_some_template(context={'body': html, 'toc': md.toc})
+* **`permalink`**:
+    Set to `True` or a string to generate permanent links at the end of each header.
+    Useful with Sphinx stylesheets.
+    
+    When set to `True` the paragraph symbol (&para; -- `&para;`) is used as the link
+    text. When set to a string, the provided string is used as the link text.
+
+* **`baselevel`**:
+    Base level for headers.
+
+    Default: `1`
+
+    The `baselevel` setting allows the header levels to be automatically adjusted to
+    fit within the hierarchy of your html templates. For example, suppose the 
+    Markdown text for a page should not contain any headers higher than level 3
+    (`<h3>`). The following will accomplish that:
+
+        >>>  text = '''
+        ... #Some Header
+        ... ## Next Level'''
+	>>> from markdown.extensions.toc import TocExtension
+        >>> html = markdown.markdown(text, extensions=[TocExtension(baselevel=3)])
+        >>> print html
+        <h3 id="some_header">Some Header</h3>
+        <h4 id="next_level">Next Level</h4>'
 
 * **`slugify`**:
-    Callable to generate anchors based on header text. Defaults to a built in
-    `slugify` method. The callable must accept two arguments, the first
-    contains the text content of the header and the second contains the
-    separator. It should then return a string which will be used as the anchor
-    text.
+    Callable to generate anchors.
 
-* **`title`**:
-    Title to insert in the Table of Contents' `<div>`. Defaults to `None`.
+    Default: `markdown.extensions.headerid.slugify`
 
-* **`anchorlink`**:
-    Setting to `True` will cause the headers link to themselves. Default is
-    `False`.
+    In order to use a different algorithm to define the id attributes, define  and
+    pass in a callable which takes the following two arguments:
 
-* **`permalink`**:
-    Set to `True` to have this extension generate a Sphinx-style permanent links
-    near the headers (for use with Sphinx stylesheets).
+    * `value`: The string to slugify.
+    * `separator`: The Word Separator.
+
+    The callable must return a string appropriate for use in HTML `id` attributes.
+
+* **`separator`**:
+    Word separator. Character which replaces whitespace in id.
 
+    Default: `-`
diff --git a/docs/release-2.6.txt b/docs/release-2.6.txt
@@ -96,6 +96,19 @@ Backwards-incompatible Changes
   be used instead. See the [documentation](reference.html#extension-configs) 
   for a full explaination of the current behavior.
 
+*   The [HeaderId][hid] Extension is pending deprecation and will raise a
+    **`PendingDeprecationWarning`** in version 2.6. The extension will be
+    deprecated in version 2.7 and raise an error in version 2.8. Use the
+    [Table of Contents][TOC] Extension instead, which offers most of the
+    features of the HeaderId Extension and more (support for meta data is missing).
+  
+    Extension authors who have been using the `slugify` and `unique` functions
+    defined in the HeaderId Extension should note that those functions are now
+    defined in the Table of Contents extension and should adjust their import
+    statements accordingly (`from markdown.extensions.toc import slugify, unique`).
+
+[hid]: extensions/headerid.html
+
 What's New in Python-Markdown 2.6
 ---------------------------------
 
@@ -110,15 +123,29 @@ What's New in Python-Markdown 2.6
 [Meta-Data]: extensions/meta_data.html
 [YAML]: http://yaml.org/
 
-*   The [TOC] Extension has been refactored. Significantly, the extension now
-    assigns the Table of Contents to the `toc` attrbibute of the Markdown class
-    regardless of whether a "marker" was found in the document. Third party
-    frameworks no longer need to insert a "marker," run the document through
-    Markdown, then extract the TOC from the document.
+*   The [Table fo Contents][TOC] Extension has been refactored and some new features
+    have been added.  See the documentation for a full explaination of each feature
+    listed below:
+
+      * The extension now assigns the Table of Contents to the `toc` attribute of
+        the Markdown class regardless of whether a "marker" was found in the document.
+	Third party frameworks no longer need to insert a "marker," run the document
+	through Markdown, then extract the TOC from the document.
     
-    Additionaly, the TOC Extension is now a "registered extension." Therefore,
-    when the `reset` method of the Markdown class is called, the `toc` attribute
-    on the Markdown class is cleared (set to an empty string).
+      * The TOC Extension is now a "registered extension." Therefore, when the `reset`
+        method of the Markdown class is called, the `toc` attribute on the Markdown
+	class is cleared (set to an empty string).
+  
+      * When the `marker` config option is set to an empty string, the parser completely
+        skips the process of searching the document for markers. This should save parsing
+	time when the TOC Extension is being used only to assign ids to headers.
+
+      * A `separator` config option has been added allowing users to override the
+        separator character used by the slugify function.
+  
+      * A `baselevel` config option has been added allowing users to set the base level
+        of headers in their documents (h1-h6). This allows the header levels to be
+        automatically adjusted to fit within the hierarchy of an html template.
 
 [TOC]: extensions/toc.html
 
diff --git a/markdown/extensions/headerid.py b/markdown/extensions/headerid.py
@@ -19,64 +19,13 @@
 from __future__ import unicode_literals
 from . import Extension
 from ..treeprocessors import Treeprocessor
-from ..util import HTML_PLACEHOLDER_RE, parseBoolValue
-import re
+from ..util import parseBoolValue
+from .toc import slugify, unique, stashedHTML2text
 import logging
-import unicodedata
+import warnings
 
 logger = logging.getLogger('MARKDOWN')
-
-IDCOUNT_RE = re.compile(r'^(.*)_([0-9]+)$')
-
-
-def slugify(value, separator):
-    """ Slugify a string, to make it URL friendly. """
-    value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore')
-    value = re.sub('[^\w\s-]', '', value.decode('ascii')).strip().lower()
-    return re.sub('[%s\s]+' % separator, separator, value)
-
-
-def unique(id, ids):
-    """ Ensure id is unique in set of ids. Append '_1', '_2'... if not """
-    while id in ids or not id:
-        m = IDCOUNT_RE.match(id)
-        if m:
-            id = '%s_%d' % (m.group(1), int(m.group(2))+1)
-        else:
-            id = '%s_%d' % (id, 1)
-    ids.add(id)
-    return id
-
-
-def itertext(elem):
-    """ Loop through all children and return text only.
-
-    Reimplements method of same name added to ElementTree in Python 2.7
-
-    """
-    if elem.text:
-        yield elem.text
-    for e in elem:
-        for s in itertext(e):
-            yield s
-        if e.tail:
-            yield e.tail
-
-
-def stashedHTML2text(text, md):
-    """ Extract raw HTML, reduce to plain text and swap with placeholder. """
-    def _html_sub(m):
-        """ Substitute raw html with plain text. """
-        try:
-            raw, safe = md.htmlStash.rawHtmlBlocks[int(m.group(1))]
-        except (IndexError, TypeError):
-            return m.group(0)
-        if md.safeMode and not safe:
-            return ''
-        # Strip out tags and entities - leaveing text
-        return re.sub(r'(<[^>]+>)|(&[\#a-zA-Z0-9]+;)', '', raw)
-
-    return HTML_PLACEHOLDER_RE.sub(_html_sub, text)
+logging.captureWarnings(True)
 
 
 class HeaderIdTreeprocessor(Treeprocessor):
@@ -94,7 +43,7 @@ def run(self, doc):
                     if "id" in elem.attrib:
                         id = elem.get('id')
                     else:
-                        id = stashedHTML2text(''.join(itertext(elem)), self.md)
+                        id = stashedHTML2text(''.join(elem.itertext()), self.md)
                         id = slugify(id, sep)
                     elem.set('id', unique(id, self.IDs))
                 if start_level:
@@ -127,6 +76,11 @@ def __init__(self, *args, **kwargs):
 
         super(HeaderIdExtension, self).__init__(*args, **kwargs)
 
+        warnings.warn(
+            'The HeaderId Extension is pending deprecation. Use the TOC Extension instead.',
+            PendingDeprecationWarning
+        )
+
     def extendMarkdown(self, md, md_globals):
         md.registerExtension(self)
         self.processor = HeaderIdTreeprocessor()
diff --git a/markdown/extensions/toc.py b/markdown/extensions/toc.py
diff --git a/tests/test_extensions.py b/tests/test_extensions.py