Skip to content

Convert HTML to clean Markdown with one API call. Turn HTML from CMS exports, web scrapers, WYSIWYG editors, and emails into deterministic GitHub Flavored Markdown. No per-project libraries. No maintenance. Same input → same output every time

License

Notifications You must be signed in to change notification settings

precisionsolutionstech-netizen/html-to-markdown-normalizer-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

HTML to Markdown Converter API — Examples & Documentation

Try HTML to Markdown Converter API on RapidAPI

Convert HTML to clean Markdown with one API call. Turn HTML from CMS exports, web scrapers, WYSIWYG editors, and emails into deterministic GitHub Flavored Markdown. No per-project libraries. No maintenance. Same input → same output every time.

Try the API on RapidAPI →


Table of Contents


What is the HTML to Markdown Converter API?

The HTML to Markdown Converter API is a REST API that converts arbitrary HTML into clean, deterministic Markdown. It strips scripts, styles, layout noise, and tracking attributes while preserving semantic structure—headings, lists, tables, links, images, and code blocks. Output is GitHub Flavored Markdown (GFM) compatible with GitHub, Notion, static site generators, and LLM pipelines.

Key Features

Feature Description
Deterministic Same HTML + options → same Markdown every time
Clean output Strips scripts, event handlers, data-* attributes, inline styles
Malformed HTML Best-effort parsing; handles unclosed tags, invalid nesting
Stateless No data stored or logged; 25MB max per request
Three modes strict, readable (default), llm-friendly

Use Cases

  • CMS migration — WordPress, Notion, Drupal HTML exports → Markdown
  • Web scraping — Normalize scraped HTML before search indexing or analytics
  • WYSIWYG output — TinyMCE, Quill, CKEditor HTML → version-controlled Markdown
  • LLM pipelines — Clean text for embeddings, RAG, or prompt context
  • Documentation — Migrate HTML docs to Markdown for GitHub, MkDocs, Docusaurus
  • Email processing — Extract readable content from HTML emails

Also Searchable As

HTML to Markdown API • HTML Markdown converter • CMS HTML to MD • WYSIWYG to Markdown • GitHub Flavored Markdown API • scraped HTML converter • document conversion API • content migration Markdown • LLM text preprocessing API


Quick Start

Endpoint: POST /convert
Try it: RapidAPI — HTML to Markdown Converter

cURL

curl -X POST "https://html-to-markdown-converter1.p.rapidapi.com/convert" \
  -H "Content-Type: application/json" \
  -H "x-rapidapi-key: YOUR_RAPIDAPI_KEY" \
  -H "x-rapidapi-host: html-to-markdown-converter1.p.rapidapi.com" \
  -d '{"html":"<h1>Hello</h1><p>World <strong>bold</strong></p>","mode":"readable"}'

JavaScript / Node.js

const response = await fetch('https://html-to-markdown-converter1.p.rapidapi.com/convert', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'x-rapidapi-key': 'YOUR_RAPIDAPI_KEY',
    'x-rapidapi-host': 'html-to-markdown-converter1.p.rapidapi.com'
  },
  body: JSON.stringify({
    html: '<h1>Hello</h1><p>World <strong>bold</strong></p>',
    mode: 'readable'
  })
});
const { markdown } = await response.json();
console.log(markdown); // "# Hello\n\nWorld **bold**"

Python

import requests

url = "https://html-to-markdown-converter1.p.rapidapi.com/convert"
headers = {
    "Content-Type": "application/json",
    "x-rapidapi-key": "YOUR_RAPIDAPI_KEY",
    "x-rapidapi-host": "html-to-markdown-converter1.p.rapidapi.com"
}
payload = {
    "html": "<h1>Hello</h1><p>World <strong>bold</strong></p>",
    "mode": "readable"
}
response = requests.post(url, json=payload, headers=headers)
data = response.json()
print(data["markdown"])

Raw HTML (text/html)

Send HTML directly as the request body. Mode defaults to readable.

curl -X POST "https://html-to-markdown-converter1.p.rapidapi.com/convert" \
  -H "Content-Type: text/html" \
  -H "x-rapidapi-key: YOUR_RAPIDAPI_KEY" \
  -H "x-rapidapi-host: html-to-markdown-converter1.p.rapidapi.com" \
  -d '<h1>Hello</h1><p>World</p>'

Real-World Examples

Example 1: CMS Content Pipeline

Ingest HTML from a CMS or third-party API and convert to Markdown for your search index or storage layer.

const API_URL = 'https://html-to-markdown-converter1.p.rapidapi.com/convert';
const RAPIDAPI_HEADERS = {
  'Content-Type': 'application/json',
  'x-rapidapi-key': process.env.RAPIDAPI_KEY,
  'x-rapidapi-host': 'html-to-markdown-converter1.p.rapidapi.com'
};

async function ingestCmsArticle(cmsHtml) {
  const res = await fetch(API_URL, {
    method: 'POST',
    headers: RAPIDAPI_HEADERS,
    body: JSON.stringify({ html: cmsHtml, mode: 'readable' })
  });
  const { markdown } = await res.json();
  return markdown; // Ready for Elasticsearch, DB, or file storage
}

Example 2: LLM Pipeline — RAG & Embeddings

Prepare clean text from web or CMS content for embeddings or RAG. Use llm-friendly mode for link references and predictable structure.

async function prepareForEmbedding(htmlContent) {
  const res = await fetch('https://html-to-markdown-converter1.p.rapidapi.com/convert', {
    method: 'POST',
    headers: RAPIDAPI_HEADERS,
    body: JSON.stringify({
      html: htmlContent,
      mode: 'llm-friendly',
      includeMetadata: true
    })
  });
  const { markdown, metadata } = await res.json();
  // Feed markdown to OpenAI, Cohere, or your embedding model
  return { markdown, charCount: metadata?.outputCharacterCount };
}

Example 3: Batch Doc Migration (WordPress → Markdown)

Convert WordPress or Notion exports in parallel. One schema, one service.

async function migrateDocsToMarkdown(htmlChunks) {
  const results = await Promise.all(
    htmlChunks.map(html =>
      fetch('https://html-to-markdown-converter1.p.rapidapi.com/convert', {
        method: 'POST',
        headers: {
          'Content-Type': 'text/html',
          'x-rapidapi-key': RAPIDAPI_KEY,
          'x-rapidapi-host': 'html-to-markdown-converter1.p.rapidapi.com'
        },
        body: html
      }).then(r => r.json())
    )
  );
  return results.map(r => r.markdown);
}

Example 4: Web Scraper → Search Index

Normalize scraped HTML before indexing. Strip ads, scripts, and layout; keep structure.

import requests

def scrape_and_convert(url: str) -> str:
    html = requests.get(url).text
    res = requests.post(
        "https://html-to-markdown-converter1.p.rapidapi.com/convert",
        headers={
            "Content-Type": "application/json",
            "x-rapidapi-key": RAPIDAPI_KEY,
            "x-rapidapi-host": "html-to-markdown-converter1.p.rapidapi.com"
        },
        json={"html": html, "mode": "strict"}
    )
    return res.json()["markdown"]

API Reference

POST /convert

Convert HTML to Markdown.

Request body (application/json):

Field Type Required Default Description
html string Yes Raw HTML to convert
mode string No readable strict, readable, or llm-friendly
includeMetadata boolean No false Include character counts, tags removed

Alternative: Send Content-Type: text/html with raw HTML as body. Mode defaults to readable.

Modes:

  • strict — Minimal Markdown; maximum cleanup
  • readable — Balanced; human-readable (default)
  • llm-friendly — Link references; predictable structure for LLMs

Response (200):

{
  "markdown": "# Hello\n\nWorld **bold**"
}

With includeMetadata: true:

{
  "markdown": "# Hello\n\nWorld",
  "metadata": {
    "originalCharacterCount": 45,
    "outputCharacterCount": 18,
    "mode": "readable",
    "tagsRemoved": 2
  }
}

Error codes: MISSING_HTML, INVALID_HTML, PAYLOAD_TOO_LARGE, UNRECOVERABLE_PARSE_FAILURE

GET /health

Returns { "status": "ok" }.


Conversion Coverage

HTML Markdown
Headings (h1–h6) #, ##, ...
Paragraphs Blank-line separated
strong, em **bold**, *italic*
Links [text](url)
Images ![alt](src)
Lists (ul, ol) - or 1.
Tables GitHub-flavored tables
Code blocks Fenced ``` blocks
Blockquotes >

Stripped: Scripts, styles, event handlers, data-* attributes, inline styles, javascript: links, input/button elements, SVG. Iframes and videos are converted to links.


Related APIs

Explore more developer tools from Precision Solutions Tech on RapidAPI:

API Description
HTML to Markdown Converter This API — convert HTML to Markdown
JSON Schema Validator Validate JSON against structural schemas
JSON Diff Checker Detect breaking changes between JSON versions
JSON Payload Consistency Checker Detect data consistency issues in JSON
API Error & Status Normalization Canonical error taxonomy and retry guidance
Sensitive Data Detection & Redaction Detect and redact PII in text
Job Posting Normalization Normalize job postings from 15+ job boards
Calendar Event Normalization Normalize calendar events from Google, Outlook, Apple

View all APIs →


FAQ

Can I send raw HTML without JSON?

Yes. Use Content-Type: text/html and send the HTML as the request body. Mode defaults to readable.

Does the API fetch URLs?

No. You must send the HTML in the request body. URL fetching is not supported.

What if the HTML is malformed?

The API uses best-effort parsing. It handles unclosed tags, invalid nesting, and partial fragments. If parsing fails entirely, it returns 422 with UNRECOVERABLE_PARSE_FAILURE.

Is the output deterministic?

Yes. Same input + same options always produce the same Markdown. No randomness or timestamps.

What Markdown dialect is used?

GitHub Flavored Markdown (GFM)—tables, fenced code blocks, standard syntax. Compatible with GitHub, Notion, MkDocs, Docusaurus, and most renderers.

Is my data stored or logged?

No. The API is stateless. HTML is processed in memory and discarded.

Can I use this for LLM pipelines?

Yes. Use mode: "llm-friendly" for link references and predictable structure. Output is suitable for embeddings, RAG, and prompt context.

What's the maximum payload size?

25MB per request. Larger payloads return 413 PAYLOAD_TOO_LARGE.


Try HTML to Markdown Converter API on RapidAPI · All APIs by Precision Solutions Tech

About

Convert HTML to clean Markdown with one API call. Turn HTML from CMS exports, web scrapers, WYSIWYG editors, and emails into deterministic GitHub Flavored Markdown. No per-project libraries. No maintenance. Same input → same output every time

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published