Nutrient-powered PDF extraction that replaces the default pdfjs text extractor with structured Markdown output -- tables, headings, and reading order preserved.
OpenClaw's default PDF extractor (pdfjs) produces plain text. It scores 0.000 on table structure and 0.000 on heading preservation across 200 real documents.
When an agent asks "what's in row 3, column 4?" it is parsing word soup. Nutrient produces structured Markdown with proper table rows and columns that agents can look up directly.
| Metric | pdfjs | Nutrient | Change |
|---|---|---|---|
| Overall accuracy | 0.578 | 0.880 | +52% |
| Table structure | 0.000 | 0.662 | -- |
| Heading fidelity | 0.000 | 0.811 | -- |
| Reading order | 0.871 | 0.924 | +6% |
Scored with NID (reading order), TEDS (table structure), and MHS (heading fidelity).
openclaw plugins install @nutrient-sdk/openclaw-nutrient-pdf
openclaw config set agents.defaults.pdfExtraction.engine autoThe first command installs the plugin. The second tells OpenClaw to use Nutrient for PDF extraction with automatic pdfjs fallback.
Verify:
openclaw nutrient-pdf status- The existing
pdftool automatically uses Nutrient when the engine is set toauto nutrient_pdf_extracttool is available for agents to explicitly request Nutrient extractionopenclaw nutrient-pdf extract <file.pdf>extracts a PDF from the command line- Falls back to pdfjs if the Nutrient CLI is not installed or fails
All processing runs locally. No cloud uploads, no API keys.
Optional settings in your OpenClaw config:
{
plugins: {
entries: {
"nutrient-pdf": {
config: {
command: "pdf-to-markdown", // path to CLI binary
timeoutMs: 30000, // extraction timeout per document
}
}
}
}
}The pdf-to-markdown CLI includes 1,000 free documents per month. See nutrient.io for higher-volume licensing.
MIT -- see LICENSE for details and third-party dependency notice.

