Skip to content

PSPDFKit-labs/openclaw-nutrient-pdf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nutrient PDF Plugin for OpenClaw

Nutrient-powered PDF extraction that replaces the default pdfjs text extractor with structured Markdown output -- tables, headings, and reading order preserved.

Table comparison: pdfjs word soup vs Nutrient structured markdown

Why

OpenClaw's default PDF extractor (pdfjs) produces plain text. It scores 0.000 on table structure and 0.000 on heading preservation across 200 real documents.

When an agent asks "what's in row 3, column 4?" it is parsing word soup. Nutrient produces structured Markdown with proper table rows and columns that agents can look up directly.

Benchmark scores: pdfjs vs Nutrient across 200 documents

Benchmark (200 documents, opendataloader-bench)

Metric pdfjs Nutrient Change
Overall accuracy 0.578 0.880 +52%
Table structure 0.000 0.662 --
Heading fidelity 0.000 0.811 --
Reading order 0.871 0.924 +6%

Scored with NID (reading order), TEDS (table structure), and MHS (heading fidelity).

Install

openclaw plugins install @nutrient-sdk/openclaw-nutrient-pdf
openclaw config set agents.defaults.pdfExtraction.engine auto

The first command installs the plugin. The second tells OpenClaw to use Nutrient for PDF extraction with automatic pdfjs fallback.

Verify:

openclaw nutrient-pdf status

What it does

  • The existing pdf tool automatically uses Nutrient when the engine is set to auto
  • nutrient_pdf_extract tool is available for agents to explicitly request Nutrient extraction
  • openclaw nutrient-pdf extract <file.pdf> extracts a PDF from the command line
  • Falls back to pdfjs if the Nutrient CLI is not installed or fails

All processing runs locally. No cloud uploads, no API keys.

Configuration

Optional settings in your OpenClaw config:

{
  plugins: {
    entries: {
      "nutrient-pdf": {
        config: {
          command: "pdf-to-markdown",  // path to CLI binary
          timeoutMs: 30000,            // extraction timeout per document
        }
      }
    }
  }
}

Free tier

The pdf-to-markdown CLI includes 1,000 free documents per month. See nutrient.io for higher-volume licensing.

Links

License

MIT -- see LICENSE for details and third-party dependency notice.

About

Nutrient-powered PDF extraction plugin for OpenClaw. 52% better accuracy on 200-document benchmark.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors