feat: add AG2 multi-agent document processing example by faridun-ag2 · Pull Request #4326 · Unstructured-IO/unstructured

faridun-ag2 · 2026-04-07T18:49:26Z

Summary

Add an example demonstrating AG2 (formerly AutoGen) multi-agent framework combined with Unstructured for intelligent document processing and analysis.

Also addresses #4320 (confidence scores) by showcasing the existing detection_class_prob metadata — agents can filter low-confidence extractions for better RAG quality.

What's included

examples/ag2_multiagent_document_processing/run.py — Main example script
examples/ag2_multiagent_document_processing/test_e2e.py — 17 end-to-end tests
examples/ag2_multiagent_document_processing/README.md — Setup instructions

How it works

Unstructured partitions a document into structured elements (Title, NarrativeText, Table, ListItem, etc.) using Hi-Res strategy for PDFs
AG2 Document Agent wraps Unstructured as a registered tool, callable by agents
AG2 Analyst Agent receives extracted content and produces comprehensive analysis
Agents collaborate via AG2 GroupChat with automatic tool execution

Key features

25+ file types: PDF, HTML, Word, PowerPoint, email, images (OCR), and more
Multi-agent pipeline: Document extraction → Analysis → Summary
Confidence score filtering ([Feature Request] Add document layout analysis confidence scores #4320): Elements display [conf=X.XX] scores from Hi-Res detection model; agents can filter low-confidence extractions via filter_low_confidence tool
Tool registration: Unstructured partitioning exposed as AG2 tool via decorator pattern
End-to-end tests: 16 offline tests + 1 live LLM test
CLI support: python run.py --file path/to/document.pdf
Uses existing example-docs/: No new test fixtures needed

Confidence scores demo (addresses #4320)

When processing PDFs with Hi-Res strategy, each element shows its detection confidence:

[1] (Title) [conf=0.48] LayoutParser: A Unified Toolkit...
[2] (NarrativeText) [conf=0.95] Recent advances in document image analysis...
[3] (ListItem) [conf=0.29] 4 Z. Shen et al.

The filter_low_confidence AG2 tool lets agents filter noisy elements:

Elements below a threshold are removed
Elements without scores (HTML, non-Hi-Res) are kept by default
Stats report kept/filtered/no-score counts

Testing

# Offline tests (no API key needed) — 16 tests
pytest examples/ag2_multiagent_document_processing/test_e2e.py -v -k "not live_llm"

# Full pipeline test (requires OPENAI_API_KEY)
pytest examples/ag2_multiagent_document_processing/test_e2e.py -v -k "live_llm"

Why AG2?

AG2 is a major multi-agent framework with 500K+ monthly PyPI downloads and 4,300+ GitHub stars.
Unstructured is the natural document processing layer for AG2 agentic RAG pipelines — this example
shows how they work together.

Checklist

Code formatted with Black (100 char line)
Ruff checks pass
Pre-commit hooks pass (except pre-existing flake8 E501 repo-wide issue)
17 e2e tests included and passing
Example is self-contained and runnable
Follows conventional commit format
Demonstrates confidence scores from Hi-Res strategy ([Feature Request] Add document layout analysis confidence scores #4320)

Demonstrate AG2 (formerly AutoGen) multi-agent framework with Unstructured for intelligent document processing and analysis. Two agents collaborate: - Document Agent: partitions docs via Unstructured (25+ file types) - Analyst Agent: analyzes extracted elements, produces summaries Includes: - examples/ag2_multiagent_document_processing/run.py — main script - examples/ag2_multiagent_document_processing/test_e2e.py — 12 e2e tests - examples/ag2_multiagent_document_processing/README.md — setup guide

…red-IO#4320) Address issue Unstructured-IO#4320 by leveraging existing detection_class_prob metadata: - Use hi_res strategy by default for PDFs to get confidence scores - format_elements_summary displays [conf=X.XX] and [origin=...] per element - New filter_elements_by_confidence() filters low-confidence extractions - New AG2 tool filter_low_confidence lets agents filter noisy elements - find_sample_document prefers PDFs for richer confidence demo - 5 new tests for confidence filtering (16 total offline tests)

faridun-ag2 · 2026-04-07T20:23:17Z

Hi @cragwolfe @qued !
Would appreciate a review when you get a chance! This adds an AG2 (multi-agent framework) integration example that uses Unstructured for document processing, and also demonstrates the existing detection_class_prob confidence scores relevant to #4320.

Happy to address any feedback. Thanks!

faridun-ag2 added 2 commits April 7, 2026 11:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add AG2 multi-agent document processing example#4326

feat: add AG2 multi-agent document processing example#4326
faridun-ag2 wants to merge 2 commits intoUnstructured-IO:mainfrom
faridun-ag2:faridun/ag2-document-processing-example

faridun-ag2 commented Apr 7, 2026 •

edited

Loading

Uh oh!

faridun-ag2 commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

faridun-ag2 commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's included

How it works

Key features

Confidence scores demo (addresses #4320)

Testing

Why AG2?

Checklist

Uh oh!

faridun-ag2 commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

faridun-ag2 commented Apr 7, 2026 •

edited

Loading