Open-source platform for extracting structured data from documents using AI.
-
Updated
May 15, 2025 - JavaScript
Open-source platform for extracting structured data from documents using AI.
Open-source spreadsheets platform for deep research and document processing
Document AI. Extract structured data like JSON, Markdown and HTML from documents using LLMs and AI agents.
Open-source toolkit for reliable RAG pipelines: convert PDFs to Markdown, clean documents, inspect chunks, compare chunking strategies, and enrich metadata for LLM applications.
Effortlessly extract information from unstructured data with this library, utilizing advanced AI techniques. Compose AI in customizable pipelines and diverse sources for your projects.
A lightweight MCP (Model Context Protocol) server for integrating ComPDF AI with Claude Desktop, enabling AI-powered intelligent document processing and data extraction from PDFs via natural language.
数字图像课程大作业,实现图片中文档提取与矫正。整体思路是通过hough变换检测出直线,进而得到角点,最后经过投影变换,进行矫正。整个项目只用到了opencv的IO操作(包括手写卷积,hough哈夫变换,投影变换等等)
Ingestors extract the contents of mixed unstructured documents into structured (followthemoney) data.
Turn any website into clean, LLM-ready data. Open-source web crawler with stealth mode, distributed crawling, real-time WebSocket progress & Markdown output. Power your AI apps with GcrawlAI.
Guidance on deploying a generative AI document analysis with Amazon Bedrock AgentCore. Auto-classifies, enhances, and aggregates multi-type documents using Gestalt-informed vision prompts. Custom analyzer creation wizard. Scripted CDK deployment. Gradio frontend included.
Parse any file in opencode. Supports PDF, DOCX, XLSX, PPTX, images, EPUB, HTML, Markdown, Jupyter, archives, and plain text.
Cloud-native document extraction platform — SaaS at kreuzberg.dev or self-host on any Kubernetes cluster. 90+ formats, REST API, webhooks. Built on Kreuzberg.
Tool to allow extraction of data from legal documents
AI-powered contract analysis tool
Ingestors extract the contents of mixed unstructured documents into structured (followthemoney) data.
n8n community node for Veryfi document AI: extract structured data from invoices, receipts, checks, bank statements, W-9/W-2, IDs, and any document via blueprints.
AIVisionText is an advanced document analysis platform that harnesses the power of artificial intelligence (AI) to revolutionize the way you manage and extract insights from documents.
The official Laiye ADP CLI and skill built for humans and agents. It enables agents to perform parsing, classification, extraction, validation on any type of documents with high accuracy.
Add a description, image, and links to the document-extraction topic page so that developers can more easily learn about it.
To associate your repository with the document-extraction topic, visit your repo's landing page and select "manage topics."