-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy path0-AI-MANIFEST.a2ml
More file actions
102 lines (83 loc) · 4.21 KB
/
0-AI-MANIFEST.a2ml
File metadata and controls
102 lines (83 loc) · 4.21 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
# STOP — READ THIS FIRST
**This file must be read first by all AI agents.**
## What Is This?
This is the AI manifest for **docudactyl** — a multi-format HPC document
extraction engine targeting British Library scale (~170M items).
## Canonical Locations
### Machine-Readable Metadata: `.machine_readable/` ONLY
| File | Purpose |
|------|---------|
| `.machine_readable/6a2/STATE.a2ml` | Project state, progress, blockers |
| `.machine_readable/6a2/META.a2ml` | Architecture decisions, governance |
| `.machine_readable/6a2/ECOSYSTEM.a2ml` | Position in ecosystem, relationships |
| `.machine_readable/6a2/AGENTIC.a2ml` | AI agent interaction patterns |
| `.machine_readable/6a2/NEUROSYM.a2ml` | Neurosymbolic integration config |
| `.machine_readable/6a2/PLAYBOOK.a2ml` | Operational runbook |
**CRITICAL:** These files must NEVER be created in the repository root.
### Agent Instructions
| File | Scope |
|------|-------|
| `0-AI-MANIFEST.a2ml` | Universal AI entry point (this file) |
| `.claude/CLAUDE.md` | Claude-specific patterns |
## Repository Structure
```
docudactyl/
├── 0-AI-MANIFEST.a2ml # THIS FILE
├── src/
│ ├── chapel/ # HPC engine (hot path)
│ │ ├── DocudactylHPC.chpl # Main entry point
│ │ ├── Config.chpl # Runtime config (--manifestPath, etc.)
│ │ ├── ManifestLoader.chpl
│ │ ├── FFIBridge.chpl # C FFI declarations
│ │ ├── FaultHandler.chpl
│ │ ├── ProgressReporter.chpl
│ │ ├── ShardedOutput.chpl
│ │ ├── ResultAggregator.chpl
│ │ ├── Checkpoint.chpl # Resume after node failure
│ │ └── ContentType.chpl
│ ├── Docudactyl/ABI/ # Idris2 formal proofs (compile-time)
│ │ ├── Types.idr # ContentKind, ParseStatus, ParseResult
│ │ ├── Layout.idr # Struct layout proofs (952 bytes, 8-byte aligned)
│ │ └── Foreign.idr # FFI declarations with safety proofs
│ ├── ocaml/ # Offline Scheme transformer (not HPC)
│ ├── ada/ # Terminal UI (standalone)
│ └── julia/ # Legacy (frozen, replaced by Chapel)
├── ffi/zig/ # Zig FFI dispatcher (zero-cost C wrapper)
│ ├── src/docudactyl_ffi.zig # Multi-format parser (Poppler, Tesseract, FFmpeg, etc.)
│ ├── build.zig
│ └── test/integration_test.zig
├── generated/abi/ # C headers generated from Idris2 ABI
├── deploy/
│ ├── Containerfile # Multi-stage build (Wolfi runtime)
│ └── slurm-docudactyl.sh # Slurm job script (64+ nodes)
├── .machine_readable/ # ALL structured metadata
└── .well-known/ # RFC 9116, ai.txt, humans.txt
```
## Core Invariants
1. **No state files in root** — `.machine_readable/` is authoritative
2. **Chapel is the hot path** — OCaml/Julia are NOT called during HPC runs
3. **Zig FFI has zero runtime overhead** — compiles to same code as direct C calls
4. **No banned patterns in Idris2** — no `believe_me`, `assert_total`, `unsafePerformIO`
5. **License: PMPL-1.0-or-later** on all original code
6. **Author: Jonathan D.A. Jewell** `<j.d.a.jewell@open.ac.uk>`
7. **Containers use Podman** — never Docker. Files named `Containerfile`, never `Dockerfile`
8. **Base images: Chainguard Wolfi** — `cgr.dev/chainguard/wolfi-base:latest`
## Architecture (Hot Path)
```
Manifest (170M paths) → Chapel (block-distribute across N locales)
→ forall: Zig FFI (detect format, dispatch to C library)
→ Poppler (PDF), Tesseract (Image OCR), FFmpeg (Audio/Video),
libxml2 (EPUB), GDAL (GeoSpatial), libvips (Image metadata)
→ Per-locale sharded output + checkpoint files
→ Global reduce → run-report.scm + run-report.json
```
## Session Startup
1. Read this manifest
2. Read `.machine_readable/6a2/STATE.a2ml` for current progress
3. Read `.claude/CLAUDE.md` for code patterns
4. Check `TOPOLOGY.md` for completion dashboard
## Meta
- Format Version: 1.0.0
- Created: 2026-02-20
- License: PMPL-1.0-or-later
- Protocol: https://github.com/hyperpolymath/0-ai-gatekeeper-protocol