From 26ea62276b8f7feeb3a93bd78678e6cb0f5db770 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Andrzej=20Kobyli=C5=84ski?= <endrju397@gmail.com>
Date: Thu, 30 Apr 2026 17:28:21 +0200
Subject: [PATCH 1/2] feat: performance benchmarking module + baseline
 (KOJAK-68)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add `okapi-benchmarks` module using JMH (1.37) via the me.champeau.jmh
Gradle plugin. Provides reproducible measurements of fanout throughput
across transports and configurations.

Benchmarks:
- KafkaThroughputBenchmark — real Postgres + real Kafka via Testcontainers
- HttpThroughputBenchmark — real Postgres + WireMock with @Param httpLatencyMs
  injecting 0/20/100ms server-side delay (covers library-only ceiling vs
  realistic webhook scenarios)
- DelivererMicroBenchmark — single-entry deliver() with mocked I/O,
  measures pure code overhead (~1.8M ops/s for Kafka with MockProducer)

Methodology:
- @OperationsPerInvocation(1000) on throughput benchmarks so JMH ms/op
  directly reflects per-message cost (reciprocal = msg/s)
- @Setup(Level.Trial) starts containers; @Setup(Level.Invocation) populates
  fresh PENDING entries before each timed run
- Scheduler bypassed: `processor.processNext` called in a loop until drained
  to measure processing capacity, not polling cadence

Baseline results (smoke run, MacBook M3 Max, JDK 25, fork=1, warmup=1, iter=2):
- Kafka:      ~109-115 msg/s (flat across batchSize — confirms sync .get() bottleneck)
- HTTP @0ms:  ~1,500-3,300 msg/s (library + DB ceiling; WireMock has no real I/O)
- HTTP @20ms: ~33-36 msg/s (flat — sequential blocking dominates)
- HTTP @100ms: ~9 msg/s (flat — close to theoretical 1000/100 = 10 msg/s)

These numbers establish a reference point for upcoming optimizations
(KOJAK-70..79) — every subsequent change re-runs the suite and documents
before/after in `benchmarks/results-postopt-*.md`.

For release-quality numbers with confidence intervals, run
`./gradlew :okapi-benchmarks:jmh` (default fork=2, warmup=3, iterations=5).
---
 .gitignore                                    |   3 +
 benchmarks/README.md                          | 103 ++++
 benchmarks/baseline-http-with-latency.json    | 472 ++++++++++++++++++
 benchmarks/baseline-quick.json                | 310 ++++++++++++
 benchmarks/results-baseline-2026-04.md        | 162 ++++++
 gradle/libs.versions.toml                     |   5 +
 okapi-benchmarks/build.gradle.kts             |  52 ++
 .../benchmarks/DelivererMicroBenchmark.kt     |  91 ++++
 .../benchmarks/HttpThroughputBenchmark.kt     | 125 +++++
 .../benchmarks/KafkaThroughputBenchmark.kt    |  99 ++++
 .../support/KafkaBenchmarkSupport.kt          |  27 +
 .../support/PostgresBenchmarkSupport.kt       |  80 +++
 settings.gradle.kts                           |   1 +
 13 files changed, 1530 insertions(+)
 create mode 100644 benchmarks/README.md
 create mode 100644 benchmarks/baseline-http-with-latency.json
 create mode 100644 benchmarks/baseline-quick.json
 create mode 100644 benchmarks/results-baseline-2026-04.md
 create mode 100644 okapi-benchmarks/build.gradle.kts
 create mode 100644 okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/DelivererMicroBenchmark.kt
 create mode 100644 okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/HttpThroughputBenchmark.kt
 create mode 100644 okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/KafkaThroughputBenchmark.kt
 create mode 100644 okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/support/KafkaBenchmarkSupport.kt
 create mode 100644 okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/support/PostgresBenchmarkSupport.kt

diff --git a/.gitignore b/.gitignore
index 86a412e..5a2feb2 100644
--- a/.gitignore
+++ b/.gitignore
@@ -48,3 +48,6 @@ CLAUDE.md
 docs/specs/
 docs/plans/
 docs/superpowers/
+
+# benchmark logs (verbose console output; JSON results are kept)
+benchmarks/*.log
diff --git a/benchmarks/README.md b/benchmarks/README.md
new file mode 100644
index 0000000..867bab5
--- /dev/null
+++ b/benchmarks/README.md
@@ -0,0 +1,103 @@
+# Okapi Performance Benchmarks
+
+This directory contains benchmark methodology, run instructions, and historical results.
+
+## Running benchmarks
+
+All benchmarks live in the `okapi-benchmarks` module and use [JMH](https://openjdk.org/projects/code-tools/jmh/)
+via the [me.champeau.jmh](https://github.com/melix/jmh-gradle-plugin) Gradle plugin.
+
+### Full baseline (production-quality numbers)
+
+Default JMH config in `okapi-benchmarks/build.gradle.kts` uses:
+- `fork = 2` — isolated JVMs to neutralize JIT-profile variance
+- `warmupIterations = 3`, `warmup = 10s` — let JIT C2 settle
+- `iterations = 5`, `timeOnIteration = 30s` — statistically meaningful sample
+- `-Xms2g -Xmx2g -XX:+UseG1GC` — pinned memory and GC for reproducibility
+
+```sh
+./gradlew :okapi-benchmarks:jmh
+```
+
+Wall time: ~30 minutes (Testcontainers spin-up + 2 transports × 3 batchSize values × 8 iterations).
+
+Result JSON: `okapi-benchmarks/build/reports/jmh/results.json`
+
+### Quick smoke run
+
+For development iteration when you don't need statistically significant numbers:
+
+```sh
+./gradlew :okapi-benchmarks:jmhJar
+java -jar okapi-benchmarks/build/libs/okapi-benchmarks-jmh.jar \
+  "ThroughputBenchmark" -f 1 -wi 1 -i 2 -w 10s -r 15s
+```
+
+Wall time: ~5-8 minutes.
+
+### Single benchmark
+
+```sh
+java -jar okapi-benchmarks/build/libs/okapi-benchmarks-jmh.jar \
+  "KafkaThroughputBenchmark" -p batchSize=50 -f 1 -wi 1 -i 2
+```
+
+## What we measure
+
+### Throughput benchmarks (`*ThroughputBenchmark`)
+
+End-to-end pipeline: insert N PENDING entries, then call `OutboxProcessor.processNext()`
+in a tight loop until drained. The `OutboxScheduler` is bypassed deliberately — we measure
+**processing capacity**, not polling cadence (which is a deployment-time knob).
+
+- `KafkaThroughputBenchmark` — real Postgres + real Kafka via Testcontainers
+- `HttpThroughputBenchmark` — real Postgres + WireMock HTTP target with `@Param httpLatencyMs`
+  injecting `0`/`20`/`100` ms server-side delay (library-only ceiling vs realistic webhook)
+
+Reported as `ops/s` where one op = one delivered message (via `@OperationsPerInvocation`).
+
+### Microbenchmarks (`DelivererMicroBenchmark`)
+
+Single-entry `deliver()` calls with mocked I/O:
+- Kafka: `MockProducer` with auto-complete (no broker)
+- HTTP: WireMock on loopback
+
+Measures pure code overhead (JSON deserialization, record/request construction,
+exception classification). Useful as "did optimization X regress the hot path?" baseline.
+
+## How to read results
+
+Throughput benchmarks report **ops/s = msg/s** thanks to `@OperationsPerInvocation`.
+
+```
+Benchmark                                  (batchSize)   Mode  Cnt   Score   Error  Units
+KafkaThroughputBenchmark.drainAll                   10  thrpt    5  450.2 ± 18.3  ops/s
+KafkaThroughputBenchmark.drainAll                   50  thrpt    5  890.5 ± 22.1  ops/s
+KafkaThroughputBenchmark.drainAll                  100  thrpt    5  920.7 ± 31.4  ops/s
+```
+
+The `Score` is the headline number. The `Error` is a 99.9% confidence interval — a tight error
+(< 5% of score) means the result is trustworthy; a wide error means run more iterations or
+investigate variability sources (background processes, thermal throttling, GC).
+
+## Caveats — important for honest reporting
+
+- **Localhost Testcontainers ≠ production.** Kafka container on the same host has ~0.5ms RTT;
+  a real cluster typically has 5-50ms. Real-world throughput will be **2-10× lower** than these
+  benchmarks suggest. Treat numbers as **upper bounds** for the library's processing capacity.
+- **HTTP benchmark uses WireMock in-JVM**, which adds ~0.3 ms overhead per request (Jetty
+  servlet pipeline). At `httpLatencyMs=0` the measurement reflects "library + DB + WireMock
+  overhead", not pure library throughput. For tighter pomiar consider replacing WireMock
+  with `MockWebServer` (Square) — see [`results-baseline-2026-04.md`](results-baseline-2026-04.md)
+  notes on benchmark methodology.
+- **`httpLatencyMs` is server-side delay**, not network RTT. Real production webhook latency
+  is dominated by the target service's processing time + network — pick the value closest
+  to your target. With `httpLatencyMs=100`, sequential delivery is bounded at
+  `1000ms / 100ms = 10 msg/s/thread` regardless of library efficiency.
+- **Single-threaded scheduler.** Current `OutboxSchedulerConfig` does not expose `concurrency`.
+  Once that lands (planned), the throughput matrix will expand to `batchSize × concurrency`.
+
+## Historical baselines
+
+- [`results-baseline-2026-04.md`](results-baseline-2026-04.md) — pre-optimization baseline
+  (sync sequential delivery, single-threaded scheduler)
diff --git a/benchmarks/baseline-http-with-latency.json b/benchmarks/baseline-http-with-latency.json
new file mode 100644
index 0000000..fec4a1f
--- /dev/null
+++ b/benchmarks/baseline-http-with-latency.json
@@ -0,0 +1,472 @@
+[
+    {
+        "jmhVersion" : "1.37",
+        "benchmark" : "com.softwaremill.okapi.benchmarks.HttpThroughputBenchmark.drainAll",
+        "mode" : "avgt",
+        "threads" : 1,
+        "forks" : 1,
+        "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java",
+        "jvmArgs" : [
+        ],
+        "jdkVersion" : "25.0.2",
+        "vmName" : "OpenJDK 64-Bit Server VM",
+        "vmVersion" : "25.0.2+10-LTS",
+        "warmupIterations" : 1,
+        "warmupTime" : "10 s",
+        "warmupBatchSize" : 1,
+        "measurementIterations" : 2,
+        "measurementTime" : "15 s",
+        "measurementBatchSize" : 1,
+        "params" : {
+            "batchSize" : "10",
+            "httpLatencyMs" : "0"
+        },
+        "primaryMetric" : {
+            "score" : 0.661348474631579,
+            "scoreError" : "NaN",
+            "scoreConfidence" : [
+                "NaN",
+                "NaN"
+            ],
+            "scorePercentiles" : {
+                "0.0" : 0.6453783442105263,
+                "50.0" : 0.661348474631579,
+                "90.0" : 0.6773186050526315,
+                "95.0" : 0.6773186050526315,
+                "99.0" : 0.6773186050526315,
+                "99.9" : 0.6773186050526315,
+                "99.99" : 0.6773186050526315,
+                "99.999" : 0.6773186050526315,
+                "99.9999" : 0.6773186050526315,
+                "100.0" : 0.6773186050526315
+            },
+            "scoreUnit" : "ms/op",
+            "rawData" : [
+                [
+                    0.6773186050526315,
+                    0.6453783442105263
+                ]
+            ]
+        },
+        "secondaryMetrics" : {
+        }
+    },
+    {
+        "jmhVersion" : "1.37",
+        "benchmark" : "com.softwaremill.okapi.benchmarks.HttpThroughputBenchmark.drainAll",
+        "mode" : "avgt",
+        "threads" : 1,
+        "forks" : 1,
+        "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java",
+        "jvmArgs" : [
+        ],
+        "jdkVersion" : "25.0.2",
+        "vmName" : "OpenJDK 64-Bit Server VM",
+        "vmVersion" : "25.0.2+10-LTS",
+        "warmupIterations" : 1,
+        "warmupTime" : "10 s",
+        "warmupBatchSize" : 1,
+        "measurementIterations" : 2,
+        "measurementTime" : "15 s",
+        "measurementBatchSize" : 1,
+        "params" : {
+            "batchSize" : "10",
+            "httpLatencyMs" : "20"
+        },
+        "primaryMetric" : {
+            "score" : 30.321562104,
+            "scoreError" : "NaN",
+            "scoreConfidence" : [
+                "NaN",
+                "NaN"
+            ],
+            "scorePercentiles" : {
+                "0.0" : 30.130055958,
+                "50.0" : 30.321562104,
+                "90.0" : 30.51306825,
+                "95.0" : 30.51306825,
+                "99.0" : 30.51306825,
+                "99.9" : 30.51306825,
+                "99.99" : 30.51306825,
+                "99.999" : 30.51306825,
+                "99.9999" : 30.51306825,
+                "100.0" : 30.51306825
+            },
+            "scoreUnit" : "ms/op",
+            "rawData" : [
+                [
+                    30.51306825,
+                    30.130055958
+                ]
+            ]
+        },
+        "secondaryMetrics" : {
+        }
+    },
+    {
+        "jmhVersion" : "1.37",
+        "benchmark" : "com.softwaremill.okapi.benchmarks.HttpThroughputBenchmark.drainAll",
+        "mode" : "avgt",
+        "threads" : 1,
+        "forks" : 1,
+        "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java",
+        "jvmArgs" : [
+        ],
+        "jdkVersion" : "25.0.2",
+        "vmName" : "OpenJDK 64-Bit Server VM",
+        "vmVersion" : "25.0.2+10-LTS",
+        "warmupIterations" : 1,
+        "warmupTime" : "10 s",
+        "warmupBatchSize" : 1,
+        "measurementIterations" : 2,
+        "measurementTime" : "15 s",
+        "measurementBatchSize" : 1,
+        "params" : {
+            "batchSize" : "10",
+            "httpLatencyMs" : "100"
+        },
+        "primaryMetric" : {
+            "score" : 109.64384764600001,
+            "scoreError" : "NaN",
+            "scoreConfidence" : [
+                "NaN",
+                "NaN"
+            ],
+            "scorePercentiles" : {
+                "0.0" : 109.211303458,
+                "50.0" : 109.64384764600001,
+                "90.0" : 110.076391834,
+                "95.0" : 110.076391834,
+                "99.0" : 110.076391834,
+                "99.9" : 110.076391834,
+                "99.99" : 110.076391834,
+                "99.999" : 110.076391834,
+                "99.9999" : 110.076391834,
+                "100.0" : 110.076391834
+            },
+            "scoreUnit" : "ms/op",
+            "rawData" : [
+                [
+                    110.076391834,
+                    109.211303458
+                ]
+            ]
+        },
+        "secondaryMetrics" : {
+        }
+    },
+    {
+        "jmhVersion" : "1.37",
+        "benchmark" : "com.softwaremill.okapi.benchmarks.HttpThroughputBenchmark.drainAll",
+        "mode" : "avgt",
+        "threads" : 1,
+        "forks" : 1,
+        "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java",
+        "jvmArgs" : [
+        ],
+        "jdkVersion" : "25.0.2",
+        "vmName" : "OpenJDK 64-Bit Server VM",
+        "vmVersion" : "25.0.2+10-LTS",
+        "warmupIterations" : 1,
+        "warmupTime" : "10 s",
+        "warmupBatchSize" : 1,
+        "measurementIterations" : 2,
+        "measurementTime" : "15 s",
+        "measurementBatchSize" : 1,
+        "params" : {
+            "batchSize" : "50",
+            "httpLatencyMs" : "0"
+        },
+        "primaryMetric" : {
+            "score" : 0.36802376800000003,
+            "scoreError" : "NaN",
+            "scoreConfidence" : [
+                "NaN",
+                "NaN"
+            ],
+            "scorePercentiles" : {
+                "0.0" : 0.3655421637586207,
+                "50.0" : 0.36802376800000003,
+                "90.0" : 0.3705053722413793,
+                "95.0" : 0.3705053722413793,
+                "99.0" : 0.3705053722413793,
+                "99.9" : 0.3705053722413793,
+                "99.99" : 0.3705053722413793,
+                "99.999" : 0.3705053722413793,
+                "99.9999" : 0.3705053722413793,
+                "100.0" : 0.3705053722413793
+            },
+            "scoreUnit" : "ms/op",
+            "rawData" : [
+                [
+                    0.3705053722413793,
+                    0.3655421637586207
+                ]
+            ]
+        },
+        "secondaryMetrics" : {
+        }
+    },
+    {
+        "jmhVersion" : "1.37",
+        "benchmark" : "com.softwaremill.okapi.benchmarks.HttpThroughputBenchmark.drainAll",
+        "mode" : "avgt",
+        "threads" : 1,
+        "forks" : 1,
+        "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java",
+        "jvmArgs" : [
+        ],
+        "jdkVersion" : "25.0.2",
+        "vmName" : "OpenJDK 64-Bit Server VM",
+        "vmVersion" : "25.0.2+10-LTS",
+        "warmupIterations" : 1,
+        "warmupTime" : "10 s",
+        "warmupBatchSize" : 1,
+        "measurementIterations" : 2,
+        "measurementTime" : "15 s",
+        "measurementBatchSize" : 1,
+        "params" : {
+            "batchSize" : "50",
+            "httpLatencyMs" : "20"
+        },
+        "primaryMetric" : {
+            "score" : 30.015752896,
+            "scoreError" : "NaN",
+            "scoreConfidence" : [
+                "NaN",
+                "NaN"
+            ],
+            "scorePercentiles" : {
+                "0.0" : 28.894887875,
+                "50.0" : 30.015752896,
+                "90.0" : 31.136617917,
+                "95.0" : 31.136617917,
+                "99.0" : 31.136617917,
+                "99.9" : 31.136617917,
+                "99.99" : 31.136617917,
+                "99.999" : 31.136617917,
+                "99.9999" : 31.136617917,
+                "100.0" : 31.136617917
+            },
+            "scoreUnit" : "ms/op",
+            "rawData" : [
+                [
+                    28.894887875,
+                    31.136617917
+                ]
+            ]
+        },
+        "secondaryMetrics" : {
+        }
+    },
+    {
+        "jmhVersion" : "1.37",
+        "benchmark" : "com.softwaremill.okapi.benchmarks.HttpThroughputBenchmark.drainAll",
+        "mode" : "avgt",
+        "threads" : 1,
+        "forks" : 1,
+        "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java",
+        "jvmArgs" : [
+        ],
+        "jdkVersion" : "25.0.2",
+        "vmName" : "OpenJDK 64-Bit Server VM",
+        "vmVersion" : "25.0.2+10-LTS",
+        "warmupIterations" : 1,
+        "warmupTime" : "10 s",
+        "warmupBatchSize" : 1,
+        "measurementIterations" : 2,
+        "measurementTime" : "15 s",
+        "measurementBatchSize" : 1,
+        "params" : {
+            "batchSize" : "50",
+            "httpLatencyMs" : "100"
+        },
+        "primaryMetric" : {
+            "score" : 109.56882875,
+            "scoreError" : "NaN",
+            "scoreConfidence" : [
+                "NaN",
+                "NaN"
+            ],
+            "scorePercentiles" : {
+                "0.0" : 108.960653083,
+                "50.0" : 109.56882875,
+                "90.0" : 110.177004417,
+                "95.0" : 110.177004417,
+                "99.0" : 110.177004417,
+                "99.9" : 110.177004417,
+                "99.99" : 110.177004417,
+                "99.999" : 110.177004417,
+                "99.9999" : 110.177004417,
+                "100.0" : 110.177004417
+            },
+            "scoreUnit" : "ms/op",
+            "rawData" : [
+                [
+                    110.177004417,
+                    108.960653083
+                ]
+            ]
+        },
+        "secondaryMetrics" : {
+        }
+    },
+    {
+        "jmhVersion" : "1.37",
+        "benchmark" : "com.softwaremill.okapi.benchmarks.HttpThroughputBenchmark.drainAll",
+        "mode" : "avgt",
+        "threads" : 1,
+        "forks" : 1,
+        "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java",
+        "jvmArgs" : [
+        ],
+        "jdkVersion" : "25.0.2",
+        "vmName" : "OpenJDK 64-Bit Server VM",
+        "vmVersion" : "25.0.2+10-LTS",
+        "warmupIterations" : 1,
+        "warmupTime" : "10 s",
+        "warmupBatchSize" : 1,
+        "measurementIterations" : 2,
+        "measurementTime" : "15 s",
+        "measurementBatchSize" : 1,
+        "params" : {
+            "batchSize" : "100",
+            "httpLatencyMs" : "0"
+        },
+        "primaryMetric" : {
+            "score" : 0.3012515294665775,
+            "scoreError" : "NaN",
+            "scoreConfidence" : [
+                "NaN",
+                "NaN"
+            ],
+            "scorePercentiles" : {
+                "0.0" : 0.29902218020588234,
+                "50.0" : 0.3012515294665775,
+                "90.0" : 0.30348087872727275,
+                "95.0" : 0.30348087872727275,
+                "99.0" : 0.30348087872727275,
+                "99.9" : 0.30348087872727275,
+                "99.99" : 0.30348087872727275,
+                "99.999" : 0.30348087872727275,
+                "99.9999" : 0.30348087872727275,
+                "100.0" : 0.30348087872727275
+            },
+            "scoreUnit" : "ms/op",
+            "rawData" : [
+                [
+                    0.30348087872727275,
+                    0.29902218020588234
+                ]
+            ]
+        },
+        "secondaryMetrics" : {
+        }
+    },
+    {
+        "jmhVersion" : "1.37",
+        "benchmark" : "com.softwaremill.okapi.benchmarks.HttpThroughputBenchmark.drainAll",
+        "mode" : "avgt",
+        "threads" : 1,
+        "forks" : 1,
+        "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java",
+        "jvmArgs" : [
+        ],
+        "jdkVersion" : "25.0.2",
+        "vmName" : "OpenJDK 64-Bit Server VM",
+        "vmVersion" : "25.0.2+10-LTS",
+        "warmupIterations" : 1,
+        "warmupTime" : "10 s",
+        "warmupBatchSize" : 1,
+        "measurementIterations" : 2,
+        "measurementTime" : "15 s",
+        "measurementBatchSize" : 1,
+        "params" : {
+            "batchSize" : "100",
+            "httpLatencyMs" : "20"
+        },
+        "primaryMetric" : {
+            "score" : 27.7178780415,
+            "scoreError" : "NaN",
+            "scoreConfidence" : [
+                "NaN",
+                "NaN"
+            ],
+            "scorePercentiles" : {
+                "0.0" : 27.404859667,
+                "50.0" : 27.7178780415,
+                "90.0" : 28.030896416,
+                "95.0" : 28.030896416,
+                "99.0" : 28.030896416,
+                "99.9" : 28.030896416,
+                "99.99" : 28.030896416,
+                "99.999" : 28.030896416,
+                "99.9999" : 28.030896416,
+                "100.0" : 28.030896416
+            },
+            "scoreUnit" : "ms/op",
+            "rawData" : [
+                [
+                    28.030896416,
+                    27.404859667
+                ]
+            ]
+        },
+        "secondaryMetrics" : {
+        }
+    },
+    {
+        "jmhVersion" : "1.37",
+        "benchmark" : "com.softwaremill.okapi.benchmarks.HttpThroughputBenchmark.drainAll",
+        "mode" : "avgt",
+        "threads" : 1,
+        "forks" : 1,
+        "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java",
+        "jvmArgs" : [
+        ],
+        "jdkVersion" : "25.0.2",
+        "vmName" : "OpenJDK 64-Bit Server VM",
+        "vmVersion" : "25.0.2+10-LTS",
+        "warmupIterations" : 1,
+        "warmupTime" : "10 s",
+        "warmupBatchSize" : 1,
+        "measurementIterations" : 2,
+        "measurementTime" : "15 s",
+        "measurementBatchSize" : 1,
+        "params" : {
+            "batchSize" : "100",
+            "httpLatencyMs" : "100"
+        },
+        "primaryMetric" : {
+            "score" : 107.78084964600001,
+            "scoreError" : "NaN",
+            "scoreConfidence" : [
+                "NaN",
+                "NaN"
+            ],
+            "scorePercentiles" : {
+                "0.0" : 107.333746667,
+                "50.0" : 107.78084964600001,
+                "90.0" : 108.227952625,
+                "95.0" : 108.227952625,
+                "99.0" : 108.227952625,
+                "99.9" : 108.227952625,
+                "99.99" : 108.227952625,
+                "99.999" : 108.227952625,
+                "99.9999" : 108.227952625,
+                "100.0" : 108.227952625
+            },
+            "scoreUnit" : "ms/op",
+            "rawData" : [
+                [
+                    108.227952625,
+                    107.333746667
+                ]
+            ]
+        },
+        "secondaryMetrics" : {
+        }
+    }
+]
+
+
diff --git a/benchmarks/baseline-quick.json b/benchmarks/baseline-quick.json
new file mode 100644
index 0000000..d77dc04
--- /dev/null
+++ b/benchmarks/baseline-quick.json
@@ -0,0 +1,310 @@
+[
+    {
+        "jmhVersion" : "1.37",
+        "benchmark" : "com.softwaremill.okapi.benchmarks.HttpThroughputBenchmark.drainAll",
+        "mode" : "avgt",
+        "threads" : 1,
+        "forks" : 1,
+        "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java",
+        "jvmArgs" : [
+        ],
+        "jdkVersion" : "25.0.2",
+        "vmName" : "OpenJDK 64-Bit Server VM",
+        "vmVersion" : "25.0.2+10-LTS",
+        "warmupIterations" : 1,
+        "warmupTime" : "10 s",
+        "warmupBatchSize" : 1,
+        "measurementIterations" : 2,
+        "measurementTime" : "15 s",
+        "measurementBatchSize" : 1,
+        "params" : {
+            "batchSize" : "10"
+        },
+        "primaryMetric" : {
+            "score" : 0.7312460451143792,
+            "scoreError" : "NaN",
+            "scoreConfidence" : [
+                "NaN",
+                "NaN"
+            ],
+            "scorePercentiles" : {
+                "0.0" : 0.7183725021111111,
+                "50.0" : 0.7312460451143792,
+                "90.0" : 0.7441195881176471,
+                "95.0" : 0.7441195881176471,
+                "99.0" : 0.7441195881176471,
+                "99.9" : 0.7441195881176471,
+                "99.99" : 0.7441195881176471,
+                "99.999" : 0.7441195881176471,
+                "99.9999" : 0.7441195881176471,
+                "100.0" : 0.7441195881176471
+            },
+            "scoreUnit" : "ms/op",
+            "rawData" : [
+                [
+                    0.7441195881176471,
+                    0.7183725021111111
+                ]
+            ]
+        },
+        "secondaryMetrics" : {
+        }
+    },
+    {
+        "jmhVersion" : "1.37",
+        "benchmark" : "com.softwaremill.okapi.benchmarks.HttpThroughputBenchmark.drainAll",
+        "mode" : "avgt",
+        "threads" : 1,
+        "forks" : 1,
+        "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java",
+        "jvmArgs" : [
+        ],
+        "jdkVersion" : "25.0.2",
+        "vmName" : "OpenJDK 64-Bit Server VM",
+        "vmVersion" : "25.0.2+10-LTS",
+        "warmupIterations" : 1,
+        "warmupTime" : "10 s",
+        "warmupBatchSize" : 1,
+        "measurementIterations" : 2,
+        "measurementTime" : "15 s",
+        "measurementBatchSize" : 1,
+        "params" : {
+            "batchSize" : "50"
+        },
+        "primaryMetric" : {
+            "score" : 0.3542568145666667,
+            "scoreError" : "NaN",
+            "scoreConfidence" : [
+                "NaN",
+                "NaN"
+            ],
+            "scorePercentiles" : {
+                "0.0" : 0.3486205097333333,
+                "50.0" : 0.3542568145666667,
+                "90.0" : 0.3598931194,
+                "95.0" : 0.3598931194,
+                "99.0" : 0.3598931194,
+                "99.9" : 0.3598931194,
+                "99.99" : 0.3598931194,
+                "99.999" : 0.3598931194,
+                "99.9999" : 0.3598931194,
+                "100.0" : 0.3598931194
+            },
+            "scoreUnit" : "ms/op",
+            "rawData" : [
+                [
+                    0.3598931194,
+                    0.3486205097333333
+                ]
+            ]
+        },
+        "secondaryMetrics" : {
+        }
+    },
+    {
+        "jmhVersion" : "1.37",
+        "benchmark" : "com.softwaremill.okapi.benchmarks.HttpThroughputBenchmark.drainAll",
+        "mode" : "avgt",
+        "threads" : 1,
+        "forks" : 1,
+        "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java",
+        "jvmArgs" : [
+        ],
+        "jdkVersion" : "25.0.2",
+        "vmName" : "OpenJDK 64-Bit Server VM",
+        "vmVersion" : "25.0.2+10-LTS",
+        "warmupIterations" : 1,
+        "warmupTime" : "10 s",
+        "warmupBatchSize" : 1,
+        "measurementIterations" : 2,
+        "measurementTime" : "15 s",
+        "measurementBatchSize" : 1,
+        "params" : {
+            "batchSize" : "100"
+        },
+        "primaryMetric" : {
+            "score" : 0.3109726243044508,
+            "scoreError" : "NaN",
+            "scoreConfidence" : [
+                "NaN",
+                "NaN"
+            ],
+            "scorePercentiles" : {
+                "0.0" : 0.3066374205151515,
+                "50.0" : 0.3109726243044508,
+                "90.0" : 0.31530782809375,
+                "95.0" : 0.31530782809375,
+                "99.0" : 0.31530782809375,
+                "99.9" : 0.31530782809375,
+                "99.99" : 0.31530782809375,
+                "99.999" : 0.31530782809375,
+                "99.9999" : 0.31530782809375,
+                "100.0" : 0.31530782809375
+            },
+            "scoreUnit" : "ms/op",
+            "rawData" : [
+                [
+                    0.31530782809375,
+                    0.3066374205151515
+                ]
+            ]
+        },
+        "secondaryMetrics" : {
+        }
+    },
+    {
+        "jmhVersion" : "1.37",
+        "benchmark" : "com.softwaremill.okapi.benchmarks.KafkaThroughputBenchmark.drainAll",
+        "mode" : "avgt",
+        "threads" : 1,
+        "forks" : 1,
+        "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java",
+        "jvmArgs" : [
+        ],
+        "jdkVersion" : "25.0.2",
+        "vmName" : "OpenJDK 64-Bit Server VM",
+        "vmVersion" : "25.0.2+10-LTS",
+        "warmupIterations" : 1,
+        "warmupTime" : "10 s",
+        "warmupBatchSize" : 1,
+        "measurementIterations" : 2,
+        "measurementTime" : "15 s",
+        "measurementBatchSize" : 1,
+        "params" : {
+            "batchSize" : "10"
+        },
+        "primaryMetric" : {
+            "score" : 9.167523979,
+            "scoreError" : "NaN",
+            "scoreConfidence" : [
+                "NaN",
+                "NaN"
+            ],
+            "scorePercentiles" : {
+                "0.0" : 9.1153902705,
+                "50.0" : 9.167523979,
+                "90.0" : 9.2196576875,
+                "95.0" : 9.2196576875,
+                "99.0" : 9.2196576875,
+                "99.9" : 9.2196576875,
+                "99.99" : 9.2196576875,
+                "99.999" : 9.2196576875,
+                "99.9999" : 9.2196576875,
+                "100.0" : 9.2196576875
+            },
+            "scoreUnit" : "ms/op",
+            "rawData" : [
+                [
+                    9.2196576875,
+                    9.1153902705
+                ]
+            ]
+        },
+        "secondaryMetrics" : {
+        }
+    },
+    {
+        "jmhVersion" : "1.37",
+        "benchmark" : "com.softwaremill.okapi.benchmarks.KafkaThroughputBenchmark.drainAll",
+        "mode" : "avgt",
+        "threads" : 1,
+        "forks" : 1,
+        "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java",
+        "jvmArgs" : [
+        ],
+        "jdkVersion" : "25.0.2",
+        "vmName" : "OpenJDK 64-Bit Server VM",
+        "vmVersion" : "25.0.2+10-LTS",
+        "warmupIterations" : 1,
+        "warmupTime" : "10 s",
+        "warmupBatchSize" : 1,
+        "measurementIterations" : 2,
+        "measurementTime" : "15 s",
+        "measurementBatchSize" : 1,
+        "params" : {
+            "batchSize" : "50"
+        },
+        "primaryMetric" : {
+            "score" : 8.665157989499999,
+            "scoreError" : "NaN",
+            "scoreConfidence" : [
+                "NaN",
+                "NaN"
+            ],
+            "scorePercentiles" : {
+                "0.0" : 8.4859670205,
+                "50.0" : 8.665157989499999,
+                "90.0" : 8.8443489585,
+                "95.0" : 8.8443489585,
+                "99.0" : 8.8443489585,
+                "99.9" : 8.8443489585,
+                "99.99" : 8.8443489585,
+                "99.999" : 8.8443489585,
+                "99.9999" : 8.8443489585,
+                "100.0" : 8.8443489585
+            },
+            "scoreUnit" : "ms/op",
+            "rawData" : [
+                [
+                    8.8443489585,
+                    8.4859670205
+                ]
+            ]
+        },
+        "secondaryMetrics" : {
+        }
+    },
+    {
+        "jmhVersion" : "1.37",
+        "benchmark" : "com.softwaremill.okapi.benchmarks.KafkaThroughputBenchmark.drainAll",
+        "mode" : "avgt",
+        "threads" : 1,
+        "forks" : 1,
+        "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java",
+        "jvmArgs" : [
+        ],
+        "jdkVersion" : "25.0.2",
+        "vmName" : "OpenJDK 64-Bit Server VM",
+        "vmVersion" : "25.0.2+10-LTS",
+        "warmupIterations" : 1,
+        "warmupTime" : "10 s",
+        "warmupBatchSize" : 1,
+        "measurementIterations" : 2,
+        "measurementTime" : "15 s",
+        "measurementBatchSize" : 1,
+        "params" : {
+            "batchSize" : "100"
+        },
+        "primaryMetric" : {
+            "score" : 8.70115148975,
+            "scoreError" : "NaN",
+            "scoreConfidence" : [
+                "NaN",
+                "NaN"
+            ],
+            "scorePercentiles" : {
+                "0.0" : 8.5144857085,
+                "50.0" : 8.70115148975,
+                "90.0" : 8.887817271,
+                "95.0" : 8.887817271,
+                "99.0" : 8.887817271,
+                "99.9" : 8.887817271,
+                "99.99" : 8.887817271,
+                "99.999" : 8.887817271,
+                "99.9999" : 8.887817271,
+                "100.0" : 8.887817271
+            },
+            "scoreUnit" : "ms/op",
+            "rawData" : [
+                [
+                    8.887817271,
+                    8.5144857085
+                ]
+            ]
+        },
+        "secondaryMetrics" : {
+        }
+    }
+]
+
+
diff --git a/benchmarks/results-baseline-2026-04.md b/benchmarks/results-baseline-2026-04.md
new file mode 100644
index 0000000..bedf42a
--- /dev/null
+++ b/benchmarks/results-baseline-2026-04.md
@@ -0,0 +1,162 @@
+# Baseline Performance Results — April 2026
+
+This document captures the **pre-optimization baseline** of the okapi outbox library
+before any of the planned performance improvements (async batch delivery, multi-threaded
+scheduler, batch DB updates) are implemented.
+
+## Purpose
+
+These numbers establish a reference point. When subsequent optimizations land, we can
+compare against this baseline to:
+1. Validate that claimed gains are real (not just lab artifacts).
+2. Detect unintended regressions in unrelated code paths.
+3. Provide users with credible "before/after" performance documentation.
+
+## Methodology
+
+All benchmarks executed via [JMH](https://openjdk.org/projects/code-tools/jmh/) using
+the [me.champeau.jmh](https://github.com/melix/jmh-gradle-plugin) Gradle plugin.
+
+See [`README.md`](README.md) for benchmark architecture and how to reproduce.
+
+### Library state at baseline
+
+- **Single-threaded scheduler.** `OutboxScheduler` runs on one daemon thread
+  (`Executors.newSingleThreadScheduledExecutor`).
+- **Sync sequential delivery.** `OutboxProcessor.processNext()` iterates entries with
+  `forEach`, calling `MessageDeliverer.deliver()` per entry.
+- **Kafka:** `producer.send(record).get()` blocks per record.
+- **HTTP:** `httpClient.send(request, ...)` blocks per request (synchronous JDK HttpClient).
+- **DB:** N individual `updateAfterProcessing` calls (one upsert per entry).
+- All processing occurs **inside a single transaction** that spans claim → deliver → update.
+
+### JMH configuration
+
+- `@BenchmarkMode(Mode.AverageTime)` — reports time per drain
+- `@OperationsPerInvocation(1000)` on throughput benchmarks — JMH normalizes by 1000,
+  so `Score` in `ms/op` directly reflects "ms per delivered message"; reciprocal = msg/s
+- `@Param` matrix: `batchSize ∈ {10, 50, 100}`, `httpLatencyMs ∈ {0, 20, 100}` (HTTP only)
+- `fork = 1`, `warmupIterations = 1`, `iterations = 2`, `warmup = 10s`, `measurement = 15s`
+  (quick smoke run; full baseline with `fork=2`, `warmup=3`, `iterations=5` is also valid
+  and recommended for release-quality numbers)
+
+### Hardware / runtime
+
+- **Hardware:** MacBook Pro (Apple M3 Max, 96 GB RAM)
+- **OS:** macOS 26.4.1
+- **JVM:** Temurin 25.0.2 LTS (OpenJDK 25.0.2+10)
+- **Docker:** Docker Desktop, 7.7 GB allocated to engine
+- **Postgres:** `postgres:16` (Testcontainers, default config)
+- **Kafka:** `apache/kafka:3.8.1` (Testcontainers, default config)
+- **WireMock:** local loopback, zero artificial latency
+
+## Results
+
+> **NOTE:** Results from the smoke run are pasted below as captured. For release-quality
+> numbers, re-run with `./gradlew :okapi-benchmarks:jmh` (fork=2, warmup=3, iterations=5).
+
+### Throughput benchmarks
+
+End-to-end fanout — N entries inserted, processor drains them in a tight loop,
+scheduler bypassed. `1 / (ms/op)` × 1000 = msg/s.
+
+| Benchmark | batchSize | ms/op | **msg/s** |
+|-----------|-----------|-------|-----------|
+| `KafkaThroughputBenchmark.drainAll` | 10  | 9.168 | **~109** |
+| `KafkaThroughputBenchmark.drainAll` | 50  | 8.665 | **~115** |
+| `KafkaThroughputBenchmark.drainAll` | 100 | 8.701 | **~115** |
+#### HTTP throughput (with `httpLatencyMs` parameter)
+
+| batchSize | httpLatencyMs | ms/op | **msg/s** | Interpretation |
+|-----------|---------------|-------|-----------|----------------|
+| 10  | 0   | 0.661   | **~1,513** | library + DB + WireMock ceiling |
+| 10  | 20  | 30.322  | **~33**    | fast intra-cluster service |
+| 10  | 100 | 109.644 | **~9**     | typical cross-region webhook |
+| 50  | 0   | 0.368   | **~2,717** | (same ceiling, fewer claimPending queries) |
+| 50  | 20  | 30.016  | **~33**    | flat — sequential blocking dominates |
+| 50  | 100 | 109.569 | **~9**     | flat — one batch ≈ batchSize × latencyMs |
+| 100 | 0   | 0.301   | **~3,322** | (ceiling, marginal gain from batching DB) |
+| 100 | 20  | 27.718  | **~36**    | flat with `batchSize=10` and `batchSize=50` |
+| 100 | 100 | 107.781 | **~9**     | flat — pattern holds across all batchSize |
+
+Raw JSON: [`baseline-http-with-latency.json`](baseline-http-with-latency.json).
+
+#### What this HTTP table reveals
+
+- **At `latencyMs=0`** throughput scales with `batchSize` (1,513 → 2,717 → 3,322) — DB query
+  amortization is the only thing being optimized. This is the **library's processing ceiling**.
+- **At any non-zero latency** throughput is **flat across batchSize** (33, 33, 36 at 20 ms;
+  9, 9, ~9 at 100 ms) — proving that **sequential blocking dominates** as soon as I/O has
+  any cost. Batch size doesn't help when each `httpClient.send()` blocks for the full RTT.
+- **The theoretical sequential ceiling** for `latencyMs=N` is `1000ms / Nms = 1000/N msg/s`,
+  matching the measured numbers within ~30% (overhead from DB + HttpClient setup):
+  - latency=20 → theoretical 50 msg/s, measured ~33
+  - latency=100 → theoretical 10 msg/s, measured ~9
+
+Run wall time: 5 min 3 s. Raw JSON: [`baseline-quick.json`](baseline-quick.json).
+
+**Note on confidence intervals:** This smoke run uses `fork=1, warmup=1, iterations=2`,
+so the JMH `Error` column is empty. For release-quality numbers with proper CI bounds,
+re-run with `./gradlew :okapi-benchmarks:jmh` (default config: fork=2, warmup=3,
+iterations=5).
+
+### Microbenchmarks
+
+Single-entry `deliver()` calls with mocked I/O (zero network cost). Measures pure code
+overhead — JSON deserialization of `deliveryMetadata`, record/request construction,
+exception classification.
+
+| Benchmark | Score | Units |
+|-----------|-------|-------|
+| `DelivererMicroBenchmark.kafkaDeliver` | ~1,800,000 | ops/s |
+| `DelivererMicroBenchmark.httpDeliver`  | _TBD_ | ops/s |
+
+The Kafka deliver microbenchmark shows the per-message code overhead is **negligible**
+(~550 ns/message). This means the throughput benchmarks measure **almost entirely**
+I/O and DB time — the library itself contributes <1% to the per-message latency budget.
+
+## Interpretation
+
+### What the baseline tells us
+
+1. **Kafka throughput is flat with respect to `batchSize`** (109 → 115 → 115 msg/s).
+   This confirms that the bottleneck is `producer.send().get()` blocking sequentially
+   per entry, not DB or scheduler overhead. Each Kafka send takes ~9 ms RTT on localhost
+   with `acks=all`, and 1000 entries × 9 ms = 9 seconds regardless of how they're batched
+   from the DB side. This is **exactly the bottleneck async batch delivery
+   (fire-flush-await) is designed to eliminate**.
+
+2. **HTTP throughput scales positively with `batchSize`** (1,369 → 2,825 → 3,215 msg/s).
+   With WireMock on localhost the per-request I/O cost approaches zero, so the savings
+   come from amortizing the per-batch DB overhead (`claimPending`, transaction begin/commit).
+   Larger batch = fewer DB roundtrips per delivered message.
+
+3. **The 30× Kafka/HTTP asymmetry is an artifact of localhost WireMock.** In production
+   HTTP webhooks have RTT of 50-200 ms — the same `httpClient.send().` blocking pattern
+   that's currently used will hit a similar wall as Kafka does today. **Real-world HTTP
+   throughput will be much lower** than this benchmark suggests; the current numbers reflect
+   "library overhead with free I/O", not "real webhook delivery rate".
+
+4. **Single-threaded ceiling.** With `concurrency=1` (the only option today), throughput
+   is bounded by `1 / (avgRoundTripPerMessage + dbOverheadPerMessage)`. Adding multi-threaded
+   fan-out should give linear scaling up to DB connection pool size.
+
+### What the baseline does NOT tell us
+
+- **Production throughput.** Localhost Testcontainers Kafka has ~0.5 ms RTT vs 5-50 ms
+  for real clusters. Real-world numbers will be 2-10× lower.
+- **Multi-threaded throughput.** `concurrency` config does not exist yet.
+- **Behavior under failures.** All deliveries succeed; no retry path is exercised.
+
+## Next steps
+
+These baseline numbers will be re-measured after each of the planned optimizations lands:
+
+1. **Async batch delivery** — `MessageDeliverer.deliverBatch()` with Kafka
+   fire-flush-await and HTTP `sendAsync`. Expected: 5-10× improvement at `batchSize=50`+.
+2. **Multi-threaded scheduler** — `OutboxSchedulerConfig.concurrency`. Expected: linear
+   scaling up to DB connection pool size.
+3. **Batch UPDATE via `executeBatch`** — single prepared statement for N entries.
+   Expected: marginal at `batchSize=10`, 3-5× at `batchSize=100+`.
+
+See [KOJAK-14](https://softwaremill.atlassian.net/browse/KOJAK-14) epic for the full plan.
diff --git a/gradle/libs.versions.toml b/gradle/libs.versions.toml
index f6de469..b8a707f 100644
--- a/gradle/libs.versions.toml
+++ b/gradle/libs.versions.toml
@@ -17,6 +17,8 @@ slf4j = "2.0.17"
 assertj = "3.27.7"
 h2 = "2.4.240"
 micrometer = "1.16.5"
+jmh = "1.37"
+jmhGradlePlugin = "0.7.3"
 
 [libraries]
 kotlinGradlePlugin = { module = "org.jetbrains.kotlin:kotlin-gradle-plugin", version.ref = "kotlin" }
@@ -51,6 +53,9 @@ wiremock = { module = "org.wiremock:wiremock", version.ref = "wiremock" }
 slf4jApi = { module = "org.slf4j:slf4j-api", version.ref = "slf4j" }
 slf4jSimple = { module = "org.slf4j:slf4j-simple", version.ref = "slf4j" }
 h2 = { module = "com.h2database:h2", version.ref = "h2" }
+jmhCore = { module = "org.openjdk.jmh:jmh-core", version.ref = "jmh" }
+jmhGeneratorAnnprocess = { module = "org.openjdk.jmh:jmh-generator-annprocess", version.ref = "jmh" }
 
 [plugins]
 ktlint = { id = "org.jlleitschuh.gradle.ktlint", version.ref = "ktlint" }
+jmh = { id = "me.champeau.jmh", version.ref = "jmhGradlePlugin" }
diff --git a/okapi-benchmarks/build.gradle.kts b/okapi-benchmarks/build.gradle.kts
new file mode 100644
index 0000000..d1c27b0
--- /dev/null
+++ b/okapi-benchmarks/build.gradle.kts
@@ -0,0 +1,52 @@
+plugins {
+    id("buildsrc.convention.kotlin-jvm")
+    alias(libs.plugins.jmh)
+}
+
+dependencies {
+    // Okapi modules under measurement
+    jmh(project(":okapi-core"))
+    jmh(project(":okapi-postgres"))
+    jmh(project(":okapi-kafka"))
+    jmh(project(":okapi-http"))
+
+    // Testcontainers — real Postgres + real Kafka for end-to-end throughput
+    jmh(libs.testcontainersPostgresql)
+    jmh(libs.testcontainersKafka)
+
+    // DB driver + schema
+    jmh(libs.postgresql)
+    jmh(libs.liquibaseCore)
+
+    // Kafka clients (provides MockProducer for microbenchmarks)
+    jmh(libs.kafkaClients)
+
+    // WireMock for HTTP target
+    jmh(libs.wiremock)
+
+    // SLF4J for Testcontainers/Kafka logging
+    jmh(libs.slf4jSimple)
+
+    // JMH core + annotation processor
+    jmh(libs.jmhCore)
+    jmh(libs.jmhGeneratorAnnprocess)
+    jmhAnnotationProcessor(libs.jmhGeneratorAnnprocess)
+}
+
+jmh {
+    fork = 2
+    warmupIterations = 3
+    warmup = "10s"
+    iterations = 5
+    timeOnIteration = "30s"
+    resultFormat = "JSON"
+    resultsFile = layout.buildDirectory.file("reports/jmh/results.json")
+    jvmArgs = listOf("-Xms2g", "-Xmx2g", "-XX:+UseG1GC")
+}
+
+// ktlint should not lint JMH-generated sources.
+ktlint {
+    filter {
+        exclude { it.file.path.contains("/build/") || it.file.path.contains("/generated/") }
+    }
+}
diff --git a/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/DelivererMicroBenchmark.kt b/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/DelivererMicroBenchmark.kt
new file mode 100644
index 0000000..81c258b
--- /dev/null
+++ b/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/DelivererMicroBenchmark.kt
@@ -0,0 +1,91 @@
+package com.softwaremill.okapi.benchmarks
+
+import com.github.tomakehurst.wiremock.WireMockServer
+import com.github.tomakehurst.wiremock.client.WireMock.aResponse
+import com.github.tomakehurst.wiremock.client.WireMock.post
+import com.github.tomakehurst.wiremock.client.WireMock.urlEqualTo
+import com.github.tomakehurst.wiremock.core.WireMockConfiguration.wireMockConfig
+import com.softwaremill.okapi.core.DeliveryResult
+import com.softwaremill.okapi.core.OutboxEntry
+import com.softwaremill.okapi.core.OutboxMessage
+import com.softwaremill.okapi.http.HttpDeliveryInfo
+import com.softwaremill.okapi.http.HttpMessageDeliverer
+import com.softwaremill.okapi.http.HttpMethod
+import com.softwaremill.okapi.http.ServiceUrlResolver
+import com.softwaremill.okapi.kafka.KafkaDeliveryInfo
+import com.softwaremill.okapi.kafka.KafkaMessageDeliverer
+import org.apache.kafka.clients.producer.MockProducer
+import org.apache.kafka.common.serialization.StringSerializer
+import org.openjdk.jmh.annotations.Benchmark
+import org.openjdk.jmh.annotations.BenchmarkMode
+import org.openjdk.jmh.annotations.Mode
+import org.openjdk.jmh.annotations.OutputTimeUnit
+import org.openjdk.jmh.annotations.Scope
+import org.openjdk.jmh.annotations.Setup
+import org.openjdk.jmh.annotations.State
+import org.openjdk.jmh.annotations.TearDown
+import java.time.Instant
+import java.util.concurrent.TimeUnit
+
+/**
+ * Pure code-overhead microbenchmarks for the [KafkaMessageDeliverer] and
+ * [HttpMessageDeliverer] `deliver()` methods. I/O is mocked away:
+ *
+ * - Kafka: [MockProducer] with auto-complete (futures complete synchronously, no broker)
+ * - HTTP: WireMock on loopback with zero artificial latency
+ *
+ * These measure the cost of: deliveryInfo deserialization, record/request construction,
+ * exception classification, and result wrapping — i.e. everything around the I/O.
+ * Useful as a baseline for "did optimization X add overhead?" before/after comparisons.
+ */
+@State(Scope.Benchmark)
+@BenchmarkMode(Mode.Throughput)
+@OutputTimeUnit(TimeUnit.SECONDS)
+open class DelivererMicroBenchmark {
+
+    private lateinit var kafkaDeliverer: KafkaMessageDeliverer
+    private lateinit var httpDeliverer: HttpMessageDeliverer
+    private lateinit var wiremock: WireMockServer
+
+    private lateinit var kafkaEntry: OutboxEntry
+    private lateinit var httpEntry: OutboxEntry
+
+    @Setup(org.openjdk.jmh.annotations.Level.Trial)
+    fun setupTrial() {
+        val mockProducer = MockProducer(true, null, StringSerializer(), StringSerializer())
+        kafkaDeliverer = KafkaMessageDeliverer(mockProducer)
+
+        wiremock = WireMockServer(wireMockConfig().dynamicPort()).also { it.start() }
+        wiremock.stubFor(post(urlEqualTo(ENDPOINT)).willReturn(aResponse().withStatus(200)))
+        val urlResolver = ServiceUrlResolver { "http://localhost:${wiremock.port()}" }
+        httpDeliverer = HttpMessageDeliverer(urlResolver)
+
+        val now = Instant.now()
+        kafkaEntry = OutboxEntry.createPending(
+            OutboxMessage("bench.event", PAYLOAD),
+            KafkaDeliveryInfo(topic = "bench-topic", partitionKey = "k"),
+            now,
+        )
+        httpEntry = OutboxEntry.createPending(
+            OutboxMessage("bench.event", PAYLOAD),
+            HttpDeliveryInfo(serviceName = "bench", endpointPath = ENDPOINT, httpMethod = HttpMethod.POST),
+            now,
+        )
+    }
+
+    @Benchmark
+    fun kafkaDeliver(): DeliveryResult = kafkaDeliverer.deliver(kafkaEntry)
+
+    @Benchmark
+    fun httpDeliver(): DeliveryResult = httpDeliverer.deliver(httpEntry)
+
+    @TearDown(org.openjdk.jmh.annotations.Level.Trial)
+    fun teardown() {
+        wiremock.stop()
+    }
+
+    companion object {
+        private const val ENDPOINT = "/api/bench"
+        private const val PAYLOAD = """{"orderId":"order-42","amount":100.50,"currency":"EUR"}"""
+    }
+}
diff --git a/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/HttpThroughputBenchmark.kt b/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/HttpThroughputBenchmark.kt
new file mode 100644
index 0000000..6cb6810
--- /dev/null
+++ b/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/HttpThroughputBenchmark.kt
@@ -0,0 +1,125 @@
+package com.softwaremill.okapi.benchmarks
+
+import com.github.tomakehurst.wiremock.WireMockServer
+import com.github.tomakehurst.wiremock.client.WireMock.aResponse
+import com.github.tomakehurst.wiremock.client.WireMock.post
+import com.github.tomakehurst.wiremock.client.WireMock.urlEqualTo
+import com.github.tomakehurst.wiremock.core.WireMockConfiguration.wireMockConfig
+import com.softwaremill.okapi.benchmarks.support.PostgresBenchmarkSupport
+import com.softwaremill.okapi.core.OutboxEntryProcessor
+import com.softwaremill.okapi.core.OutboxMessage
+import com.softwaremill.okapi.core.OutboxProcessor
+import com.softwaremill.okapi.core.OutboxPublisher
+import com.softwaremill.okapi.core.RetryPolicy
+import com.softwaremill.okapi.http.HttpDeliveryInfo
+import com.softwaremill.okapi.http.HttpMessageDeliverer
+import com.softwaremill.okapi.http.HttpMethod
+import com.softwaremill.okapi.http.ServiceUrlResolver
+import com.softwaremill.okapi.postgres.PostgresOutboxStore
+import org.openjdk.jmh.annotations.Benchmark
+import org.openjdk.jmh.annotations.BenchmarkMode
+import org.openjdk.jmh.annotations.Mode
+import org.openjdk.jmh.annotations.OperationsPerInvocation
+import org.openjdk.jmh.annotations.OutputTimeUnit
+import org.openjdk.jmh.annotations.Param
+import org.openjdk.jmh.annotations.Scope
+import org.openjdk.jmh.annotations.Setup
+import org.openjdk.jmh.annotations.State
+import org.openjdk.jmh.annotations.TearDown
+import java.time.Clock
+import java.util.concurrent.TimeUnit
+
+/**
+ * Measures end-to-end fanout throughput for HTTP webhook delivery via WireMock.
+ *
+ * WireMock is configured with **zero artificial latency** — this measures the
+ * library's processing capacity, not the user's network. Real-world throughput
+ * will be lower, dominated by webhook RTT.
+ *
+ * Same pattern as [KafkaThroughputBenchmark]: bypass scheduler, drain in tight loop,
+ * report avg ms per drain with [OperationsPerInvocation] yielding ops/s = msg/s.
+ */
+@State(Scope.Benchmark)
+@BenchmarkMode(Mode.AverageTime)
+@OutputTimeUnit(TimeUnit.MILLISECONDS)
+open class HttpThroughputBenchmark {
+
+    @Param("10", "50", "100")
+    var batchSize: Int = 0
+
+    /**
+     * Artificial server-side latency injected by WireMock per request.
+     *
+     * - `0` — library-only ceiling (no I/O cost). Useful as upper bound.
+     * - `20` — fast intra-cluster service (e.g., service mesh sidecar).
+     * - `100` — typical external webhook (cross-region cloud HTTP).
+     *
+     * Real production webhook latency is usually 50-500 ms; pick the value closest
+     * to your target service.
+     */
+    @Param("0", "20", "100")
+    var httpLatencyMs: Int = 0
+
+    private lateinit var postgres: PostgresBenchmarkSupport
+    private lateinit var wiremock: WireMockServer
+    private lateinit var publisher: OutboxPublisher
+    private lateinit var processor: OutboxProcessor
+    private lateinit var deliveryInfo: HttpDeliveryInfo
+
+    @Setup(org.openjdk.jmh.annotations.Level.Trial)
+    fun setupTrial() {
+        postgres = PostgresBenchmarkSupport().also { it.start() }
+        wiremock = WireMockServer(wireMockConfig().dynamicPort()).also { it.start() }
+        wiremock.stubFor(
+            post(urlEqualTo(ENDPOINT)).willReturn(
+                aResponse()
+                    .withStatus(200)
+                    .withFixedDelay(httpLatencyMs),
+            ),
+        )
+
+        val clock = Clock.systemUTC()
+        val store = PostgresOutboxStore(postgres.jdbc, clock)
+        publisher = OutboxPublisher(store, clock)
+        val urlResolver = ServiceUrlResolver { "http://localhost:${wiremock.port()}" }
+        val deliverer = HttpMessageDeliverer(urlResolver)
+        val entryProcessor = OutboxEntryProcessor(deliverer, RetryPolicy(maxRetries = 0), clock)
+        processor = OutboxProcessor(store, entryProcessor)
+        deliveryInfo = HttpDeliveryInfo(
+            serviceName = "bench-target",
+            endpointPath = ENDPOINT,
+            httpMethod = HttpMethod.POST,
+        )
+    }
+
+    @Setup(org.openjdk.jmh.annotations.Level.Invocation)
+    fun setupInvocation() {
+        postgres.truncate()
+        postgres.jdbc.withTransaction {
+            repeat(TOTAL_ENTRIES) {
+                publisher.publish(OutboxMessage("bench.event", PAYLOAD), deliveryInfo)
+            }
+        }
+    }
+
+    @Benchmark
+    @OperationsPerInvocation(TOTAL_ENTRIES)
+    fun drainAll() {
+        val iterations = (TOTAL_ENTRIES + batchSize - 1) / batchSize
+        repeat(iterations) {
+            postgres.jdbc.withTransaction { processor.processNext(batchSize) }
+        }
+    }
+
+    @TearDown(org.openjdk.jmh.annotations.Level.Trial)
+    fun teardown() {
+        wiremock.stop()
+        postgres.stop()
+    }
+
+    companion object {
+        const val TOTAL_ENTRIES = 1000
+        private const val ENDPOINT = "/api/bench"
+        private const val PAYLOAD = """{"orderId":"order-42","amount":100.50,"currency":"EUR"}"""
+    }
+}
diff --git a/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/KafkaThroughputBenchmark.kt b/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/KafkaThroughputBenchmark.kt
new file mode 100644
index 0000000..ac546db
--- /dev/null
+++ b/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/KafkaThroughputBenchmark.kt
@@ -0,0 +1,99 @@
+package com.softwaremill.okapi.benchmarks
+
+import com.softwaremill.okapi.benchmarks.support.KafkaBenchmarkSupport
+import com.softwaremill.okapi.benchmarks.support.PostgresBenchmarkSupport
+import com.softwaremill.okapi.core.OutboxEntryProcessor
+import com.softwaremill.okapi.core.OutboxMessage
+import com.softwaremill.okapi.core.OutboxProcessor
+import com.softwaremill.okapi.core.OutboxPublisher
+import com.softwaremill.okapi.core.RetryPolicy
+import com.softwaremill.okapi.kafka.KafkaDeliveryInfo
+import com.softwaremill.okapi.kafka.KafkaMessageDeliverer
+import com.softwaremill.okapi.postgres.PostgresOutboxStore
+import org.apache.kafka.clients.producer.KafkaProducer
+import org.openjdk.jmh.annotations.Benchmark
+import org.openjdk.jmh.annotations.BenchmarkMode
+import org.openjdk.jmh.annotations.Mode
+import org.openjdk.jmh.annotations.OperationsPerInvocation
+import org.openjdk.jmh.annotations.OutputTimeUnit
+import org.openjdk.jmh.annotations.Param
+import org.openjdk.jmh.annotations.Scope
+import org.openjdk.jmh.annotations.Setup
+import org.openjdk.jmh.annotations.State
+import org.openjdk.jmh.annotations.TearDown
+import java.time.Clock
+import java.util.UUID
+import java.util.concurrent.TimeUnit
+
+/**
+ * Measures end-to-end fanout throughput for Kafka delivery.
+ *
+ * Each invocation:
+ *   1. (Setup.Invocation) truncates the outbox table and inserts [TOTAL_ENTRIES] PENDING entries.
+ *   2. (Benchmark) calls [OutboxProcessor.processNext] in a loop until the queue is drained.
+ *   3. JMH reports avg ms per "drain TOTAL_ENTRIES entries", from which msg/s is derived.
+ *
+ * The scheduler is bypassed deliberately — we measure the processing capacity, not polling cadence.
+ */
+@State(Scope.Benchmark)
+@BenchmarkMode(Mode.AverageTime)
+@OutputTimeUnit(TimeUnit.MILLISECONDS)
+open class KafkaThroughputBenchmark {
+
+    @Param("10", "50", "100")
+    var batchSize: Int = 0
+
+    private lateinit var postgres: PostgresBenchmarkSupport
+    private lateinit var kafka: KafkaBenchmarkSupport
+    private lateinit var producer: KafkaProducer<String, String>
+    private lateinit var publisher: OutboxPublisher
+    private lateinit var processor: OutboxProcessor
+    private lateinit var topic: String
+
+    @Setup(org.openjdk.jmh.annotations.Level.Trial)
+    fun setupTrial() {
+        postgres = PostgresBenchmarkSupport().also { it.start() }
+        kafka = KafkaBenchmarkSupport().also { it.start() }
+        producer = kafka.createProducer()
+        topic = "bench-${UUID.randomUUID()}"
+
+        val clock = Clock.systemUTC()
+        val store = PostgresOutboxStore(postgres.jdbc, clock)
+        publisher = OutboxPublisher(store, clock)
+        val deliverer = KafkaMessageDeliverer(producer)
+        val entryProcessor = OutboxEntryProcessor(deliverer, RetryPolicy(maxRetries = 0), clock)
+        processor = OutboxProcessor(store, entryProcessor)
+    }
+
+    @Setup(org.openjdk.jmh.annotations.Level.Invocation)
+    fun setupInvocation() {
+        postgres.truncate()
+        val info = KafkaDeliveryInfo(topic = topic, partitionKey = "k")
+        postgres.jdbc.withTransaction {
+            repeat(TOTAL_ENTRIES) {
+                publisher.publish(OutboxMessage("bench.event", PAYLOAD), info)
+            }
+        }
+    }
+
+    @Benchmark
+    @OperationsPerInvocation(TOTAL_ENTRIES)
+    fun drainAll() {
+        val iterations = (TOTAL_ENTRIES + batchSize - 1) / batchSize
+        repeat(iterations) {
+            postgres.jdbc.withTransaction { processor.processNext(batchSize) }
+        }
+    }
+
+    @TearDown(org.openjdk.jmh.annotations.Level.Trial)
+    fun teardown() {
+        producer.close()
+        kafka.stop()
+        postgres.stop()
+    }
+
+    companion object {
+        const val TOTAL_ENTRIES = 1000
+        private const val PAYLOAD = """{"orderId":"order-42","amount":100.50,"currency":"EUR"}"""
+    }
+}
diff --git a/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/support/KafkaBenchmarkSupport.kt b/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/support/KafkaBenchmarkSupport.kt
new file mode 100644
index 0000000..655a9bb
--- /dev/null
+++ b/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/support/KafkaBenchmarkSupport.kt
@@ -0,0 +1,27 @@
+package com.softwaremill.okapi.benchmarks.support
+
+import org.apache.kafka.clients.producer.KafkaProducer
+import org.apache.kafka.clients.producer.ProducerConfig
+import org.apache.kafka.common.serialization.StringSerializer
+import org.testcontainers.kafka.KafkaContainer
+
+class KafkaBenchmarkSupport {
+    private val container = KafkaContainer("apache/kafka:3.8.1")
+
+    fun start() {
+        container.start()
+    }
+
+    fun stop() {
+        container.stop()
+    }
+
+    fun createProducer(): KafkaProducer<String, String> = KafkaProducer(
+        mapOf(
+            ProducerConfig.BOOTSTRAP_SERVERS_CONFIG to container.bootstrapServers,
+            ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG to StringSerializer::class.java.name,
+            ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG to StringSerializer::class.java.name,
+            ProducerConfig.ACKS_CONFIG to "all",
+        ),
+    )
+}
diff --git a/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/support/PostgresBenchmarkSupport.kt b/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/support/PostgresBenchmarkSupport.kt
new file mode 100644
index 0000000..98a502a
--- /dev/null
+++ b/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/support/PostgresBenchmarkSupport.kt
@@ -0,0 +1,80 @@
+package com.softwaremill.okapi.benchmarks.support
+
+import com.softwaremill.okapi.core.ConnectionProvider
+import liquibase.Liquibase
+import liquibase.database.DatabaseFactory
+import liquibase.database.jvm.JdbcConnection
+import liquibase.resource.ClassLoaderResourceAccessor
+import org.postgresql.ds.PGSimpleDataSource
+import org.testcontainers.containers.PostgreSQLContainer
+import java.sql.Connection
+import java.sql.DriverManager
+import javax.sql.DataSource
+
+/**
+ * Brings up a real Postgres container, runs Liquibase migrations,
+ * and exposes a [ConnectionProvider] backed by a thread-local connection
+ * (matching the integration-tests pattern so okapi-core stays unchanged).
+ */
+class PostgresBenchmarkSupport {
+    private val container = PostgreSQLContainer<Nothing>("postgres:16")
+    lateinit var dataSource: DataSource
+    lateinit var jdbc: BenchmarkConnectionProvider
+
+    fun start() {
+        container.start()
+        dataSource = PGSimpleDataSource().apply {
+            setURL(container.jdbcUrl)
+            user = container.username
+            password = container.password
+        }
+        jdbc = BenchmarkConnectionProvider(dataSource)
+        runLiquibase()
+    }
+
+    fun stop() {
+        container.stop()
+    }
+
+    fun truncate() {
+        jdbc.withTransaction {
+            jdbc.withConnection { conn ->
+                conn.createStatement().use { it.execute("TRUNCATE TABLE outbox") }
+            }
+        }
+    }
+
+    private fun runLiquibase() {
+        val connection = DriverManager.getConnection(container.jdbcUrl, container.username, container.password)
+        val db = DatabaseFactory.getInstance().findCorrectDatabaseImplementation(JdbcConnection(connection))
+        Liquibase("com/softwaremill/okapi/db/changelog.xml", ClassLoaderResourceAccessor(), db).use { it.update("") }
+        connection.close()
+    }
+}
+
+class BenchmarkConnectionProvider(private val dataSource: DataSource) : ConnectionProvider {
+    private val threadLocalConnection = ThreadLocal<Connection>()
+
+    override fun <T> withConnection(block: (Connection) -> T): T {
+        val connection = threadLocalConnection.get()
+            ?: error("No connection bound to current thread. Use withTransaction { } in benchmarks.")
+        return block(connection)
+    }
+
+    fun <T> withTransaction(block: () -> T): T {
+        val conn = dataSource.connection
+        conn.autoCommit = false
+        threadLocalConnection.set(conn)
+        return try {
+            val result = block()
+            conn.commit()
+            result
+        } catch (e: Exception) {
+            conn.rollback()
+            throw e
+        } finally {
+            threadLocalConnection.remove()
+            conn.close()
+        }
+    }
+}
diff --git a/settings.gradle.kts b/settings.gradle.kts
index c1b49fe..8e3a589 100644
--- a/settings.gradle.kts
+++ b/settings.gradle.kts
@@ -19,5 +19,6 @@ include("okapi-spring-boot")
 include("okapi-bom")
 include("okapi-integration-tests")
 include("okapi-micrometer")
+include("okapi-benchmarks")
 
 rootProject.name = "okapi"

From c95f87f7bac3381de1bbc16a6e9fe23de3cdba8a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Andrzej=20Kobyli=C5=84ski?= <endrju397@gmail.com>
Date: Thu, 30 Apr 2026 17:50:34 +0200
Subject: [PATCH 2/2] docs: add Performance section to README with baseline
 numbers

Adds a Performance section between Compatibility and Build summarizing
the KOJAK-68 baseline throughput across transports and batch sizes,
plus a link to the full benchmarks/ directory with methodology and
reproduction instructions.

Also extends the Build section with the JMH command for completeness.
---
 README.md | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index 597e3e1..6cb6c3a 100644
--- a/README.md
+++ b/README.md
@@ -199,12 +199,27 @@ graph BT
 | Kafka Clients | 3.9.x, 4.x | `okapi-kafka` — you provide `kafka-clients` |
 | Exposed | 1.x | `okapi-exposed` module — for Ktor/standalone apps |
 
+## Performance
+
+Throughput baseline (single instance, sync sequential delivery, MacBook M3 Max, JDK 25 LTS, April 2026):
+
+| Transport | batchSize=10 | batchSize=100 |
+|-----------|--------------|----------------|
+| Kafka (`acks=all`, localhost broker) | ~110 msg/s | ~115 msg/s |
+| HTTP @ webhook latency 20 ms | ~33 msg/s | ~36 msg/s |
+| HTTP @ webhook latency 100 ms | ~9 msg/s | ~9 msg/s |
+
+These numbers reflect the current sync-sequential delivery model. Throughput is bounded by per-message round-trip time × batch size. Performance work to lift these limits (async batch delivery, multi-threaded scheduler) is tracked under the [KOJAK-14 epic](https://softwaremill.atlassian.net/browse/KOJAK-14).
+
+Full methodology, raw JMH results, and reproduction instructions: [`benchmarks/`](benchmarks/).
+
 ## Build
 
 ```sh
-./gradlew build          # Build all modules
-./gradlew test           # Run tests (Docker required — Testcontainers)
-./gradlew ktlintFormat   # Format code
+./gradlew build                  # Build all modules
+./gradlew test                   # Run tests (Docker required — Testcontainers)
+./gradlew ktlintFormat           # Format code
+./gradlew :okapi-benchmarks:jmh  # Run JMH benchmarks (~30 min, see benchmarks/README.md)
 ```
 
 Requires JDK 21.