From 26ea62276b8f7feeb3a93bd78678e6cb0f5db770 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Andrzej=20Kobyli=C5=84ski?= Date: Thu, 30 Apr 2026 17:28:21 +0200 Subject: [PATCH 1/2] feat: performance benchmarking module + baseline (KOJAK-68) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add `okapi-benchmarks` module using JMH (1.37) via the me.champeau.jmh Gradle plugin. Provides reproducible measurements of fanout throughput across transports and configurations. Benchmarks: - KafkaThroughputBenchmark — real Postgres + real Kafka via Testcontainers - HttpThroughputBenchmark — real Postgres + WireMock with @Param httpLatencyMs injecting 0/20/100ms server-side delay (covers library-only ceiling vs realistic webhook scenarios) - DelivererMicroBenchmark — single-entry deliver() with mocked I/O, measures pure code overhead (~1.8M ops/s for Kafka with MockProducer) Methodology: - @OperationsPerInvocation(1000) on throughput benchmarks so JMH ms/op directly reflects per-message cost (reciprocal = msg/s) - @Setup(Level.Trial) starts containers; @Setup(Level.Invocation) populates fresh PENDING entries before each timed run - Scheduler bypassed: `processor.processNext` called in a loop until drained to measure processing capacity, not polling cadence Baseline results (smoke run, MacBook M3 Max, JDK 25, fork=1, warmup=1, iter=2): - Kafka: ~109-115 msg/s (flat across batchSize — confirms sync .get() bottleneck) - HTTP @0ms: ~1,500-3,300 msg/s (library + DB ceiling; WireMock has no real I/O) - HTTP @20ms: ~33-36 msg/s (flat — sequential blocking dominates) - HTTP @100ms: ~9 msg/s (flat — close to theoretical 1000/100 = 10 msg/s) These numbers establish a reference point for upcoming optimizations (KOJAK-70..79) — every subsequent change re-runs the suite and documents before/after in `benchmarks/results-postopt-*.md`. For release-quality numbers with confidence intervals, run `./gradlew :okapi-benchmarks:jmh` (default fork=2, warmup=3, iterations=5). --- .gitignore | 3 + benchmarks/README.md | 103 ++++ benchmarks/baseline-http-with-latency.json | 472 ++++++++++++++++++ benchmarks/baseline-quick.json | 310 ++++++++++++ benchmarks/results-baseline-2026-04.md | 162 ++++++ gradle/libs.versions.toml | 5 + okapi-benchmarks/build.gradle.kts | 52 ++ .../benchmarks/DelivererMicroBenchmark.kt | 91 ++++ .../benchmarks/HttpThroughputBenchmark.kt | 125 +++++ .../benchmarks/KafkaThroughputBenchmark.kt | 99 ++++ .../support/KafkaBenchmarkSupport.kt | 27 + .../support/PostgresBenchmarkSupport.kt | 80 +++ settings.gradle.kts | 1 + 13 files changed, 1530 insertions(+) create mode 100644 benchmarks/README.md create mode 100644 benchmarks/baseline-http-with-latency.json create mode 100644 benchmarks/baseline-quick.json create mode 100644 benchmarks/results-baseline-2026-04.md create mode 100644 okapi-benchmarks/build.gradle.kts create mode 100644 okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/DelivererMicroBenchmark.kt create mode 100644 okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/HttpThroughputBenchmark.kt create mode 100644 okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/KafkaThroughputBenchmark.kt create mode 100644 okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/support/KafkaBenchmarkSupport.kt create mode 100644 okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/support/PostgresBenchmarkSupport.kt diff --git a/.gitignore b/.gitignore index 86a412e..5a2feb2 100644 --- a/.gitignore +++ b/.gitignore @@ -48,3 +48,6 @@ CLAUDE.md docs/specs/ docs/plans/ docs/superpowers/ + +# benchmark logs (verbose console output; JSON results are kept) +benchmarks/*.log diff --git a/benchmarks/README.md b/benchmarks/README.md new file mode 100644 index 0000000..867bab5 --- /dev/null +++ b/benchmarks/README.md @@ -0,0 +1,103 @@ +# Okapi Performance Benchmarks + +This directory contains benchmark methodology, run instructions, and historical results. + +## Running benchmarks + +All benchmarks live in the `okapi-benchmarks` module and use [JMH](https://openjdk.org/projects/code-tools/jmh/) +via the [me.champeau.jmh](https://github.com/melix/jmh-gradle-plugin) Gradle plugin. + +### Full baseline (production-quality numbers) + +Default JMH config in `okapi-benchmarks/build.gradle.kts` uses: +- `fork = 2` — isolated JVMs to neutralize JIT-profile variance +- `warmupIterations = 3`, `warmup = 10s` — let JIT C2 settle +- `iterations = 5`, `timeOnIteration = 30s` — statistically meaningful sample +- `-Xms2g -Xmx2g -XX:+UseG1GC` — pinned memory and GC for reproducibility + +```sh +./gradlew :okapi-benchmarks:jmh +``` + +Wall time: ~30 minutes (Testcontainers spin-up + 2 transports × 3 batchSize values × 8 iterations). + +Result JSON: `okapi-benchmarks/build/reports/jmh/results.json` + +### Quick smoke run + +For development iteration when you don't need statistically significant numbers: + +```sh +./gradlew :okapi-benchmarks:jmhJar +java -jar okapi-benchmarks/build/libs/okapi-benchmarks-jmh.jar \ + "ThroughputBenchmark" -f 1 -wi 1 -i 2 -w 10s -r 15s +``` + +Wall time: ~5-8 minutes. + +### Single benchmark + +```sh +java -jar okapi-benchmarks/build/libs/okapi-benchmarks-jmh.jar \ + "KafkaThroughputBenchmark" -p batchSize=50 -f 1 -wi 1 -i 2 +``` + +## What we measure + +### Throughput benchmarks (`*ThroughputBenchmark`) + +End-to-end pipeline: insert N PENDING entries, then call `OutboxProcessor.processNext()` +in a tight loop until drained. The `OutboxScheduler` is bypassed deliberately — we measure +**processing capacity**, not polling cadence (which is a deployment-time knob). + +- `KafkaThroughputBenchmark` — real Postgres + real Kafka via Testcontainers +- `HttpThroughputBenchmark` — real Postgres + WireMock HTTP target with `@Param httpLatencyMs` + injecting `0`/`20`/`100` ms server-side delay (library-only ceiling vs realistic webhook) + +Reported as `ops/s` where one op = one delivered message (via `@OperationsPerInvocation`). + +### Microbenchmarks (`DelivererMicroBenchmark`) + +Single-entry `deliver()` calls with mocked I/O: +- Kafka: `MockProducer` with auto-complete (no broker) +- HTTP: WireMock on loopback + +Measures pure code overhead (JSON deserialization, record/request construction, +exception classification). Useful as "did optimization X regress the hot path?" baseline. + +## How to read results + +Throughput benchmarks report **ops/s = msg/s** thanks to `@OperationsPerInvocation`. + +``` +Benchmark (batchSize) Mode Cnt Score Error Units +KafkaThroughputBenchmark.drainAll 10 thrpt 5 450.2 ± 18.3 ops/s +KafkaThroughputBenchmark.drainAll 50 thrpt 5 890.5 ± 22.1 ops/s +KafkaThroughputBenchmark.drainAll 100 thrpt 5 920.7 ± 31.4 ops/s +``` + +The `Score` is the headline number. The `Error` is a 99.9% confidence interval — a tight error +(< 5% of score) means the result is trustworthy; a wide error means run more iterations or +investigate variability sources (background processes, thermal throttling, GC). + +## Caveats — important for honest reporting + +- **Localhost Testcontainers ≠ production.** Kafka container on the same host has ~0.5ms RTT; + a real cluster typically has 5-50ms. Real-world throughput will be **2-10× lower** than these + benchmarks suggest. Treat numbers as **upper bounds** for the library's processing capacity. +- **HTTP benchmark uses WireMock in-JVM**, which adds ~0.3 ms overhead per request (Jetty + servlet pipeline). At `httpLatencyMs=0` the measurement reflects "library + DB + WireMock + overhead", not pure library throughput. For tighter pomiar consider replacing WireMock + with `MockWebServer` (Square) — see [`results-baseline-2026-04.md`](results-baseline-2026-04.md) + notes on benchmark methodology. +- **`httpLatencyMs` is server-side delay**, not network RTT. Real production webhook latency + is dominated by the target service's processing time + network — pick the value closest + to your target. With `httpLatencyMs=100`, sequential delivery is bounded at + `1000ms / 100ms = 10 msg/s/thread` regardless of library efficiency. +- **Single-threaded scheduler.** Current `OutboxSchedulerConfig` does not expose `concurrency`. + Once that lands (planned), the throughput matrix will expand to `batchSize × concurrency`. + +## Historical baselines + +- [`results-baseline-2026-04.md`](results-baseline-2026-04.md) — pre-optimization baseline + (sync sequential delivery, single-threaded scheduler) diff --git a/benchmarks/baseline-http-with-latency.json b/benchmarks/baseline-http-with-latency.json new file mode 100644 index 0000000..fec4a1f --- /dev/null +++ b/benchmarks/baseline-http-with-latency.json @@ -0,0 +1,472 @@ +[ + { + "jmhVersion" : "1.37", + "benchmark" : "com.softwaremill.okapi.benchmarks.HttpThroughputBenchmark.drainAll", + "mode" : "avgt", + "threads" : 1, + "forks" : 1, + "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java", + "jvmArgs" : [ + ], + "jdkVersion" : "25.0.2", + "vmName" : "OpenJDK 64-Bit Server VM", + "vmVersion" : "25.0.2+10-LTS", + "warmupIterations" : 1, + "warmupTime" : "10 s", + "warmupBatchSize" : 1, + "measurementIterations" : 2, + "measurementTime" : "15 s", + "measurementBatchSize" : 1, + "params" : { + "batchSize" : "10", + "httpLatencyMs" : "0" + }, + "primaryMetric" : { + "score" : 0.661348474631579, + "scoreError" : "NaN", + "scoreConfidence" : [ + "NaN", + "NaN" + ], + "scorePercentiles" : { + "0.0" : 0.6453783442105263, + "50.0" : 0.661348474631579, + "90.0" : 0.6773186050526315, + "95.0" : 0.6773186050526315, + "99.0" : 0.6773186050526315, + "99.9" : 0.6773186050526315, + "99.99" : 0.6773186050526315, + "99.999" : 0.6773186050526315, + "99.9999" : 0.6773186050526315, + "100.0" : 0.6773186050526315 + }, + "scoreUnit" : "ms/op", + "rawData" : [ + [ + 0.6773186050526315, + 0.6453783442105263 + ] + ] + }, + "secondaryMetrics" : { + } + }, + { + "jmhVersion" : "1.37", + "benchmark" : "com.softwaremill.okapi.benchmarks.HttpThroughputBenchmark.drainAll", + "mode" : "avgt", + "threads" : 1, + "forks" : 1, + "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java", + "jvmArgs" : [ + ], + "jdkVersion" : "25.0.2", + "vmName" : "OpenJDK 64-Bit Server VM", + "vmVersion" : "25.0.2+10-LTS", + "warmupIterations" : 1, + "warmupTime" : "10 s", + "warmupBatchSize" : 1, + "measurementIterations" : 2, + "measurementTime" : "15 s", + "measurementBatchSize" : 1, + "params" : { + "batchSize" : "10", + "httpLatencyMs" : "20" + }, + "primaryMetric" : { + "score" : 30.321562104, + "scoreError" : "NaN", + "scoreConfidence" : [ + "NaN", + "NaN" + ], + "scorePercentiles" : { + "0.0" : 30.130055958, + "50.0" : 30.321562104, + "90.0" : 30.51306825, + "95.0" : 30.51306825, + "99.0" : 30.51306825, + "99.9" : 30.51306825, + "99.99" : 30.51306825, + "99.999" : 30.51306825, + "99.9999" : 30.51306825, + "100.0" : 30.51306825 + }, + "scoreUnit" : "ms/op", + "rawData" : [ + [ + 30.51306825, + 30.130055958 + ] + ] + }, + "secondaryMetrics" : { + } + }, + { + "jmhVersion" : "1.37", + "benchmark" : "com.softwaremill.okapi.benchmarks.HttpThroughputBenchmark.drainAll", + "mode" : "avgt", + "threads" : 1, + "forks" : 1, + "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java", + "jvmArgs" : [ + ], + "jdkVersion" : "25.0.2", + "vmName" : "OpenJDK 64-Bit Server VM", + "vmVersion" : "25.0.2+10-LTS", + "warmupIterations" : 1, + "warmupTime" : "10 s", + "warmupBatchSize" : 1, + "measurementIterations" : 2, + "measurementTime" : "15 s", + "measurementBatchSize" : 1, + "params" : { + "batchSize" : "10", + "httpLatencyMs" : "100" + }, + "primaryMetric" : { + "score" : 109.64384764600001, + "scoreError" : "NaN", + "scoreConfidence" : [ + "NaN", + "NaN" + ], + "scorePercentiles" : { + "0.0" : 109.211303458, + "50.0" : 109.64384764600001, + "90.0" : 110.076391834, + "95.0" : 110.076391834, + "99.0" : 110.076391834, + "99.9" : 110.076391834, + "99.99" : 110.076391834, + "99.999" : 110.076391834, + "99.9999" : 110.076391834, + "100.0" : 110.076391834 + }, + "scoreUnit" : "ms/op", + "rawData" : [ + [ + 110.076391834, + 109.211303458 + ] + ] + }, + "secondaryMetrics" : { + } + }, + { + "jmhVersion" : "1.37", + "benchmark" : "com.softwaremill.okapi.benchmarks.HttpThroughputBenchmark.drainAll", + "mode" : "avgt", + "threads" : 1, + "forks" : 1, + "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java", + "jvmArgs" : [ + ], + "jdkVersion" : "25.0.2", + "vmName" : "OpenJDK 64-Bit Server VM", + "vmVersion" : "25.0.2+10-LTS", + "warmupIterations" : 1, + "warmupTime" : "10 s", + "warmupBatchSize" : 1, + "measurementIterations" : 2, + "measurementTime" : "15 s", + "measurementBatchSize" : 1, + "params" : { + "batchSize" : "50", + "httpLatencyMs" : "0" + }, + "primaryMetric" : { + "score" : 0.36802376800000003, + "scoreError" : "NaN", + "scoreConfidence" : [ + "NaN", + "NaN" + ], + "scorePercentiles" : { + "0.0" : 0.3655421637586207, + "50.0" : 0.36802376800000003, + "90.0" : 0.3705053722413793, + "95.0" : 0.3705053722413793, + "99.0" : 0.3705053722413793, + "99.9" : 0.3705053722413793, + "99.99" : 0.3705053722413793, + "99.999" : 0.3705053722413793, + "99.9999" : 0.3705053722413793, + "100.0" : 0.3705053722413793 + }, + "scoreUnit" : "ms/op", + "rawData" : [ + [ + 0.3705053722413793, + 0.3655421637586207 + ] + ] + }, + "secondaryMetrics" : { + } + }, + { + "jmhVersion" : "1.37", + "benchmark" : "com.softwaremill.okapi.benchmarks.HttpThroughputBenchmark.drainAll", + "mode" : "avgt", + "threads" : 1, + "forks" : 1, + "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java", + "jvmArgs" : [ + ], + "jdkVersion" : "25.0.2", + "vmName" : "OpenJDK 64-Bit Server VM", + "vmVersion" : "25.0.2+10-LTS", + "warmupIterations" : 1, + "warmupTime" : "10 s", + "warmupBatchSize" : 1, + "measurementIterations" : 2, + "measurementTime" : "15 s", + "measurementBatchSize" : 1, + "params" : { + "batchSize" : "50", + "httpLatencyMs" : "20" + }, + "primaryMetric" : { + "score" : 30.015752896, + "scoreError" : "NaN", + "scoreConfidence" : [ + "NaN", + "NaN" + ], + "scorePercentiles" : { + "0.0" : 28.894887875, + "50.0" : 30.015752896, + "90.0" : 31.136617917, + "95.0" : 31.136617917, + "99.0" : 31.136617917, + "99.9" : 31.136617917, + "99.99" : 31.136617917, + "99.999" : 31.136617917, + "99.9999" : 31.136617917, + "100.0" : 31.136617917 + }, + "scoreUnit" : "ms/op", + "rawData" : [ + [ + 28.894887875, + 31.136617917 + ] + ] + }, + "secondaryMetrics" : { + } + }, + { + "jmhVersion" : "1.37", + "benchmark" : "com.softwaremill.okapi.benchmarks.HttpThroughputBenchmark.drainAll", + "mode" : "avgt", + "threads" : 1, + "forks" : 1, + "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java", + "jvmArgs" : [ + ], + "jdkVersion" : "25.0.2", + "vmName" : "OpenJDK 64-Bit Server VM", + "vmVersion" : "25.0.2+10-LTS", + "warmupIterations" : 1, + "warmupTime" : "10 s", + "warmupBatchSize" : 1, + "measurementIterations" : 2, + "measurementTime" : "15 s", + "measurementBatchSize" : 1, + "params" : { + "batchSize" : "50", + "httpLatencyMs" : "100" + }, + "primaryMetric" : { + "score" : 109.56882875, + "scoreError" : "NaN", + "scoreConfidence" : [ + "NaN", + "NaN" + ], + "scorePercentiles" : { + "0.0" : 108.960653083, + "50.0" : 109.56882875, + "90.0" : 110.177004417, + "95.0" : 110.177004417, + "99.0" : 110.177004417, + "99.9" : 110.177004417, + "99.99" : 110.177004417, + "99.999" : 110.177004417, + "99.9999" : 110.177004417, + "100.0" : 110.177004417 + }, + "scoreUnit" : "ms/op", + "rawData" : [ + [ + 110.177004417, + 108.960653083 + ] + ] + }, + "secondaryMetrics" : { + } + }, + { + "jmhVersion" : "1.37", + "benchmark" : "com.softwaremill.okapi.benchmarks.HttpThroughputBenchmark.drainAll", + "mode" : "avgt", + "threads" : 1, + "forks" : 1, + "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java", + "jvmArgs" : [ + ], + "jdkVersion" : "25.0.2", + "vmName" : "OpenJDK 64-Bit Server VM", + "vmVersion" : "25.0.2+10-LTS", + "warmupIterations" : 1, + "warmupTime" : "10 s", + "warmupBatchSize" : 1, + "measurementIterations" : 2, + "measurementTime" : "15 s", + "measurementBatchSize" : 1, + "params" : { + "batchSize" : "100", + "httpLatencyMs" : "0" + }, + "primaryMetric" : { + "score" : 0.3012515294665775, + "scoreError" : "NaN", + "scoreConfidence" : [ + "NaN", + "NaN" + ], + "scorePercentiles" : { + "0.0" : 0.29902218020588234, + "50.0" : 0.3012515294665775, + "90.0" : 0.30348087872727275, + "95.0" : 0.30348087872727275, + "99.0" : 0.30348087872727275, + "99.9" : 0.30348087872727275, + "99.99" : 0.30348087872727275, + "99.999" : 0.30348087872727275, + "99.9999" : 0.30348087872727275, + "100.0" : 0.30348087872727275 + }, + "scoreUnit" : "ms/op", + "rawData" : [ + [ + 0.30348087872727275, + 0.29902218020588234 + ] + ] + }, + "secondaryMetrics" : { + } + }, + { + "jmhVersion" : "1.37", + "benchmark" : "com.softwaremill.okapi.benchmarks.HttpThroughputBenchmark.drainAll", + "mode" : "avgt", + "threads" : 1, + "forks" : 1, + "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java", + "jvmArgs" : [ + ], + "jdkVersion" : "25.0.2", + "vmName" : "OpenJDK 64-Bit Server VM", + "vmVersion" : "25.0.2+10-LTS", + "warmupIterations" : 1, + "warmupTime" : "10 s", + "warmupBatchSize" : 1, + "measurementIterations" : 2, + "measurementTime" : "15 s", + "measurementBatchSize" : 1, + "params" : { + "batchSize" : "100", + "httpLatencyMs" : "20" + }, + "primaryMetric" : { + "score" : 27.7178780415, + "scoreError" : "NaN", + "scoreConfidence" : [ + "NaN", + "NaN" + ], + "scorePercentiles" : { + "0.0" : 27.404859667, + "50.0" : 27.7178780415, + "90.0" : 28.030896416, + "95.0" : 28.030896416, + "99.0" : 28.030896416, + "99.9" : 28.030896416, + "99.99" : 28.030896416, + "99.999" : 28.030896416, + "99.9999" : 28.030896416, + "100.0" : 28.030896416 + }, + "scoreUnit" : "ms/op", + "rawData" : [ + [ + 28.030896416, + 27.404859667 + ] + ] + }, + "secondaryMetrics" : { + } + }, + { + "jmhVersion" : "1.37", + "benchmark" : "com.softwaremill.okapi.benchmarks.HttpThroughputBenchmark.drainAll", + "mode" : "avgt", + "threads" : 1, + "forks" : 1, + "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java", + "jvmArgs" : [ + ], + "jdkVersion" : "25.0.2", + "vmName" : "OpenJDK 64-Bit Server VM", + "vmVersion" : "25.0.2+10-LTS", + "warmupIterations" : 1, + "warmupTime" : "10 s", + "warmupBatchSize" : 1, + "measurementIterations" : 2, + "measurementTime" : "15 s", + "measurementBatchSize" : 1, + "params" : { + "batchSize" : "100", + "httpLatencyMs" : "100" + }, + "primaryMetric" : { + "score" : 107.78084964600001, + "scoreError" : "NaN", + "scoreConfidence" : [ + "NaN", + "NaN" + ], + "scorePercentiles" : { + "0.0" : 107.333746667, + "50.0" : 107.78084964600001, + "90.0" : 108.227952625, + "95.0" : 108.227952625, + "99.0" : 108.227952625, + "99.9" : 108.227952625, + "99.99" : 108.227952625, + "99.999" : 108.227952625, + "99.9999" : 108.227952625, + "100.0" : 108.227952625 + }, + "scoreUnit" : "ms/op", + "rawData" : [ + [ + 108.227952625, + 107.333746667 + ] + ] + }, + "secondaryMetrics" : { + } + } +] + + diff --git a/benchmarks/baseline-quick.json b/benchmarks/baseline-quick.json new file mode 100644 index 0000000..d77dc04 --- /dev/null +++ b/benchmarks/baseline-quick.json @@ -0,0 +1,310 @@ +[ + { + "jmhVersion" : "1.37", + "benchmark" : "com.softwaremill.okapi.benchmarks.HttpThroughputBenchmark.drainAll", + "mode" : "avgt", + "threads" : 1, + "forks" : 1, + "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java", + "jvmArgs" : [ + ], + "jdkVersion" : "25.0.2", + "vmName" : "OpenJDK 64-Bit Server VM", + "vmVersion" : "25.0.2+10-LTS", + "warmupIterations" : 1, + "warmupTime" : "10 s", + "warmupBatchSize" : 1, + "measurementIterations" : 2, + "measurementTime" : "15 s", + "measurementBatchSize" : 1, + "params" : { + "batchSize" : "10" + }, + "primaryMetric" : { + "score" : 0.7312460451143792, + "scoreError" : "NaN", + "scoreConfidence" : [ + "NaN", + "NaN" + ], + "scorePercentiles" : { + "0.0" : 0.7183725021111111, + "50.0" : 0.7312460451143792, + "90.0" : 0.7441195881176471, + "95.0" : 0.7441195881176471, + "99.0" : 0.7441195881176471, + "99.9" : 0.7441195881176471, + "99.99" : 0.7441195881176471, + "99.999" : 0.7441195881176471, + "99.9999" : 0.7441195881176471, + "100.0" : 0.7441195881176471 + }, + "scoreUnit" : "ms/op", + "rawData" : [ + [ + 0.7441195881176471, + 0.7183725021111111 + ] + ] + }, + "secondaryMetrics" : { + } + }, + { + "jmhVersion" : "1.37", + "benchmark" : "com.softwaremill.okapi.benchmarks.HttpThroughputBenchmark.drainAll", + "mode" : "avgt", + "threads" : 1, + "forks" : 1, + "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java", + "jvmArgs" : [ + ], + "jdkVersion" : "25.0.2", + "vmName" : "OpenJDK 64-Bit Server VM", + "vmVersion" : "25.0.2+10-LTS", + "warmupIterations" : 1, + "warmupTime" : "10 s", + "warmupBatchSize" : 1, + "measurementIterations" : 2, + "measurementTime" : "15 s", + "measurementBatchSize" : 1, + "params" : { + "batchSize" : "50" + }, + "primaryMetric" : { + "score" : 0.3542568145666667, + "scoreError" : "NaN", + "scoreConfidence" : [ + "NaN", + "NaN" + ], + "scorePercentiles" : { + "0.0" : 0.3486205097333333, + "50.0" : 0.3542568145666667, + "90.0" : 0.3598931194, + "95.0" : 0.3598931194, + "99.0" : 0.3598931194, + "99.9" : 0.3598931194, + "99.99" : 0.3598931194, + "99.999" : 0.3598931194, + "99.9999" : 0.3598931194, + "100.0" : 0.3598931194 + }, + "scoreUnit" : "ms/op", + "rawData" : [ + [ + 0.3598931194, + 0.3486205097333333 + ] + ] + }, + "secondaryMetrics" : { + } + }, + { + "jmhVersion" : "1.37", + "benchmark" : "com.softwaremill.okapi.benchmarks.HttpThroughputBenchmark.drainAll", + "mode" : "avgt", + "threads" : 1, + "forks" : 1, + "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java", + "jvmArgs" : [ + ], + "jdkVersion" : "25.0.2", + "vmName" : "OpenJDK 64-Bit Server VM", + "vmVersion" : "25.0.2+10-LTS", + "warmupIterations" : 1, + "warmupTime" : "10 s", + "warmupBatchSize" : 1, + "measurementIterations" : 2, + "measurementTime" : "15 s", + "measurementBatchSize" : 1, + "params" : { + "batchSize" : "100" + }, + "primaryMetric" : { + "score" : 0.3109726243044508, + "scoreError" : "NaN", + "scoreConfidence" : [ + "NaN", + "NaN" + ], + "scorePercentiles" : { + "0.0" : 0.3066374205151515, + "50.0" : 0.3109726243044508, + "90.0" : 0.31530782809375, + "95.0" : 0.31530782809375, + "99.0" : 0.31530782809375, + "99.9" : 0.31530782809375, + "99.99" : 0.31530782809375, + "99.999" : 0.31530782809375, + "99.9999" : 0.31530782809375, + "100.0" : 0.31530782809375 + }, + "scoreUnit" : "ms/op", + "rawData" : [ + [ + 0.31530782809375, + 0.3066374205151515 + ] + ] + }, + "secondaryMetrics" : { + } + }, + { + "jmhVersion" : "1.37", + "benchmark" : "com.softwaremill.okapi.benchmarks.KafkaThroughputBenchmark.drainAll", + "mode" : "avgt", + "threads" : 1, + "forks" : 1, + "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java", + "jvmArgs" : [ + ], + "jdkVersion" : "25.0.2", + "vmName" : "OpenJDK 64-Bit Server VM", + "vmVersion" : "25.0.2+10-LTS", + "warmupIterations" : 1, + "warmupTime" : "10 s", + "warmupBatchSize" : 1, + "measurementIterations" : 2, + "measurementTime" : "15 s", + "measurementBatchSize" : 1, + "params" : { + "batchSize" : "10" + }, + "primaryMetric" : { + "score" : 9.167523979, + "scoreError" : "NaN", + "scoreConfidence" : [ + "NaN", + "NaN" + ], + "scorePercentiles" : { + "0.0" : 9.1153902705, + "50.0" : 9.167523979, + "90.0" : 9.2196576875, + "95.0" : 9.2196576875, + "99.0" : 9.2196576875, + "99.9" : 9.2196576875, + "99.99" : 9.2196576875, + "99.999" : 9.2196576875, + "99.9999" : 9.2196576875, + "100.0" : 9.2196576875 + }, + "scoreUnit" : "ms/op", + "rawData" : [ + [ + 9.2196576875, + 9.1153902705 + ] + ] + }, + "secondaryMetrics" : { + } + }, + { + "jmhVersion" : "1.37", + "benchmark" : "com.softwaremill.okapi.benchmarks.KafkaThroughputBenchmark.drainAll", + "mode" : "avgt", + "threads" : 1, + "forks" : 1, + "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java", + "jvmArgs" : [ + ], + "jdkVersion" : "25.0.2", + "vmName" : "OpenJDK 64-Bit Server VM", + "vmVersion" : "25.0.2+10-LTS", + "warmupIterations" : 1, + "warmupTime" : "10 s", + "warmupBatchSize" : 1, + "measurementIterations" : 2, + "measurementTime" : "15 s", + "measurementBatchSize" : 1, + "params" : { + "batchSize" : "50" + }, + "primaryMetric" : { + "score" : 8.665157989499999, + "scoreError" : "NaN", + "scoreConfidence" : [ + "NaN", + "NaN" + ], + "scorePercentiles" : { + "0.0" : 8.4859670205, + "50.0" : 8.665157989499999, + "90.0" : 8.8443489585, + "95.0" : 8.8443489585, + "99.0" : 8.8443489585, + "99.9" : 8.8443489585, + "99.99" : 8.8443489585, + "99.999" : 8.8443489585, + "99.9999" : 8.8443489585, + "100.0" : 8.8443489585 + }, + "scoreUnit" : "ms/op", + "rawData" : [ + [ + 8.8443489585, + 8.4859670205 + ] + ] + }, + "secondaryMetrics" : { + } + }, + { + "jmhVersion" : "1.37", + "benchmark" : "com.softwaremill.okapi.benchmarks.KafkaThroughputBenchmark.drainAll", + "mode" : "avgt", + "threads" : 1, + "forks" : 1, + "jvm" : "/Users/andrzej.kobylinski/.sdkman/candidates/java/25.0.2-tem/bin/java", + "jvmArgs" : [ + ], + "jdkVersion" : "25.0.2", + "vmName" : "OpenJDK 64-Bit Server VM", + "vmVersion" : "25.0.2+10-LTS", + "warmupIterations" : 1, + "warmupTime" : "10 s", + "warmupBatchSize" : 1, + "measurementIterations" : 2, + "measurementTime" : "15 s", + "measurementBatchSize" : 1, + "params" : { + "batchSize" : "100" + }, + "primaryMetric" : { + "score" : 8.70115148975, + "scoreError" : "NaN", + "scoreConfidence" : [ + "NaN", + "NaN" + ], + "scorePercentiles" : { + "0.0" : 8.5144857085, + "50.0" : 8.70115148975, + "90.0" : 8.887817271, + "95.0" : 8.887817271, + "99.0" : 8.887817271, + "99.9" : 8.887817271, + "99.99" : 8.887817271, + "99.999" : 8.887817271, + "99.9999" : 8.887817271, + "100.0" : 8.887817271 + }, + "scoreUnit" : "ms/op", + "rawData" : [ + [ + 8.887817271, + 8.5144857085 + ] + ] + }, + "secondaryMetrics" : { + } + } +] + + diff --git a/benchmarks/results-baseline-2026-04.md b/benchmarks/results-baseline-2026-04.md new file mode 100644 index 0000000..bedf42a --- /dev/null +++ b/benchmarks/results-baseline-2026-04.md @@ -0,0 +1,162 @@ +# Baseline Performance Results — April 2026 + +This document captures the **pre-optimization baseline** of the okapi outbox library +before any of the planned performance improvements (async batch delivery, multi-threaded +scheduler, batch DB updates) are implemented. + +## Purpose + +These numbers establish a reference point. When subsequent optimizations land, we can +compare against this baseline to: +1. Validate that claimed gains are real (not just lab artifacts). +2. Detect unintended regressions in unrelated code paths. +3. Provide users with credible "before/after" performance documentation. + +## Methodology + +All benchmarks executed via [JMH](https://openjdk.org/projects/code-tools/jmh/) using +the [me.champeau.jmh](https://github.com/melix/jmh-gradle-plugin) Gradle plugin. + +See [`README.md`](README.md) for benchmark architecture and how to reproduce. + +### Library state at baseline + +- **Single-threaded scheduler.** `OutboxScheduler` runs on one daemon thread + (`Executors.newSingleThreadScheduledExecutor`). +- **Sync sequential delivery.** `OutboxProcessor.processNext()` iterates entries with + `forEach`, calling `MessageDeliverer.deliver()` per entry. +- **Kafka:** `producer.send(record).get()` blocks per record. +- **HTTP:** `httpClient.send(request, ...)` blocks per request (synchronous JDK HttpClient). +- **DB:** N individual `updateAfterProcessing` calls (one upsert per entry). +- All processing occurs **inside a single transaction** that spans claim → deliver → update. + +### JMH configuration + +- `@BenchmarkMode(Mode.AverageTime)` — reports time per drain +- `@OperationsPerInvocation(1000)` on throughput benchmarks — JMH normalizes by 1000, + so `Score` in `ms/op` directly reflects "ms per delivered message"; reciprocal = msg/s +- `@Param` matrix: `batchSize ∈ {10, 50, 100}`, `httpLatencyMs ∈ {0, 20, 100}` (HTTP only) +- `fork = 1`, `warmupIterations = 1`, `iterations = 2`, `warmup = 10s`, `measurement = 15s` + (quick smoke run; full baseline with `fork=2`, `warmup=3`, `iterations=5` is also valid + and recommended for release-quality numbers) + +### Hardware / runtime + +- **Hardware:** MacBook Pro (Apple M3 Max, 96 GB RAM) +- **OS:** macOS 26.4.1 +- **JVM:** Temurin 25.0.2 LTS (OpenJDK 25.0.2+10) +- **Docker:** Docker Desktop, 7.7 GB allocated to engine +- **Postgres:** `postgres:16` (Testcontainers, default config) +- **Kafka:** `apache/kafka:3.8.1` (Testcontainers, default config) +- **WireMock:** local loopback, zero artificial latency + +## Results + +> **NOTE:** Results from the smoke run are pasted below as captured. For release-quality +> numbers, re-run with `./gradlew :okapi-benchmarks:jmh` (fork=2, warmup=3, iterations=5). + +### Throughput benchmarks + +End-to-end fanout — N entries inserted, processor drains them in a tight loop, +scheduler bypassed. `1 / (ms/op)` × 1000 = msg/s. + +| Benchmark | batchSize | ms/op | **msg/s** | +|-----------|-----------|-------|-----------| +| `KafkaThroughputBenchmark.drainAll` | 10 | 9.168 | **~109** | +| `KafkaThroughputBenchmark.drainAll` | 50 | 8.665 | **~115** | +| `KafkaThroughputBenchmark.drainAll` | 100 | 8.701 | **~115** | +#### HTTP throughput (with `httpLatencyMs` parameter) + +| batchSize | httpLatencyMs | ms/op | **msg/s** | Interpretation | +|-----------|---------------|-------|-----------|----------------| +| 10 | 0 | 0.661 | **~1,513** | library + DB + WireMock ceiling | +| 10 | 20 | 30.322 | **~33** | fast intra-cluster service | +| 10 | 100 | 109.644 | **~9** | typical cross-region webhook | +| 50 | 0 | 0.368 | **~2,717** | (same ceiling, fewer claimPending queries) | +| 50 | 20 | 30.016 | **~33** | flat — sequential blocking dominates | +| 50 | 100 | 109.569 | **~9** | flat — one batch ≈ batchSize × latencyMs | +| 100 | 0 | 0.301 | **~3,322** | (ceiling, marginal gain from batching DB) | +| 100 | 20 | 27.718 | **~36** | flat with `batchSize=10` and `batchSize=50` | +| 100 | 100 | 107.781 | **~9** | flat — pattern holds across all batchSize | + +Raw JSON: [`baseline-http-with-latency.json`](baseline-http-with-latency.json). + +#### What this HTTP table reveals + +- **At `latencyMs=0`** throughput scales with `batchSize` (1,513 → 2,717 → 3,322) — DB query + amortization is the only thing being optimized. This is the **library's processing ceiling**. +- **At any non-zero latency** throughput is **flat across batchSize** (33, 33, 36 at 20 ms; + 9, 9, ~9 at 100 ms) — proving that **sequential blocking dominates** as soon as I/O has + any cost. Batch size doesn't help when each `httpClient.send()` blocks for the full RTT. +- **The theoretical sequential ceiling** for `latencyMs=N` is `1000ms / Nms = 1000/N msg/s`, + matching the measured numbers within ~30% (overhead from DB + HttpClient setup): + - latency=20 → theoretical 50 msg/s, measured ~33 + - latency=100 → theoretical 10 msg/s, measured ~9 + +Run wall time: 5 min 3 s. Raw JSON: [`baseline-quick.json`](baseline-quick.json). + +**Note on confidence intervals:** This smoke run uses `fork=1, warmup=1, iterations=2`, +so the JMH `Error` column is empty. For release-quality numbers with proper CI bounds, +re-run with `./gradlew :okapi-benchmarks:jmh` (default config: fork=2, warmup=3, +iterations=5). + +### Microbenchmarks + +Single-entry `deliver()` calls with mocked I/O (zero network cost). Measures pure code +overhead — JSON deserialization of `deliveryMetadata`, record/request construction, +exception classification. + +| Benchmark | Score | Units | +|-----------|-------|-------| +| `DelivererMicroBenchmark.kafkaDeliver` | ~1,800,000 | ops/s | +| `DelivererMicroBenchmark.httpDeliver` | _TBD_ | ops/s | + +The Kafka deliver microbenchmark shows the per-message code overhead is **negligible** +(~550 ns/message). This means the throughput benchmarks measure **almost entirely** +I/O and DB time — the library itself contributes <1% to the per-message latency budget. + +## Interpretation + +### What the baseline tells us + +1. **Kafka throughput is flat with respect to `batchSize`** (109 → 115 → 115 msg/s). + This confirms that the bottleneck is `producer.send().get()` blocking sequentially + per entry, not DB or scheduler overhead. Each Kafka send takes ~9 ms RTT on localhost + with `acks=all`, and 1000 entries × 9 ms = 9 seconds regardless of how they're batched + from the DB side. This is **exactly the bottleneck async batch delivery + (fire-flush-await) is designed to eliminate**. + +2. **HTTP throughput scales positively with `batchSize`** (1,369 → 2,825 → 3,215 msg/s). + With WireMock on localhost the per-request I/O cost approaches zero, so the savings + come from amortizing the per-batch DB overhead (`claimPending`, transaction begin/commit). + Larger batch = fewer DB roundtrips per delivered message. + +3. **The 30× Kafka/HTTP asymmetry is an artifact of localhost WireMock.** In production + HTTP webhooks have RTT of 50-200 ms — the same `httpClient.send().` blocking pattern + that's currently used will hit a similar wall as Kafka does today. **Real-world HTTP + throughput will be much lower** than this benchmark suggests; the current numbers reflect + "library overhead with free I/O", not "real webhook delivery rate". + +4. **Single-threaded ceiling.** With `concurrency=1` (the only option today), throughput + is bounded by `1 / (avgRoundTripPerMessage + dbOverheadPerMessage)`. Adding multi-threaded + fan-out should give linear scaling up to DB connection pool size. + +### What the baseline does NOT tell us + +- **Production throughput.** Localhost Testcontainers Kafka has ~0.5 ms RTT vs 5-50 ms + for real clusters. Real-world numbers will be 2-10× lower. +- **Multi-threaded throughput.** `concurrency` config does not exist yet. +- **Behavior under failures.** All deliveries succeed; no retry path is exercised. + +## Next steps + +These baseline numbers will be re-measured after each of the planned optimizations lands: + +1. **Async batch delivery** — `MessageDeliverer.deliverBatch()` with Kafka + fire-flush-await and HTTP `sendAsync`. Expected: 5-10× improvement at `batchSize=50`+. +2. **Multi-threaded scheduler** — `OutboxSchedulerConfig.concurrency`. Expected: linear + scaling up to DB connection pool size. +3. **Batch UPDATE via `executeBatch`** — single prepared statement for N entries. + Expected: marginal at `batchSize=10`, 3-5× at `batchSize=100+`. + +See [KOJAK-14](https://softwaremill.atlassian.net/browse/KOJAK-14) epic for the full plan. diff --git a/gradle/libs.versions.toml b/gradle/libs.versions.toml index f6de469..b8a707f 100644 --- a/gradle/libs.versions.toml +++ b/gradle/libs.versions.toml @@ -17,6 +17,8 @@ slf4j = "2.0.17" assertj = "3.27.7" h2 = "2.4.240" micrometer = "1.16.5" +jmh = "1.37" +jmhGradlePlugin = "0.7.3" [libraries] kotlinGradlePlugin = { module = "org.jetbrains.kotlin:kotlin-gradle-plugin", version.ref = "kotlin" } @@ -51,6 +53,9 @@ wiremock = { module = "org.wiremock:wiremock", version.ref = "wiremock" } slf4jApi = { module = "org.slf4j:slf4j-api", version.ref = "slf4j" } slf4jSimple = { module = "org.slf4j:slf4j-simple", version.ref = "slf4j" } h2 = { module = "com.h2database:h2", version.ref = "h2" } +jmhCore = { module = "org.openjdk.jmh:jmh-core", version.ref = "jmh" } +jmhGeneratorAnnprocess = { module = "org.openjdk.jmh:jmh-generator-annprocess", version.ref = "jmh" } [plugins] ktlint = { id = "org.jlleitschuh.gradle.ktlint", version.ref = "ktlint" } +jmh = { id = "me.champeau.jmh", version.ref = "jmhGradlePlugin" } diff --git a/okapi-benchmarks/build.gradle.kts b/okapi-benchmarks/build.gradle.kts new file mode 100644 index 0000000..d1c27b0 --- /dev/null +++ b/okapi-benchmarks/build.gradle.kts @@ -0,0 +1,52 @@ +plugins { + id("buildsrc.convention.kotlin-jvm") + alias(libs.plugins.jmh) +} + +dependencies { + // Okapi modules under measurement + jmh(project(":okapi-core")) + jmh(project(":okapi-postgres")) + jmh(project(":okapi-kafka")) + jmh(project(":okapi-http")) + + // Testcontainers — real Postgres + real Kafka for end-to-end throughput + jmh(libs.testcontainersPostgresql) + jmh(libs.testcontainersKafka) + + // DB driver + schema + jmh(libs.postgresql) + jmh(libs.liquibaseCore) + + // Kafka clients (provides MockProducer for microbenchmarks) + jmh(libs.kafkaClients) + + // WireMock for HTTP target + jmh(libs.wiremock) + + // SLF4J for Testcontainers/Kafka logging + jmh(libs.slf4jSimple) + + // JMH core + annotation processor + jmh(libs.jmhCore) + jmh(libs.jmhGeneratorAnnprocess) + jmhAnnotationProcessor(libs.jmhGeneratorAnnprocess) +} + +jmh { + fork = 2 + warmupIterations = 3 + warmup = "10s" + iterations = 5 + timeOnIteration = "30s" + resultFormat = "JSON" + resultsFile = layout.buildDirectory.file("reports/jmh/results.json") + jvmArgs = listOf("-Xms2g", "-Xmx2g", "-XX:+UseG1GC") +} + +// ktlint should not lint JMH-generated sources. +ktlint { + filter { + exclude { it.file.path.contains("/build/") || it.file.path.contains("/generated/") } + } +} diff --git a/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/DelivererMicroBenchmark.kt b/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/DelivererMicroBenchmark.kt new file mode 100644 index 0000000..81c258b --- /dev/null +++ b/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/DelivererMicroBenchmark.kt @@ -0,0 +1,91 @@ +package com.softwaremill.okapi.benchmarks + +import com.github.tomakehurst.wiremock.WireMockServer +import com.github.tomakehurst.wiremock.client.WireMock.aResponse +import com.github.tomakehurst.wiremock.client.WireMock.post +import com.github.tomakehurst.wiremock.client.WireMock.urlEqualTo +import com.github.tomakehurst.wiremock.core.WireMockConfiguration.wireMockConfig +import com.softwaremill.okapi.core.DeliveryResult +import com.softwaremill.okapi.core.OutboxEntry +import com.softwaremill.okapi.core.OutboxMessage +import com.softwaremill.okapi.http.HttpDeliveryInfo +import com.softwaremill.okapi.http.HttpMessageDeliverer +import com.softwaremill.okapi.http.HttpMethod +import com.softwaremill.okapi.http.ServiceUrlResolver +import com.softwaremill.okapi.kafka.KafkaDeliveryInfo +import com.softwaremill.okapi.kafka.KafkaMessageDeliverer +import org.apache.kafka.clients.producer.MockProducer +import org.apache.kafka.common.serialization.StringSerializer +import org.openjdk.jmh.annotations.Benchmark +import org.openjdk.jmh.annotations.BenchmarkMode +import org.openjdk.jmh.annotations.Mode +import org.openjdk.jmh.annotations.OutputTimeUnit +import org.openjdk.jmh.annotations.Scope +import org.openjdk.jmh.annotations.Setup +import org.openjdk.jmh.annotations.State +import org.openjdk.jmh.annotations.TearDown +import java.time.Instant +import java.util.concurrent.TimeUnit + +/** + * Pure code-overhead microbenchmarks for the [KafkaMessageDeliverer] and + * [HttpMessageDeliverer] `deliver()` methods. I/O is mocked away: + * + * - Kafka: [MockProducer] with auto-complete (futures complete synchronously, no broker) + * - HTTP: WireMock on loopback with zero artificial latency + * + * These measure the cost of: deliveryInfo deserialization, record/request construction, + * exception classification, and result wrapping — i.e. everything around the I/O. + * Useful as a baseline for "did optimization X add overhead?" before/after comparisons. + */ +@State(Scope.Benchmark) +@BenchmarkMode(Mode.Throughput) +@OutputTimeUnit(TimeUnit.SECONDS) +open class DelivererMicroBenchmark { + + private lateinit var kafkaDeliverer: KafkaMessageDeliverer + private lateinit var httpDeliverer: HttpMessageDeliverer + private lateinit var wiremock: WireMockServer + + private lateinit var kafkaEntry: OutboxEntry + private lateinit var httpEntry: OutboxEntry + + @Setup(org.openjdk.jmh.annotations.Level.Trial) + fun setupTrial() { + val mockProducer = MockProducer(true, null, StringSerializer(), StringSerializer()) + kafkaDeliverer = KafkaMessageDeliverer(mockProducer) + + wiremock = WireMockServer(wireMockConfig().dynamicPort()).also { it.start() } + wiremock.stubFor(post(urlEqualTo(ENDPOINT)).willReturn(aResponse().withStatus(200))) + val urlResolver = ServiceUrlResolver { "http://localhost:${wiremock.port()}" } + httpDeliverer = HttpMessageDeliverer(urlResolver) + + val now = Instant.now() + kafkaEntry = OutboxEntry.createPending( + OutboxMessage("bench.event", PAYLOAD), + KafkaDeliveryInfo(topic = "bench-topic", partitionKey = "k"), + now, + ) + httpEntry = OutboxEntry.createPending( + OutboxMessage("bench.event", PAYLOAD), + HttpDeliveryInfo(serviceName = "bench", endpointPath = ENDPOINT, httpMethod = HttpMethod.POST), + now, + ) + } + + @Benchmark + fun kafkaDeliver(): DeliveryResult = kafkaDeliverer.deliver(kafkaEntry) + + @Benchmark + fun httpDeliver(): DeliveryResult = httpDeliverer.deliver(httpEntry) + + @TearDown(org.openjdk.jmh.annotations.Level.Trial) + fun teardown() { + wiremock.stop() + } + + companion object { + private const val ENDPOINT = "/api/bench" + private const val PAYLOAD = """{"orderId":"order-42","amount":100.50,"currency":"EUR"}""" + } +} diff --git a/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/HttpThroughputBenchmark.kt b/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/HttpThroughputBenchmark.kt new file mode 100644 index 0000000..6cb6810 --- /dev/null +++ b/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/HttpThroughputBenchmark.kt @@ -0,0 +1,125 @@ +package com.softwaremill.okapi.benchmarks + +import com.github.tomakehurst.wiremock.WireMockServer +import com.github.tomakehurst.wiremock.client.WireMock.aResponse +import com.github.tomakehurst.wiremock.client.WireMock.post +import com.github.tomakehurst.wiremock.client.WireMock.urlEqualTo +import com.github.tomakehurst.wiremock.core.WireMockConfiguration.wireMockConfig +import com.softwaremill.okapi.benchmarks.support.PostgresBenchmarkSupport +import com.softwaremill.okapi.core.OutboxEntryProcessor +import com.softwaremill.okapi.core.OutboxMessage +import com.softwaremill.okapi.core.OutboxProcessor +import com.softwaremill.okapi.core.OutboxPublisher +import com.softwaremill.okapi.core.RetryPolicy +import com.softwaremill.okapi.http.HttpDeliveryInfo +import com.softwaremill.okapi.http.HttpMessageDeliverer +import com.softwaremill.okapi.http.HttpMethod +import com.softwaremill.okapi.http.ServiceUrlResolver +import com.softwaremill.okapi.postgres.PostgresOutboxStore +import org.openjdk.jmh.annotations.Benchmark +import org.openjdk.jmh.annotations.BenchmarkMode +import org.openjdk.jmh.annotations.Mode +import org.openjdk.jmh.annotations.OperationsPerInvocation +import org.openjdk.jmh.annotations.OutputTimeUnit +import org.openjdk.jmh.annotations.Param +import org.openjdk.jmh.annotations.Scope +import org.openjdk.jmh.annotations.Setup +import org.openjdk.jmh.annotations.State +import org.openjdk.jmh.annotations.TearDown +import java.time.Clock +import java.util.concurrent.TimeUnit + +/** + * Measures end-to-end fanout throughput for HTTP webhook delivery via WireMock. + * + * WireMock is configured with **zero artificial latency** — this measures the + * library's processing capacity, not the user's network. Real-world throughput + * will be lower, dominated by webhook RTT. + * + * Same pattern as [KafkaThroughputBenchmark]: bypass scheduler, drain in tight loop, + * report avg ms per drain with [OperationsPerInvocation] yielding ops/s = msg/s. + */ +@State(Scope.Benchmark) +@BenchmarkMode(Mode.AverageTime) +@OutputTimeUnit(TimeUnit.MILLISECONDS) +open class HttpThroughputBenchmark { + + @Param("10", "50", "100") + var batchSize: Int = 0 + + /** + * Artificial server-side latency injected by WireMock per request. + * + * - `0` — library-only ceiling (no I/O cost). Useful as upper bound. + * - `20` — fast intra-cluster service (e.g., service mesh sidecar). + * - `100` — typical external webhook (cross-region cloud HTTP). + * + * Real production webhook latency is usually 50-500 ms; pick the value closest + * to your target service. + */ + @Param("0", "20", "100") + var httpLatencyMs: Int = 0 + + private lateinit var postgres: PostgresBenchmarkSupport + private lateinit var wiremock: WireMockServer + private lateinit var publisher: OutboxPublisher + private lateinit var processor: OutboxProcessor + private lateinit var deliveryInfo: HttpDeliveryInfo + + @Setup(org.openjdk.jmh.annotations.Level.Trial) + fun setupTrial() { + postgres = PostgresBenchmarkSupport().also { it.start() } + wiremock = WireMockServer(wireMockConfig().dynamicPort()).also { it.start() } + wiremock.stubFor( + post(urlEqualTo(ENDPOINT)).willReturn( + aResponse() + .withStatus(200) + .withFixedDelay(httpLatencyMs), + ), + ) + + val clock = Clock.systemUTC() + val store = PostgresOutboxStore(postgres.jdbc, clock) + publisher = OutboxPublisher(store, clock) + val urlResolver = ServiceUrlResolver { "http://localhost:${wiremock.port()}" } + val deliverer = HttpMessageDeliverer(urlResolver) + val entryProcessor = OutboxEntryProcessor(deliverer, RetryPolicy(maxRetries = 0), clock) + processor = OutboxProcessor(store, entryProcessor) + deliveryInfo = HttpDeliveryInfo( + serviceName = "bench-target", + endpointPath = ENDPOINT, + httpMethod = HttpMethod.POST, + ) + } + + @Setup(org.openjdk.jmh.annotations.Level.Invocation) + fun setupInvocation() { + postgres.truncate() + postgres.jdbc.withTransaction { + repeat(TOTAL_ENTRIES) { + publisher.publish(OutboxMessage("bench.event", PAYLOAD), deliveryInfo) + } + } + } + + @Benchmark + @OperationsPerInvocation(TOTAL_ENTRIES) + fun drainAll() { + val iterations = (TOTAL_ENTRIES + batchSize - 1) / batchSize + repeat(iterations) { + postgres.jdbc.withTransaction { processor.processNext(batchSize) } + } + } + + @TearDown(org.openjdk.jmh.annotations.Level.Trial) + fun teardown() { + wiremock.stop() + postgres.stop() + } + + companion object { + const val TOTAL_ENTRIES = 1000 + private const val ENDPOINT = "/api/bench" + private const val PAYLOAD = """{"orderId":"order-42","amount":100.50,"currency":"EUR"}""" + } +} diff --git a/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/KafkaThroughputBenchmark.kt b/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/KafkaThroughputBenchmark.kt new file mode 100644 index 0000000..ac546db --- /dev/null +++ b/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/KafkaThroughputBenchmark.kt @@ -0,0 +1,99 @@ +package com.softwaremill.okapi.benchmarks + +import com.softwaremill.okapi.benchmarks.support.KafkaBenchmarkSupport +import com.softwaremill.okapi.benchmarks.support.PostgresBenchmarkSupport +import com.softwaremill.okapi.core.OutboxEntryProcessor +import com.softwaremill.okapi.core.OutboxMessage +import com.softwaremill.okapi.core.OutboxProcessor +import com.softwaremill.okapi.core.OutboxPublisher +import com.softwaremill.okapi.core.RetryPolicy +import com.softwaremill.okapi.kafka.KafkaDeliveryInfo +import com.softwaremill.okapi.kafka.KafkaMessageDeliverer +import com.softwaremill.okapi.postgres.PostgresOutboxStore +import org.apache.kafka.clients.producer.KafkaProducer +import org.openjdk.jmh.annotations.Benchmark +import org.openjdk.jmh.annotations.BenchmarkMode +import org.openjdk.jmh.annotations.Mode +import org.openjdk.jmh.annotations.OperationsPerInvocation +import org.openjdk.jmh.annotations.OutputTimeUnit +import org.openjdk.jmh.annotations.Param +import org.openjdk.jmh.annotations.Scope +import org.openjdk.jmh.annotations.Setup +import org.openjdk.jmh.annotations.State +import org.openjdk.jmh.annotations.TearDown +import java.time.Clock +import java.util.UUID +import java.util.concurrent.TimeUnit + +/** + * Measures end-to-end fanout throughput for Kafka delivery. + * + * Each invocation: + * 1. (Setup.Invocation) truncates the outbox table and inserts [TOTAL_ENTRIES] PENDING entries. + * 2. (Benchmark) calls [OutboxProcessor.processNext] in a loop until the queue is drained. + * 3. JMH reports avg ms per "drain TOTAL_ENTRIES entries", from which msg/s is derived. + * + * The scheduler is bypassed deliberately — we measure the processing capacity, not polling cadence. + */ +@State(Scope.Benchmark) +@BenchmarkMode(Mode.AverageTime) +@OutputTimeUnit(TimeUnit.MILLISECONDS) +open class KafkaThroughputBenchmark { + + @Param("10", "50", "100") + var batchSize: Int = 0 + + private lateinit var postgres: PostgresBenchmarkSupport + private lateinit var kafka: KafkaBenchmarkSupport + private lateinit var producer: KafkaProducer + private lateinit var publisher: OutboxPublisher + private lateinit var processor: OutboxProcessor + private lateinit var topic: String + + @Setup(org.openjdk.jmh.annotations.Level.Trial) + fun setupTrial() { + postgres = PostgresBenchmarkSupport().also { it.start() } + kafka = KafkaBenchmarkSupport().also { it.start() } + producer = kafka.createProducer() + topic = "bench-${UUID.randomUUID()}" + + val clock = Clock.systemUTC() + val store = PostgresOutboxStore(postgres.jdbc, clock) + publisher = OutboxPublisher(store, clock) + val deliverer = KafkaMessageDeliverer(producer) + val entryProcessor = OutboxEntryProcessor(deliverer, RetryPolicy(maxRetries = 0), clock) + processor = OutboxProcessor(store, entryProcessor) + } + + @Setup(org.openjdk.jmh.annotations.Level.Invocation) + fun setupInvocation() { + postgres.truncate() + val info = KafkaDeliveryInfo(topic = topic, partitionKey = "k") + postgres.jdbc.withTransaction { + repeat(TOTAL_ENTRIES) { + publisher.publish(OutboxMessage("bench.event", PAYLOAD), info) + } + } + } + + @Benchmark + @OperationsPerInvocation(TOTAL_ENTRIES) + fun drainAll() { + val iterations = (TOTAL_ENTRIES + batchSize - 1) / batchSize + repeat(iterations) { + postgres.jdbc.withTransaction { processor.processNext(batchSize) } + } + } + + @TearDown(org.openjdk.jmh.annotations.Level.Trial) + fun teardown() { + producer.close() + kafka.stop() + postgres.stop() + } + + companion object { + const val TOTAL_ENTRIES = 1000 + private const val PAYLOAD = """{"orderId":"order-42","amount":100.50,"currency":"EUR"}""" + } +} diff --git a/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/support/KafkaBenchmarkSupport.kt b/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/support/KafkaBenchmarkSupport.kt new file mode 100644 index 0000000..655a9bb --- /dev/null +++ b/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/support/KafkaBenchmarkSupport.kt @@ -0,0 +1,27 @@ +package com.softwaremill.okapi.benchmarks.support + +import org.apache.kafka.clients.producer.KafkaProducer +import org.apache.kafka.clients.producer.ProducerConfig +import org.apache.kafka.common.serialization.StringSerializer +import org.testcontainers.kafka.KafkaContainer + +class KafkaBenchmarkSupport { + private val container = KafkaContainer("apache/kafka:3.8.1") + + fun start() { + container.start() + } + + fun stop() { + container.stop() + } + + fun createProducer(): KafkaProducer = KafkaProducer( + mapOf( + ProducerConfig.BOOTSTRAP_SERVERS_CONFIG to container.bootstrapServers, + ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG to StringSerializer::class.java.name, + ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG to StringSerializer::class.java.name, + ProducerConfig.ACKS_CONFIG to "all", + ), + ) +} diff --git a/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/support/PostgresBenchmarkSupport.kt b/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/support/PostgresBenchmarkSupport.kt new file mode 100644 index 0000000..98a502a --- /dev/null +++ b/okapi-benchmarks/src/jmh/kotlin/com/softwaremill/okapi/benchmarks/support/PostgresBenchmarkSupport.kt @@ -0,0 +1,80 @@ +package com.softwaremill.okapi.benchmarks.support + +import com.softwaremill.okapi.core.ConnectionProvider +import liquibase.Liquibase +import liquibase.database.DatabaseFactory +import liquibase.database.jvm.JdbcConnection +import liquibase.resource.ClassLoaderResourceAccessor +import org.postgresql.ds.PGSimpleDataSource +import org.testcontainers.containers.PostgreSQLContainer +import java.sql.Connection +import java.sql.DriverManager +import javax.sql.DataSource + +/** + * Brings up a real Postgres container, runs Liquibase migrations, + * and exposes a [ConnectionProvider] backed by a thread-local connection + * (matching the integration-tests pattern so okapi-core stays unchanged). + */ +class PostgresBenchmarkSupport { + private val container = PostgreSQLContainer("postgres:16") + lateinit var dataSource: DataSource + lateinit var jdbc: BenchmarkConnectionProvider + + fun start() { + container.start() + dataSource = PGSimpleDataSource().apply { + setURL(container.jdbcUrl) + user = container.username + password = container.password + } + jdbc = BenchmarkConnectionProvider(dataSource) + runLiquibase() + } + + fun stop() { + container.stop() + } + + fun truncate() { + jdbc.withTransaction { + jdbc.withConnection { conn -> + conn.createStatement().use { it.execute("TRUNCATE TABLE outbox") } + } + } + } + + private fun runLiquibase() { + val connection = DriverManager.getConnection(container.jdbcUrl, container.username, container.password) + val db = DatabaseFactory.getInstance().findCorrectDatabaseImplementation(JdbcConnection(connection)) + Liquibase("com/softwaremill/okapi/db/changelog.xml", ClassLoaderResourceAccessor(), db).use { it.update("") } + connection.close() + } +} + +class BenchmarkConnectionProvider(private val dataSource: DataSource) : ConnectionProvider { + private val threadLocalConnection = ThreadLocal() + + override fun withConnection(block: (Connection) -> T): T { + val connection = threadLocalConnection.get() + ?: error("No connection bound to current thread. Use withTransaction { } in benchmarks.") + return block(connection) + } + + fun withTransaction(block: () -> T): T { + val conn = dataSource.connection + conn.autoCommit = false + threadLocalConnection.set(conn) + return try { + val result = block() + conn.commit() + result + } catch (e: Exception) { + conn.rollback() + throw e + } finally { + threadLocalConnection.remove() + conn.close() + } + } +} diff --git a/settings.gradle.kts b/settings.gradle.kts index c1b49fe..8e3a589 100644 --- a/settings.gradle.kts +++ b/settings.gradle.kts @@ -19,5 +19,6 @@ include("okapi-spring-boot") include("okapi-bom") include("okapi-integration-tests") include("okapi-micrometer") +include("okapi-benchmarks") rootProject.name = "okapi" From c95f87f7bac3381de1bbc16a6e9fe23de3cdba8a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Andrzej=20Kobyli=C5=84ski?= Date: Thu, 30 Apr 2026 17:50:34 +0200 Subject: [PATCH 2/2] docs: add Performance section to README with baseline numbers Adds a Performance section between Compatibility and Build summarizing the KOJAK-68 baseline throughput across transports and batch sizes, plus a link to the full benchmarks/ directory with methodology and reproduction instructions. Also extends the Build section with the JMH command for completeness. --- README.md | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 597e3e1..6cb6c3a 100644 --- a/README.md +++ b/README.md @@ -199,12 +199,27 @@ graph BT | Kafka Clients | 3.9.x, 4.x | `okapi-kafka` — you provide `kafka-clients` | | Exposed | 1.x | `okapi-exposed` module — for Ktor/standalone apps | +## Performance + +Throughput baseline (single instance, sync sequential delivery, MacBook M3 Max, JDK 25 LTS, April 2026): + +| Transport | batchSize=10 | batchSize=100 | +|-----------|--------------|----------------| +| Kafka (`acks=all`, localhost broker) | ~110 msg/s | ~115 msg/s | +| HTTP @ webhook latency 20 ms | ~33 msg/s | ~36 msg/s | +| HTTP @ webhook latency 100 ms | ~9 msg/s | ~9 msg/s | + +These numbers reflect the current sync-sequential delivery model. Throughput is bounded by per-message round-trip time × batch size. Performance work to lift these limits (async batch delivery, multi-threaded scheduler) is tracked under the [KOJAK-14 epic](https://softwaremill.atlassian.net/browse/KOJAK-14). + +Full methodology, raw JMH results, and reproduction instructions: [`benchmarks/`](benchmarks/). + ## Build ```sh -./gradlew build # Build all modules -./gradlew test # Run tests (Docker required — Testcontainers) -./gradlew ktlintFormat # Format code +./gradlew build # Build all modules +./gradlew test # Run tests (Docker required — Testcontainers) +./gradlew ktlintFormat # Format code +./gradlew :okapi-benchmarks:jmh # Run JMH benchmarks (~30 min, see benchmarks/README.md) ``` Requires JDK 21.