Skip to content

feat(ingest): local Maple binary (maple start) — OTLP ingest + embedded chDB#64

Draft
Makisuo wants to merge 2 commits into
mainfrom
local-maple-backend
Draft

feat(ingest): local Maple binary (maple start) — OTLP ingest + embedded chDB#64
Makisuo wants to merge 2 commits into
mainfrom
local-maple-backend

Conversation

@Makisuo
Copy link
Copy Markdown
Owner

@Makisuo Makisuo commented May 28, 2026

Summary

Phase 2 of the lightweight local Maple effort: a standalone single-binary local mode started with maple start. It runs an OTLP/HTTP ingest endpoint backed by an embedded in-process ClickHouse (chDB) — no separate server, no Docker — and reuses the production OTLP→NDJSON encoders and the generated ClickHouse schema, so local rows are shaped identically to cloud rows. Single-tenant: every row is pinned to OrgId = "local".

This PR is backend only (Phases 3–5 — component extraction, the local SPA, and cross-platform packaging — are deferred). The bundled UI is a placeholder page.

What's here

  • chdb module (apps/ingest/src/chdb.rs) — chDB is single-owner (one OS thread per data dir), so a dedicated writer thread owns the session and all bootstrap / insert / query work is funneled to it over a channel. Schema bootstrap via Arg::MultiQuery; inserts via INSERT … SELECT … FROM format(JSONEachRow, …) with the org pinned.
  • maple bin (apps/ingest/src/bin/local.rs) — gated behind a new local cargo feature so the production maple-ingest build never links libchdb (~319 MB). clap CLI (start --port --data-dir), Axum routes (/v1/{traces,logs,metrics}, /local/query, /health), rust-embed SPA fallback.
  • telemetry::encode_local_{traces,logs,metrics} — thin wrappers over the existing private encoders for zero row-mapping divergence with the Tinybird path.
  • Schema codegengenerate-clickhouse-schema-sql.ts + generate-clickhouse-insert-mappings.ts emit local-schema.sql + local-inserts.json from the Tinybird manifest, wired into the clickhouse:schema task. (Includes the db2debb5 revision bump the artifacts embed.)

Validated

  • OTLP/HTTP ingest (protobuf and JSON) for traces / logs / metrics → reused encoders → chDB writer → schema bootstrap.
  • Materialized views fire on insert (trace_list_mv, service_map_spans, traces_aggregates_hourly AggregatingMergeTree) and computed DEFAULT columns populate (IsEntryPoint, SampleRate).
  • /local/query returns JSON arrays; empty → [], bad SQL → 500.
  • Production cargo check --lib stays clean (no libchdb without --features local).

Not yet (follow-ups)

  • Phase 3: extract trace-timeline + logs components from apps/webpackages/ui
  • Phase 4: apps/local-ui SPA, bundled into the binary
  • Phase 5: cross-platform release build; resolve libchdb linking / rpath (DYLD_LIBRARY_PATH currently required at runtime) + macOS codesigning

Test plan

  • cargo build --features local -p maple-ingest --bin maple
  • maple start, send OTLP via an instrumented app, POST /local/query with SELECT count() FROM traces
  • bun run clickhouse:schema:check passes

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 28, 2026

Ingest Rust Test + Benchmark Results

Commit: 9f985acd06d279d56760b16a8597635c0b369b45

Load Benchmark — tinybird mode, median of 3 run(s) vs main

Metric main (median) PR (median) Delta
Requests/sec 1915.22 2101.60 +9.7% better
Rows/sec 19152.17 21016.05 +9.7% better
p50 latency 32.29 ms 29.88 ms -7.5% better
p95 latency 39.04 ms 34.26 ms -12.2% better
p99 latency 40.24 ms 42.02 ms +4.4% worse
Export catch-up 0.026 s 0.026 s -0.5% better
Max RSS 103.05 MiB 98.38 MiB -4.5% better
Failures 0 0 same

Same code path on both sides (same LOAD_TEST_INGEST_MODE), so the delta column is meaningful. Numbers come from ubuntu-latest, which is noisy — treat single-digit-percent deltas as noise.

PR load benchmark JSON (per-iteration)
[
  {
    "ingest_mode": "tinybird",
    "requests": 2000,
    "successes": 2000,
    "failures": 0,
    "rows_sent": 20000,
    "rows_exported": 20000,
    "imports": 27,
    "duration_seconds": 1.050530852,
    "export_catchup_seconds": 0.026313127,
    "request_rps": 1903.799394556001,
    "row_rps": 19037.99394556001,
    "p50_ms": 32.175,
    "p95_ms": 39.655,
    "p99_ms": 42.022,
    "max_rss_mb": 101.7421875,
    "max_cpu_percent": 62.5,
    "avg_cpu_percent": 47.03333333333333
  },
  {
    "ingest_mode": "tinybird",
    "requests": 2000,
    "successes": 2000,
    "failures": 0,
    "rows_sent": 20000,
    "rows_exported": 20000,
    "imports": 24,
    "duration_seconds": 0.941078326,
    "export_catchup_seconds": 0.025968481,
    "request_rps": 2125.221615187852,
    "row_rps": 21252.21615187852,
    "p50_ms": 28.951,
    "p95_ms": 33.652,
    "p99_ms": 43.952,
    "max_rss_mb": 98.375,
    "max_cpu_percent": 69.6,
    "avg_cpu_percent": 43.099999999999994
  },
  {
    "ingest_mode": "tinybird",
    "requests": 2000,
    "successes": 2000,
    "failures": 0,
    "rows_sent": 20000,
    "rows_exported": 20000,
    "imports": 24,
    "duration_seconds": 0.951653758,
    "export_catchup_seconds": 0.026092371,
    "request_rps": 2101.604688876771,
    "row_rps": 21016.046888767713,
    "p50_ms": 29.88,
    "p95_ms": 34.262,
    "p99_ms": 39.114,
    "max_rss_mb": 97.703125,
    "max_cpu_percent": 67.8,
    "avg_cpu_percent": 42.2
  }
]
main load benchmark JSON (per-iteration)
[
  {
    "ingest_mode": "tinybird",
    "requests": 2000,
    "successes": 2000,
    "failures": 0,
    "rows_sent": 20000,
    "rows_exported": 20000,
    "imports": 25,
    "duration_seconds": 1.086509461,
    "export_catchup_seconds": 0.025871174,
    "request_rps": 1840.7570958095873,
    "row_rps": 18407.570958095876,
    "p50_ms": 34.35,
    "p95_ms": 39.041,
    "p99_ms": 40.245,
    "max_rss_mb": 103.05078125,
    "max_cpu_percent": 60.7,
    "avg_cpu_percent": 45.5
  },
  {
    "ingest_mode": "tinybird",
    "requests": 2000,
    "successes": 2000,
    "failures": 0,
    "rows_sent": 20000,
    "rows_exported": 20000,
    "imports": 26,
    "duration_seconds": 1.04426806,
    "export_catchup_seconds": 0.026219733,
    "request_rps": 1915.217056432809,
    "row_rps": 19152.170564328087,
    "p50_ms": 32.294,
    "p95_ms": 39.253,
    "p99_ms": 54.175,
    "max_rss_mb": 104.3359375,
    "max_cpu_percent": 66.0,
    "avg_cpu_percent": 48.0
  },
  {
    "ingest_mode": "tinybird",
    "requests": 2000,
    "successes": 2000,
    "failures": 0,
    "rows_sent": 20000,
    "rows_exported": 20000,
    "imports": 25,
    "duration_seconds": 1.001076635,
    "export_catchup_seconds": 0.027458026,
    "request_rps": 1997.8490457925832,
    "row_rps": 19978.490457925833,
    "p50_ms": 31.151,
    "p95_ms": 35.031,
    "p99_ms": 35.913,
    "max_rss_mb": 102.89453125,
    "max_cpu_percent": 67.8,
    "avg_cpu_percent": 54.96666666666666
  }
]

WAL-acked microbench (cargo bench --bench ingest_bench)

   Compiling maple-ingest v0.1.0 (/home/runner/work/maple/maple/apps/ingest)
    Finished `bench` profile [optimized] target(s) in 33.16s
     Running benches/ingest_bench.rs (target/release/deps/ingest_bench-56e0fe315b7f3811)
Gnuplot not found, using plotters backend
test ingest_accept/logs_10_rows_wal_ack ... bench:      553592 ns/iter (+/- 8891)
test ingest_accept/traces_10_spans_wal_ack ... bench:      582302 ns/iter (+/- 22359)

cargo test

    Updating crates.io index
   Compiling maple-ingest v0.1.0 (/home/runner/work/maple/maple/apps/ingest)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 5.92s
     Running unittests src/lib.rs (target/debug/deps/maple_ingest-8b9e9fc61a910385)

running 22 tests
test otel::tests::build_resource_sets_runtime_and_sdk_type ... ok
test telemetry::tests::apply_attribute_mappings_rewrites_span_attributes ... ok
test telemetry::tests::hex_empty_for_zero_ids ... ok
test telemetry::tests::log_encoder_matches_tinybird_row_shape ... ok
test telemetry::tests::logs_emit_exactly_the_jsonpaths_declared_in_datasources_ts ... ok
test telemetry::tests::custom_datasource_names_propagate_to_frames ... ok
test telemetry::tests::logs_severity_text_falls_back_to_mapped_number ... ok
test telemetry::tests::logs_use_observed_time_when_time_unix_nano_is_zero ... ok
test telemetry::tests::metric_encoder_matches_all_tinybird_datasource_shapes ... ok
test telemetry::tests::metrics_summary_data_points_are_dropped ... ok
test telemetry::tests::metrics_emit_exactly_the_jsonpaths_declared_in_datasources_ts ... ok
test telemetry::tests::sampling_keeps_errors_even_when_ratio_low ... ok
test telemetry::tests::timestamp_has_nano_precision ... ok
test telemetry::tests::timestamps_match_clickhouse_datetime64_nine_format ... ok
test telemetry::tests::trace_encoder_matches_tinybird_row_shape ... ok
test telemetry::tests::traces_emit_exactly_the_jsonpaths_declared_in_datasources_ts ... ok
test telemetry::tests::wal_partial_drain_advances_cursor_without_truncating ... ok
test telemetry::tests::wal_round_trips_frame ... ok
test telemetry::tests::wal_truncates_after_full_drain_allowing_further_appends ... ok
test telemetry::tests::pipeline_e2e_exports_traces_to_fake_tinybird ... ok
test telemetry::tests::pipeline_e2e_exports_gzip_ndjson_to_fake_tinybird ... ok
test telemetry::tests::pipeline_e2e_exports_metrics_to_fake_tinybird ... ok

test result: ok. 22 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.14s

     Running unittests src/bin/load_test.rs (target/debug/deps/load_test-3ae74910c06cd17d)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/main.rs (target/debug/deps/maple_ingest-c2f428c94e6b99e8)

running 18 tests
test tests::cloudflare_log_record_maps_body_severity_and_attributes ... ok
test tests::cloudflare_timestamps_support_rfc3339_unix_and_unix_nano ... ok
test tests::cloudflare_validation_payload_is_detected ... ok
test tests::cloudflare_ndjson_payload_parses_multiple_records ... ok
test tests::d1_response_parses_empty_results_as_no_match ... ok
test tests::d1_response_parses_failure_with_errors ... ok
test tests::d1_truthy_accepts_int_and_bool_self_managed ... ok
test tests::enrichment_overwrites_tenant_fields ... ok
test tests::extract_ingest_key_returns_sentinel_literal_unchanged ... ok
test tests::d1_response_parses_success_with_rows ... ok
test tests::hash_is_deterministic ... ok
test tests::non_self_managed_goes_to_shared_pool ... ok
test tests::resolve_ingest_key_returns_none_when_hash_missing ... ok
test tests::self_managed_degrades_to_shared_when_endpoint_unset ... ok
test tests::resolve_ingest_key_returns_self_managed_false_when_no_settings_row ... ok
test tests::self_managed_goes_to_self_managed_pool_when_configured ... ok
test tests::sentinel_token_matches_only_exact_literal ... ok
test tests::resolve_ingest_key_returns_self_managed_true_when_active_settings_row ... ok

test result: ok. 18 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.01s

   Doc-tests maple_ingest

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

Adds `maple start`, a standalone single-binary local mode: OTLP/HTTP
ingest into an embedded in-process ClickHouse (chDB), reusing the
production OTLP→NDJSON encoders and the generated ClickHouse schema so
local rows are shaped identically to cloud. Single-tenant (OrgId="local").

- chdb module: one dedicated writer thread owns the chDB session; all
  bootstrap/insert/query is funneled through it (chDB is single-owner).
- new `maple` bin gated behind the `local` cargo feature so the
  production maple-ingest build never links libchdb; clap CLI, Axum
  routes, rust-embed SPA fallback.
- telemetry::encode_local_{traces,logs,metrics} wrap the private encoders
  for zero row-mapping divergence with the Tinybird path.
- schema codegen: emit local-schema.sql + local-inserts.json from the
  Tinybird manifest, wired into the clickhouse:schema task.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Makisuo Makisuo force-pushed the local-maple-backend branch from 2566c68 to ba7db11 Compare May 29, 2026 14:02
Pre-existing regression on `main` (introduced by ac723de "feat: fix some
react stuff", which rewrote this provider to use createElement). main's CI is
red on the same error; this branch inherits it via rebase onto main.

AutocompleteValuesProvider's `children` prop is required, so React 19's
createElement overload requires it in the props object — passing it as the
variadic 3rd arg leaves the required prop unsatisfied (TS2769). Move children
into the props object. Functionally identical; unblocks @maple/web typecheck.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant