cardano-tracer | timeseries: Prometheus-aligned HTTP API, node info endpoints, Grafana datasource#6562
Open
Russoul wants to merge 34 commits into
Open
cardano-tracer | timeseries: Prometheus-aligned HTTP API, node info endpoints, Grafana datasource#6562Russoul wants to merge 34 commits into
Russoul wants to merge 34 commits into
Conversation
…eseries profile
**cardano-tracer / cardano-timeseries-io**
- Add `Cardano.Timeseries.JSON` with `ToJSON` instances for `Value`, `Instant`,
`Timeseries` and `SeriesIdentifier` (orphan module, imported for side-effects)
- Switch `POST /timeseries/query` response from plain text to `application/json`
**cardano-profile**
- Add `timeseries :: Bool` field to `Tracer`
- Add `tracerTimeseries` primitive
- Add `ci-bench-timeseries` profile: 2-node local cluster with timeseries endpoint
enabled, no shutdown condition, generator runs for 100 000 epochs (effectively
indefinite) — intended for interactive exploration via Grafana
- Regenerate `all-profiles-coay.json` and `wb_profiles.mk`; update all test fixtures
**Nix**
- `cardano-tracer-service-workbench.nix`: add `timeseries` option → `hasTimeseries`
- `cardano-tracer-service.nix`: add `timeseriesEnable/Host/Port` NixOS options
- `tracer.nix`: wire `profile.tracer.timeseries` → `{epHost, epPort = 3400}`
- `supervisor.sh`: replace hanging `netstat -pltn` with `lsof -nP -iTCP:9001
-sTCP:LISTEN` for macOS compatibility
**bench/grafana-datasource** (new)
- TypeScript Grafana datasource plugin (`iog-cardanotimeseries-datasource`)
- Sends `POST /timeseries/query` via Grafana server-side proxy (avoids CORS)
- Converts `Value` tagged-union JSON to Grafana `DataFrame[]`
- `docker-compose.yaml` for local development; datasource auto-provisioned at
`http://host.docker.internal:3400`; Colima-compatible (`extra_hosts`)
- Parse/eval errors from the server are now surfaced as DataQueryError: the banner shows the first line (summary); the full multi-line message (source location + caret) is in data.message for the inspector - Requests go via Grafana's server-side proxy (instanceSettings.url) to avoid CORS and host-resolution issues in the browser - Add Unit value: renders as an empty frame (no data to display)
- POST /timeseries/query now accepts JSON body {"query": "...", "time": <optional Unix seconds>}
- Success response wrapped in {"status":"success","data":...} envelope
- Error responses use {"status":"error","errorType":"...","error":"..."} format
- errorType mirrors Prometheus: "parse", "bad_data", "execution"
- Plugin updated to send application/json body and unwrap response envelope
- Remove ConfigEditor.tsx (Grafana's built-in URL field suffices)
Introduces Nil and Cons constructors to both the expression language and the value domain. The primary motivation is to give the built-in Metrics expression a well-typed return value: it now returns a proper linked list of Text values (metric names) rather than a right-nested Pair tuple terminated by Unit, making it possible to assign it the type List Text in the elaborator. - Interp: Metrics now builds a Nil/Cons spine instead of a Pair/Unit tuple - JSON: ToJSON instances for Nil and Cons (tag/head/tail encoding) - Show: show Nil = "[]", show (Cons h t) = "[h|t]" - Resolve: structural pass-throughs for Nil and Cons - Grafana plugin: Nil/Cons variants in the TypeScript Value union; Cons case walks the spine into a string table frame, Text items render their actual string value
…ew in ci-bench-timeseries - Add provisioned Grafana dashboard mimicking RTView's four sections: Resources, Blockchain, Leadership, Transactions (26 panels, row-based collapsible sections, byte units for memory/mempool panels) - Pin datasource uid in provisioning so dashboard references are stable - Mount dashboards directory in docker-compose - Enable RTView (hasRTView, port 3300) in ci-bench-timeseries profile alongside the existing timeseries endpoint
… inconsistencies Implements pointwise arithmetic (add, sub, mul, div) between RangeVector Scalar and Scalar, following the existing InstantVector/Scalar pattern. This enables queries like 'metric[now - 1h; now] / 1048576' in Grafana panels. Fixes ten inconsistencies found in typing.txt: missing List type in grammar, filter_by_label argument order, A->B type formation rule conclusion, := vs ≔ notation, 'Syntax hugar' typo, map variable name mismatch, RangeVector missing type parameter in quantile_over_time, four missing instant_vector_scalar relation rules, mul_instant_vector_scalar missing arguments, and stale metrics return type (Text -> List Text). Adds a precondition note to 'interp' that the input expression must be well-typed.
…mar docs
Add a cardano-timeseries-test suite (tasty/tasty-hunit) mirroring the
cardano-recon-framework layout, with three suites:
- Elab.Expr.Parser.Suite: comprehensive parser coverage across all constructs,
operator precedence checks, and wrong-arity error cases (152 tests total)
- Elab.Suite: well-typed / ill-typed elaboration checks
- Interp.Suite: end-to-end execution via API.execute against an empty store
Fix parser bug: the `let` RHS was parsed as `exprOr` instead of `exprUniverse`,
preventing unparenthesised lambdas and nested lets from appearing as let-RHS.
The `in` keyword acts as a natural terminator so no ambiguity arises.
Update grammar documentation:
- Elab/Expr.hs inline grammar: `let x = t{> universe}` -> `t{≥ universe}`
- docs/elab.txt: fix `<int>min` -> `<int>m`; add missing `round t`,
`earliest x`, `latest x`
… elab bug The test suite covers scalar/bool arithmetic and comparisons, duration and timestamp literals, type conversions (to_scalar, abs, round), pairs, let/lambda, InstantVector lookup/aggregation/filter/map/label-filter/unless/join, InstantVector-Scalar arithmetic and relations, RangeVector aggregations (avg_over_time, sum_over_time, rate, increase, quantile_over_time), and metrics. Writing the tests surfaced a bug in Elab.hs where Surface.SumOverTime was elaborated to Semantic.AvgOverTime instead of Semantic.SumOverTime, causing sum_over_time queries to silently return the mean. Fixed.
Drop the checkFresh guard so duplicate names are no longer rejected. Change variable lookup to search the context right-to-left (Seq.reverse) so that the innermost binding wins, as scoping requires.
St now carries availableMetrics :: Set MetricIdentifier, populated at the call site via metrics store (or Set.empty in metric-free contexts like the elab test suite). The fallback Variable case — previously an unconditional metric assumption — now checks membership in availableMetrics and throws "Undefined name: <v>" when the name appears neither in the local context nor in the store, replacing the confusing downstream type-mismatch error that occurred before.
…tamp 0 The `epoch` keyword was mapped to `Now` in the parser, making it identical to `now` (i.e. current server time). It now produces `Epoch`, which the interpreter evaluates to `Timestamp 0` (Unix epoch), so expressions like `epoch + 1778499300385ms` yield the intended absolute timestamp rather than ~year 2082.
The ci-bench-timeseries profile has ~900M transactions (100 000 epochs × ~9k txs/epoch) but inherited fundsDefault (10 000 ADA) from base, which covers only a handful of transactions before the generator exits with "insufficient funds". Add fundsTimeseries (25 000 000 ADA = 25 × 10^15 lovelace) to Vocabulary and a baseTimeseries variant of base that uses it. Switch ciTimeseries02Value to baseTimeseries so the ci-bench-timeseries profile has enough genesis funds to sustain a long-running workload.
100 000 epochs needed ~23B ADA in the generator wallet (split grows exponentially with tx_count via unfoldSplitSequence) but genesis could only supply 22.5B, causing an immediate "insufficient funds" crash. 12 epochs at 15 tps / timescaleCompressed = ~108 000 transactions = 2 hours. The required split is now ~2.3M ADA, well within the 22.5B available.
When a variable is not found in either the local context or the metric store, the elaborator now appends a "Did you mean: ..." hint by ranking all candidate names (locals + metrics) by Levenshtein distance and showing the closest ones (up to 5) within threshold max(1, len/3). Uses the `edit-distance` library (the canonical Haskell implementation, also used by GHC for its own "did you mean" diagnostics).
Replace hardcoded [now - 1h; now] in all 25 dashboard panels with [$__from; $__to] so the Grafana time picker controls the window. Pre-process $__from/$__to in the datasource plugin before getTemplateSrv() sees them — Grafana treats these as built-in variables and ignores scopedVars overrides — converting them to epoch + Nms expressions that the query language interprets as absolute timestamps.
Replaces the 7-test skeleton with a comprehensive suite covering all elaborator code paths: literals, duration literals, arithmetic (well-typed and ill-typed), comparisons, boolean operators, let/lambda, pairs/projections, to_scalar, abs/round, metrics builtin, range vectors, instant-vector operations, and metric-name resolution (known metric, undefined name, Levenshtein suggestions). Notes two elaborator bidirectionality constraints discovered via test failures: - `Duration + Duration` requires the result type to be driven by a containing expression (tested via `now + (1s + 2s)`). - `\x -> x + 1` leaves `x`'s type unconstrained; replaced with `\x -> now + x` which forces `x : Duration` via the Timestamp+? noncanonical rule.
When both operands are known to be Duration but the result type is still a hole, force the hole to Duration and re-queue as the canonical Duration + Duration : Duration problem. This allows standalone expressions like `1s + 2s` to elaborate without needing an outer context to drive the result type.
All three rules fire only when the known type information uniquely
determines the outcome:
? - Duration : ? -> Timestamp - Duration : Timestamp
Only Timestamp - Duration exists with Duration on the Sub rhs.
Enables: \x -> x - 1s (infer x : Timestamp)
? + ? : Duration -> Duration + Duration : Duration
Only Duration + Duration produces Duration.
Enables: \x -> \y -> m [now; now : x + y] (infer both : Duration)
? - ? : Timestamp -> Timestamp - Duration : Timestamp
Only Timestamp - Duration produces Timestamp via Sub.
Enables: \x -> \y -> m [x - y; x] (infer x : Timestamp, y : Duration)
Ordering: A (? - Duration) before C (? - ? : Timestamp) so the more
specific rhsTy=Duration match takes priority. B (? + ? : Duration)
after the existing Duration + Duration : ? rule so the known-both-sides
case is tried first.
- JSON.hs: rename tag/value to resultType/result; use Prometheus resultType names (scalar, vector, matrix); encode timestamps and durations as Unix seconds (Double); encode data-point values as strings; rename labels->metric and data->values in Instant/Timeseries - TimeseriesServer: parse query endpoint body as application/x-www-form-urlencoded (Prometheus wire format); add GET support alongside POST, sharing a single handleQuery helper - grafana-datasource: update types, toDataFrames, and datasource to match new wire format; switch fetch to x-www-form-urlencoded
Remove the 2-node ci-bench-timeseries profile along with its dedicated baseTimeseries and fundsTimeseries combinators (now dead code). Add 6-dense-timeseries-1h as a copy of 6-dense-1h with tracerTimeseries enabled: 6 nodes, 1-hour duration, dense topology. The tracerTimeseries combinator in Primitives.hs is retained. Regenerate all-profiles-coay.json.
21567bd to
d9368e4
Compare
…files-coay.json all-profiles-coay.json was corrupted by cabal build output leaking into stdout during a previous regeneration. Regenerated cleanly by separating the build step from the capture step.
…(excludes playground profiles)
Replace GNU ps long-option syntax with pgrep -P, which is available on both macOS (BSD) and Linux.
…ies server
New routes on the timeseries server:
GET /timeseries/nodes — list connected node IDs
GET /timeseries/node/{id}/info — NodeInfo + uptimeSeconds
GET /timeseries/node/{id}/startup — NodeStartupInfo
GET /timeseries/node/{id}/state (RTVIEW) — sync progress %
The server now receives TracerEnv instead of individual fields so it can
access teDPRequestors, teCurrentDPLock, and teConnectedNodes.
The /state endpoint uses data point key "NodeAddBlock" (the namespace
cardano-node actually stores the NodeState data point under) rather than
"NodeState", which was always empty.
…d dashboard panels New query types in the datasource plugin: nodes — lists all connected node IDs (used by $node_id variable) node-info — name, protocol, version, commit, start time, uptime node-startup — era, slot length, epoch length, KES period node-state — sync progress % New panels in rtview.json (all repeat by $node_id variable): Connected Nodes table, Node Info table, Startup table, Sync % stat, Uptime stat metricFindQuery populates the $node_id query variable automatically from /timeseries/nodes. $__from/$__to in timeseries queries are pre-processed to valid timestamp expressions before template variable expansion.
… code review - Elab.hs: fix copy-paste error in binary arithmetic op elab (rhs hole was unified against lhsTy instead of rhsTy); rename evalBinaryArithmethicOpElabProblem to evalBinaryArithmeticOpElabProblem (typo) - Elab.hs: elaborate `metrics` as List Text (was Text); add Str elab case - Elab/Typing.hs, Resolve.hs, Unify.hs: add List Ty to support metrics type - Interp.hs: guard avg/min/max against empty instant vector; fix rate to error on single-point timeseries instead of dividing by zero - Interp/Value.hs: use showFFloat in Show instance for Scalar to avoid scientific notation in JSON output - TimeseriesServer.hs: fix minimumRetentionMillis units (seconds → ms); remove unused RecordWildCards pragma; align sleep delay with Monitoring.hs - Acceptors/Utils.hs: align new imports with surrounding import block
1297902 to
560cae8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Query language improvements
epochparser bug fixed —epochnow correctly producesTimestamp 0(Unix epoch); previously it parsed identically tonow, makingepoch + Nmsevaluate to ~year 2082 instead of the intended absolute timestampUndefined name: <x>errors at elaboration time instead of confusing downstream type-mismatch errors;StcarriesavailableMetrics :: Set MetricIdentifierpopulated from the storeedit-distance) to known metric names and local bindings, appending the closest match to the error messagelet x = 1 in let x = 2 in xnow correctly returns2; previously the elaborator rejected re-use of any nameRangeVector/Scalararithmetic —+,-,*,/between a range vector and a scalar are now supported in both elaboration and interpretationUnitandListtypes —Unit,Nil, andConsconstructors added toValueand the semanticExpr;metricsexpression returns the metric list as aCons-list ofTextvaluessum_over_timeelab bug fixed —Surface.SumOverTimewas incorrectly elaborated toSemantic.AvgOverTime(copy-paste error)Duration + Duration : ?→ result isDuration(e.g.1s + 2snow elaborates standalone)? - Duration : ?→ lhs must beTimestamp(e.g.\x -> x - 1sinfersx : Timestamp)? + ? : Duration→ both args must beDuration(e.g.\x -> \y -> m[now; now : x + y]infers both lambda args)? - ? : Timestamp→ lhs must beTimestamp, rhs must beDuration(e.g.\x -> \y -> m[x - y; x]infers arg types from the range start position)Test suite
to_scalar,abs/round, themetricsbuiltin, range vectors, instant-vector operations, metric-name resolution, and "Did you mean?" suggestionsavg_over_time,sum_over_time,rate,increase,quantile_over_time), label filtering,metricsquery, shadowing, and error casescardano-tracertimeseries HTTP serverPOST /timeseries/query— body:application/x-www-form-urlencoded; parameters:query=<expr>(required) — the query expressiontime=<unix_s>(optional, float) — evaluation instant; defaults to server time (now){"status":"success","data":<Value>}or{"status":"error","errorType":"...","error":"..."}GET /timeseries/query— same parameters and response, passed as URL query stringTracerEnv):GET /timeseries/nodes— JSON array of connected node ID stringsGET /timeseries/node/{id}/info—NodeInfofields plus server-computeduptimeSecondsGET /timeseries/node/{id}/startup—NodeStartupInfo(era, slot length, epoch length, KES period)GET /timeseries/node/{id}/state(RTVIEW) —{"syncProgress":<pct>}from theNodeAddBlockdata point/stateendpoint uses key"NodeAddBlock"(the namespacecardano-nodeactually registers the data point under) rather than"NodeState", which was always emptyGrafana datasource plugin (
bench/grafana-datasource/)$__from/$__tosupport — the plugin pre-processes queries before template substitution, rewriting$__fromand$__totoepoch + Nmsexpressions the query language understands; dashboard panels use[$__from; $__to]ranges driven by the Grafana time pickerDataQueryErrorwith the first line as the panel message and the full elaboration/interpretation trace in the inspectornodes— list of node IDs (populates the$node_idtemplate variable viametricFindQuery)node-info— name, protocol, version, commit, start time, uptime in secondsnode-startup— era, slot length, epoch length, KES period lengthnode-state— sync progress percentage$node_idvia a multi-select query variable).and-in EKG metric names are replaced with_so names match thecardano_node_metrics_*convention used in queriesWorkbench profiles
6-dense-timeseries-1h— 6-node dense-topology profile withtracer.timeseries = trueandtracer.rtview = true; 1-hour duration; replaces the earlier 2-nodeci-bench-timeseriesprofileps h --ppid(GNU ps) replaced withpgrep -Pinsupervisor.sh; the former errors on macOS BSDpscausing noisy output on local runs