From 4252888217d65118d68e466fe3b273a622063eaf Mon Sep 17 00:00:00 2001 From: warku123 Date: Tue, 28 Apr 2026 17:22:51 +0800 Subject: [PATCH 1/4] feat(metrics): refine block_transaction_count buckets and add METRICS_CHANGELOG.md Densify the default histogram buckets around 0-300 (the typical TPS range) so percentile interpolation for per-SR throughput is more accurate. New buckets: [0, 20, 50, 80, 100, 120, 140, 160, 180, 200, 230, 260, 300, 500, 2000]. Also introduce METRICS_CHANGELOG.md to track Prometheus metric additions, changes, and removals going forward, with a Pre-4.8.2 baseline snapshot of currently emitted metrics. --- METRICS_CHANGELOG.md | 85 +++++++++++++++++++ .../common/prometheus/MetricsHistogram.java | 2 +- 2 files changed, 86 insertions(+), 1 deletion(-) create mode 100644 METRICS_CHANGELOG.md diff --git a/METRICS_CHANGELOG.md b/METRICS_CHANGELOG.md new file mode 100644 index 00000000000..9f0b0016a17 --- /dev/null +++ b/METRICS_CHANGELOG.md @@ -0,0 +1,85 @@ +Metrics Changelog +================= + +This file tracks Prometheus metric additions, changes, and removals in java-tron. For the full set of metrics emitted today, see the references at the bottom. + +**4.8.2** + +### New Metrics + +#### Core + +- `tron:block_transaction_count` (Histogram, label `miner`) — per-block transaction count, sampled at the entry of `Manager#pushBlock` before any early return so duplicate, stale, and fork-switched pushes are observed alongside applied blocks. Primary use cases: empty-block detection per super representative, and per-SR TPS / throughput percentile interpolation. Default buckets `[0, 20, 50, 80, 100, 120, 140, 160, 180, 200, 230, 260, 300, 500, 2000]` are densified around 0–300 so percentile interpolation in the typical TPS range is more accurate. ([#6624](https://github.com/tronprotocol/java-tron/pull/6624)) + +#### Consensus + +- `tron:sr_set_change` (Counter, labels `action`, `witness`) — incremented once per witness whenever the active SR set rotates at a maintenance boundary. `action` is one of `add` / `remove`. Cardinality grows with the number of distinct witnesses that have ever entered or left the active set, not with the active set size at any given moment. ([#6624](https://github.com/tronprotocol/java-tron/pull/6624)) + +**Pre-4.8.2 Baseline** + +Snapshot of metrics emitted prior to this changelog. Per-version provenance is not tracked here; consult `git log` on [`common/src/main/java/org/tron/common/prometheus/`](common/src/main/java/org/tron/common/prometheus/) for exact origin of each metric. + +### Existing Metrics + +#### Core (block / transaction processing) + +- `tron:header_height` (Gauge) — latest block height on this node. +- `tron:header_time` (Gauge) — latest block timestamp on this node. +- `tron:block_push_latency_seconds` (Histogram) — `Manager#pushBlock` latency. +- `tron:block_process_latency_seconds` (Histogram, label `sync`) — `TronNetDelegate#processBlock` latency. +- `tron:block_generate_latency_seconds` (Histogram, label `address`) — block generation latency per producer. +- `tron:block_fetch_latency_seconds` (Histogram) — block fetch latency. +- `tron:block_receive_delay_seconds` (Histogram) — `receiveTime - blockTime`. +- `tron:block_fork` (Counter, label `type`) — fork events by type. +- `tron:lock_acquire_latency_seconds` (Histogram, label `type`) — DB / chain lock acquisition latency. +- `tron:miner` (Counter, labels `miner`, `type`) — blocks produced by an SR. +- `tron:miner_latency_seconds` (Histogram, label `miner`) — block mining latency per producer. +- `tron:miner_delay_seconds` (Histogram, label `miner`) — `actualTime - planTime` for block production. +- `tron:txs` (Counter, labels `type`, `detail`) — transaction counts. +- `tron:process_transaction_latency_seconds` (Histogram, labels `type`, `contract`) — transaction processing latency. +- `tron:verify_sign_latency_seconds` (Histogram, label `type`) — signature verification latency for transactions and blocks. +- `tron:tx_cache` (Gauge, label `type`) — transaction cache stats. +- `tron:manager_queue_size` (Gauge, label `type`) — `Manager` queue sizes (pending / popped / queued / repush). + +#### Net (P2P) + +- `tron:peers` (Gauge, label `type`) — peer counts. +- `tron:p2p_error` (Counter, label `type`) — P2P error events. +- `tron:p2p_disconnect` (Counter, label `type`) — P2P disconnect events. +- `tron:ping_pong_latency_seconds` (Histogram) — peer ping-pong RTT. +- `tron:message_process_latency_seconds` (Histogram, label `type`) — peer message processing latency. +- `tron:tcp_bytes` (Histogram, label `type`) — TCP traffic. +- `tron:udp_bytes` (Histogram, label `type`) — UDP traffic. + +#### API + +- `tron:http_service_latency_seconds` (Histogram, label `url`) — HTTP endpoint latency. +- `tron:http_bytes` (Histogram, labels `url`, `status`) — HTTP traffic. +- `tron:grpc_service_latency_seconds` (Histogram, label `endpoint`) — gRPC endpoint latency. +- `tron:jsonrpc_service_latency_seconds` (Histogram, label `method`) — JSON-RPC method latency. +- `tron:internal_service_latency_seconds` (Histogram, labels `class`, `method`) — internal service-call latency. +- `tron:internal_service_fail` (Counter, labels `class`, `method`) — internal service-call failure count. + +#### DB + +- `tron:db_size_bytes` (Gauge, labels `type`, `db`, `level`) — LevelDB storage size. +- `tron:db_sst_level` (Gauge, labels `type`, `db`, `level`) — LevelDB SST files per compaction level. +- `tron:guava_cache_hit_rate`, `tron:guava_cache_request`, `tron:guava_cache_eviction_count` — Guava cache stats, registered via `GuavaCacheExports` for caches that opt in. + +#### System + +Emitted by `OperatingSystemExports` (no labels): + +- `system_available_cpus`, `system_cpu_load`, `system_load_average`, `system_total_physical_memory_bytes`, `system_free_physical_memory_bytes`, `system_total_swap_spaces_bytes`, `system_free_swap_spaces_bytes`. + +#### JVM / process + +Auto-emitted by the Prometheus client library via `DefaultExports.initialize()` (`simpleclient_hotspot`). The full list is owned by the upstream library and not enumerated here; see the [client_java](https://github.com/prometheus/client_java) docs. Common ones: `jvm_memory_bytes_*`, `jvm_gc_collection_seconds_*`, `jvm_threads_*`, `process_cpu_seconds_total`, `process_open_fds`, `process_resident_memory_bytes`. + +--- + +**References** + +- [Official metrics documentation](https://tronprotocol.github.io/documentation-en/using_javatron/metrics/) — descriptions, configuration, and example queries. +- [tron-docker `metric_monitor/README.md`](https://github.com/tronprotocol/tron-docker/blob/main/metric_monitor/README.md) — operator-oriented overview with deployment guidance. +- [java-tron-server (Grafana dashboard 16567)](https://grafana.com/grafana/dashboards/16567-java-tron-server/) — reference dashboard template. diff --git a/common/src/main/java/org/tron/common/prometheus/MetricsHistogram.java b/common/src/main/java/org/tron/common/prometheus/MetricsHistogram.java index 6a66dc76bb3..51c5894738c 100644 --- a/common/src/main/java/org/tron/common/prometheus/MetricsHistogram.java +++ b/common/src/main/java/org/tron/common/prometheus/MetricsHistogram.java @@ -51,7 +51,7 @@ public class MetricsHistogram { init(MetricKeys.Histogram.BLOCK_TRANSACTION_COUNT, "Distribution of transaction counts per block.", - new double[]{0, 10, 50, 100, 200, 500, 1000, 2000, 5000, 10000}, + new double[]{0, 20, 50, 80, 100, 120, 140, 160, 180, 200, 230, 260, 300, 500, 2000}, MetricLabels.Histogram.MINER); } From 61d19934eb060ff86bedf3826b909c61b1557611 Mon Sep 17 00:00:00 2001 From: warku123 Date: Wed, 6 May 2026 11:43:57 +0800 Subject: [PATCH 2/4] docs(metrics): point references to in-repo Grafana dashboard JSON Replace the grafana.com/dashboards/16567 link with the maintained JSON in tron-docker, per @Sunny6889's review on #6730. The community dashboard 16567 is no longer being updated; the in-org JSON is the current source of truth. --- METRICS_CHANGELOG.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/METRICS_CHANGELOG.md b/METRICS_CHANGELOG.md index 9f0b0016a17..39d4933d0e2 100644 --- a/METRICS_CHANGELOG.md +++ b/METRICS_CHANGELOG.md @@ -82,4 +82,4 @@ Auto-emitted by the Prometheus client library via `DefaultExports.initialize()` - [Official metrics documentation](https://tronprotocol.github.io/documentation-en/using_javatron/metrics/) — descriptions, configuration, and example queries. - [tron-docker `metric_monitor/README.md`](https://github.com/tronprotocol/tron-docker/blob/main/metric_monitor/README.md) — operator-oriented overview with deployment guidance. -- [java-tron-server (Grafana dashboard 16567)](https://grafana.com/grafana/dashboards/16567-java-tron-server/) — reference dashboard template. +- [java-tron-server Grafana dashboard](https://github.com/tronprotocol/tron-docker/blob/main/metric_monitor/grafana_dashboard/java-tron-server.json) — maintained reference dashboard JSON. From b1bdb5dde71bd1a721569efe6b3e17887b44d8e3 Mon Sep 17 00:00:00 2001 From: warku123 Date: Fri, 8 May 2026 15:24:52 +0800 Subject: [PATCH 3/4] docs(metrics): fix baseline gaps and add operational notes in METRICS_CHANGELOG - Add missing tron:error_info (Counter) to Logging subsection - Add missing process_cpu_load to System metrics list - Fix tron:db_size_bytes / tron:db_sst_level descriptions to be engine-aware (LEVELDB or ROCKSDB) instead of LevelDB-only - Expand Guava cache entries with type label and individual descriptions - Add +Inf overflow operational note under tron:block_transaction_count --- METRICS_CHANGELOG.md | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/METRICS_CHANGELOG.md b/METRICS_CHANGELOG.md index 39d4933d0e2..a31e30822cf 100644 --- a/METRICS_CHANGELOG.md +++ b/METRICS_CHANGELOG.md @@ -11,6 +11,8 @@ This file tracks Prometheus metric additions, changes, and removals in java-tron - `tron:block_transaction_count` (Histogram, label `miner`) — per-block transaction count, sampled at the entry of `Manager#pushBlock` before any early return so duplicate, stale, and fork-switched pushes are observed alongside applied blocks. Primary use cases: empty-block detection per super representative, and per-SR TPS / throughput percentile interpolation. Default buckets `[0, 20, 50, 80, 100, 120, 140, 160, 180, 200, 230, 260, 300, 500, 2000]` are densified around 0–300 so percentile interpolation in the typical TPS range is more accurate. ([#6624](https://github.com/tronprotocol/java-tron/pull/6624)) + > **Operational note:** The upper bound is 2000; blocks with more transactions land in `+Inf` with no further bucket resolution. Monitor the overflow ratio — e.g. `(rate(tron_block_transaction_count_bucket{le="+Inf"}[5m]) - rate(tron_block_transaction_count_bucket{le="2000"}[5m])) / rate(tron_block_transaction_count_count[5m]) > 0.01` — as a signal to re-tune the upper bound. + #### Consensus - `tron:sr_set_change` (Counter, labels `action`, `witness`) — incremented once per witness whenever the active SR set rotates at a maintenance boundary. `action` is one of `add` / `remove`. Cardinality grows with the number of distinct witnesses that have ever entered or left the active set, not with the active set size at any given moment. ([#6624](https://github.com/tronprotocol/java-tron/pull/6624)) @@ -62,15 +64,22 @@ Snapshot of metrics emitted prior to this changelog. Per-version provenance is n #### DB -- `tron:db_size_bytes` (Gauge, labels `type`, `db`, `level`) — LevelDB storage size. -- `tron:db_sst_level` (Gauge, labels `type`, `db`, `level`) — LevelDB SST files per compaction level. -- `tron:guava_cache_hit_rate`, `tron:guava_cache_request`, `tron:guava_cache_eviction_count` — Guava cache stats, registered via `GuavaCacheExports` for caches that opt in. +- `tron:db_size_bytes` (Gauge, labels `type`, `db`, `level`) — storage size in bytes per engine, database, and level; `type` is the storage engine (`LEVELDB` or `ROCKSDB`) depending on node configuration. +- `tron:db_sst_level` (Gauge, labels `type`, `db`, `level`) — SST files per compaction level per engine and database; `type` is the storage engine (`LEVELDB` or `ROCKSDB`) depending on node configuration. +- `tron:guava_cache_hit_rate` (Gauge, label `type`) — hit rate of a Guava cache; `type` is the cache name. +- `tron:guava_cache_request` (Gauge, label `type`) — total request count of a Guava cache; `type` is the cache name. +- `tron:guava_cache_eviction_count` (Gauge, label `type`) — eviction count of a Guava cache; `type` is the cache name. +- (Registered via `GuavaCacheExports` for caches that opt in to `CacheManager`.) + +#### Logging + +- `tron:error_info` (Counter, labels `topic`, `type`) — incremented on every ERROR-level log line by `InstrumentedAppender`. #### System Emitted by `OperatingSystemExports` (no labels): -- `system_available_cpus`, `system_cpu_load`, `system_load_average`, `system_total_physical_memory_bytes`, `system_free_physical_memory_bytes`, `system_total_swap_spaces_bytes`, `system_free_swap_spaces_bytes`. +- `system_available_cpus`, `process_cpu_load`, `system_cpu_load`, `system_load_average`, `system_total_physical_memory_bytes`, `system_free_physical_memory_bytes`, `system_total_swap_spaces_bytes`, `system_free_swap_spaces_bytes`. #### JVM / process From 61afb5dd81985eecca8fd3775721ea3395908a8d Mon Sep 17 00:00:00 2001 From: warku123 Date: Fri, 8 May 2026 17:00:04 +0800 Subject: [PATCH 4/4] feat(metrics): add 5000/10000 safety-net buckets to block_transaction_count Restore 5000 and 10000 as upper safety-net buckets so outlier events (stress tests, repush storms, fork-switched blocks) retain resolution above 2000. The 0-300 densification for typical-TPS percentiles is unchanged. Update METRICS_CHANGELOG.md and its operational note accordingly. --- METRICS_CHANGELOG.md | 4 ++-- .../java/org/tron/common/prometheus/MetricsHistogram.java | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/METRICS_CHANGELOG.md b/METRICS_CHANGELOG.md index a31e30822cf..3c599796d7a 100644 --- a/METRICS_CHANGELOG.md +++ b/METRICS_CHANGELOG.md @@ -9,9 +9,9 @@ This file tracks Prometheus metric additions, changes, and removals in java-tron #### Core -- `tron:block_transaction_count` (Histogram, label `miner`) — per-block transaction count, sampled at the entry of `Manager#pushBlock` before any early return so duplicate, stale, and fork-switched pushes are observed alongside applied blocks. Primary use cases: empty-block detection per super representative, and per-SR TPS / throughput percentile interpolation. Default buckets `[0, 20, 50, 80, 100, 120, 140, 160, 180, 200, 230, 260, 300, 500, 2000]` are densified around 0–300 so percentile interpolation in the typical TPS range is more accurate. ([#6624](https://github.com/tronprotocol/java-tron/pull/6624)) +- `tron:block_transaction_count` (Histogram, label `miner`) — per-block transaction count, sampled at the entry of `Manager#pushBlock` before any early return so duplicate, stale, and fork-switched pushes are observed alongside applied blocks. Primary use cases: empty-block detection per super representative, and per-SR TPS / throughput percentile interpolation. Default buckets `[0, 20, 50, 80, 100, 120, 140, 160, 180, 200, 230, 260, 300, 500, 2000, 5000, 10000]` are densified around 0–300 for percentile interpolation in the typical TPS range; 5000 and 10000 are retained as safety-net buckets to preserve resolution for outlier events such as stress tests or repush storms. ([#6624](https://github.com/tronprotocol/java-tron/pull/6624)) - > **Operational note:** The upper bound is 2000; blocks with more transactions land in `+Inf` with no further bucket resolution. Monitor the overflow ratio — e.g. `(rate(tron_block_transaction_count_bucket{le="+Inf"}[5m]) - rate(tron_block_transaction_count_bucket{le="2000"}[5m])) / rate(tron_block_transaction_count_count[5m]) > 0.01` — as a signal to re-tune the upper bound. + > **Operational note:** The effective upper bound is 10000; blocks exceeding that land in `+Inf`. Monitor the overflow ratio — e.g. `(rate(tron_block_transaction_count_bucket{le="+Inf"}[5m]) - rate(tron_block_transaction_count_bucket{le="10000"}[5m])) / rate(tron_block_transaction_count_count[5m]) > 0.01` — as a signal to re-tune the upper bound. #### Consensus diff --git a/common/src/main/java/org/tron/common/prometheus/MetricsHistogram.java b/common/src/main/java/org/tron/common/prometheus/MetricsHistogram.java index 51c5894738c..fa42a59aeaa 100644 --- a/common/src/main/java/org/tron/common/prometheus/MetricsHistogram.java +++ b/common/src/main/java/org/tron/common/prometheus/MetricsHistogram.java @@ -51,7 +51,7 @@ public class MetricsHistogram { init(MetricKeys.Histogram.BLOCK_TRANSACTION_COUNT, "Distribution of transaction counts per block.", - new double[]{0, 20, 50, 80, 100, 120, 140, 160, 180, 200, 230, 260, 300, 500, 2000}, + new double[]{0, 20, 50, 80, 100, 120, 140, 160, 180, 200, 230, 260, 300, 500, 2000, 5000, 10000}, MetricLabels.Histogram.MINER); }