-
Notifications
You must be signed in to change notification settings - Fork 1.7k
feat(metrics): refine block_transaction_count buckets, add changelog #6730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
kuny0707
merged 4 commits into
tronprotocol:develop
from
warku123:feat/metrics-changelog
May 8, 2026
+95
−1
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
4252888
feat(metrics): refine block_transaction_count buckets and add METRICS…
warku123 61d1993
docs(metrics): point references to in-repo Grafana dashboard JSON
warku123 b1bdb5d
docs(metrics): fix baseline gaps and add operational notes in METRICS…
warku123 61afb5d
feat(metrics): add 5000/10000 safety-net buckets to block_transaction…
warku123 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,94 @@ | ||
| Metrics Changelog | ||
| ================= | ||
|
|
||
| This file tracks Prometheus metric additions, changes, and removals in java-tron. For the full set of metrics emitted today, see the references at the bottom. | ||
|
|
||
| **4.8.2** | ||
|
|
||
| ### New Metrics | ||
|
|
||
| #### Core | ||
|
|
||
| - `tron:block_transaction_count` (Histogram, label `miner`) — per-block transaction count, sampled at the entry of `Manager#pushBlock` before any early return so duplicate, stale, and fork-switched pushes are observed alongside applied blocks. Primary use cases: empty-block detection per super representative, and per-SR TPS / throughput percentile interpolation. Default buckets `[0, 20, 50, 80, 100, 120, 140, 160, 180, 200, 230, 260, 300, 500, 2000, 5000, 10000]` are densified around 0–300 for percentile interpolation in the typical TPS range; 5000 and 10000 are retained as safety-net buckets to preserve resolution for outlier events such as stress tests or repush storms. ([#6624](https://github.com/tronprotocol/java-tron/pull/6624)) | ||
|
|
||
| > **Operational note:** The effective upper bound is 10000; blocks exceeding that land in `+Inf`. Monitor the overflow ratio — e.g. `(rate(tron_block_transaction_count_bucket{le="+Inf"}[5m]) - rate(tron_block_transaction_count_bucket{le="10000"}[5m])) / rate(tron_block_transaction_count_count[5m]) > 0.01` — as a signal to re-tune the upper bound. | ||
|
|
||
| #### Consensus | ||
|
|
||
| - `tron:sr_set_change` (Counter, labels `action`, `witness`) — incremented once per witness whenever the active SR set rotates at a maintenance boundary. `action` is one of `add` / `remove`. Cardinality grows with the number of distinct witnesses that have ever entered or left the active set, not with the active set size at any given moment. ([#6624](https://github.com/tronprotocol/java-tron/pull/6624)) | ||
|
|
||
| **Pre-4.8.2 Baseline** | ||
|
|
||
| Snapshot of metrics emitted prior to this changelog. Per-version provenance is not tracked here; consult `git log` on [`common/src/main/java/org/tron/common/prometheus/`](common/src/main/java/org/tron/common/prometheus/) for exact origin of each metric. | ||
|
|
||
| ### Existing Metrics | ||
|
|
||
| #### Core (block / transaction processing) | ||
|
|
||
| - `tron:header_height` (Gauge) — latest block height on this node. | ||
| - `tron:header_time` (Gauge) — latest block timestamp on this node. | ||
| - `tron:block_push_latency_seconds` (Histogram) — `Manager#pushBlock` latency. | ||
| - `tron:block_process_latency_seconds` (Histogram, label `sync`) — `TronNetDelegate#processBlock` latency. | ||
| - `tron:block_generate_latency_seconds` (Histogram, label `address`) — block generation latency per producer. | ||
| - `tron:block_fetch_latency_seconds` (Histogram) — block fetch latency. | ||
| - `tron:block_receive_delay_seconds` (Histogram) — `receiveTime - blockTime`. | ||
| - `tron:block_fork` (Counter, label `type`) — fork events by type. | ||
| - `tron:lock_acquire_latency_seconds` (Histogram, label `type`) — DB / chain lock acquisition latency. | ||
| - `tron:miner` (Counter, labels `miner`, `type`) — blocks produced by an SR. | ||
| - `tron:miner_latency_seconds` (Histogram, label `miner`) — block mining latency per producer. | ||
| - `tron:miner_delay_seconds` (Histogram, label `miner`) — `actualTime - planTime` for block production. | ||
| - `tron:txs` (Counter, labels `type`, `detail`) — transaction counts. | ||
| - `tron:process_transaction_latency_seconds` (Histogram, labels `type`, `contract`) — transaction processing latency. | ||
| - `tron:verify_sign_latency_seconds` (Histogram, label `type`) — signature verification latency for transactions and blocks. | ||
| - `tron:tx_cache` (Gauge, label `type`) — transaction cache stats. | ||
| - `tron:manager_queue_size` (Gauge, label `type`) — `Manager` queue sizes (pending / popped / queued / repush). | ||
|
|
||
| #### Net (P2P) | ||
|
|
||
| - `tron:peers` (Gauge, label `type`) — peer counts. | ||
| - `tron:p2p_error` (Counter, label `type`) — P2P error events. | ||
| - `tron:p2p_disconnect` (Counter, label `type`) — P2P disconnect events. | ||
| - `tron:ping_pong_latency_seconds` (Histogram) — peer ping-pong RTT. | ||
| - `tron:message_process_latency_seconds` (Histogram, label `type`) — peer message processing latency. | ||
| - `tron:tcp_bytes` (Histogram, label `type`) — TCP traffic. | ||
| - `tron:udp_bytes` (Histogram, label `type`) — UDP traffic. | ||
|
|
||
| #### API | ||
|
|
||
| - `tron:http_service_latency_seconds` (Histogram, label `url`) — HTTP endpoint latency. | ||
| - `tron:http_bytes` (Histogram, labels `url`, `status`) — HTTP traffic. | ||
| - `tron:grpc_service_latency_seconds` (Histogram, label `endpoint`) — gRPC endpoint latency. | ||
| - `tron:jsonrpc_service_latency_seconds` (Histogram, label `method`) — JSON-RPC method latency. | ||
| - `tron:internal_service_latency_seconds` (Histogram, labels `class`, `method`) — internal service-call latency. | ||
| - `tron:internal_service_fail` (Counter, labels `class`, `method`) — internal service-call failure count. | ||
|
|
||
| #### DB | ||
|
|
||
| - `tron:db_size_bytes` (Gauge, labels `type`, `db`, `level`) — storage size in bytes per engine, database, and level; `type` is the storage engine (`LEVELDB` or `ROCKSDB`) depending on node configuration. | ||
| - `tron:db_sst_level` (Gauge, labels `type`, `db`, `level`) — SST files per compaction level per engine and database; `type` is the storage engine (`LEVELDB` or `ROCKSDB`) depending on node configuration. | ||
| - `tron:guava_cache_hit_rate` (Gauge, label `type`) — hit rate of a Guava cache; `type` is the cache name. | ||
| - `tron:guava_cache_request` (Gauge, label `type`) — total request count of a Guava cache; `type` is the cache name. | ||
| - `tron:guava_cache_eviction_count` (Gauge, label `type`) — eviction count of a Guava cache; `type` is the cache name. | ||
| - (Registered via `GuavaCacheExports` for caches that opt in to `CacheManager`.) | ||
|
|
||
| #### Logging | ||
|
|
||
| - `tron:error_info` (Counter, labels `topic`, `type`) — incremented on every ERROR-level log line by `InstrumentedAppender`. | ||
|
|
||
| #### System | ||
|
|
||
| Emitted by `OperatingSystemExports` (no labels): | ||
|
|
||
| - `system_available_cpus`, `process_cpu_load`, `system_cpu_load`, `system_load_average`, `system_total_physical_memory_bytes`, `system_free_physical_memory_bytes`, `system_total_swap_spaces_bytes`, `system_free_swap_spaces_bytes`. | ||
|
|
||
| #### JVM / process | ||
|
|
||
| Auto-emitted by the Prometheus client library via `DefaultExports.initialize()` (`simpleclient_hotspot`). The full list is owned by the upstream library and not enumerated here; see the [client_java](https://github.com/prometheus/client_java) docs. Common ones: `jvm_memory_bytes_*`, `jvm_gc_collection_seconds_*`, `jvm_threads_*`, `process_cpu_seconds_total`, `process_open_fds`, `process_resident_memory_bytes`. | ||
|
|
||
| --- | ||
|
|
||
| **References** | ||
|
|
||
| - [Official metrics documentation](https://tronprotocol.github.io/documentation-en/using_javatron/metrics/) — descriptions, configuration, and example queries. | ||
| - [tron-docker `metric_monitor/README.md`](https://github.com/tronprotocol/tron-docker/blob/main/metric_monitor/README.md) — operator-oriented overview with deployment guidance. | ||
| - [java-tron-server Grafana dashboard](https://github.com/tronprotocol/tron-docker/blob/main/metric_monitor/grafana_dashboard/java-tron-server.json) — maintained reference dashboard JSON. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[NIT] Pre-4.8.2 baseline omits
tron:error_infoThe baseline claims to be a complete snapshot of metrics emitted prior to this changelog, but it does not list
tron:error_info(Counter, labelstopic,type), which is registered incommon/src/main/java/org/tron/common/prometheus/InstrumentedAppender.java:12-17and emitted on every ERROR-level log line. The metric is already actively asserted inframework/src/test/java/org/tron/core/metrics/prometheus/PrometheusApiServiceTest.java:114.Leaving it out of the baseline means a future PR that renames or removes it will have no documented baseline to compare against, which is the exact failure mode this changelog is meant to prevent.
Suggestion: Add a one-line entry under one of the existing subsections (Core or a new "Logging" subsection) describing
tron:error_info(Counter, labelstopic,type) — emitted on every ERROR-level log line byInstrumentedAppender.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a dedicated Logging subsection to the Pre-4.8.2 Baseline with
tron:error_info(Counter, labelstopic,type) — incremented on every ERROR-level log line byInstrumentedAppender.Also took the opportunity to do a full audit of all metric registration sites (
MetricsCounter,MetricsGauge,MetricsHistogram,InstrumentedAppender,OperatingSystemExports,GuavaCacheExports). Found one more gap:process_cpu_load(defined inOperatingSystemExports) was missing from the System section — added that in the same commit.