Skip to content

docs: monitoring: document undocumented metrics and fix tables#2378

Open
eschabell wants to merge 1 commit intofluent:masterfrom
eschabell:erics_monitoring_fixes
Open

docs: monitoring: document undocumented metrics and fix tables#2378
eschabell wants to merge 1 commit intofluent:masterfrom
eschabell:erics_monitoring_fixes

Conversation

@eschabell
Copy link
Collaborator

@eschabell eschabell commented Feb 13, 2026

  • Add fluentbit_input_ring_buffer_writes_total metric
  • Add fluentbit_input_ring_buffer_retries_total metric
  • Add fluentbit_input_ring_buffer_retry_failures_total metric
  • Add fluentbit_output_chunk_available_capacity_percent metric
  • Sort v2 metrics table alphabetically
  • Sort v2 storage layer table alphabetically
  • Fix unit for upstream connection metrics from bytes to connections
  • Fix grammar in storage_chunks_busy description
  • Fix grammar in fs_chunks_busy and fs_chunks_busy_bytes descriptions
  • Fix grammar in upstream connection metric descriptions
  • Fix stray markdown bold on Complete coverage bullet point
  • Remove trailing period from Mem_Buf_Limit in storage_overlimit description
  • Normalize table column formatting

Fixes #2377

Summary by CodeRabbit

  • Documentation
    • Reorganized and reformatted monitoring metric tables for clearer v1/v2 presentation.
    • Added many input/output/storage metrics and reinstated several previously removed metrics (build, hot-reload, start-time, uptime, latency, and various counters).
    • Removed/deprecated legacy duplicates and consolidated metrics across sections.
    • Reworked storage-layer organization and labeling conventions (host vs name).
    • Clarified output latency descriptions, bucket boundaries, and improved naming consistency.

@eschabell eschabell self-assigned this Feb 13, 2026
@eschabell eschabell requested a review from a team as a code owner February 13, 2026 12:59
@eschabell eschabell added waiting-on-review Waiting on a review from mainteners 5.0 labels Feb 13, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 13, 2026

Note

Currently processing new changes in this PR. This may take a few minutes, please wait...

 _____________________
< sync; sync; review; >
 ---------------------
  \
   \   \
        \ /\
        ( )
      .( o ).

✏️ Tip: You can disable in-progress messages and the fortune message in your review settings.

Tip

CodeRabbit can use Trivy to scan for security misconfigurations and secrets in Infrastructure as Code files.

Add a .trivyignore file to your project to customize which findings Trivy reports.

📝 Walkthrough

Walkthrough

Reorganized and reformatted v1/v2 metric tables in administration/monitoring.md; added many new Fluent Bit metrics (input, output, build/hot-reload, uptime/process start) and substantially restructured storage-layer metric names and label conventions. No code changes.

Changes

Cohort / File(s) Summary
Monitoring docs (metrics + storage)
administration/monitoring.md
Reflowed and reformatted v1/v2 metric tables and column layout; added numerous new metrics (many fluentbit_input_*, fluentbit_output_*, fluentbit_build_info, fluentbit_hot_reloaded_times, fluentbit_process_start_time_seconds, fluentbit_uptime, etc.); removed/deprecated several entries from older lists; reorganized storage-layer metrics and adjusted label conventions (hostname vs. name aliases); wording/formatting tweaks in output latency section.

Sequence Diagram(s)

(Skipped — documentation-only changes.)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • cosmo0920
  • patrick-stephens

Poem

🐰 I hopped through metrics, neat and bright,
New names lined up in tidy sight.
Inputs, outputs, uptime hum,
Storage labels find their sum,
A little rabbit cheers tonight. 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main changes: documenting undocumented metrics and fixing table formatting, which aligns with the PR objectives.
Linked Issues check ✅ Passed The PR adds missing metrics, updates descriptions for accuracy, fixes table formatting/ordering, and corrects units as required by issue #2377.
Out of Scope Changes check ✅ Passed All changes are scoped to documentation updates in administration/monitoring.md as defined by the linked issue objectives.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request updates the Fluent Bit monitoring documentation by adding four previously undocumented metrics, correcting units and grammar in existing metric descriptions, and improving table organization through alphabetical sorting and normalized formatting.

Changes:

  • Added four new ring buffer and output capacity metrics to the v2 metrics table
  • Corrected units for upstream connection metrics from "bytes" to "connections"
  • Fixed grammar issues in multiple metric descriptions and corrected a stray markdown bold

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@eschabell
Copy link
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 13, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@eschabell eschabell force-pushed the erics_monitoring_fixes branch from ca93ce4 to a3a7b41 Compare February 13, 2026 13:25
@eschabell
Copy link
Collaborator Author

@cosmo0920 and @patrick-stephens ready for review!

@c-neto
Copy link
Contributor

c-neto commented Feb 14, 2026

Hi @eschabell!

I noticed that this PR does not include the fluentbit_input_long_line_skipped_total new metric created in this PR: fluent/fluent-bit#11457. Additionally, the description for fluentbit_input_long_line_truncated_total states that this metric is available when skip_long_lines is enabled, but this metric only applies when truncate_long_lines is defined. This was the motivation for creating issue fluent/fluent-bit#11457 to expose long line occurrences.

@eschabell eschabell force-pushed the erics_monitoring_fixes branch from 2faebed to 1b12e11 Compare February 18, 2026 10:59
@eschabell
Copy link
Collaborator Author

Thanks @c-neto, added those to the PR, appreciate that so tagging you for another review!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
administration/monitoring.md (1)

210-212: Ring buffer metrics are missing an availability qualifier.

Every other conditionally-emitted metric in this table (files, long_line, memrb) has a note explaining when it appears. These three ring buffer metrics have no such qualifier, leaving users guessing when to expect them.

Based on learnings, ring buffer parameters use the thread.ring_buffer.* configuration prefix (e.g., thread.ring_buffer.capacity), so a note along these lines would align with the pattern used elsewhere in the table:

📝 Suggested addition (same pattern as surrounding rows)
-| `fluentbit_input_ring_buffer_retries_total` | name: the name or alias for the input instance | The number of ring buffer write retries. | counter | retries |
-| `fluentbit_input_ring_buffer_retry_failures_total` | name: the name or alias for the input instance | The number of ring buffer write retry failures. | counter | failures |
-| `fluentbit_input_ring_buffer_writes_total` | name: the name or alias for the input instance | The number of ring buffer write operations. | counter | writes |
+| `fluentbit_input_ring_buffer_retries_total` | name: the name or alias for the input instance | The number of ring buffer write retries. Only available for input plugins configured with ring buffer mode (using `thread.ring_buffer.*` parameters). | counter | retries |
+| `fluentbit_input_ring_buffer_retry_failures_total` | name: the name or alias for the input instance | The number of ring buffer write retry failures. Only available for input plugins configured with ring buffer mode (using `thread.ring_buffer.*` parameters). | counter | failures |
+| `fluentbit_input_ring_buffer_writes_total` | name: the name or alias for the input instance | The number of ring buffer write operations. Only available for input plugins configured with ring buffer mode (using `thread.ring_buffer.*` parameters). | counter | writes |

Based on learnings, the correct configuration key prefix for ring buffer parameters is thread. (not threaded.).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@administration/monitoring.md` around lines 210 - 212, Add an availability
qualifier for the three ring buffer metrics
(`fluentbit_input_ring_buffer_retries_total`,
`fluentbit_input_ring_buffer_retry_failures_total`,
`fluentbit_input_ring_buffer_writes_total`) matching the pattern used for other
conditional metrics (e.g., files, long_line, memrb): state that these appear
only when ring buffering is enabled via the `thread.ring_buffer.*` configuration
keys (use the correct `thread.` prefix, not `threaded.`), and include a short
note in the table cell explaining when users should expect the metrics to be
emitted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@administration/monitoring.md`:
- Line 231: Replace the incorrect metric name
`fluentbit_input_chunks.storage_chunks` with the correct Prometheus-compliant
metric `fluentbit_input_storage_chunks` in the documentation table; ensure the
period is removed and the underscore-only convention used elsewhere is followed
so the entry matches the actual Fluent Bit metric naming (e.g., aligns with
`fluentbit_input_storage_chunks` and related rows).
- Line 193: Update the metric documentation for fluentbit_build_info: change its
description to state that it is an informational metric that always exports the
value 1 and carries build metadata via labels (e.g., hostname, version, os), and
change its unit from "seconds" to a dimensionless unit such as "1" (or
"dimensionless") so it follows Prometheus `_info` conventions; locate the entry
referencing `fluentbit_build_info` in the monitoring docs and update the
description text and the unit column accordingly.

---

Nitpick comments:
In `@administration/monitoring.md`:
- Around line 210-212: Add an availability qualifier for the three ring buffer
metrics (`fluentbit_input_ring_buffer_retries_total`,
`fluentbit_input_ring_buffer_retry_failures_total`,
`fluentbit_input_ring_buffer_writes_total`) matching the pattern used for other
conditional metrics (e.g., files, long_line, memrb): state that these appear
only when ring buffering is enabled via the `thread.ring_buffer.*` configuration
keys (use the correct `thread.` prefix, not `threaded.`), and include a short
note in the table cell explaining when users should expect the metrics to be
emitted.

@eschabell eschabell force-pushed the erics_monitoring_fixes branch from 1b12e11 to 48caef0 Compare February 18, 2026 11:20
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
administration/monitoring.md (1)

210-212: Consider adding an availability note to the ring buffer metrics for consistency.

fluentbit_input_memrb_dropped_* (lines 206–207) notes "Only available for input plugins with storage.type set to memrb", and the Tail-specific metrics note their plugin constraint. The ring buffer metrics are only emitted for threaded input plugins with thread.ring_buffer.* configured; adding a similar note would keep the table consistent and help users know when to expect these metrics.

✏️ Suggested wording additions
-| `fluentbit_input_ring_buffer_retries_total` | name: the name or alias for the input instance | The number of ring buffer write retries. | counter | retries |
-| `fluentbit_input_ring_buffer_retry_failures_total` | name: the name or alias for the input instance | The number of ring buffer write retry failures. | counter | failures |
-| `fluentbit_input_ring_buffer_writes_total` | name: the name or alias for the input instance | The number of ring buffer write operations. | counter | writes |
+| `fluentbit_input_ring_buffer_retries_total` | name: the name or alias for the input instance | The number of ring buffer write retries. Only available for input plugins with the ring buffer enabled (`thread.ring_buffer.*` configuration). | counter | retries |
+| `fluentbit_input_ring_buffer_retry_failures_total` | name: the name or alias for the input instance | The number of ring buffer write retry failures. Only available for input plugins with the ring buffer enabled (`thread.ring_buffer.*` configuration). | counter | failures |
+| `fluentbit_input_ring_buffer_writes_total` | name: the name or alias for the input instance | The number of ring buffer write operations. Only available for input plugins with the ring buffer enabled (`thread.ring_buffer.*` configuration). | counter | writes |

Based on learnings: the correct configuration key prefix for ring buffer parameters in Fluent Bit input plugins is thread. (e.g., thread.ring_buffer.capacity, thread.ring_buffer.window, thread.ring_buffer.retry_limit).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@administration/monitoring.md` around lines 210 - 212, Add an availability
note to the ring buffer metric rows
(`fluentbit_input_ring_buffer_retries_total`,
`fluentbit_input_ring_buffer_retry_failures_total`,
`fluentbit_input_ring_buffer_writes_total`) stating these metrics are only
emitted for threaded input plugins when the thread.ring_buffer.* parameters are
configured (e.g., thread.ring_buffer.capacity, thread.ring_buffer.window,
thread.ring_buffer.retry_limit), matching the style of the existing
`fluentbit_input_memrb_dropped_*` and Tail-specific notes for consistency.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@administration/monitoring.md`:
- Line 245: Update the metric name documented in the monitoring table from
fluentbit_storage_mem_chunk to the correct plural fluentbit_storage_mem_chunks
to match Fluent Bit's actual metric (registered in flb_storage.c as mem_chunks
and equivalent to chunks.mem_chunks); change the table entry label and any
references to the singular form so it aligns with adjacent storage metrics like
fs_chunks and fs_chunks_busy.

---

Duplicate comments:
In `@administration/monitoring.md`:
- Line 193: The table row for fluentbit_build_info is incorrect: update the
description to state that fluentbit_build_info is a Prometheus _info metric that
always exports the constant value 1 and carries metadata (hostname, version, os)
as labels, and change the unit from "seconds" to "unitless" (or "1"); modify the
row for the metric named fluentbit_build_info in the monitoring document to
reflect this corrected description and unit.

---

Nitpick comments:
In `@administration/monitoring.md`:
- Around line 210-212: Add an availability note to the ring buffer metric rows
(`fluentbit_input_ring_buffer_retries_total`,
`fluentbit_input_ring_buffer_retry_failures_total`,
`fluentbit_input_ring_buffer_writes_total`) stating these metrics are only
emitted for threaded input plugins when the thread.ring_buffer.* parameters are
configured (e.g., thread.ring_buffer.capacity, thread.ring_buffer.window,
thread.ring_buffer.retry_limit), matching the style of the existing
`fluentbit_input_memrb_dropped_*` and Tail-specific notes for consistency.

  - add fluentbit_input_ring_buffer_writes_total metric
  - add fluentbit_input_ring_buffer_retries_total metric
  - add fluentbit_input_ring_buffer_retry_failures_total metric
  - add fluentbit_output_chunk_available_capacity_percent metric
  - sort v2 metrics table alphabetically
  - sort v2 storage layer table alphabetically
  - fix unit for upstream connection metrics from bytes to connections
  - fix grammar in storage_chunks_busy description
  - fix grammar in fs_chunks_busy and fs_chunks_busy_bytes descriptions
  - fix grammar in upstream connection metric descriptions
  - fix stray markdown bold on Complete coverage bullet point
  - remove trailing period from Mem_Buf_Limit in storage_overlimit
  description
  - normalize table column formatting
  - fix unit for fluentbit_hot_reloaded_times metric
  - fix unit for fluentbit_input_ring_buffer_retry_failures_total metric
  - add 5 Tail-specific metrics to the v2 metrics table:
    - fluentbit_input_files_closed_total
    - fluentbit_input_files_opened_total
    - fluentbit_input_files_rotated_total
    - fluentbit_input_long_line_truncated_total
    - fluentbit_input_multiline_truncated_total
    - add missing fluentbit_input_long_line_skipped_total metric
      (added in fluent-bit#11457 for tracking skipped long lines
      when skip_long_lines is enabled)
    - fix fluentbit_input_long_line_truncated_total description to
      reference truncate_long_lines instead of skip_long_lines
    - clean up fluentbit_build_info description to accurately reflect
      the v2 cmetrics behavior (value is the init_time epoch timestamp)
    - remove invalid fluentbit_input_chunks.storage_chunks row
    - fix fluentbit_storage_mem_chunks metric name (add missing s)

Fixes fluent#2377 and fixes fluent#2379 (thanks to Carlos Neto <carlos.neto.dev@gmail.com>)

Signed-off-by: Eric D. Schabell <eric@schabell.org>
@eschabell eschabell force-pushed the erics_monitoring_fixes branch from 48caef0 to 28c4e8f Compare February 18, 2026 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

5.0 waiting-on-review Waiting on a review from mainteners

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Monitoring doc page needs general update and new metrics to be added.

2 participants