Skip to content

feat: Add disk usage percentage and warn on high usage#57

Open
lfrancke wants to merge 11 commits intomainfrom
feat/disk-usage-percent
Open

feat: Add disk usage percentage and warn on high usage#57
lfrancke wants to merge 11 commits intomainfrom
feat/disk-usage-percent

Conversation

@lfrancke
Copy link
Copy Markdown
Member

@lfrancke lfrancke commented Mar 30, 2026

Summary

  • Add usage_percent field to disk collection output, calculated as (total - available) / total * 100
  • Log at WARN level when any disk exceeds 85% usage, making it easy to spot in Graylog/Vector

Motivated by a customer running out of space on an attached PVC over the weekend.

Test plan

  • cargo test --all-features passes
  • cargo clippy with RUSTFLAGS="-D warnings" passes

lfrancke and others added 11 commits March 30, 2026 10:45
The tracing statement for `user.gid` was reading from `user.uid`
instead of `user.gid`, causing the wrong value to be reported.
Replace `into_iter().next().is_none()` with `list().is_empty()`
for clarity, and use `list().iter()` for the actual collection.
This was likely a debugging leftover — the error source chain is
already captured via the `successors` iterator below.
JSON serialization and file write can fail at runtime (e.g. disk
full). Log the error and continue the loop instead of crashing,
since this tool may run continuously for hours.
std::thread::sleep blocks the entire tokio worker thread.
Since main is already async, use the non-blocking alternative.
In a container debugging tool, broken DNS config (/etc/resolv.conf)
is a likely scenario to diagnose. Log the error and skip DNS lookups
instead of panicking.
…andling

The network collector silently swallowed interface listing errors by
returning empty data. Now it returns Result so the orchestrator wraps
it in ComponentResult, matching the pattern used by other fallible
collectors. Errors appear in JSON output instead of being silently
lost.
HashMap produces non-deterministic JSON output, making it hard to
diff containerdebug output across runs. BTreeMap sorts keys
consistently.
Add `usage_percent` field to disk collection output. When a disk
exceeds 85% usage, log at WARN level instead of INFO so it stands
out in log aggregation systems.
@lfrancke lfrancke moved this to Development: Waiting for Review in Stackable Engineering Mar 30, 2026
@lfrancke lfrancke self-assigned this Mar 30, 2026
@sbernauer sbernauer self-requested a review March 31, 2026 06:58
@sbernauer sbernauer moved this from Development: Waiting for Review to Development: In Review in Stackable Engineering Mar 31, 2026

impl From<&sysinfo::Disk> for Disk {
fn from(sysinfo_disk: &sysinfo::Disk) -> Self {
let total_space = sysinfo_disk.total_space();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to propose an stylistic improvement (and safety guard). It also stores the usage bytes, and I figured "why not report that as well?"

diff --git a/src/system_information/disk.rs b/src/system_information/disk.rs
index 1cbc429..aadf840 100644
--- a/src/system_information/disk.rs
+++ b/src/system_information/disk.rs
@@ -5,6 +5,7 @@ pub struct Disk {
     pub name: String,
     pub mount_point: String,
     pub total_space: u64,
+    pub used_space: u64,
     pub available_space: u64,
     pub usage_percent: f64,
 }
@@ -24,16 +25,19 @@ impl From<&sysinfo::Disk> for Disk {
     fn from(sysinfo_disk: &sysinfo::Disk) -> Self {
         let total_space = sysinfo_disk.total_space();
         let available_space = sysinfo_disk.available_space();
-        let usage_percent = if total_space > 0 {
-            (total_space - available_space) as f64 / total_space as f64 * 100.0
-        } else {
-            0.0
+        // There should'nt be negative used bytes. We prevent underflow, to not falsely report more
+        // used than total space.
+        let used_space = total_space.saturating_sub(available_space);
+        let usage_percent = match used_space {
+            0 => 0.0,
+            used_space => used_space as f64 / total_space as f64 * 100.0,
         };
 
         let disk = Disk {
             name: sysinfo_disk.name().to_string_lossy().into_owned(),
             mount_point: sysinfo_disk.mount_point().to_string_lossy().into_owned(),
             total_space,
+            used_space,
             available_space,
             usage_percent,
         };
@@ -43,6 +47,7 @@ impl From<&sysinfo::Disk> for Disk {
                 disk.mount_point,
                 disk.name,
                 disk.space.total = disk.total_space,
+                disk.space.used = disk.used_space,
                 disk.space.available = disk.available_space,
                 disk.space.usage_percent = format!("{:.1}%", disk.usage_percent),
                 "disk usage high"
@@ -52,6 +57,7 @@ impl From<&sysinfo::Disk> for Disk {
                 disk.mount_point,
                 disk.name,
                 disk.space.total = disk.total_space,
+                disk.space.used = disk.used_space,
                 disk.space.available = disk.available_space,
                 disk.space.usage_percent = format!("{:.1}%", disk.usage_percent),
                 "found disk"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Development: In Review

Development

Successfully merging this pull request may close these issues.

2 participants