Skip to content

Conversation

@technowhizz
Copy link
Contributor

Adds support for collecting SMART metrics from NVMe drives with the use of pysmart and smartctl JSON output. Includes updates to the deployment playbooks, tests, and dashboards.

@technowhizz technowhizz requested a review from a team as a code owner December 12, 2025 11:00
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for monitoring NVMe drives by integrating smartctl's JSON output into smartmon.py, removing the dependency on nvme-cli. The changes are comprehensive, covering Ansible playbooks, the monitoring script, tests, and Grafana dashboards. The implementation is generally robust, with good error handling and updated tests. However, I've identified a few issues that could impact monitoring coverage and data accuracy. A logic change in smartmon.py might cause some disks to be skipped from monitoring. Additionally, some Prometheus queries in the updated Grafana dashboards are missing necessary filters, which could lead to incorrect data aggregation and display. I have provided specific suggestions to address these points.

Copy link
Member

@dougszumski dougszumski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing this forward @technowhizz! I haven't got round to trying it yet, but it looks like a good effort overall, well done

@dougszumski dougszumski enabled auto-merge (squash) January 16, 2026 11:07
@dougszumski dougszumski merged commit 7029ca8 into stackhpc/2025.1 Jan 16, 2026
20 of 22 checks passed
@dougszumski dougszumski deleted the updated_smartmon_nvme_support branch January 16, 2026 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants