OCPBUGS-12953: Fix missing metrics for Windows nodes on Nodes list and detail pages#16525
OCPBUGS-12953: Fix missing metrics for Windows nodes on Nodes list and detail pages#16525sg00dwin wants to merge 3 commits into
Conversation
…il pages Assisted by: Claude (Opus 4.6)
…ndows metric Assisted by: Claude (Opus 4.6)
Assisted by: Claude (Opus 4.6)
|
@sg00dwin: This pull request references Jira Issue OCPBUGS-12953, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughThis PR improves Prometheus node metrics computation by introducing IP-to-hostname mapping utilities and refining PromQL CPU queries. Exported helper functions translate Prometheus instance labels (including IP:port forms) into Kubernetes node names. PromQL CPU query templates are updated to filter by IP address, and the metrics fetching logic integrates these mappings when storing resolved metric values. ChangesNode Metrics IP Resolution
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 15✅ Passed checks (15 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sg00dwin The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
frontend/packages/console-app/src/components/nodes/NodesPage.tsx (1)
1074-1090: ⚡ Quick winDepending on the whole
nodesarray may trigger excessive metric refetches.
nodescomes fromuseK8sWatchResource, whose reference changes on any watched-node update (status/heartbeat churn), not just when the set of nodes or their IPs change. Each change clears the 30s interval, fires an immediatefetchNodeMetrics(several Prometheus queries), and re-arms the interval. Since the IP→hostname map only needs nodename+InternalIP, consider depending on a stable derived key instead of the full array.♻️ Sketch: derive a stable dependency
+ const nodeIPKey = useMemo( + () => + nodes + .map((n) => `${n.metadata?.name}:${n.status?.addresses?.find((a) => a.type === 'InternalIP')?.address}`) + .sort() + .join(','), + [nodes], + ); useEffect(() => { const updateMetrics = async () => { try { const metrics = await fetchNodeMetrics(nodes); ... - }, [dispatch, nodes]); + // eslint-disable-next-line react-hooks/exhaustive-deps + }, [dispatch, nodeIPKey]);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/packages/console-app/src/components/nodes/NodesPage.tsx` around lines 1074 - 1090, The effect depends on the entire nodes array which changes frequently; instead derive a stable dependency (e.g. a memoized string or array of node identifiers composed of node.metadata.name + InternalIP) using useMemo and replace nodes in the useEffect dependency list with that derived key plus showMetrics and dispatch; keep the existing updateMetrics (which calls fetchNodeMetrics and dispatch(setNodeMetrics)) and interval logic but trigger it only when the derived key changes so you avoid clearing/recreating the 30s interval on unrelated node status churn.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@frontend/packages/console-app/src/components/nodes/NodesPage.tsx`:
- Around line 599-608: The function resolveInstanceLabel's signature understates
its real contract: it can be called with undefined (e.g. callers pass
data.metric.instance || data.metric.node) and can return undefined at runtime.
Change the signature of resolveInstanceLabel to accept instance: string |
undefined and return string | undefined, and update the implementation to
short-circuit when instance is undefined (return undefined); keep existing
behavior for colon-separated instances using ipToHostname.get(ip) || instance.
Ensure any callers/uses expecting a string handle the possible undefined return.
---
Nitpick comments:
In `@frontend/packages/console-app/src/components/nodes/NodesPage.tsx`:
- Around line 1074-1090: The effect depends on the entire nodes array which
changes frequently; instead derive a stable dependency (e.g. a memoized string
or array of node identifiers composed of node.metadata.name + InternalIP) using
useMemo and replace nodes in the useEffect dependency list with that derived key
plus showMetrics and dispatch; keep the existing updateMetrics (which calls
fetchNodeMetrics and dispatch(setNodeMetrics)) and interval logic but trigger it
only when the derived key changes so you avoid clearing/recreating the 30s
interval on unrelated node status churn.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: d4b14bf4-e524-4cc3-9886-3b2744ea2f42
📒 Files selected for processing (3)
frontend/packages/console-app/src/components/nodes/NodesPage.tsxfrontend/packages/console-app/src/components/nodes/__tests__/NodesPage.spec.tsfrontend/packages/console-app/src/components/nodes/node-dashboard/queries.ts
| export const resolveInstanceLabel = ( | ||
| instance: string, | ||
| ipToHostname: Map<string, string>, | ||
| ): string => { | ||
| if (instance?.includes(':')) { | ||
| const ip = instance.split(':')[0]; | ||
| return ipToHostname.get(ip) || instance; | ||
| } | ||
| return instance; | ||
| }; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check whether strictNullChecks is enabled for this package's TS config
fd -t f 'tsconfig*.json' frontend/packages/console-app frontend 2>/dev/null | xargs -r rg -n 'strict|strictNullChecks'Repository: openshift/console
Length of output: 407
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Show resolveInstanceLabel + local call sites in NodesPage.tsx
node_file="frontend/packages/console-app/src/components/nodes/NodesPage.tsx"
echo "---- resolveInstanceLabel definition (near lines 580-640) ----"
sed -n '560,640p' "$node_file" | nl -ba
echo "---- usages of resolveInstanceLabel within NodesPage.tsx ----"
rg -n "resolveInstanceLabel" "$node_file" || true
# 2) Locate NodesPage.spec.ts and show the spec that mentions resolveInstanceLabel(undefined,...)
spec_file="frontend/packages/console-app/src/components/nodes/NodesPage.spec.ts"
echo "---- spec file existence / relevant section ----"
if [ -f "$spec_file" ]; then
sed -n '1,120p' "$spec_file" | nl -ba
rg -n "resolveInstanceLabel" "$spec_file" || true
else
echo "Missing: $spec_file"
fi
# 3) Inspect tsconfig files under frontend/packages/console-app (and any that reference it)
echo "---- tsconfig files under frontend/packages/console-app ----"
fd -t f 'tsconfig*.json' frontend/packages/console-app | sort | while read -r f; do
echo "### $f"
# print only relevant strict-related lines plus compilerOptions nullability settings if any
rg -n "strictNullChecks|strict[\"']?|compilerOptions" "$f" || true
done
# 4) If there is a tsconfig specific to this package, show the strict-related block verbatim
echo "---- strict/strictNullChecks blocks verbatim (console-app tsconfigs) ----"
fd -t f 'tsconfig*.json' frontend/packages/console-app | sort | while read -r f; do
echo "### $f (snippet)"
sed -n '1,120p' "$f" | nl -ba | sed -n '1,120p' | head -n 120
doneRepository: openshift/console
Length of output: 185
🏁 Script executed:
#!/bin/bash
set -euo pipefail
node_file="frontend/packages/console-app/src/components/nodes/NodesPage.tsx"
spec_file="frontend/packages/console-app/src/components/nodes/NodesPage.spec.ts"
show_numbered() { awk '{printf "%d:%s\n", NR, $0}'; }
echo "---- resolveInstanceLabel definition (NodesPage.tsx, ~580-640) ----"
sed -n '560,640p' "$node_file" | show_numbered
echo "---- usages of resolveInstanceLabel within NodesPage.tsx ----"
rg -n "resolveInstanceLabel" "$node_file" || true
echo "---- spec file section where resolveInstanceLabel is referenced ----"
if [ -f "$spec_file" ]; then
rg -n "resolveInstanceLabel" "$spec_file" || true
echo "---- first 120 lines of NodesPage.spec.ts ----"
sed -n '1,120p' "$spec_file" | show_numbered
else
echo "Missing: $spec_file"
fi
echo "---- tsconfig files under frontend/packages/console-app ----"
fd -t f 'tsconfig*.json' frontend/packages/console-app | sort | while read -r f; do
echo "### $f"
rg -n '"strictNullChecks"|"strict"' "$f" || true
done
echo "---- contents of any tsconfig found that mention strictNullChecks/strict (full line excerpts) ----"
fd -t f 'tsconfig*.json' frontend/packages/console-app | sort | while read -r f; do
rg -n "strictNullChecks|\"strict\"" "$f" && echo "---" || true
doneRepository: openshift/console
Length of output: 3390
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the call site around the reported usage (~line 656)
node_file="frontend/packages/console-app/src/components/nodes/NodesPage.tsx"
echo "---- resolveInstanceLabel call site (NodesPage.tsx, ~640-700) ----"
sed -n '630,720p' "$node_file" | awk '{print NR ":" $0}'
echo "---- all usages of resolveInstanceLabel in console-app package ----"
rg -n "resolveInstanceLabel" frontend/packages/console-app || true
echo "---- locate NodesPage.*spec.* ----"
fd -t f "NodesPage.*spec.*" frontend/packages/console-app -0 2>/dev/null | xargs -0 -I{} echo "{}" || true
echo "---- global search for resolveInstanceLabel(undefined" # checks for the spec mention
rg -n "resolveInstanceLabel\s*\(\s*undefined" frontend || true
echo "---- global search for NodesPage.spec mention in repo ----"
rg -n "NodesPage\.spec" frontend/packages/console-app || true
echo "---- tsconfig files relevant to console-app (show extends/strict/null settings) ----"
fd -t f 'tsconfig*.json' frontend/packages/console-app frontend -0 2>/dev/null | xargs -0 -I{} sh -c 'echo "### {}"; rg -n "\"extends\"|\"strict\"|\"strictNullChecks\"|compilerOptions" "{}" || true'Repository: openshift/console
Length of output: 6542
🏁 Script executed:
#!/bin/bash
set -euo pipefail
pkg_dir="frontend/packages/console-app"
echo "---- list files in console-app package root (tsconfig-related) ----"
ls -la "$pkg_dir" | head -n 200
echo "---- find tsconfig* under console-app ----"
fd -t f 'tsconfig*.json' "$pkg_dir" || true
echo "---- inspect package.json scripts for console-app (ts-jest/tsc config) ----"
if [ -f "$pkg_dir/package.json" ]; then
sed -n '1,200p' "$pkg_dir/package.json" | nl -ba
rg -n "tsconfig|jest|ts-jest|babel-jest|ts-node|tsc" "$pkg_dir/package.json" || true
else
echo "Missing: $pkg_dir/package.json"
fi
echo "---- locate any tsconfig-base/extends used by console-app ----"
fd -t f 'tsconfig*.json' frontend | rg -n "tsconfig-base|extends" || true
# Search for how NodesPage is compiled/tested (ts-jest config references)
echo "---- search for NodesPage.tsx in repo ----"
rg -n "NodesPage\.tsx" frontend/packages/console-app -S || true
echo "---- inspect NodesPage.spec.ts imports to see test TS config assumptions ----"
spec_file="frontend/packages/console-app/src/components/nodes/__tests__/NodesPage.spec.ts"
if [ -f "$spec_file" ]; then
sed -n '1,120p' "$spec_file" | nl -ba
fiRepository: openshift/console
Length of output: 737
🏁 Script executed:
#!/bin/bash
set -euo pipefail
pkg_dir="frontend/packages/console-app"
root_tsconfig="frontend/tsconfig.json"
spec_file="frontend/packages/console-app/src/components/nodes/__tests__/NodesPage.spec.ts"
echo "---- console-app/package.json (scripts/config sections only) ----"
sed -n '1,220p' "$pkg_dir/package.json" | rg -n "scripts|jest|ts-jest|ts-node|test|transform|moduleNameMapper|globals|testRegex|testMatch|setupFiles" || true
echo "---- full scripts block (if present) ----"
node -e "const p=require('./$pkg_dir/package.json'); console.log(JSON.stringify(p.scripts,null,2))" 2>/dev/null || true
echo "---- Does console-app have any tsconfig locally? ----"
fd -t f 'tsconfig*.json' "$pkg_dir" || true
echo "---- frontend/tsconfig.json (compilerOptions strict/strictNullChecks) ----"
sed -n '1,120p' "$root_tsconfig" | rg -n "strictNullChecks|strict|compilerOptions" || true
sed -n '1,140p' "$root_tsconfig" | cat
echo "---- search for strictNullChecks in frontend tsconfigs ----"
rg -n "\"strictNullChecks\"|\"strict\"" frontend/tsconfig.json frontend -S --glob 'tsconfig*.json' || true
echo "---- NodesPage.spec.ts ----"
if [ -f "$spec_file" ]; then
sed -n '1,120p' "$spec_file" | cat
fi
echo "---- locate any jest/ts-jest config files in console-app or frontend ----"
fd -t f 'jest*.js|jest*.ts|jest*.json' frontend | head -n 50
fd -t f 'ts-jest*|ts-jest*.js|ts-jest*.ts' frontend | head -n 50Repository: openshift/console
Length of output: 5861
Update resolveInstanceLabel types to match its real contract
resolveInstanceLabel returns undefined when called with undefined (covered by NodesPage.spec.ts), and the caller can pass data.metric.instance || data.metric.node which may be undefined. Since frontend/tsconfig.json has strict: false, this doesn’t currently fail compilation, but the signature understates the runtime behavior.
♻️ Proposed signature change
export const resolveInstanceLabel = (
- instance: string,
+ instance: string | undefined,
ipToHostname: Map<string, string>,
-): string => {
+): string | undefined => {
if (instance?.includes(':')) {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@frontend/packages/console-app/src/components/nodes/NodesPage.tsx` around
lines 599 - 608, The function resolveInstanceLabel's signature understates its
real contract: it can be called with undefined (e.g. callers pass
data.metric.instance || data.metric.node) and can return undefined at runtime.
Change the signature of resolveInstanceLabel to accept instance: string |
undefined and return string | undefined, and update the implementation to
short-circuit when instance is undefined (return undefined); keep existing
behavior for colon-separated instances using ipToHostname.get(ip) || instance.
Ensure any callers/uses expecting a string handle the possible undefined return.
|
/label tide/merge-method-squash |
|
/jira refresh |
|
@sg00dwin: This pull request references Jira Issue OCPBUGS-12953, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (rrasouli@redhat.com), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
|
@sg00dwin: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |


Analysis / Root cause:
The CPU column shows
-for Windows nodes on the Compute > Nodes list page. This is caused by two issues:Missing CPU query fallback: The Console's CPU queries rely on Prometheus recording rules (
instance:node_cpu:rate:sumandinstance:node_num_cpu:sum) created by the kube-prometheus stack for Linux nodes. WMCO createsinstance:node_cpu:rate:sumfor Windows but does not createinstance:node_num_cpu:sum. TheCPUCellcomponent requires both values to be finite — missing either shows-. The storage queries in the same function already handle Windows viaorfallback, but CPU did not.Instance label format mismatch: On some platforms (e.g. AWS), the Windows exporter uses
IP:portformat for the Prometheusinstancelabel (e.g.10.0.61.250:9182), while Linux node-exporter uses hostnames (e.g.ip-10-0-61-250.us-east-2.compute.internal). The Console maps metrics to nodes by matching theinstancelabel to the node name, so Windows metrics fail to map.Jira: https://redhat.atlassian.net/browse/OCPBUGS-12953
Solution description:
Add
orWindows CPU fallback queries to both the node list page (NodesPage.tsx) and node detail page (queries.ts), matching the existing pattern used for Windows storage metrics:rate(windows_cpu_time_total{mode!="idle"}[3m])— sums non-idle CPU time, matching the Linuxinstance:node_cpu:rate:sumsemanticscount ... (windows_cpu_time_total{mode="idle"})— counts one series per core, matching how the Linuxinstance:node_num_cpu:sumworksRemap IP:port instance labels to hostnames in
fetchNodeMetrics()by building an IP→hostname map from each node's InternalIP address. When a Prometheus result has aninstancelabel containing:, the IP portion is looked up in the map and replaced with the hostname. Falls back to the original label if no match is found.Extract helpers and add unit tests —
buildIPToHostnameMapandresolveInstanceLabelare exported as testable functions with 9 unit tests covering mapping, edge cases (empty nodes, missing IP, unknown IP), and passthrough behavior for Linux nodes.Screenshots / screen recording:

Before
After

Test setup:
oc label namespace openshift-windows-machine-config-operator openshift.io/cluster-monitoring="true"ReadystatusTest cases:
-)sum by(instance)(instance:node_num_cpu:sum) or count by(instance)(windows_cpu_time_total{mode="idle"})returns results for both Linux and Windows nodesAdditional info:
instance:node_cpu:rate:sumrecording rule — it sumsmode="idle"(idle time) instead ofmode!="idle"(usage time). This Console fix bypasses that by using rawwindows_cpu_time_totalmetrics directly.orfallback pattern is proven — storage queries in the same function have used it for Windows metrics without issues.:(IP:port format). Linux node-exporter in OpenShift always uses hostname-style labels (confirmed via ServiceMonitor relabeling config), so Linux nodes are never affected.Assisted by: Claude (Opus 4.6)
Summary by CodeRabbit
Release Notes
Bug Fixes
Tests