Skip to content

Consolidate MCPRegistry status to match MCPServer pattern#4643

Draft
ChrisJBurns wants to merge 1 commit intomainfrom
chrisburns/mcpregistry-status-consolidation
Draft

Consolidate MCPRegistry status to match MCPServer pattern#4643
ChrisJBurns wants to merge 1 commit intomainfrom
chrisburns/mcpregistry-status-consolidation

Conversation

@ChrisJBurns
Copy link
Copy Markdown
Collaborator

Summary

MCPRegistry had an over-engineered three-phase status model with separate SyncStatus and APIStatus sub-objects, a DeriveOverallPhase() method, and a StatusCollector batching pattern. With sync operations moved to the registry server itself, only deployment readiness remains as a status source — making this complexity unnecessary and coupling the operator to internal registry server state.

This flattens MCPRegistryStatus to the standard Kubernetes workload pattern used by MCPServer: Phase + Ready condition + ReadyReplicas + URL. This enables kubectl wait --for=condition=Ready and consistent tooling across all ToolHive workloads.

Closes #4251
Closes #4269
Closes #4270
Closes #4510

Type of change

  • Refactoring (no behavior change)

Test plan

  • Unit tests (task test)
  • Linting (task lint-fix)

Changes

File Change
api/v1alpha1/mcpregistry_types.go Flatten MCPRegistryStatus to Phase/Message/URL/ReadyReplicas/Conditions/ObservedGeneration. Remove SyncStatus, APIStatus, StorageReference, SyncPhase, APIPhase types. Simplify MCPRegistryPhase to Pending/Running/Failed/Terminating. Update printer columns to match MCPServer.
controllers/mcpregistry_controller.go Rewrite status logic to set fields directly — no StatusManager/StatusDeriver. Add updateRegistryStatus() with refetch-before-update. Add setRegistryReadyCondition() using shared ConditionTypeReady. Set ObservedGeneration on conditions.
pkg/registryapi/types.go Relocate Error type from deleted mcpregistrystatus package. Add GetReadyReplicas() to Manager interface.
pkg/registryapi/manager.go Update Error references, replace ConditionAPIReady with ConditionTypeReady, add GetReadyReplicas implementation.
pkg/mcpregistrystatus/ Deleted — entire package (7 files). StatusCollector, StatusDeriver, APIStatusCollector patterns removed.
pkg/validation/image_validation.go MCPRegistryPhaseReadyMCPRegistryPhaseRunning
test-integration/mcp-registry/ Remove sync-related status helpers. Update all phase/condition references across 6 test files.
DESIGN.md Document simplified status model with history of removed StatusCollector pattern.
CRD YAMLs Regenerated to reflect new status schema and printer columns.

Does this introduce a user-facing change?

Yes — MCPRegistry status is simplified:

  • .status.phase values change: Ready is now Running, Syncing is removed
  • .status.syncStatus and .status.apiStatus fields are removed
  • New fields: .status.url and .status.readyReplicas
  • kubectl get mcpregistries columns change from Phase/Sync/API/Servers/Last Sync/Age to Status/Ready/Replicas/URL/Age
  • kubectl wait --for=condition=Ready now works

This is a breaking API change to v1alpha1 which carries no stability guarantee.

Special notes for reviewers

  • Breaking CRD change: v1alpha1 has no stability guarantee, no conversion webhook needed. Orphaned etcd fields will be overwritten on next reconciliation.
  • Stopped phase intentionally omitted — MCPServer has it for scale-to-zero but MCPRegistry has no replicas spec field.
  • Pre-existing test failures: TestMCPRemoteProxyStatusProgression has a data race unrelated to this PR. The vmcp task gen mockgen failures are a Go version mismatch (go1.24 mockgen vs go1.26 runtime) also pre-existing.

Generated with Claude Code

MCPRegistry had an over-engineered three-phase status model with separate
SyncStatus and APIStatus sub-objects, a DeriveOverallPhase() method, and
a StatusCollector batching pattern. With sync operations moved to the
registry server itself, only deployment readiness remains as a status
source, making this complexity unnecessary.

Flatten MCPRegistryStatus to the standard Kubernetes workload pattern
used by MCPServer: Phase + Ready condition + ReadyReplicas + URL. This
enables kubectl wait --for=condition=Ready and consistent tooling across
all ToolHive workloads.

Closes #4251
Closes #4269
Closes #4270
Closes #4510

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the size/XL Extra large PR: 1000+ lines changed label Apr 7, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 7, 2026

Codecov Report

❌ Patch coverage is 65.65657% with 34 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.82%. Comparing base (f8b6051) to head (7929c07).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
cmd/thv-operator/pkg/registryapi/manager.go 13.04% 20 Missing ⚠️
...thv-operator/controllers/mcpregistry_controller.go 83.09% 8 Missing and 4 partials ⚠️
cmd/thv-operator/pkg/registryapi/types.go 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4643      +/-   ##
==========================================
- Coverage   68.89%   68.82%   -0.07%     
==========================================
  Files         505      502       -3     
  Lines       52425    52329      -96     
==========================================
- Hits        36118    36016     -102     
- Misses      13514    13527      +13     
+ Partials     2793     2786       -7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ChrisJBurns
Copy link
Copy Markdown
Collaborator Author

@claude Please review this using the agents

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 7, 2026

Claude finished @ChrisJBurns's task in 3m 15s —— View job


PR Review: Consolidate MCPRegistry status to match MCPServer pattern

  • Gather context and understand the PR
  • Read key changed files
  • Run code-reviewer agent for Go/conventions analysis
  • Run kubernetes-expert agent for operator patterns
  • Synthesize findings and post review

Overview

The refactoring goal is sound and the overall design is correct. The MCPRegistryStatus struct now properly matches the MCPServer pattern (Phase + Ready condition + ReadyReplicas + URL), kubectl wait --for=condition=Ready works, list markers are correct, and the CRD change is handled appropriately for v1alpha1. Several concrete issues are worth addressing before merge.


Issues to Fix

1. Stale URL not cleared on Failed/Pending transitions
cmd/thv-operator/controllers/mcpregistry_controller.go — error branch (~line 217) and not-ready branch (~line 242)

When a registry transitions from RunningFailed or RunningPending, latest.Status.URL is never cleared. A kubectl get mcpregistry would show Phase: Failed alongside a valid-looking URL, misleading users into thinking the registry is accessible.

// In the not-ready branch:
latest.Status.Phase = mcpv1alpha1.MCPRegistryPhasePending
latest.Status.URL = ""   // add this
// In the error branch:
latest.Status.Phase = mcpv1alpha1.MCPRegistryPhaseFailed
latest.Status.URL = ""   // add this

Fix this →

2. IsAPIReady called twice per reconcile — TOCTOU + wasted API call
cmd/thv-operator/controllers/mcpregistry_controller.go — called inside updateRegistryStatus (~line 230) and again in the requeue check (~line 173)

Each call makes a live client.Get against the Kubernetes API. If the deployment state changes between the two calls (TOCTOU), the status could say Phase: Pending but the requeue skips the 30s wait (or vice versa). The .Times(2) expectations in tests document the duplication rather than fixing it.

The fix is to return the readiness result from updateRegistryStatus and reuse it:

func (r *MCPRegistryReconciler) updateRegistryStatus(...) (bool, error) { // return isReady
    ...
    isReady := r.registryAPIManager.IsAPIReady(ctx, mcpRegistry)
    ...
    return isReady, r.Status().Update(ctx, latest)
}

Fix this →

3. Redundant ReadyReplicas assignment in not-ready branch
cmd/thv-operator/controllers/mcpregistry_controller.go — not-ready branch

latest.Status.ReadyReplicas = readyReplicas is assigned before the if isReady branch and then redundantly re-assigned inside the else branch. Remove the second assignment.

4. Status update failure silently ignored on valid PodTemplateSpec path
cmd/thv-operator/controllers/mcpregistry_controller.govalidateAndUpdatePodTemplateStatus (~line 326)

The invalid-spec path returns false on status update failure, but the valid-spec path logs the error and returns true anyway — silently swallowing the condition. The PodTemplateValid=True condition is lost if the write fails. Align the valid-spec path to also return false on update failure.

5. Use registryapi.RegistryAPIPort instead of magic number 8080
cmd/thv-operator/controllers/mcpregistry_controller.go — ~line 235

The constant RegistryAPIPort = 8080 already exists in cmd/thv-operator/pkg/registryapi/types.go. Using the magic number creates a silent drift risk if the port ever changes.

endpoint := fmt.Sprintf("http://%s.%s:%d",
    mcpRegistry.GetAPIResourceName(), mcpRegistry.Namespace, registryapi.RegistryAPIPort)

Suggestions (non-blocking)

6. Missing tests for IsAPIReady and GetReadyReplicas in manager.go

Codecov reports manager.go at 13% patch coverage. The two new methods IsAPIReady and GetReadyReplicas are untested — both make API calls and have distinct failure modes (deployment not found, found-but-not-ready). Worth covering at minimum: (a) deployment not found → 0/false, (b) deployment found and ready → true/N.

7. Duplicate condition reason values
cmd/thv-operator/api/v1alpha1/mcpregistry_types.goConditionReasonRegistryReady = "Ready" and ConditionReasonRegistryNotReady = "NotReady"

These values are identical to ConditionReasonReady / ConditionReasonNotReady already defined in mcpserver_types.go:27-30. Consider either reusing the existing constants or adding a comment explaining why registry has its own names.

8. Unconditional Status().Update() in updateRegistryStatus

The status is always written even when nothing changed. This causes a minor reconcile-loop amplification (triggers a new watch event → re-enqueues). This is pre-existing debt shared with the MCPServer controller, so not a regression — worth a follow-up issue.

9. Large PR justification section needed

The automated review bot (github-actions) requested a ## Large PR Justification section in the PR description to dismiss its review. The PR description already contains the motivation clearly (large because it's an atomic refactor deleting 7 files of an entire package), just needs the explicit section:

## Large PR Justification
- Deletes the entire `pkg/mcpregistrystatus/` package (7 files, ~1100 lines) atomically — splitting would leave the codebase in an inconsistent state between old and new status models
- CRD and deepcopy files are generated and cannot be partially split

Kubernetes Operator Pattern Verification

The kubernetes-expert agent confirmed:

  • MCPRegistryStatus fields now match MCPServer exactly (Conditions, ObservedGeneration, Phase, Message, URL, ReadyReplicas)
  • kubectl wait --for=condition=Ready correctly supported — ConditionTypeReady = "Ready" with metav1.ConditionTrue/False values
  • +listType=map / +listMapKey=type markers on Conditions are correct
  • MCPRegistryPhase enum constraint correctly placed on type definition
  • No conversion webhook needed for v1alpha1 — correct call; orphaned etcd fields will be pruned on next write
  • Shared ConditionPodTemplateValid constants correctly defined once in mcpserver_types.go and reused

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

1 participant