Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
217329a
Add MCPEmbedding CRD for embedding model deployment in operator
ptelang Jan 14, 2026
1d91025
Rename MCPEmbedding crd as EmbeddingServer
ptelang Jan 15, 2026
f100ffd
Updated image and model names
ptelang Jan 15, 2026
3daccec
Remove unnecessary GroupRef from EmbeddingServers crd
ptelang Jan 15, 2026
7279a2d
Fixed reconciliation loop issue causing no service creation
ptelang Jan 15, 2026
fec2932
Rename examples/operator/embeddings to examples/opeartor/embedding-se…
ptelang Jan 15, 2026
00ed558
Updated embedding server example yamls
ptelang Jan 15, 2026
c529656
Bump toolhive operator version and fix linting issues
ptelang Jan 16, 2026
6d2ec66
Added e2e tests and fixed a bug
ptelang Jan 16, 2026
5d0efce
Convert EmbeddingServer to use StatefulSets and add HuggingFace token…
ptelang Jan 20, 2026
73f74a7
Fix linting issues
ptelang Jan 20, 2026
b40b3e5
Update Helm chart documentation
ptelang Jan 20, 2026
aef5d8c
Batch all EmbeddingServer status updates to a single call to prevent …
ptelang Jan 20, 2026
5b0064a
Fix README files
ptelang Jan 20, 2026
84f5d67
Updated CRD api docs
ptelang Jan 20, 2026
ea0c4f6
Fixed ensureStatefulSet and ensureService functions to prevent early …
ptelang Jan 20, 2026
989cfd7
Bump toolhive-operator-crds chart version to 0.0.99
ptelang Jan 20, 2026
e4978ab
Added toolhive-test-ns-1 and toolhive-test-ns-2 namespaces to test co…
ptelang Jan 20, 2026
d0499bb
Use smallest supported embedding model for e2e tests
ptelang Jan 20, 2026
931ad7c
Modify embeddingserver e2e tests to support slow model file downloads
ptelang Jan 20, 2026
d32eb3f
add envtest for EmbeddingServer
jerm-dro Jan 20, 2026
62a039b
add tests that demonstrate gaps
jerm-dro Jan 20, 2026
05e1f4f
Fix bugs in the tests
ptelang Jan 21, 2026
317a789
Add sleep before checking PVC status in embeddingserver e2e test
ptelang Jan 21, 2026
0dfb7e6
Update image location for huggingface inference engine
ptelang Jan 21, 2026
8ff356b
Addressed TODOs in the embedding-server integration tests
ptelang Jan 21, 2026
e1b679c
Add SPDX license header to embedding-server files
ptelang Jan 21, 2026
113b981
Fixed a linting issue by refactoring a high cyclomatic complexity fun…
ptelang Jan 21, 2026
9d2cc02
Merge branch 'main' into add-embedding-engine
ptelang Jan 21, 2026
60f052e
Merge branch 'main' into add-embedding-engine
ptelang Jan 22, 2026
47f3623
Bump toolhive-operator-crds chart version
ptelang Jan 22, 2026
5a8e464
Update all places from deployment to statefulset in ref to embeddings…
ptelang Jan 23, 2026
de85d9d
Remove the unnecessary updateStatefulSetWithRetry function
ptelang Jan 23, 2026
56d4f9b
Fix embedding server statefulset update detection to support sidecar …
ptelang Jan 23, 2026
9a5d19d
Refactored statefulSetNeedsUpdate function in embedding server contro…
ptelang Jan 23, 2026
e558afd
Removed left-over TODO comment
ptelang Jan 23, 2026
941537f
Replaced conditional branches with an immediately-invoked anonymous f…
ptelang Jan 23, 2026
79ae443
Removed unnecessary README.md files from test scenarios
ptelang Jan 23, 2026
a7cde8a
Add header forward middleware for remote MCP servers (#3423)
jhrozek Jan 23, 2026
2d8da5d
Add E2E tests for group endpoints (#3402)
dmjb Jan 23, 2026
5429aa0
authserver DCR hardening: Add grant_types and response_types allowlis…
jhrozek Jan 23, 2026
f802358
Refactor RBAC management to eliminate code duplication (#3368)
yrobla Jan 23, 2026
b7af76f
Add token endpoint handler (#3408)
jhrozek Jan 23, 2026
ff03438
Merge branch 'main' into add-embedding-engine
ptelang Jan 23, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -313,6 +313,40 @@ For the complete documentation structure and navigation, see `docs/arch/README.m
- Do not use "Conventional Commits", e.g. starting with `feat`, `fix`, `chore`, etc.
- Use mockgen for creating mocks instead of generating mocks by hand.

### Go Coding Style

- **Prefer immutable variable assignment with anonymous functions**:
When you need to assign a variable based on complex conditional logic, prefer using an immediately-invoked anonymous function instead of mutating the variable across multiple branches:

```go
// ✅ Good: Immutable assignment with anonymous function
phase := func() PhaseType {
if someCondition {
return PhaseA
}
if anotherCondition {
return PhaseB
}
return PhaseDefault
}()

// ❌ Avoid: Mutable variable across branches
var phase PhaseType
if someCondition {
phase = PhaseA
} else if anotherCondition {
phase = PhaseB
} else {
phase = PhaseDefault
}
```

**Benefits**:
- The variable is immutable after assignment, reducing bugs from accidental modification
- All decision logic is in one place with explicit returns
- Clearer logic flow and easier to understand
- Reduces cognitive load from tracking which branch sets which value

## Error Handling Guidelines

See `docs/error-handling.md` for comprehensive documentation.
Expand Down
272 changes: 272 additions & 0 deletions cmd/thv-operator/api/v1alpha1/embeddingserver_types.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,272 @@
// SPDX-License-Identifier: Apache-2.0

package v1alpha1

import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
)

// Condition types for EmbeddingServer (reuses common conditions from MCPServer)
// ConditionImageValidated and ConditionPodTemplateValid are shared with MCPServer

const (
// ConditionModelReady indicates whether the embedding model is downloaded and ready
ConditionModelReady = "ModelReady"

// ConditionVolumeReady indicates whether the PVC for model caching is ready
ConditionVolumeReady = "VolumeReady"
)

// Condition reasons for EmbeddingServer
// Image validation and PodTemplate reasons are shared with MCPServer

const (
// ConditionReasonModelDownloading indicates the model is being downloaded
ConditionReasonModelDownloading = "ModelDownloading"
// ConditionReasonModelReady indicates the model is downloaded and ready
ConditionReasonModelReady = "ModelReady"
// ConditionReasonModelFailed indicates the model download or initialization failed
ConditionReasonModelFailed = "ModelFailed"

// ConditionReasonVolumeCreating indicates the PVC is being created
ConditionReasonVolumeCreating = "VolumeCreating"
// ConditionReasonVolumeReady indicates the PVC is ready
ConditionReasonVolumeReady = "VolumeReady"
// ConditionReasonVolumeFailed indicates the PVC creation failed
ConditionReasonVolumeFailed = "VolumeFailed"
)

// EmbeddingServerSpec defines the desired state of EmbeddingServer
type EmbeddingServerSpec struct {
// Model is the HuggingFace embedding model to use (e.g., "sentence-transformers/all-MiniLM-L6-v2")
// +kubebuilder:validation:Required
Model string `json:"model"`

// HFTokenSecretRef is a reference to a Kubernetes Secret containing the huggingface token.
// If provided, the secret value will be provided to the embedding server for authentication with huggingface.
// +optional
HFTokenSecretRef *SecretKeyRef `json:"hfTokenSecretRef,omitempty"`

// Image is the container image for huggingface-embedding-inference
// +kubebuilder:validation:Required
// +kubebuilder:default="ghcr.io/huggingface/text-embeddings-inference:latest"
Image string `json:"image,omitempty"`

// ImagePullPolicy defines the pull policy for the container image
// +kubebuilder:validation:Enum=Always;Never;IfNotPresent
// +kubebuilder:default="IfNotPresent"
// +optional
ImagePullPolicy string `json:"imagePullPolicy,omitempty"`

// Port is the port to expose the embedding service on
// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Maximum=65535
// +kubebuilder:default=8080
Port int32 `json:"port,omitempty"`

// Args are additional arguments to pass to the embedding inference server
// +optional
Args []string `json:"args,omitempty"`

// Env are environment variables to set in the container
// +optional
Env []EnvVar `json:"env,omitempty"`

// Resources defines compute resources for the embedding server
// +optional
Resources ResourceRequirements `json:"resources,omitempty"`

// ModelCache configures persistent storage for downloaded models
// When enabled, models are cached in a PVC and reused across pod restarts
// +optional
ModelCache *ModelCacheConfig `json:"modelCache,omitempty"`

// PodTemplateSpec allows customizing the pod (node selection, tolerations, etc.)
// This field accepts a PodTemplateSpec object as JSON/YAML.
// Note that to modify the specific container the embedding server runs in, you must specify
// the 'embedding' container name in the PodTemplateSpec.
// +optional
// +kubebuilder:pruning:PreserveUnknownFields
// +kubebuilder:validation:Type=object
PodTemplateSpec *runtime.RawExtension `json:"podTemplateSpec,omitempty"`

// ResourceOverrides allows overriding annotations and labels for resources created by the operator
// +optional
ResourceOverrides *EmbeddingResourceOverrides `json:"resourceOverrides,omitempty"`

// Replicas is the number of embedding server replicas to run
// +kubebuilder:validation:Minimum=1
// +kubebuilder:default=1
// +optional
Replicas *int32 `json:"replicas,omitempty"`
}

// ModelCacheConfig configures persistent storage for model caching
type ModelCacheConfig struct {
// Enabled controls whether model caching is enabled
// +kubebuilder:default=true
// +optional
Enabled bool `json:"enabled,omitempty"`

// StorageClassName is the storage class to use for the PVC
// If not specified, uses the cluster's default storage class
// +optional
StorageClassName *string `json:"storageClassName,omitempty"`

// Size is the size of the PVC for model caching (e.g., "10Gi")
// +kubebuilder:default="10Gi"
// +optional
Size string `json:"size,omitempty"`

// AccessMode is the access mode for the PVC
// +kubebuilder:default="ReadWriteOnce"
// +kubebuilder:validation:Enum=ReadWriteOnce;ReadWriteMany;ReadOnlyMany
// +optional
AccessMode string `json:"accessMode,omitempty"`
}

// EmbeddingResourceOverrides defines overrides for annotations and labels on created resources
type EmbeddingResourceOverrides struct {
// StatefulSet defines overrides for the StatefulSet resource
// +optional
StatefulSet *EmbeddingStatefulSetOverrides `json:"statefulSet,omitempty"`

// Service defines overrides for the Service resource
// +optional
Service *ResourceMetadataOverrides `json:"service,omitempty"`

// PersistentVolumeClaim defines overrides for the PVC resource
// +optional
PersistentVolumeClaim *ResourceMetadataOverrides `json:"persistentVolumeClaim,omitempty"`
}

// EmbeddingStatefulSetOverrides defines overrides specific to the embedding statefulset
type EmbeddingStatefulSetOverrides struct {
// ResourceMetadataOverrides is embedded to inherit annotations and labels fields
ResourceMetadataOverrides `json:",inline"` // nolint:revive

// PodTemplateMetadataOverrides defines metadata overrides for the pod template
// +optional
PodTemplateMetadataOverrides *ResourceMetadataOverrides `json:"podTemplateMetadataOverrides,omitempty"`
}

// EmbeddingServerStatus defines the observed state of EmbeddingServer
type EmbeddingServerStatus struct {
// Conditions represent the latest available observations of the EmbeddingServer's state
// +optional
Conditions []metav1.Condition `json:"conditions,omitempty"`

// Phase is the current phase of the EmbeddingServer
// +optional
Phase EmbeddingServerPhase `json:"phase,omitempty"`

// Message provides additional information about the current phase
// +optional
Message string `json:"message,omitempty"`

// URL is the URL where the embedding service can be accessed
// +optional
URL string `json:"url,omitempty"`

// ReadyReplicas is the number of ready replicas
// +optional
ReadyReplicas int32 `json:"readyReplicas,omitempty"`

// ObservedGeneration reflects the generation most recently observed by the controller
// +optional
ObservedGeneration int64 `json:"observedGeneration,omitempty"`
}

// EmbeddingServerPhase is the phase of the EmbeddingServer
// +kubebuilder:validation:Enum=Pending;Downloading;Running;Failed;Terminating
type EmbeddingServerPhase string

const (
// EmbeddingServerPhasePending means the EmbeddingServer is being created
EmbeddingServerPhasePending EmbeddingServerPhase = "Pending"

// EmbeddingServerPhaseDownloading means the model is being downloaded
EmbeddingServerPhaseDownloading EmbeddingServerPhase = "Downloading"

// EmbeddingServerPhaseRunning means the EmbeddingServer is running and ready
EmbeddingServerPhaseRunning EmbeddingServerPhase = "Running"

// EmbeddingServerPhaseFailed means the EmbeddingServer failed to start
EmbeddingServerPhaseFailed EmbeddingServerPhase = "Failed"

// EmbeddingServerPhaseTerminating means the EmbeddingServer is being deleted
EmbeddingServerPhaseTerminating EmbeddingServerPhase = "Terminating"
)

//+kubebuilder:object:root=true
//+kubebuilder:subresource:status
//+kubebuilder:printcolumn:name="Status",type="string",JSONPath=".status.phase"
//+kubebuilder:printcolumn:name="Model",type="string",JSONPath=".spec.model"
//+kubebuilder:printcolumn:name="Ready",type="integer",JSONPath=".status.readyReplicas"
//+kubebuilder:printcolumn:name="URL",type="string",JSONPath=".status.url"
//+kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp"

// EmbeddingServer is the Schema for the embeddingservers API
type EmbeddingServer struct {
metav1.TypeMeta `json:",inline"` // nolint:revive
metav1.ObjectMeta `json:"metadata,omitempty"`

Spec EmbeddingServerSpec `json:"spec,omitempty"`
Status EmbeddingServerStatus `json:"status,omitempty"`
}

//+kubebuilder:object:root=true

// EmbeddingServerList contains a list of EmbeddingServer
type EmbeddingServerList struct {
metav1.TypeMeta `json:",inline"` // nolint:revive
metav1.ListMeta `json:"metadata,omitempty"`
Items []EmbeddingServer `json:"items"`
}

// GetName returns the name of the EmbeddingServer
func (e *EmbeddingServer) GetName() string {
return e.Name
}

// GetNamespace returns the namespace of the EmbeddingServer
func (e *EmbeddingServer) GetNamespace() string {
return e.Namespace
}

// GetPort returns the port of the EmbeddingServer
func (e *EmbeddingServer) GetPort() int32 {
if e.Spec.Port > 0 {
return e.Spec.Port
}
return 8080
}

// GetReplicas returns the number of replicas for the EmbeddingServer
func (e *EmbeddingServer) GetReplicas() int32 {
if e.Spec.Replicas != nil {
return *e.Spec.Replicas
}
return 1
}

// IsModelCacheEnabled returns whether model caching is enabled
func (e *EmbeddingServer) IsModelCacheEnabled() bool {
if e.Spec.ModelCache == nil {
return false
}
return e.Spec.ModelCache.Enabled
}

// GetImagePullPolicy returns the image pull policy for the EmbeddingServer
func (e *EmbeddingServer) GetImagePullPolicy() string {
if e.Spec.ImagePullPolicy != "" {
return e.Spec.ImagePullPolicy
}
return "IfNotPresent"
}

func init() {
SchemeBuilder.Register(&EmbeddingServer{}, &EmbeddingServerList{})
}
5 changes: 5 additions & 0 deletions cmd/thv-operator/api/v1alpha1/mcpremoteproxy_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,11 @@ type MCPRemoteProxySpec struct {
// +optional
Resources ResourceRequirements `json:"resources,omitempty"`

// ServiceAccount is the name of an already existing service account to use by the proxy.
// If not specified, a ServiceAccount will be created automatically and used by the proxy.
// +optional
ServiceAccount *string `json:"serviceAccount,omitempty"`

// TrustProxyHeaders indicates whether to trust X-Forwarded-* headers from reverse proxies
// When enabled, the proxy will use X-Forwarded-Proto, X-Forwarded-Host, X-Forwarded-Port,
// and X-Forwarded-Prefix headers to construct endpoint URLs
Expand Down
5 changes: 5 additions & 0 deletions cmd/thv-operator/api/v1alpha1/virtualmcpserver_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,11 @@ type VirtualMCPServerSpec struct {
// +optional
ServiceType string `json:"serviceType,omitempty"`

// ServiceAccount is the name of an already existing service account to use by the Virtual MCP server.
// If not specified, a ServiceAccount will be created automatically and used by the Virtual MCP server.
// +optional
ServiceAccount *string `json:"serviceAccount,omitempty"`

// PodTemplateSpec defines the pod template to use for the Virtual MCP server
// This allows for customizing the pod configuration beyond what is provided by the other fields.
// Note that to modify the specific container the Virtual MCP server runs in, you must specify
Expand Down
Loading
Loading