-
Notifications
You must be signed in to change notification settings - Fork 171
Add embedding engine #3280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ptelang
wants to merge
44
commits into
jerm/2026-01-13-optimizer-in-vmcp
Choose a base branch
from
add-embedding-engine
base: jerm/2026-01-13-optimizer-in-vmcp
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Add embedding engine #3280
Changes from all commits
Commits
Show all changes
44 commits
Select commit
Hold shift + click to select a range
217329a
Add MCPEmbedding CRD for embedding model deployment in operator
ptelang 1d91025
Rename MCPEmbedding crd as EmbeddingServer
ptelang f100ffd
Updated image and model names
ptelang 3daccec
Remove unnecessary GroupRef from EmbeddingServers crd
ptelang 7279a2d
Fixed reconciliation loop issue causing no service creation
ptelang fec2932
Rename examples/operator/embeddings to examples/opeartor/embedding-se…
ptelang 00ed558
Updated embedding server example yamls
ptelang c529656
Bump toolhive operator version and fix linting issues
ptelang 6d2ec66
Added e2e tests and fixed a bug
ptelang 5d0efce
Convert EmbeddingServer to use StatefulSets and add HuggingFace token…
ptelang 73f74a7
Fix linting issues
ptelang b40b3e5
Update Helm chart documentation
ptelang aef5d8c
Batch all EmbeddingServer status updates to a single call to prevent …
ptelang 5b0064a
Fix README files
ptelang 84f5d67
Updated CRD api docs
ptelang ea0c4f6
Fixed ensureStatefulSet and ensureService functions to prevent early …
ptelang 989cfd7
Bump toolhive-operator-crds chart version to 0.0.99
ptelang e4978ab
Added toolhive-test-ns-1 and toolhive-test-ns-2 namespaces to test co…
ptelang d0499bb
Use smallest supported embedding model for e2e tests
ptelang 931ad7c
Modify embeddingserver e2e tests to support slow model file downloads
ptelang d32eb3f
add envtest for EmbeddingServer
jerm-dro 62a039b
add tests that demonstrate gaps
jerm-dro 05e1f4f
Fix bugs in the tests
ptelang 317a789
Add sleep before checking PVC status in embeddingserver e2e test
ptelang 0dfb7e6
Update image location for huggingface inference engine
ptelang 8ff356b
Addressed TODOs in the embedding-server integration tests
ptelang e1b679c
Add SPDX license header to embedding-server files
ptelang 113b981
Fixed a linting issue by refactoring a high cyclomatic complexity fun…
ptelang 9d2cc02
Merge branch 'main' into add-embedding-engine
ptelang 60f052e
Merge branch 'main' into add-embedding-engine
ptelang 47f3623
Bump toolhive-operator-crds chart version
ptelang 5a8e464
Update all places from deployment to statefulset in ref to embeddings…
ptelang de85d9d
Remove the unnecessary updateStatefulSetWithRetry function
ptelang 56d4f9b
Fix embedding server statefulset update detection to support sidecar …
ptelang 9a5d19d
Refactored statefulSetNeedsUpdate function in embedding server contro…
ptelang e558afd
Removed left-over TODO comment
ptelang 941537f
Replaced conditional branches with an immediately-invoked anonymous f…
ptelang 79ae443
Removed unnecessary README.md files from test scenarios
ptelang a7cde8a
Add header forward middleware for remote MCP servers (#3423)
jhrozek 2d8da5d
Add E2E tests for group endpoints (#3402)
dmjb 5429aa0
authserver DCR hardening: Add grant_types and response_types allowlis…
jhrozek f802358
Refactor RBAC management to eliminate code duplication (#3368)
yrobla b7af76f
Add token endpoint handler (#3408)
jhrozek ff03438
Merge branch 'main' into add-embedding-engine
ptelang File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,272 @@ | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| package v1alpha1 | ||
|
|
||
| import ( | ||
| metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
| "k8s.io/apimachinery/pkg/runtime" | ||
| ) | ||
|
|
||
| // Condition types for EmbeddingServer (reuses common conditions from MCPServer) | ||
| // ConditionImageValidated and ConditionPodTemplateValid are shared with MCPServer | ||
|
|
||
| const ( | ||
| // ConditionModelReady indicates whether the embedding model is downloaded and ready | ||
| ConditionModelReady = "ModelReady" | ||
|
|
||
| // ConditionVolumeReady indicates whether the PVC for model caching is ready | ||
| ConditionVolumeReady = "VolumeReady" | ||
| ) | ||
|
|
||
| // Condition reasons for EmbeddingServer | ||
| // Image validation and PodTemplate reasons are shared with MCPServer | ||
|
|
||
| const ( | ||
| // ConditionReasonModelDownloading indicates the model is being downloaded | ||
| ConditionReasonModelDownloading = "ModelDownloading" | ||
| // ConditionReasonModelReady indicates the model is downloaded and ready | ||
| ConditionReasonModelReady = "ModelReady" | ||
| // ConditionReasonModelFailed indicates the model download or initialization failed | ||
| ConditionReasonModelFailed = "ModelFailed" | ||
|
|
||
| // ConditionReasonVolumeCreating indicates the PVC is being created | ||
| ConditionReasonVolumeCreating = "VolumeCreating" | ||
| // ConditionReasonVolumeReady indicates the PVC is ready | ||
| ConditionReasonVolumeReady = "VolumeReady" | ||
| // ConditionReasonVolumeFailed indicates the PVC creation failed | ||
| ConditionReasonVolumeFailed = "VolumeFailed" | ||
| ) | ||
|
|
||
| // EmbeddingServerSpec defines the desired state of EmbeddingServer | ||
| type EmbeddingServerSpec struct { | ||
| // Model is the HuggingFace embedding model to use (e.g., "sentence-transformers/all-MiniLM-L6-v2") | ||
| // +kubebuilder:validation:Required | ||
| Model string `json:"model"` | ||
|
|
||
| // HFTokenSecretRef is a reference to a Kubernetes Secret containing the huggingface token. | ||
| // If provided, the secret value will be provided to the embedding server for authentication with huggingface. | ||
| // +optional | ||
| HFTokenSecretRef *SecretKeyRef `json:"hfTokenSecretRef,omitempty"` | ||
|
|
||
| // Image is the container image for huggingface-embedding-inference | ||
| // +kubebuilder:validation:Required | ||
| // +kubebuilder:default="ghcr.io/huggingface/text-embeddings-inference:latest" | ||
| Image string `json:"image,omitempty"` | ||
|
|
||
| // ImagePullPolicy defines the pull policy for the container image | ||
| // +kubebuilder:validation:Enum=Always;Never;IfNotPresent | ||
| // +kubebuilder:default="IfNotPresent" | ||
| // +optional | ||
| ImagePullPolicy string `json:"imagePullPolicy,omitempty"` | ||
|
|
||
| // Port is the port to expose the embedding service on | ||
| // +kubebuilder:validation:Minimum=1 | ||
| // +kubebuilder:validation:Maximum=65535 | ||
| // +kubebuilder:default=8080 | ||
| Port int32 `json:"port,omitempty"` | ||
|
|
||
| // Args are additional arguments to pass to the embedding inference server | ||
| // +optional | ||
| Args []string `json:"args,omitempty"` | ||
|
|
||
| // Env are environment variables to set in the container | ||
| // +optional | ||
| Env []EnvVar `json:"env,omitempty"` | ||
|
|
||
| // Resources defines compute resources for the embedding server | ||
| // +optional | ||
| Resources ResourceRequirements `json:"resources,omitempty"` | ||
|
|
||
| // ModelCache configures persistent storage for downloaded models | ||
| // When enabled, models are cached in a PVC and reused across pod restarts | ||
| // +optional | ||
| ModelCache *ModelCacheConfig `json:"modelCache,omitempty"` | ||
|
|
||
| // PodTemplateSpec allows customizing the pod (node selection, tolerations, etc.) | ||
| // This field accepts a PodTemplateSpec object as JSON/YAML. | ||
| // Note that to modify the specific container the embedding server runs in, you must specify | ||
| // the 'embedding' container name in the PodTemplateSpec. | ||
| // +optional | ||
| // +kubebuilder:pruning:PreserveUnknownFields | ||
| // +kubebuilder:validation:Type=object | ||
| PodTemplateSpec *runtime.RawExtension `json:"podTemplateSpec,omitempty"` | ||
|
|
||
| // ResourceOverrides allows overriding annotations and labels for resources created by the operator | ||
| // +optional | ||
| ResourceOverrides *EmbeddingResourceOverrides `json:"resourceOverrides,omitempty"` | ||
ptelang marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| // Replicas is the number of embedding server replicas to run | ||
| // +kubebuilder:validation:Minimum=1 | ||
| // +kubebuilder:default=1 | ||
| // +optional | ||
| Replicas *int32 `json:"replicas,omitempty"` | ||
| } | ||
|
|
||
| // ModelCacheConfig configures persistent storage for model caching | ||
| type ModelCacheConfig struct { | ||
| // Enabled controls whether model caching is enabled | ||
| // +kubebuilder:default=true | ||
| // +optional | ||
| Enabled bool `json:"enabled,omitempty"` | ||
|
|
||
| // StorageClassName is the storage class to use for the PVC | ||
| // If not specified, uses the cluster's default storage class | ||
| // +optional | ||
| StorageClassName *string `json:"storageClassName,omitempty"` | ||
|
|
||
| // Size is the size of the PVC for model caching (e.g., "10Gi") | ||
| // +kubebuilder:default="10Gi" | ||
| // +optional | ||
| Size string `json:"size,omitempty"` | ||
|
|
||
| // AccessMode is the access mode for the PVC | ||
| // +kubebuilder:default="ReadWriteOnce" | ||
| // +kubebuilder:validation:Enum=ReadWriteOnce;ReadWriteMany;ReadOnlyMany | ||
| // +optional | ||
| AccessMode string `json:"accessMode,omitempty"` | ||
| } | ||
|
|
||
| // EmbeddingResourceOverrides defines overrides for annotations and labels on created resources | ||
| type EmbeddingResourceOverrides struct { | ||
| // StatefulSet defines overrides for the StatefulSet resource | ||
| // +optional | ||
| StatefulSet *EmbeddingStatefulSetOverrides `json:"statefulSet,omitempty"` | ||
|
|
||
| // Service defines overrides for the Service resource | ||
| // +optional | ||
| Service *ResourceMetadataOverrides `json:"service,omitempty"` | ||
|
|
||
| // PersistentVolumeClaim defines overrides for the PVC resource | ||
| // +optional | ||
| PersistentVolumeClaim *ResourceMetadataOverrides `json:"persistentVolumeClaim,omitempty"` | ||
| } | ||
|
|
||
| // EmbeddingStatefulSetOverrides defines overrides specific to the embedding statefulset | ||
| type EmbeddingStatefulSetOverrides struct { | ||
| // ResourceMetadataOverrides is embedded to inherit annotations and labels fields | ||
| ResourceMetadataOverrides `json:",inline"` // nolint:revive | ||
|
|
||
| // PodTemplateMetadataOverrides defines metadata overrides for the pod template | ||
| // +optional | ||
| PodTemplateMetadataOverrides *ResourceMetadataOverrides `json:"podTemplateMetadataOverrides,omitempty"` | ||
| } | ||
|
|
||
| // EmbeddingServerStatus defines the observed state of EmbeddingServer | ||
| type EmbeddingServerStatus struct { | ||
| // Conditions represent the latest available observations of the EmbeddingServer's state | ||
| // +optional | ||
| Conditions []metav1.Condition `json:"conditions,omitempty"` | ||
|
|
||
| // Phase is the current phase of the EmbeddingServer | ||
| // +optional | ||
| Phase EmbeddingServerPhase `json:"phase,omitempty"` | ||
|
|
||
| // Message provides additional information about the current phase | ||
| // +optional | ||
| Message string `json:"message,omitempty"` | ||
|
|
||
| // URL is the URL where the embedding service can be accessed | ||
| // +optional | ||
| URL string `json:"url,omitempty"` | ||
|
|
||
| // ReadyReplicas is the number of ready replicas | ||
| // +optional | ||
| ReadyReplicas int32 `json:"readyReplicas,omitempty"` | ||
|
|
||
| // ObservedGeneration reflects the generation most recently observed by the controller | ||
| // +optional | ||
| ObservedGeneration int64 `json:"observedGeneration,omitempty"` | ||
| } | ||
|
|
||
| // EmbeddingServerPhase is the phase of the EmbeddingServer | ||
| // +kubebuilder:validation:Enum=Pending;Downloading;Running;Failed;Terminating | ||
| type EmbeddingServerPhase string | ||
|
|
||
| const ( | ||
| // EmbeddingServerPhasePending means the EmbeddingServer is being created | ||
| EmbeddingServerPhasePending EmbeddingServerPhase = "Pending" | ||
|
|
||
| // EmbeddingServerPhaseDownloading means the model is being downloaded | ||
| EmbeddingServerPhaseDownloading EmbeddingServerPhase = "Downloading" | ||
|
|
||
| // EmbeddingServerPhaseRunning means the EmbeddingServer is running and ready | ||
| EmbeddingServerPhaseRunning EmbeddingServerPhase = "Running" | ||
|
|
||
| // EmbeddingServerPhaseFailed means the EmbeddingServer failed to start | ||
| EmbeddingServerPhaseFailed EmbeddingServerPhase = "Failed" | ||
|
|
||
| // EmbeddingServerPhaseTerminating means the EmbeddingServer is being deleted | ||
| EmbeddingServerPhaseTerminating EmbeddingServerPhase = "Terminating" | ||
| ) | ||
|
|
||
| //+kubebuilder:object:root=true | ||
| //+kubebuilder:subresource:status | ||
| //+kubebuilder:printcolumn:name="Status",type="string",JSONPath=".status.phase" | ||
| //+kubebuilder:printcolumn:name="Model",type="string",JSONPath=".spec.model" | ||
| //+kubebuilder:printcolumn:name="Ready",type="integer",JSONPath=".status.readyReplicas" | ||
| //+kubebuilder:printcolumn:name="URL",type="string",JSONPath=".status.url" | ||
| //+kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp" | ||
|
|
||
| // EmbeddingServer is the Schema for the embeddingservers API | ||
| type EmbeddingServer struct { | ||
| metav1.TypeMeta `json:",inline"` // nolint:revive | ||
| metav1.ObjectMeta `json:"metadata,omitempty"` | ||
|
|
||
| Spec EmbeddingServerSpec `json:"spec,omitempty"` | ||
| Status EmbeddingServerStatus `json:"status,omitempty"` | ||
| } | ||
|
|
||
| //+kubebuilder:object:root=true | ||
|
|
||
| // EmbeddingServerList contains a list of EmbeddingServer | ||
| type EmbeddingServerList struct { | ||
| metav1.TypeMeta `json:",inline"` // nolint:revive | ||
| metav1.ListMeta `json:"metadata,omitempty"` | ||
| Items []EmbeddingServer `json:"items"` | ||
| } | ||
|
|
||
| // GetName returns the name of the EmbeddingServer | ||
| func (e *EmbeddingServer) GetName() string { | ||
| return e.Name | ||
| } | ||
|
|
||
| // GetNamespace returns the namespace of the EmbeddingServer | ||
| func (e *EmbeddingServer) GetNamespace() string { | ||
| return e.Namespace | ||
| } | ||
|
|
||
| // GetPort returns the port of the EmbeddingServer | ||
| func (e *EmbeddingServer) GetPort() int32 { | ||
| if e.Spec.Port > 0 { | ||
| return e.Spec.Port | ||
| } | ||
| return 8080 | ||
| } | ||
|
|
||
| // GetReplicas returns the number of replicas for the EmbeddingServer | ||
| func (e *EmbeddingServer) GetReplicas() int32 { | ||
| if e.Spec.Replicas != nil { | ||
| return *e.Spec.Replicas | ||
| } | ||
| return 1 | ||
| } | ||
|
|
||
| // IsModelCacheEnabled returns whether model caching is enabled | ||
| func (e *EmbeddingServer) IsModelCacheEnabled() bool { | ||
| if e.Spec.ModelCache == nil { | ||
| return false | ||
| } | ||
| return e.Spec.ModelCache.Enabled | ||
| } | ||
|
|
||
| // GetImagePullPolicy returns the image pull policy for the EmbeddingServer | ||
| func (e *EmbeddingServer) GetImagePullPolicy() string { | ||
| if e.Spec.ImagePullPolicy != "" { | ||
| return e.Spec.ImagePullPolicy | ||
| } | ||
| return "IfNotPresent" | ||
| } | ||
|
|
||
| func init() { | ||
| SchemeBuilder.Register(&EmbeddingServer{}, &EmbeddingServerList{}) | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.