RFC: First-class vMCP support in the local CLI experience by JAORMX · Pull Request #59 · stacklok/toolhive-rfcs

JAORMX · 2026-03-24T17:44:09Z

Summary

Proposes integrating vMCP into the main thv CLI as a thv vmcp subcommand (serve, validate, init)
Brings the optimizer to local with three tiers: FTS5-only (--optimizer), managed TEI container (--optimizer-embedding), and full config control
Formalizes the library embedding pattern used by brood-box as an officially supported integration path
Preserves the standalone vmcp binary for K8s deployments and backwards compatibility

Key Design Decisions

Zero-config quickstart: thv vmcp serve --group default works without a config file
Shared logic in pkg/vmcp/cli/: Both thv vmcp and standalone vmcp delegate to the same package
Managed TEI lifecycle: --optimizer-embedding auto-manages a TEI container with named volume for model caching, health polling, and graceful fallback to FTS5-only
Library stability table: Packages rated Stable/Beta/Internal to guide downstream consumers

Related RFCs

THV-0008 (Virtual MCP Server)
THV-0014 (K8s-Aware vMCP)
THV-0022 (Optimizer Migration)
THV-0034 (Long-Running Local Server)

🤖 Generated with Claude Code

Add RFC proposing integration of the Virtual MCP Server (vMCP) into the main thv CLI as a `thv vmcp` subcommand, bringing local optimizer support with managed TEI container lifecycle, and formalizing the library embedding pattern used by brood-box. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jerm-dro

LGTM! The fact this already works for broodbox indicates this is very feasible.

rfcs/THV-0059-vmcp-local-experience.md

jerm-dro · 2026-03-25T03:05:41Z

rfcs/THV-0059-vmcp-local-experience.md

+- **Regression tests**: Verify that the standalone `vmcp serve` command still works identically after the refactor.
+- **Security tests**: Verify that quick mode binds to `127.0.0.1` only, that strict YAML parsing rejects unknown fields, and that HMAC session binding is enforced when configured.
+
+## Documentation


The docs website will need to be updated. We can also then remove the legacy mcp-optimizer.

jerm-dro · 2026-03-25T03:07:43Z

rfcs/THV-0059-vmcp-local-experience.md

+
+- Add a `thv vmcp` subcommand with `serve`, `validate`, and `init` sub-commands that integrate vMCP into the main CLI
+- Support a zero-config quickstart: `thv vmcp serve --group <name>` should work without a config file for simple aggregation
+- Bring the optimizer to the local experience with managed embedding service lifecycle (`--optimizer` flag auto-manages a TEI container)


You could more simply state the goal is to deprecate and later remove the legacy optimizer.

jerm-dro · 2026-03-25T03:12:46Z

rfcs/THV-0059-vmcp-local-experience.md

+- **Examples**: `examples/vmcp-local-quickstart/` with a minimal setup, `examples/vmcp-advanced/` with auth, composite tools, and telemetry
+- **Existing docs updates**: Update `docs/arch/10-virtual-mcp-architecture.md` to reference the new CLI integration and library embedding path
+
+## Open Questions


Are there sharp edges given vMCP has typically been run in k8s? or can you easily replace / disable all the K8s dependencies?

It's unclear so far. I haven't personally stumbled upon any issues, but we need to test and get more data.

aponcedeleonch

Hey! Really solid RFC, the whole local experience story makes a ton of sense. Left some inline comments — mostly questions and suggestions. Nada heavy, just thinking out loud. 🤙

aponcedeleonch · 2026-03-25T14:11:52Z

rfcs/THV-0059-vmcp-local-experience.md

+  # tools:
+  #   - workload: "github"
+  #     filter: ["create_pr", "list_issues"]
+


Hey, quick question — do we actually need this commented-out backends section? What would be the use case for static overrides here?

I'm a bit worried this could cause unexpected bugs if someone uncomments these and they get out of sync with the actual running backends. Since backends are auto-discovered at runtime from the group, I'd suggest we just tell users to modify their backends directly (via thv run flags or group config) if they need changes to their MCP servers, rather than us providing an override path from the vMCP config.

Keeping a single source of truth for backend definitions feels cleaner and less error-prone, no?

aponcedeleonch · 2026-03-25T14:11:52Z

rfcs/THV-0059-vmcp-local-experience.md

+- **Health check**: Poll `GET /health` with backoff until the TEI server reports ready. TEI must download and load the model on first start, which can take 30-60 seconds.
+- **Port binding**: Default `127.0.0.1:8384`. Chosen to avoid conflicting with the vMCP port (4483) or common dev ports.
+- **Lifecycle coupling**: The TEI container is started before the vMCP server and stopped after it shuts down. If the TEI container fails to start or becomes unhealthy, vMCP falls back to FTS5-only mode with a warning.
+- **Idempotent start**: If a `thv-embedding-<group>` container is already running (e.g., from a previous invocation), reuse it rather than creating a new one.


The TEI container image should also be configurable via a flag, something like --embedding-image. This way users can use images that better fit their architecture and potentially take advantage of hardware acceleration (CUDA GPUs, etc.).

For example:

thv vmcp serve --group default --optimizer-embedding \ --embedding-image ghcr.io/huggingface/text-embeddings-inference:turing-latest

This also ties into open question #7 about GPU support — making the image configurable is the simplest way to support it without us having to auto-detect GPU availability.

aponcedeleonch · 2026-03-25T14:11:52Z

rfcs/THV-0059-vmcp-local-experience.md

+
+Users with advanced requirements (custom embedding endpoint, GPU-accelerated TEI, external service) can configure the optimizer directly in the config file:
+
+```yaml


Good call noting the platform considerations here. For context: Docker Desktop on Apple Silicon does handle amd64-only images automatically via Rosetta 2 emulation — you don't need to specify --platform linux/amd64. The overhead is roughly 5-15% for CPU-bound workloads, which is acceptable for TEI.

So docker run ghcr.io/huggingface/text-embeddings-inference:cpu-latest will work on Mac without extra flags, but under emulation. Worth mentioning this explicitly so users know what to expect performance-wise.

If we go with the --embedding-image flag I suggested above, architecture becomes the user's problem to solve (pick the right image), which is simpler than us trying to detect arch and pick the right variant.

aponcedeleonch · 2026-03-25T14:11:52Z

rfcs/THV-0059-vmcp-local-experience.md

+- **Lifecycle coupling**: The TEI container is started before the vMCP server and stopped after it shuts down. If the TEI container fails to start or becomes unhealthy, vMCP falls back to FTS5-only mode with a warning.
+- **Idempotent start**: If a `thv-embedding-<group>` container is already running (e.g., from a previous invocation), reuse it rather than creating a new one.
+- **Platform considerations**: The TEI CPU image is amd64. On ARM64 hosts (Apple Silicon), the container runs under emulation. A future enhancement could detect architecture and select an appropriate image variant.
+


Not sure about coupling the container name to the group with thv-embedding-<group>. This ties the embedding server tightly to a single vMCP/optimizer instance.

We could potentially re-use the same embedding server across multiple vMCPs — the embeddings for tool names/descriptions don't depend on which vMCP is asking. Maybe something like thv-embedding (shared) or thv-embedding-<model-hash> (shared per model) would give us more flexibility?

That way if someone runs two vMCPs with different groups but the same embedding model, they share one TEI container instead of spinning up two.

aponcedeleonch · 2026-03-25T14:11:52Z

rfcs/THV-0059-vmcp-local-experience.md

+- **Platform considerations**: The TEI CPU image is amd64. On ARM64 hosts (Apple Silicon), the container runs under emulation. A future enhancement could detect architecture and select an appropriate image variant.
+
+**Tier 3: Full config control**
+


Could we start the TEI container on a random available port instead of defaulting to a specific one? If a user runs multiple vMCPs (or has something else on 8384), we'd hit a port conflict.

A random port (reported back to the user in logs) avoids this entirely. Did you have a specific reason for wanting a fixed default port, like making the URL predictable for the config file case?

aponcedeleonch · 2026-03-25T14:11:52Z

rfcs/THV-0059-vmcp-local-experience.md

+
+**Tier 3: Full config control**
+
+Users with advanced requirements (custom embedding endpoint, GPU-accelerated TEI, external service) can configure the optimizer directly in the config file:


I'd lean toward being stricter here — if the TEI container fails to start or become healthy when --optimizer-embedding was explicitly requested, I think we probably shouldn't start the vMCP at all rather than falling back to FTS5-only.

Here's my concern: if we silently degrade, the performance of find_tool will be noticeably worse, and the user will think the optimizer isn't working well when it's really a config/environment issue (Docker not running, port conflict, etc.). Failing fast with a clear error message gives the user something actionable.

If they just want FTS5, they can use --optimizer explicitly. The --optimizer-embedding flag is a clear signal that they want semantic search.

aponcedeleonch · 2026-03-25T14:11:52Z

rfcs/THV-0059-vmcp-local-experience.md

+type EmbeddingServiceManager struct { ... }
+
+// Start launches (or reuses) the TEI container and waits for readiness.
+// Returns the embedding service URL.


One thing I've been thinking — we probably need a way to know whether toolhive is in charge of the embedding container's lifecycle. In K8s this isn't an issue since the embedding server is always operator-managed, but locally the user might bring their own TEI.

Initially I was thinking a config flag like toolhive-managed: true in the optimizer section, BUT — we already have pkg/labels/ for exactly this pattern. We could just slap a label like io.toolhive.managed=true on the container when we create it, then query by label to check ownership. This is the same pattern docker-compose, traefik, and others use. It's more robust (survives restarts, no external state to get out of sync) and doesn't require a config flag.

What do you think? Labels over config flag?

aponcedeleonch · 2026-03-25T14:11:52Z

rfcs/THV-0059-vmcp-local-experience.md

+
+### Data Security
+
+- vMCP proxies MCP protocol messages between clients and backends. It does not persist tool call results or resource contents.


Given that we're also thinking about changing thv serve to use a Unix socket, should we consider running vMCP as a socket too for the local case?

For local single-user scenarios, a Unix socket sidesteps the whole auth/port-binding discussion entirely — it's secured by filesystem permissions, no port conflicts, no need for OIDC or anonymous auth decisions. Could be a --transport socket flag or similar.

Just food for thought for the security model here, not necessarily blocking for this RFC.

JAORMX marked this pull request as draft March 24, 2026 17:44

Rename RFC to match PR number (THV-0059)

99da748

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jerm-dro reviewed Mar 25, 2026

View reviewed changes

aponcedeleonch reviewed Mar 25, 2026

View reviewed changes


		Users with advanced requirements (custom embedding endpoint, GPU-accelerated TEI, external service) can configure the optimizer directly in the config file:

		```yaml

		- Platform considerations: The TEI CPU image is amd64. On ARM64 hosts (Apple Silicon), the container runs under emulation. A future enhancement could detect architecture and select an appropriate image variant.

		Tier 3: Full config control


		### Data Security

		- vMCP proxies MCP protocol messages between clients and backends. It does not persist tool call results or resource contents.

Conversation

JAORMX commented Mar 24, 2026

Summary

Key Design Decisions

Related RFCs

Uh oh!

jerm-dro left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aponcedeleonch left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants