Skip to content

RFC: First-class vMCP support in the local CLI experience#59

Draft
JAORMX wants to merge 2 commits intomainfrom
jaosorior/vmcp-local-experience
Draft

RFC: First-class vMCP support in the local CLI experience#59
JAORMX wants to merge 2 commits intomainfrom
jaosorior/vmcp-local-experience

Conversation

@JAORMX
Copy link
Copy Markdown
Contributor

@JAORMX JAORMX commented Mar 24, 2026

Summary

  • Proposes integrating vMCP into the main thv CLI as a thv vmcp subcommand (serve, validate, init)
  • Brings the optimizer to local with three tiers: FTS5-only (--optimizer), managed TEI container (--optimizer-embedding), and full config control
  • Formalizes the library embedding pattern used by brood-box as an officially supported integration path
  • Preserves the standalone vmcp binary for K8s deployments and backwards compatibility

Key Design Decisions

  • Zero-config quickstart: thv vmcp serve --group default works without a config file
  • Shared logic in pkg/vmcp/cli/: Both thv vmcp and standalone vmcp delegate to the same package
  • Managed TEI lifecycle: --optimizer-embedding auto-manages a TEI container with named volume for model caching, health polling, and graceful fallback to FTS5-only
  • Library stability table: Packages rated Stable/Beta/Internal to guide downstream consumers

Related RFCs

  • THV-0008 (Virtual MCP Server)
  • THV-0014 (K8s-Aware vMCP)
  • THV-0022 (Optimizer Migration)
  • THV-0034 (Long-Running Local Server)

🤖 Generated with Claude Code

Add RFC proposing integration of the Virtual MCP Server (vMCP) into the
main thv CLI as a `thv vmcp` subcommand, bringing local optimizer support
with managed TEI container lifecycle, and formalizing the library embedding
pattern used by brood-box.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@JAORMX JAORMX marked this pull request as draft March 24, 2026 17:44
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@jerm-dro jerm-dro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! The fact this already works for broodbox indicates this is very feasible.

- **Regression tests**: Verify that the standalone `vmcp serve` command still works identically after the refactor.
- **Security tests**: Verify that quick mode binds to `127.0.0.1` only, that strict YAML parsing rejects unknown fields, and that HMAC session binding is enforced when configured.

## Documentation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs website will need to be updated. We can also then remove the legacy mcp-optimizer.


- Add a `thv vmcp` subcommand with `serve`, `validate`, and `init` sub-commands that integrate vMCP into the main CLI
- Support a zero-config quickstart: `thv vmcp serve --group <name>` should work without a config file for simple aggregation
- Bring the optimizer to the local experience with managed embedding service lifecycle (`--optimizer` flag auto-manages a TEI container)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could more simply state the goal is to deprecate and later remove the legacy optimizer.

- **Examples**: `examples/vmcp-local-quickstart/` with a minimal setup, `examples/vmcp-advanced/` with auth, composite tools, and telemetry
- **Existing docs updates**: Update `docs/arch/10-virtual-mcp-architecture.md` to reference the new CLI integration and library embedding path

## Open Questions
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there sharp edges given vMCP has typically been run in k8s? or can you easily replace / disable all the K8s dependencies?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear so far. I haven't personally stumbled upon any issues, but we need to test and get more data.

Copy link
Copy Markdown
Member

@aponcedeleonch aponcedeleonch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! Really solid RFC, the whole local experience story makes a ton of sense. Left some inline comments — mostly questions and suggestions. Nada heavy, just thinking out loud. 🤙

# tools:
# - workload: "github"
# filter: ["create_pr", "list_issues"]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, quick question — do we actually need this commented-out backends section? What would be the use case for static overrides here?

I'm a bit worried this could cause unexpected bugs if someone uncomments these and they get out of sync with the actual running backends. Since backends are auto-discovered at runtime from the group, I'd suggest we just tell users to modify their backends directly (via thv run flags or group config) if they need changes to their MCP servers, rather than us providing an override path from the vMCP config.

Keeping a single source of truth for backend definitions feels cleaner and less error-prone, no?

- **Health check**: Poll `GET /health` with backoff until the TEI server reports ready. TEI must download and load the model on first start, which can take 30-60 seconds.
- **Port binding**: Default `127.0.0.1:8384`. Chosen to avoid conflicting with the vMCP port (4483) or common dev ports.
- **Lifecycle coupling**: The TEI container is started before the vMCP server and stopped after it shuts down. If the TEI container fails to start or becomes unhealthy, vMCP falls back to FTS5-only mode with a warning.
- **Idempotent start**: If a `thv-embedding-<group>` container is already running (e.g., from a previous invocation), reuse it rather than creating a new one.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TEI container image should also be configurable via a flag, something like --embedding-image. This way users can use images that better fit their architecture and potentially take advantage of hardware acceleration (CUDA GPUs, etc.).

For example:

thv vmcp serve --group default --optimizer-embedding \
  --embedding-image ghcr.io/huggingface/text-embeddings-inference:turing-latest

This also ties into open question #7 about GPU support — making the image configurable is the simplest way to support it without us having to auto-detect GPU availability.


Users with advanced requirements (custom embedding endpoint, GPU-accelerated TEI, external service) can configure the optimizer directly in the config file:

```yaml
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call noting the platform considerations here. For context: Docker Desktop on Apple Silicon does handle amd64-only images automatically via Rosetta 2 emulation — you don't need to specify --platform linux/amd64. The overhead is roughly 5-15% for CPU-bound workloads, which is acceptable for TEI.

So docker run ghcr.io/huggingface/text-embeddings-inference:cpu-latest will work on Mac without extra flags, but under emulation. Worth mentioning this explicitly so users know what to expect performance-wise.

If we go with the --embedding-image flag I suggested above, architecture becomes the user's problem to solve (pick the right image), which is simpler than us trying to detect arch and pick the right variant.

- **Lifecycle coupling**: The TEI container is started before the vMCP server and stopped after it shuts down. If the TEI container fails to start or becomes unhealthy, vMCP falls back to FTS5-only mode with a warning.
- **Idempotent start**: If a `thv-embedding-<group>` container is already running (e.g., from a previous invocation), reuse it rather than creating a new one.
- **Platform considerations**: The TEI CPU image is amd64. On ARM64 hosts (Apple Silicon), the container runs under emulation. A future enhancement could detect architecture and select an appropriate image variant.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about coupling the container name to the group with thv-embedding-<group>. This ties the embedding server tightly to a single vMCP/optimizer instance.

We could potentially re-use the same embedding server across multiple vMCPs — the embeddings for tool names/descriptions don't depend on which vMCP is asking. Maybe something like thv-embedding (shared) or thv-embedding-<model-hash> (shared per model) would give us more flexibility?

That way if someone runs two vMCPs with different groups but the same embedding model, they share one TEI container instead of spinning up two.

- **Platform considerations**: The TEI CPU image is amd64. On ARM64 hosts (Apple Silicon), the container runs under emulation. A future enhancement could detect architecture and select an appropriate image variant.

**Tier 3: Full config control**

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we start the TEI container on a random available port instead of defaulting to a specific one? If a user runs multiple vMCPs (or has something else on 8384), we'd hit a port conflict.

A random port (reported back to the user in logs) avoids this entirely. Did you have a specific reason for wanting a fixed default port, like making the URL predictable for the config file case?


**Tier 3: Full config control**

Users with advanced requirements (custom embedding endpoint, GPU-accelerated TEI, external service) can configure the optimizer directly in the config file:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd lean toward being stricter here — if the TEI container fails to start or become healthy when --optimizer-embedding was explicitly requested, I think we probably shouldn't start the vMCP at all rather than falling back to FTS5-only.

Here's my concern: if we silently degrade, the performance of find_tool will be noticeably worse, and the user will think the optimizer isn't working well when it's really a config/environment issue (Docker not running, port conflict, etc.). Failing fast with a clear error message gives the user something actionable.

If they just want FTS5, they can use --optimizer explicitly. The --optimizer-embedding flag is a clear signal that they want semantic search.

type EmbeddingServiceManager struct { ... }

// Start launches (or reuses) the TEI container and waits for readiness.
// Returns the embedding service URL.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I've been thinking — we probably need a way to know whether toolhive is in charge of the embedding container's lifecycle. In K8s this isn't an issue since the embedding server is always operator-managed, but locally the user might bring their own TEI.

Initially I was thinking a config flag like toolhive-managed: true in the optimizer section, BUT — we already have pkg/labels/ for exactly this pattern. We could just slap a label like io.toolhive.managed=true on the container when we create it, then query by label to check ownership. This is the same pattern docker-compose, traefik, and others use. It's more robust (survives restarts, no external state to get out of sync) and doesn't require a config flag.

What do you think? Labels over config flag?


### Data Security

- vMCP proxies MCP protocol messages between clients and backends. It does not persist tool call results or resource contents.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we're also thinking about changing thv serve to use a Unix socket, should we consider running vMCP as a socket too for the local case?

For local single-user scenarios, a Unix socket sidesteps the whole auth/port-binding discussion entirely — it's secured by filesystem permissions, no port conflicts, no need for OIDC or anonymous auth decisions. Could be a --transport socket flag or similar.

Just food for thought for the security model here, not necessarily blocking for this RFC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants