Skip to content

feat: add MCP server example for sandboxed JavaScript execution#35

Open
simongdavies wants to merge 2 commits intomainfrom
add-mcp-example
Open

feat: add MCP server example for sandboxed JavaScript execution#35
simongdavies wants to merge 2 commits intomainfrom
add-mcp-example

Conversation

@simongdavies
Copy link
Copy Markdown
Contributor

Add an MCP (Model Context Protocol) server that exposes an execute_javascript tool, allowing AI agents to run arbitrary JavaScript inside an isolated Hyperlight micro-VM sandbox with strict CPU time limits and automatic snapshot/restore recovery after timeouts.

Includes server implementation, demo scripts (PowerShell and Bash), vitest test suite, and documentation.

@simongdavies simongdavies added the kind/enhancement New feature or improvement label Mar 3, 2026
Add an MCP (Model Context Protocol) server that exposes an
execute_javascript tool, allowing AI agents to run arbitrary JavaScript
inside an isolated Hyperlight micro-VM sandbox with strict CPU time
limits and automatic snapshot/restore recovery after timeouts.

Includes server implementation, demo scripts (PowerShell and Bash),
vitest test suite, and documentation.

Signed-off-by: Simon Davies <simongdavies@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new example MCP (Model Context Protocol) server under src/js-host-api/examples/mcp-server that lets MCP clients execute JavaScript inside a Hyperlight sandbox with configurable resource limits, plus demo scripts, documentation, and a Vitest-based integration test suite.

Changes:

  • Introduces an MCP stdio server (execute_javascript) that compiles/runs JS inside a reusable Hyperlight sandbox with CPU + wall-clock timeouts, snapshot/restore recovery, and optional timing/code logs.
  • Adds Vitest config + multiple integration-style test suites covering tool behavior, timeouts/recovery, env-var configurability, and timing log output.
  • Adds end-to-end demo scripts (bash + PowerShell) and a README describing setup and client configuration.

Reviewed changes

Copilot reviewed 11 out of 13 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
src/js-host-api/examples/mcp-server/server.js MCP server implementation; sandbox lifecycle, limits, logging, and tool registration.
src/js-host-api/examples/mcp-server/package.json Example package definition with MCP SDK, Zod, and Vitest.
src/js-host-api/examples/mcp-server/vitest.config.js Vitest configuration for the example’s tests and timeouts.
src/js-host-api/examples/mcp-server/tests/mcp-server.test.js End-to-end MCP protocol/tool integration tests via stdio NDJSON.
src/js-host-api/examples/mcp-server/tests/config.test.js Tests for env-configurable limits, defaults, and stderr warnings.
src/js-host-api/examples/mcp-server/tests/timing.test.js Tests for HYPERLIGHT_TIMING_LOG JSONL output and timing fields.
src/js-host-api/examples/mcp-server/tests/prompt-examples.test.js Large suite validating outputs for “README prompt” examples.
src/js-host-api/examples/mcp-server/demo-copilot-cli.sh Bash demo script to run prompts via Copilot CLI with MCP config.
src/js-host-api/examples/mcp-server/demo-copilot-cli.ps1 PowerShell demo script to run prompts via Copilot CLI with MCP config.
src/js-host-api/examples/mcp-server/README.md End-user documentation for the example server and demos.
src/js-host-api/eslint.config.mjs Adds performance as an allowed global (used by the new server).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 13 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 13 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 13 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 13 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 13 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@simongdavies simongdavies deleted the add-mcp-example branch March 27, 2026 19:03
@simongdavies simongdavies restored the add-mcp-example branch March 27, 2026 19:32
@simongdavies simongdavies reopened this Mar 27, 2026
@simongdavies simongdavies requested a review from Copilot March 27, 2026 22:00
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 13 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Simon Davies <simongdavies@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 13 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +198 to +202
it('should have all timing values as non-negative integers', async () => {
const records = readTimingRecords();
expect(records.length).toBeGreaterThanOrEqual(1);

const record = records[records.length - 1];
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test (and several below) assumes a previous test already executed the tool and wrote at least one timing record. That makes the suite order-dependent and can fail when running a single test or if the runner ever shuffles tests. Consider ensuring each test arranges its own tool invocation (or add a beforeEach that performs one call and/or clears the timing log).

Copilot uses AI. Check for mistakes.
Comment on lines +161 to +165
let inside = 0;
const N = 100000;
for (let i = 0; i < N; i++) {
const x = Math.random();
const y = Math.random();
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This prompt-implementation uses Math.random() inside the sandbox, which makes the test nondeterministic and potentially flaky (even with wide bounds) and also increases runtime (100k samples). For CI stability, consider swapping in a small deterministic PRNG with a fixed seed (or otherwise removing randomness) so outputs and performance are reproducible.

Suggested change
let inside = 0;
const N = 100000;
for (let i = 0; i < N; i++) {
const x = Math.random();
const y = Math.random();
// Deterministic PRNG (xorshift32) for reproducible tests
let seed = 123456789;
function rand() {
seed ^= seed << 13;
seed ^= seed >>> 17;
seed ^= seed << 5;
return (seed >>> 0) / 0x100000000;
}
let inside = 0;
const N = 100000;
for (let i = 0; i < N; i++) {
const x = rand();
const y = rand();

Copilot uses AI. Check for mistakes.
Comment on lines +891 to +895
// 100 particles with random velocities over 1000 steps will
// always produce bounces — the probability of zero bounces
// is vanishingly small (each particle has ~50% chance of
// bouncing per axis per step).
expect(result.totalBounces).toBeGreaterThan(0);
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expect(result.totalBounces).toBeGreaterThan(0) is probabilistic because initial positions/velocities are random; it can fail (rarely) if no particle crosses a boundary. To avoid flaky CI, make the simulation deterministic (seeded PRNG) or assert only deterministic invariants (e.g., allInBounds, array sizes, numeric types/ranges).

Suggested change
// 100 particles with random velocities over 1000 steps will
// always produce bounces — the probability of zero bounces
// is vanishingly small (each particle has ~50% chance of
// bouncing per axis per step).
expect(result.totalBounces).toBeGreaterThan(0);
// totalBounces should be a non-negative integer; the exact
// value depends on random initial conditions and velocities.
expect(result.totalBounces).toBeGreaterThanOrEqual(0);
expect(Number.isInteger(result.totalBounces)).toBe(true);

Copilot uses AI. Check for mistakes.
Comment on lines +339 to +345
# --available-tools Restrict model to ONLY our MCP tool plus
# internal tools the agent needs to function
# (task_complete, report_intent). The model
# cannot call shell, file write, web fetch,
# or any other tool. This is the security
# layer — even though --allow-all-tools is
# set, only whitelisted tools are visible.
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments describe using --available-tools to restrict the model’s visible toolset, but the actual copilot invocation doesn’t include --available-tools. Either add the flag (if still supported) or update the comments, since this currently overstates the security restrictions being applied.

Suggested change
# --available-tools Restrict model to ONLY our MCP tool plus
# internal tools the agent needs to function
# (task_complete, report_intent). The model
# cannot call shell, file write, web fetch,
# or any other tool. This is the security
# layer — even though --allow-all-tools is
# set, only whitelisted tools are visible.
# --available-tools (Optional) Can be used to restrict the model
# to ONLY specific tools plus internal tools
# the agent needs to function (for example,
# task_complete, report_intent). When set,
# the model cannot call shell, file write,
# web fetch, or any other non-whitelisted
# tool. NOTE: this demo script does NOT
# currently pass --available-tools; do not
# assume such a restriction is in effect.

Copilot uses AI. Check for mistakes.
Comment on lines +208 to +212
type: 'stdio',
command: 'node',
args: ['${SERVER_JS}'],
env,
};
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This embedded node -e snippet interpolates ${SERVER_JS} (and other values) directly into a JavaScript string. Paths containing spaces, quotes, or backslashes can break the generated JS/JSON. Prefer passing values via environment variables/stdin, or JSON-escaping them (e.g., via JSON.stringify) before embedding.

Copilot uses AI. Check for mistakes.
Comment on lines +566 to +572
try {
# Use --% (stop-parsing token) to prevent PS from mangling
# the native command arguments. Pass prompt via temp file.
$rawOutput = & $script:CopilotBin `
-p $fullPrompt `
-s `
--additional-mcp-config "@$mcpTmp" `
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says to use PowerShell’s --% stop-parsing token to prevent argument mangling, but the command invocation below doesn’t use --%. Either adjust the comment or actually use the recommended mechanism (or a temp file via @file) so the rationale matches the implementation and avoids confusion when debugging.

Copilot uses AI. Check for mistakes.
Comment on lines +377 to +409
// Track objects during traversal to detect true circular
// references. We use a replacer that adds objects on entry
// and removes them on exit (post-order), so DAG-shared refs
// (e.g. { a: obj, b: obj }) are correctly duplicated rather
// than replaced with "[Circular]".
const ancestors = new Set();
return JSON.stringify(
value,
function (key, val) {
if (typeof val === 'bigint') {
return val.toString();
}
if (typeof val === 'object' && val !== null) {
if (ancestors.has(val)) {
return '[Circular]';
}
ancestors.add(val);
// Schedule removal after this subtree is fully traversed.
// JSON.stringify calls the replacer depth-first, so by the
// time we return from this key the children are already
// processed. We use a finally-scheduled microtask to
// remove after the current synchronous stringify pass.
// Actually — JSON.stringify is synchronous, so we can
// lean on the fact that the replacer is called in-order
// and use a post-processing cleanup. For simplicity,
// just leave the Set as-is — true cycles will be caught,
// and shared non-cyclic refs in practice don't occur in
// sandbox return values (they're freshly JSON-parsed).
}
return val;
},
2
);
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

safeStringifyResult marks any repeated object reference as "[Circular]" because ancestors is never cleared during traversal. This will incorrectly replace shared (non-cyclic) references and can also grow memory for large results. Consider using a stack-based cycle detector (add on entry + remove on exit) or a proven circular-safe serializer so only true cycles are replaced.

Suggested change
// Track objects during traversal to detect true circular
// references. We use a replacer that adds objects on entry
// and removes them on exit (post-order), so DAG-shared refs
// (e.g. { a: obj, b: obj }) are correctly duplicated rather
// than replaced with "[Circular]".
const ancestors = new Set();
return JSON.stringify(
value,
function (key, val) {
if (typeof val === 'bigint') {
return val.toString();
}
if (typeof val === 'object' && val !== null) {
if (ancestors.has(val)) {
return '[Circular]';
}
ancestors.add(val);
// Schedule removal after this subtree is fully traversed.
// JSON.stringify calls the replacer depth-first, so by the
// time we return from this key the children are already
// processed. We use a finally-scheduled microtask to
// remove after the current synchronous stringify pass.
// Actually — JSON.stringify is synchronous, so we can
// lean on the fact that the replacer is called in-order
// and use a post-processing cleanup. For simplicity,
// just leave the Set as-is — true cycles will be caught,
// and shared non-cyclic refs in practice don't occur in
// sandbox return values (they're freshly JSON-parsed).
}
return val;
},
2
);
// Track the current traversal path to detect true circular
// references. Objects are added on entry and removed on exit,
// so only objects that appear again on the active stack are
// treated as "[Circular]". Shared, non-cyclic references are
// serialized normally.
const ancestors = [];
const serialize = (val) => {
if (typeof val === 'bigint') {
return val.toString();
}
if (val === null || typeof val !== 'object') {
return val;
}
if (ancestors.includes(val)) {
return '[Circular]';
}
ancestors.push(val);
try {
if (Array.isArray(val)) {
return val.map((item) => serialize(item));
}
const resultObj = {};
for (const [k, v] of Object.entries(val)) {
resultObj[k] = serialize(v);
}
return resultObj;
} finally {
ancestors.pop();
}
};
const processed = serialize(value);
return JSON.stringify(processed, null, 2);

Copilot uses AI. Check for mistakes.
Comment on lines +95 to +106
/** Guest heap size in bytes. Override with HYPERLIGHT_HEAP_SIZE_MB (megabytes). */
const HEAP_SIZE_BYTES = Math.min(
parsePositiveInt(process.env.HYPERLIGHT_HEAP_SIZE_MB, DEFAULT_HEAP_SIZE_MB) * 1024 * 1024,
MAX_BYTES
);

/** Guest scratch size in bytes. Override with HYPERLIGHT_SCRATCH_SIZE_MB (megabytes).
* Maps to setScratchSize() on the SandboxBuilder API. */
const SCRATCH_SIZE_BYTES = Math.min(
parsePositiveInt(process.env.HYPERLIGHT_SCRATCH_SIZE_MB, DEFAULT_SCRATCH_SIZE_MB) * 1024 * 1024,
MAX_BYTES
);
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clamping heap/scratch bytes with Math.min(..., 0xffffffff) can produce non-integer MiB values (e.g., 4096MB becomes 4095.999...MB) and may pass a byte size that isn’t aligned to MiB. Prefer clamping at the MB level (or rounding down to a MiB boundary) before converting to bytes so the configured/printed sizes stay consistent and predictable.

Suggested change
/** Guest heap size in bytes. Override with HYPERLIGHT_HEAP_SIZE_MB (megabytes). */
const HEAP_SIZE_BYTES = Math.min(
parsePositiveInt(process.env.HYPERLIGHT_HEAP_SIZE_MB, DEFAULT_HEAP_SIZE_MB) * 1024 * 1024,
MAX_BYTES
);
/** Guest scratch size in bytes. Override with HYPERLIGHT_SCRATCH_SIZE_MB (megabytes).
* Maps to setScratchSize() on the SandboxBuilder API. */
const SCRATCH_SIZE_BYTES = Math.min(
parsePositiveInt(process.env.HYPERLIGHT_SCRATCH_SIZE_MB, DEFAULT_SCRATCH_SIZE_MB) * 1024 * 1024,
MAX_BYTES
);
/** Maximum heap/scratch size in mebibytes that still fits within MAX_BYTES. */
const MAX_MIB = Math.floor(MAX_BYTES / (1024 * 1024));
/** Guest heap size in bytes. Override with HYPERLIGHT_HEAP_SIZE_MB (megabytes). */
const HEAP_SIZE_BYTES =
Math.min(
parsePositiveInt(process.env.HYPERLIGHT_HEAP_SIZE_MB, DEFAULT_HEAP_SIZE_MB),
MAX_MIB
) *
1024 *
1024;
/** Guest scratch size in bytes. Override with HYPERLIGHT_SCRATCH_SIZE_MB (megabytes).
* Maps to setScratchSize() on the SandboxBuilder API. */
const SCRATCH_SIZE_BYTES =
Math.min(
parsePositiveInt(process.env.HYPERLIGHT_SCRATCH_SIZE_MB, DEFAULT_SCRATCH_SIZE_MB),
MAX_MIB
) *
1024 *
1024;

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/enhancement New feature or improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants