-
Notifications
You must be signed in to change notification settings - Fork 63
Open
Description
Summary
On a fresh local OpenShell gateway, inference.local inside a sandbox consistently returns 404 page not found for both:
POST /v1/chat/completions(OpenAI-style)POST /v1/responses(per the docs’ “Verify from sandbox” example)
This happens even though:
- Gateway inference is configured with a valid NVIDIA provider and Nemotron 3 model.
- The sandbox proxy does intercept these calls and routes them through
navigator_routertohttps://integrate.api.nvidia.com/v1with the expected paths.
This effectively breaks the documented https://inference.local inference routing path.
Environment
- Host: Windows 11 + WSL2 (Ubuntu, Docker Engine in WSL)
- OpenShell CLI: installed via
uv pip install openshell --prefrom internalnv-shared-pypi - Docker: logged in to
ghcr.iowith PAT (including SSO) and able to pullghcr.io/nvidia/openshell/*images - Gateway: started via
openshell gateway starton WSL host - Inference backend: NVIDIA Inference API, Nemotron 3 Nano 30B (works directly from WSL with my key)
Steps to Reproduce
1. Start gateway (host / WSL)
# In WSL
uv venv .venv
source .venv/bin/activate
uv pip install openshell --upgrade --pre \
--index-url [https://urm.nvidia.com/artifactory/api/pypi/nv-shared-pypi/simple](https://urm.nvidia.com/artifactory/api/pypi/nv-shared-pypi/simple)
openshell gateway start→ Gateway ready, e.g. Endpoint: https://127.0.0.1:8080
2. Configure NVIDIA provider + Nemotron 3 inference (host / WSL)
export NVIDIA_API_KEY="YOUR_INFERENCE_API_KEY" # same key that works directly against inference-api.nvidia.com
openshell provider create \
--name nvidia-prod \
--type nvidia \
--from-existing
openshell inference set \
--provider nvidia-prod \
--model nvidia/nvidia/Nemotron-3-Nano-30B-A3B
openshell inference getOutput:
Gateway inference:
Provider: nvidia-prod
Model: nvidia/nvidia/Nemotron-3-Nano-30B-A3B
Version: 1
System inference:
Not configured
3. Create and connect to sandbox
openshell sandbox create --name test
openshell sandbox list # wait until Ready
openshell sandbox connect testprompt: sandbox@test:~$
4. Test /v1/chat/completions from sandbox
pip install openai
python - << 'EOF'
from openai import OpenAI
client = OpenAI(
base_url="[https://inference.local/v1](https://inference.local/v1)",
api_key="dummy", # ignored by OpenShell; routing uses configured provider
)
resp = client.chat.completions.create(
model="anything", # should be rewritten to configured model
messages=[{"role": "user", "content": "Hello from OpenShell sandbox!"}],
temperature=0.7,
max_tokens=128,
)
print(resp.choices[0].message.content)
EOFActual result:
openai.NotFoundError: 404 page not found
5. Test /v1/responses from sandbox (per docs)
pip install requests
python - << 'EOF'
import requests, json
url = "[https://inference.local/v1/responses](https://inference.local/v1/responses)"
payload = {
"instructions": "You are a helpful assistant.",
"input": "Hello from OpenShell sandbox!",
}
resp = requests.post(url, json=payload, timeout=60)
print("Status:", resp.status_code)
print("Body:", resp.text[:500])
EOFActual result:
Status: 404
Body: 404 page not found
What I Expected
Given:
openshell inference getshows a configured NVIDIA provider + Nemotron model.- Docs state that
/v1/chat/completionsand/v1/responsesare recognized inference patterns forinference.local. - The “Verify the Endpoint from a Sandbox” example uses
POST /v1/responses.
I expected:
POST https://inference.local/v1/chat/completionsandPOST https://inference.local/v1/responses
to return a normal model response (HTTP 200 + JSON) from inside the sandbox.
What Actually Happens
- Both endpoints return a simple
404 page not foundfrom inside the sandbox. - There is no obvious configuration error on the host/sandbox side (gateway, provider, and inference are all reported as healthy).
Relevant Logs (openshell logs -g openshell)
1773260787.772 INFO Fetching inference route bundle from gateway endpoint=[https://openshell.openshell.svc.cluster.local:8080](https://openshell.openshell.svc.cluster.local:8080)
1773260787.822 INFO Loaded inference route bundle revision=6ce65bfa03d7bff0 route_count=1
1773260787.822 INFO Inference routing enabled with local execution route_count=1
1773260787.823 INFO Proxy listening (tcp) addr=10.200.0.1:3128
... sandbox [navigator_sandbox::proxy] Intercepted inference request, routing locally kind=chat_completion method=POST path=/v1/chat/completions protocol=openai_chat_completions
1773260870.962 INFO routing proxy inference request endpoint=[https://integrate.api.nvidia.com/v1](https://integrate.api.nvidia.com/v1) method=POST path=/v1/chat/completions protocols=openai_chat_completions,openai_completions,openai_responses,model_discovery
... sandbox [navigator_sandbox::proxy] Intercepted inference request, routing locally kind=responses method=POST path=/v1/responses protocol=openai_responses
1773261095.914 INFO routing proxy inference request endpoint=[https://integrate.api.nvidia.com/v1](https://integrate.api.nvidia.com/v1) method=POST path=/v1/responses protocols=openai_chat_completions,openai_completions,openai_responses,model_discovery
Notes:
- The proxy does intercept
inference.localand classifies both/v1/chat/completionsand/v1/responsesas inference requests. navigator_routeris invoked withendpoint=https://integrate.api.nvidia.com/v1andpath=/v1/....- Despite this, the sandbox receives
404 page not foundfor both URLs.
Separately, I’ve confirmed that my NVIDIA Inference API key + Nemotron 3 model work fine directly from WSL against https://inference-api.nvidia.com/v1/chat/completions with the same model ID.
Questions
- Is
integrate.api.nvidia.com/v1the intended upstream endpoint for thenvidiaprovider in this build? - Should the router be constructing
/v1/chat/completionsand/v1/responsesagainst that base as-is, or is there a known issue with the current OpenShell server image’s inference routing? - Is there a different path or configuration I should be using to exercise
inference.localfrom inside a sandbox on the current version?
Happy to provide more logs or try a specific build/tag if that helps narrow it down.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels