feature: dynamo integration with nat profiler and prometheus/grafana dashboard#1486
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds an optimized Thompson Sampling router and processors for Dynamo with dynamic discovery, Prometheus/Grafana monitoring and dashboards, multi-worker startup/stop/orchestration and image-build tooling, NAT profiler integration for Dynamo metrics, transport-level LLM hint injection, many monitoring/observability scripts, and extensive docs and demos. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Frontend
participant Processor
participant Router
participant Worker
participant Prometheus
Client->>Frontend: HTTP generate request (headers / nvext.annotations)
Frontend->>Processor: route to backend.generate
Processor->>Processor: extract hints (prefix_id, reuse_budget, osl, iat)
Processor->>Router: find_worker(tokens, prefix_id, reuse_budget)
Router->>Prometheus: record decision metric
Router-->>Processor: worker_id, decision_id
Processor->>Worker: forward/stream generate request
Worker-->>Processor: stream response chunks
Processor->>Prometheus: update latency/token metrics
Processor->>Prometheus: async update KVE metrics
Processor-->>Frontend: stream response back
Frontend-->>Client: deliver response
Processor->>Router: feedback(decision_id, latency, success)
Router->>Prometheus: record feedback / update baselines
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 11
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
external/dynamo/start_dynamo_unified.sh (1)
17-25: Update the header comment to match the new multi‑worker defaults.The header still mentions GPUs 0‑3 with TP=4, but defaults are now 0‑7 and TP=2 with multiple workers. Please refresh the description. As per coding guidelines.
📝 Example update
-# - Unified Worker (GPUs 0,1,2,3, TP=4, no disaggregation) +# - Unified Workers (default GPUs 0-7, TP=2 per worker, no disaggregation)
🤖 Fix all issues with AI agents
In `@external/dynamo/collect_metrics.sh`:
- Around line 27-41: The curl calls that write to "$frontend_file" and
"$worker_file" should use --fail and --max-time (e.g., --max-time 10) and check
curl's exit status so failures don't create empty/partial files; fetch into temp
files, verify each curl succeeded, log an error or warning to stderr with
context ("frontend metrics" / "worker metrics") and either exit 1 or skip
inclusion into "$combined_file" based on desired behavior, then atomically move
the temp files to "$frontend_file" and "$worker_file" before building
"$combined_file".
In
`@external/dynamo/monitoring/grafana/provisioning/dashboards/json/dynamo-overview.json`:
- Around line 1-4: Add the SPDX Apache-2.0 header object according to the repo’s
JSON convention immediately before the existing dashboard content (i.e., before
the top-level "annotations" key); update the JSON to include the standard SPDX
fields (e.g., "spdx": { "license": "Apache-2.0", "copyright": "...", ... })
following the repository’s header schema so the file begins with the SPDX header
object then the existing dashboard object with "annotations".
In `@external/dynamo/monitoring/README.md`:
- Around line 1-3: Add the SPDX Apache-2.0 license header as the very first line
of the README before the "# Dynamo Monitoring Stack" title; update the README.md
for the "Dynamo Monitoring Stack" document to include the exact SPDX identifier
"SPDX-License-Identifier: Apache-2.0" on its own line at the top of the file.
In `@external/dynamo/optimized/ARCHITECTURE.md`:
- Around line 1-2: The ARCHITECTURE.md file titled "Optimized Thompson Sampling
Router Architecture" is missing the required SPDX Apache-2.0 header; add the
standard SPDX header comment (e.g., "SPDX-License-Identifier: Apache-2.0") at
the very top of the file before any content so the document complies with
licensing guidelines and linter checks.
In `@external/dynamo/optimized/PARAMETERS.md`:
- Around line 1-3: Add the standard SPDX Apache-2.0 header to the top of
PARAMETERS.md to comply with licensing guidelines; insert the SPDX identifier
line "SPDX-License-Identifier: Apache-2.0" (and optional year/owner comment
above or beside it per project convention) as the first lines of the file so the
new Markdown doc describing WorkloadAwareRouter in router.py includes the
required license header.
In `@external/dynamo/optimized/processor.py`:
- Around line 718-745: Add a return type annotation to parse_args() (->
argparse.Namespace) and add a new CLI argument "--kv-cache-block-size"
(type=int) that defaults from the DYNAMO_KV_BLOCK_SIZE environment variable or
64 when unset; ensure parse_args returns the Namespace with this value and then
thread that value into the call to register_llm(...) (or wherever the model/card
is registered) so register_llm receives kv_cache_block_size instead of the
hard-coded 1, keeping the argument name kv_cache_block_size consistent with
existing worker/frontend usage.
In `@external/dynamo/optimized/README.md`:
- Around line 1-3: Add the SPDX license header to the top of the README so the
file begins with the standard SPDX identifier before the existing title line "#
Optimized Thompson Sampling Router Architecture"; insert a single-line header
"SPDX-License-Identifier: Apache-2.0" (with no other characters) as the first
line of the file so the README follows the repository's Markdown licensing
guideline.
In `@external/dynamo/start_dynamo_optimized_thompson_hints.sh`:
- Around line 284-301: The script currently uses read -p to prompt for HF_TOKEN
which can echo the secret to the terminal; change the interactive prompt to use
a silent read (e.g., read -s -r HF_TOKEN) so the token is not displayed, add a
newline/confirmation echo after the silent read to keep UX clean, and ensure the
later checks that set HF_TOKEN="dummy" or echo "✓ HuggingFace token received"
remain unchanged; update the prompt logic around the HF_TOKEN and
LOCAL_MODEL_DIR checks to use the silent read and not print the token.
- Around line 77-83: Validate GPU/TP sizing after computing NUM_GPUS and before
using NUM_WORKERS: check that TP_SIZE is a positive integer and that NUM_GPUS >=
TP_SIZE and that NUM_GPUS is divisible by TP_SIZE so NUM_WORKERS computed from
NUM_GPUS/TP_SIZE is > 0; if the check fails, print a clear error using
WORKER_GPUS, TP_SIZE and NUM_GPUS context and exit non‑zero to avoid later
indexing in wait_for_worker or using CONTAINER_GPU_INDICES when NUM_WORKERS ==
0. Ensure the validation runs right after NUM_WORKERS is assigned and prevents
further execution on bad configs.
In `@external/dynamo/start_dynamo_unified.sh`:
- Around line 53-55: NUM_WORKERS calculation can yield 0 or incorrect values
when TP_SIZE > NUM_GPUS or when NUM_GPUS is not divisible by TP_SIZE; add
validation after computing NUM_WORKERS (and before accessing WORKER_PIDS) to
check that NUM_GPUS and TP_SIZE are positive integers, that TP_SIZE <= NUM_GPUS,
and that NUM_GPUS % TP_SIZE == 0, and if any check fails print a clear error and
exit (use the same logging/exit pattern as the script). Also consider computing
NUM_WORKERS as an integer division only after validation to avoid truncation
surprises and reference NUM_WORKERS, NUM_GPUS, TP_SIZE and WORKER_PIDS in the
error message so the failure is easy to debug.
In `@src/nat/llm/dynamo_llm.py`:
- Around line 393-410: The code always generates a prefix ID even when
prefix_template is intended to disable hints; change the logic in the block that
sets unique_id/prefix_id so that if prefix_template is None you set prefix_id =
None (do not generate unique_id or call uuid.uuid4()), otherwise generate
unique_id and compute prefix_id = prefix_template.format(uuid=unique_id) (or
fallback f"nat-dynamo-{unique_id}" when prefix_template is truthy but empty),
update the logger to only log the created prefix when prefix_id is not None, and
pass that prefix_id (possibly None) into _DynamoTransport so the transport can
skip injecting hints when prefix_id is None.
🟡 Minor comments (19)
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/profile_rethinking_full_test.yml-243-246 (1)
243-246: Update Prometheus scrape interval reference to match actual config.The comment states Prometheus scrapes every 5s, but the Prometheus configuration in
external/dynamo/monitoring/prometheus.ymldefinesscrape_interval: 2s. Update the minimum range calculation accordingly.Proposed fix
- # Minimum: 15s (Prometheus scrapes every 5s, need ≥3 points for reliable rates) + # Minimum: 6s (Prometheus scrapes every 2s, need ≥3 points for reliable rates)external/dynamo/optimized/ARCHITECTURE.md-13-14 (1)
13-14: Replace inanimate possessive ’s in Markdown.Phrases like “Frontend’s built-in router” and “Dynamo’s
--model-name/--model-path” violate the doc style rule. Please rephrase (e.g., “the built-in router of the frontend”, “the Dynamo--model-name/--model-path”). As per coding guidelines, ...✍️ Example rephrases
-3. **Frontend's built-in router becomes irrelevant** +3. **The built-in router of the frontend becomes irrelevant** -| Model Mapping | Custom `FRONTEND_MODEL_MAPPING` | Dynamo's `--model-name`/`--model-path` | +| Model Mapping | Custom `FRONTEND_MODEL_MAPPING` | the Dynamo `--model-name`/`--model-path` | -> default frontend, but this is **irrelevant** in our architecture. The frontend's built-in +> default frontend, but this is **irrelevant** in our architecture. The built-in router of the frontendAlso applies to: 216-224, 360-363
external/dynamo/optimized/ARCHITECTURE.md-15-21 (1)
15-21: Add language identifiers to fenced code blocks (MD040).Markdownlint flags these fenced blocks for missing language tags. Please add
text,bash, orjsonas appropriate.🧩 Example fix
-``` +```text Frontend (built-in router: round-robin) → routes to dynamo.backend.generate → OUR PROCESSOR (intercepts!) → queries Thompson Sampling router → forwards to dynamo.worker.generate (actual SGLang workers) -``` +```Also applies to: 23-24, 297-304, 307-313
src/nat/llm/dynamo_llm.py-22-23 (1)
22-23: Docstring still references event hooks; update to transport-based injection.The module now uses a custom transport, but the docstring still describes event hooks.
📝 Suggested update
-The implementation uses httpx event hooks to inject hints at the HTTP transport level, +The implementation uses a custom httpx transport to inject hints at the HTTP level,external/dynamo/monitoring/README.md-383-392 (1)
383-392: Wrap bare URLs in inline code or links (MD034).Bare URLs are flagged by markdownlint; please wrap them in inline code or convert to links (e.g.,
http://localhost:3000).🔗 Example fix
-Then open http://localhost:3000 in your browser. +Then open `http://localhost:3000` in your browser. ... -Example queries: +Example queries: -```promql +```promql # Request rate (requests/second) rate(dynamo_frontend_requests_total[1m])Also applies to: 414-488
external/dynamo/optimized/README.md-7-30 (1)
7-30: Specify languages for the untyped fenced blocks (MD040).Add a language such as
textto the architecture diagram and expected‑output snippet to satisfy markdownlint.🧹 Example fix
-``` +```text ┌─────────────────────────────────────────────────────────────────────────┐ │ Client Request (with nvext.annotations) │ ... └─────────────────────────────────────────────────────────────────────────┘ -``` +``` ... -``` +```text Step 3: Starting Custom Processor (Registers as backend.generate)... ... INFO processor.generate: Routing decision: worker=... decision=... -``` +```Also applies to: 157-174
external/dynamo/optimized/README.md-13-48 (1)
13-48: Avoid possessive 's for inanimate objects in docs.Rephrase instances like “Processor's model card”, “Frontend's ModelWatcher”, and “frontend's expected model” to non‑possessive forms for doc style compliance. As per coding guidelines.
✍️ Example wording updates
-│ ↓ finds Processor's model card! │ +│ ↓ finds the model card for the processor │ ... -3. **Frontend's ModelWatcher** discovers the processor's model card +3. **Frontend ModelWatcher** discovers the model card for the processor ... -- `--model-name`: Served model name (must match frontend's expected model) +- `--model-name`: Served model name (must match the expected model for the frontend)Also applies to: 86-88
external/dynamo/monitoring/README.md-24-55 (1)
24-55: Add language identifiers to fenced blocks flagged by markdownlint.Several code fences (diagram, KVES formula, file tree, metric lists) are missing a language tag. Add
text(or a more specific language) to satisfy MD040.🧹 Example fixes
-``` +```text ┌──────────────────────────────────────────────────────────────────────────────┐ │ Dynamo Stack │ ... └──────────────────────────────────────────────────────────────────────────────┘ -``` +``` ... -``` +```text KVES = (TotalWork - ActualWork) / TotalWork ∈ [0,1] ... w_hit = (w_gpu_hit, w_cpu_hit, w_disk_hit) # weights per hit source -``` +```Also applies to: 165-173, 295-308, 531-545
src/nat/profiler/profile_runner.py-230-251 (1)
230-251: Preserve stack traces and satisfy BLE001 when Dynamo metrics collection fails.The broad catch logs with
logger.warning, which drops the stack trace and still triggers BLE001. If the broad catch is intentional, switch tologger.exceptionand annotate with# noqa: BLE001(or narrow the exception list). As per coding guidelines.🛠️ Suggested change
- except Exception as e: - logger.warning("Failed to collect Dynamo metrics: %s", e) + except Exception: # noqa: BLE001 + logger.exception("Failed to collect Dynamo metrics")external/dynamo/start_dynamo_unified.sh-480-482 (1)
480-482: Fix the separator typo in the output banner.The separator line prints
...ca...; this looks like an accidental keystroke.🧹 Proposed fix
- echo "=====================================ca====================" + echo "========================================================="external/dynamo/monitoring/README.md-185-195 (1)
185-195: Doc style tweaks: avoid inanimate possessives and hyphenate compounds.Rephrase “SGLang's/Dynamo's …” and change “KV cache related” → “KV cache‑related”; also remove the blank line inside the blockquote to satisfy MD028. As per coding guidelines.
✍️ Example wording updates
-> **Why use SGLang's native metric?** SGLang computes cache hit rate internally but doesn't include +> **Why use the native metric from SGLang?** SGLang computes cache hit rate internally but doesn't include > `cached_tokens` in its API responses. The processor's `thompson_kve_*` counters will show 0 > unless the underlying engine provides `usage.prompt_tokens_details.cached_tokens`. - -> **Note on Full KVES**: To implement the full KVES equation with CPU/disk hit weights, you would need +> **Note on the full KVES**: To implement the full KVES equation with CPU/disk hit weights, you would need ... -This section documents the working status of all KV cache related metrics across the Dynamo stack. +This section documents the working status of all KV cache‑related metrics across the Dynamo stack.external/dynamo/monitoring/grafana/provisioning/dashboards/json/dynamo-overview.json-96-104 (1)
96-104: Align the Requests panel query with the "req/min" unit.The panel is labeled "Requests (1m)" with unit "reqpm", but the query uses a 5s window. Use a 1m increase window to match the declared per-minute metric.
Suggested query change
- "expr": "sum(increase(dynamo_frontend_requests_total[5s]))", + "expr": "sum(increase(dynamo_frontend_requests_total[1m]))",external/dynamo/optimized/router.py-1054-1062 (1)
1054-1062: Remove TODO in production codeThe TODO for metrics querying violates the no-TODOs guideline. Either implement a stubbed query with an explicit config gate or track it in an issue. As per coding guidelines, avoid TODOs in committed code/comments.
Do you want me to draft a minimal metrics client stub (or open an issue template) for this block?
src/nat/profiler/inference_optimization/dynamo_metrics.py-1059-1062 (1)
1059-1062: Docstring typo:kv_cache_efficiencydoesn’t existThe field is
kv_efficiency. This example will mislead users. As per coding guidelines, keep documentation accurate.✅ Suggested fix
- print(f"KV Efficiency: {core.kv_cache_efficiency:.2%}") + print(f"KV Efficiency: {core.kv_efficiency:.2%}")external/dynamo/.env.example-108-115 (1)
108-115: Export KV tuning vars so child processes see themWithout
export, these values won’t reach the startup scripts and defaults will be used.✅ Suggested fix
-DYNAMO_KV_BLOCK_SIZE=64 +export DYNAMO_KV_BLOCK_SIZE=64 @@ -DYNAMO_MEM_FRACTION_STATIC=0.9 +export DYNAMO_MEM_FRACTION_STATIC=0.9external/dynamo/optimized/processor.py-232-234 (1)
232-234: Remove hard-coded Dynamo version in docstringVersion numbers shouldn’t be embedded in
.pydocs. As per coding guidelines, keep docs version-agnostic.✅ Suggested fix
- # doesn't expose custom bucket configuration in Dynamo 0.7.1) + # doesn't expose custom bucket configuration in the current Dynamo release)external/dynamo/optimized/router.py-1259-1328 (1)
1259-1328: Add return type hint for the CLI parserPublic functions require return type hints. Add
-> argparse.Namespacetoparse_args().✅ Suggested fix
-def parse_args(): +def parse_args() -> argparse.Namespace:external/dynamo/optimized/processor.py-488-491 (1)
488-491: Uselogger.exception()to capture stack trace informationThis exception handler catches and logs without re-raising, so it should use
logger.exception()to preserve the full stack trace. Also, when usinglogger.exception(), omit the exception object from the format arguments—it's automatically captured:Suggested fix
except Exception as e: - logger.error("Failed to pick worker: %s", e) + logger.exception("Failed to pick worker") self._metrics.router_errors_total.inc() return None, NoneConsider narrowing the exception type if possible based on what the router client can raise.
src/nat/profiler/inference_optimization/dynamo_metrics.py-717-720 (1)
717-720: Uselogger.exceptionwhen swallowing errors without re-raisingThese exception handlers catch exceptions without re-raising and don't capture stack traces. According to coding guidelines, use
logger.exception()(orlogger.error(exc_info=True)) to preserve full diagnostic information for exceptions caught without re-raising.✅ Example adjustment for lines 717-720
except Exception as e: error_msg = f"Failed to collect {metric_name}: {e}" - logger.warning(error_msg) + logger.exception(error_msg) result.errors.append(error_msg)Also applies to:
- Lines 972-974: Replace
logger.debug()withlogger.exception()- Lines 1001-1002 and 1008-1009: Add
logger.exception()calls to capture stack traces before appending to the errors list
🧹 Nitpick comments (4)
external/dynamo/monitoring/docker-compose.yml (1)
33-42: Follow environment variable pattern for Grafana credentials and use safer defaults.This file is documented for local development only, but using hardcoded admin credentials and enabling anonymous Admin access should still follow environment variable defaults for consistency with best practices. Additionally, fix the file ending: the file currently ends with 2 blank lines instead of 1.
Recommended changes
- - GF_AUTH_ANONYMOUS_ENABLED=true - - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin - - GF_AUTH_DISABLE_LOGIN_FORM=true - - GF_SECURITY_ADMIN_USER=admin - - GF_SECURITY_ADMIN_PASSWORD=admin + - GF_AUTH_ANONYMOUS_ENABLED=${GRAFANA_ANON_ENABLED:-false} + - GF_AUTH_ANONYMOUS_ORG_ROLE=${GRAFANA_ANON_ROLE:-Viewer} + - GF_AUTH_DISABLE_LOGIN_FORM=${GRAFANA_DISABLE_LOGIN_FORM:-false} + - GF_SECURITY_ADMIN_USER=${GRAFANA_ADMIN_USER:-admin} + - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_ADMIN_PASSWORD:-admin}Remove trailing blank lines at end of file to have only a single newline.
src/nat/data_models/profiler.py (1)
44-105: Trim theDynamoMetricsConfigdocstring to Google-style format.Guidelines require Google-style docstrings for public classes; consider a concise summary +
Attributes:and move the long how‑to content to external docs. As per coding guidelines.external/dynamo/monitoring/grafana/provisioning/dashboards/json/dynamo-overview.json (1)
572-584: Replace hard-coded worker instance queries with dynamic per-instance aggregation.The current queries for refId D and E pin specific instances (
localhost:18081andlocalhost:18082), which breaks the panel if the worker count changes. Usesum by (instance)to automatically show all worker instances:♻️ Suggested query change
- { - "expr": "rate(dynamo_component_requests_total{dynamo_namespace=\"workers\",dynamo_component=\"worker\",instance=\"localhost:18081\"}[5s])", - "legendFormat": "4. Worker 0 (18081) [rate(dynamo_component_requests_total{worker:18081}[5s])]", - "refId": "D" - }, - { - "expr": "rate(dynamo_component_requests_total{dynamo_namespace=\"workers\",dynamo_component=\"worker\",instance=\"localhost:18082\"}[5s])", - "legendFormat": "4. Worker 1 (18082) [rate(dynamo_component_requests_total{worker:18082}[5s])]", - "refId": "E" - }, + { + "expr": "sum by (instance) (rate(dynamo_component_requests_total{dynamo_namespace=\"workers\",dynamo_component=\"worker\"}[5s]))", + "legendFormat": "4. Worker ({{instance}}) [rate(dynamo_component_requests_total{worker}[5s])]", + "refId": "D" + },external/dynamo/optimized/router.py (1)
472-557: Harden debug trace storage locationDefaulting to
/tmpis vulnerable to symlink races on shared systems. Consider a user-scoped cache dir and restrictive permissions.♻️ Suggested hardening
- debug_trace_dir: str = "/tmp/dynamo_router_traces", + debug_trace_dir: str = "~/.cache/dynamo_router_traces", @@ - self.debug_trace_dir = str(debug_trace_dir) + self.debug_trace_dir = os.path.expanduser(str(debug_trace_dir)) @@ - os.makedirs(self.debug_trace_dir, exist_ok=True) + os.makedirs(self.debug_trace_dir, mode=0o700, exist_ok=True)
There was a problem hiding this comment.
Actionable comments posted: 13
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
.pre-commit-config.yaml (1)
37-42: Addjupytertoadditional_dependencies—the hook will fail without it.The shell script is properly executable with correct shebang, but it calls
jupyter nbconverton line 25. Sincelanguage: pythoncreates an isolated venv, onlynbconvertis installed fromadditional_dependencies: ["nbconvert"]. Thejupytercommand won't be available, causing the hook to fail at runtime. Add"jupyter"to additional_dependencies. (Note:nbconvertis already pinned to 7.16.6 inuv.lock, so the version pinning concern is addressed.)external/dynamo/start_dynamo_disagg.sh (1)
27-27: Stale comment: port reference is outdated.The comment states "Frontend: Port 8099" but the default HTTP port has been changed to 8000 on line 39. Update the comment to reflect the new default.
-# Frontend: Port 8099 (HTTP API) +# Frontend: Port 8000 (HTTP API, configurable via DYNAMO_HTTP_PORT)external/dynamo/start_dynamo_unified_thompson_hints.sh (1)
28-28: Stale comment: port reference is outdated.The comment states "Frontend: Port 8099" but the default HTTP port has been changed to 8000 on line 44. Update the comment for consistency.
-# Frontend: Port 8099 (HTTP API with prefix hint headers) +# Frontend: Port 8000 (HTTP API with prefix hint headers)
🤖 Fix all issues with AI agents
In `@external/dynamo/monitoring/docker-compose.yml`:
- Around line 30-43: The Grafana service is exposed insecurely via network_mode:
host with anonymous admin access and login disabled; update the grafana service
to bind only to localhost (remove network_mode: host and use a ports mapping
like 127.0.0.1:3000:3000) and harden the environment defaults by setting
GF_AUTH_ANONYMOUS_ENABLED=false and GF_AUTH_DISABLE_LOGIN_FORM=false while
keeping GF_SECURITY_ADMIN_USER and GF_SECURITY_ADMIN_PASSWORD present; make
these defaults overrideable via environment or an .env file so developers can
re-enable anonymous/local-only access when explicitly needed (refer to the
grafana service, network_mode, GF_AUTH_ANONYMOUS_ENABLED,
GF_AUTH_DISABLE_LOGIN_FORM, GF_AUTH_ANONYMOUS_ORG_ROLE).
In `@external/dynamo/monitoring/README.md`:
- Around line 222-224: Replace inanimate possessive phrasing like "SGLang's
native metric" with non-possessive alternatives such as "the native SGLang
metric" or "the native metric from SGLang" in the README text (specifically the
sentence starting "SGLang computes cache hit rate internally..."). Update any
other occurrences of similar inanimate possessives in this section (e.g.,
"SGLang's", "the processor's `thompson_kve_*` counters") to use equivalent
non-possessive constructions to match the documentation style guidelines.
- Around line 229-233: Update the header paragraph "KV Cache Metrics Status" to
hyphenate the compound adjective by replacing the phrase "KV cache related
metrics" with "KV cache-related metrics" (preferably using a non-breaking hyphen
U+2011 if you want to prevent line-wrapping); locate the sentence in the same
paragraph that mentions the `${backend}` template variable and make the
single-word change there to ensure grammatical correctness.
- Around line 571-573: The README example uses a user-specific absolute path;
update the example invocation to use a generic placeholder instead (e.g.,
/path/to/repo or $REPO_ROOT) so it’s portable; modify the example line that runs
./collect_metrics.sh ./metrics_output 30 to show changing directory to a generic
repo root (instead of
/localhome/local-bbednarski/NeMo-Agent-Toolkit/external/dynamo) and keep the
collect_metrics.sh command unchanged to preserve the usage example.
In `@external/dynamo/monitoring/scripts/cache_experiment.sh`:
- Around line 1-14: The script cache_experiment.sh currently has an abbreviated
header; replace it with the full Apache-2.0 license header block (the standard
boilerplate used in the repo) including the SPDX-License-Identifier: Apache-2.0
line, the full copyright notice (Copyright (c) 2025-2026, NVIDIA CORPORATION &
AFFILIATES. All rights reserved.), and the complete Apache License 2.0 notice
text (permissions, conditions, and disclaimer) so the file contains the
canonical license template at the top.
In `@external/dynamo/monitoring/scripts/kv_event_observer.py`:
- Line 85: The parameter parent_hash in function record_stored is unused and
triggers a lint warning; either rename it to _parent_hash to indicate
intentional unusedness or actually use it (e.g., log or store ancestry info).
Edit the record_stored signature (record_stored) to rename parent_hash to
_parent_hash if you don't need it, or add a single-line usage inside
record_stored (such as including parent_hash in the emitted metric/log or
appending it to an ancestry list) so the linter no longer flags it.
- Around line 1-35: Add the required SPDX Apache-2.0 license header immediately
after the shebang line in kv_event_observer.py; place the standard multi-line
SPDX/Apache-2.0 header (including SPDX-License-Identifier: Apache-2.0 and the
copyright/notice block per repo policy) right after "#!/usr/bin/env python3" so
the file header is compliant before the module docstring and any definitions (no
changes to functions or logic like the KVEventBatch parsing or metrics polling).
In `@external/dynamo/monitoring/scripts/README.md`:
- Around line 1-5: Add the SPDX license header "SPDX-License-Identifier:
Apache-2.0" to the very top of the README.md for the "KV Cache Event Observer"
(place the header before the first "# KV Cache Event Observer" line) so the file
includes the required Apache-2.0 license identifier per project guidelines.
In `@external/dynamo/optimized/ARCHITECTURE.md`:
- Around line 9-14: Update the inanimate possessive phrasing in ARCHITECTURE.md:
replace occurrences like "Frontend's built-in router" with "the built-in router
of the frontend" (and similar rephrasings) throughout the document, including
the other instances noted around the section discussing DYN_ROUTER_MODE,
dynamo.backend.generate, and dynamo.worker.generate (lines referenced in the
review); preserve meaning and references to DYN_ROUTER_MODE,
dynamo.backend.generate, and dynamo.worker.generate while only adjusting
possessive wording to match the style guide.
- Around line 15-24: The fenced code blocks containing the "Frontend (built-in
router: round-robin) → routes to dynamo.backend.generate → OUR PROCESSOR ..."
snippet (and the other similar blocks around the ASCII box starting with
"┌─────────────────────────────────────────────────────────────────────────────────┐")
are missing language identifiers and there is a blank line inside a blockquote;
add an appropriate language tag (e.g., ```text, ```bash or ```json) to each
fenced code block and remove the empty line inside the blockquote so the block
is contiguous, ensuring all occurrences (including the other blocks referenced
in the comment) follow the same fixes.
In `@external/dynamo/start_dynamo_optimized_thompson_hints_vllm.sh`:
- Around line 309-330: The prompt currently uses read -p which echoes the
HF_TOKEN to the terminal; change the interactive prompt in the HF_TOKEN check
(the block referencing HF_TOKEN and LOCAL_MODEL_DIR and the read call) to use
silent input by replacing read -p with read -s -p and then emit an echo newline
after the read so the terminal prompt formatting remains correct, preserving the
existing logic that checks for an empty value and sets HF_TOKEN="dummy" or
prints confirmation.
- Around line 86-92: After computing NUM_GPUS, CONTAINER_GPU_INDICES and
NUM_WORKERS, add a guard that validates TP_SIZE against NUM_GPUS: if TP_SIZE is
zero, greater than NUM_GPUS, or NUM_GPUS is not evenly divisible by TP_SIZE
(NUM_WORKERS equals 0 or NUM_GPUS % TP_SIZE != 0), immediately print a clear
error and exit with non‑zero status. Reference the variables NUM_GPUS, TP_SIZE,
NUM_WORKERS and CONTAINER_GPU_INDICES when implementing the check so the script
fails fast before any worker loops or indexing occur.
In `@external/dynamo/start_dynamo_unified.sh`:
- Line 481: The separator banner echo contains a stray "ca" that should be
removed; locate the echo call that prints
"=====================================ca====================" (in
start_dynamo_unified.sh) and update the string to a clean separator without the
"ca" characters so the output shows a continuous line of "=" characters.
🧹 Nitpick comments (4)
external/dynamo/monitoring/scripts/cache_experiment.sh (2)
188-198: Dependency onbcmay cause failures on minimal systems.The
bcutility is used for arithmetic calculations but may not be installed by default. Consider adding a dependency check or using bash arithmetic where possible.Alternative using awk (commonly available)
-echo -e " Cache hits: ${final_hits} (delta: +$(echo "$final_hits - $initial_hits" | bc))" -echo -e " Cache queries: ${final_queries} (delta: +$(echo "$final_queries - $initial_queries" | bc))" +delta_hits_calc=$(awk "BEGIN {print $final_hits - $initial_hits}") +delta_queries_calc=$(awk "BEGIN {print $final_queries - $initial_queries}") +echo -e " Cache hits: ${final_hits} (delta: +${delta_hits_calc})" +echo -e " Cache queries: ${final_queries} (delta: +${delta_queries_calc})" echo "" # Calculate hit rate for this experiment -delta_hits=$(echo "$final_hits - $initial_hits" | bc) -delta_queries=$(echo "$final_queries - $initial_queries" | bc) +delta_hits=$delta_hits_calc +delta_queries=$delta_queries_calc if [ "$delta_queries" != "0" ]; then - hit_rate=$(echo "scale=1; $delta_hits * 100 / $delta_queries" | bc) + hit_rate=$(awk "BEGIN {printf \"%.1f\", $delta_hits * 100 / $delta_queries}") echo -e "${GREEN}Experiment hit rate: ${hit_rate}%${NC}" fi
173-176: Process cleanup could leave orphaned process on signal.If the script receives a signal (e.g., Ctrl+C) before reaching line 174, the observer process will be orphaned. Consider adding a trap for cleanup.
Suggested fix
Add after line 144:
# Cleanup trap trap 'kill $OBSERVER_PID 2>/dev/null; rm -f "$OBSERVER_LOG"' EXIT INT TERMThen the explicit cleanup at lines 174-176 and 201 can be simplified or removed since the trap handles it.
external/dynamo/stop_dynamo.sh (1)
142-147: Missing error handling for volume removal.If the Prometheus data volume doesn't exist,
docker volume rmwill output an error message. Consider suppressing the error or checking existence first.Suggested fix
if [ "$CLEAR_METRICS" = true ]; then echo "" echo "Clearing Prometheus data volume..." - docker volume rm monitoring_prometheus_data && echo "✓ Prometheus data volume removed (old metrics cleared)" + if docker volume rm monitoring_prometheus_data 2>/dev/null; then + echo "✓ Prometheus data volume removed (old metrics cleared)" + else + echo " (Prometheus data volume not found or already removed)" + fi fiexternal/dynamo/.env.example (1)
59-62: Alignexportusage in the.envexample.This file mixes
exportwith bare assignments. If it’s intended to be sourced in a shell,DYNAMO_HTTP_PORTshould be exported like the others; if it’s for Docker Compose.env, removeexportkeywords or add a note to avoid confusion.🛠️ Consistency option (shell-source)
- DYNAMO_HTTP_PORT=8000 + export DYNAMO_HTTP_PORT=8000
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
external/dynamo/generalized/processor.py (1)
428-433:⚠️ Potential issue | 🔴 CriticalUpdate code to match deployed Dynamo 0.7.1 or add version compatibility layer.
Lines 428-433 assume Dynamo ≥0.8.x (where
create_service()was removed), but all deployment scripts inexternal/dynamo/are pinned to Dynamo 0.7.1, wherecreate_service()is still required. This code will fail at runtime with the current deployment version. Either restore thecreate_service()call for 0.7.1 compatibility, update deployment scripts to Dynamo 0.8.x+, or add conditional logic to detect the runtime version and handle both APIs.This same issue affects multiple files:
generalized/router.py,optimized/router_multilru.py, andoptimized/processor_multilru.py.
🤖 Fix all issues with AI agents
In `@external/dynamo/.env.example`:
- Around line 16-18: The HF_TOKEN placeholder in the .env example is active and
may be accidentally sourced; change the HF_TOKEN export to be commented out
(e.g., prefix the HF_TOKEN line with a comment marker) so users must explicitly
set or export HF_TOKEN themselves; ensure HF_HOME remains exported and leave a
commented example or instruction for users to add their own HF_TOKEN instead of
exporting a dummy value.
- Around line 59-62: Update the .env.example so all variables are consistently
exported and remove unnecessary quotes: add export to variables missing it
(e.g., DYNAMO_HTTP_PORT and the other non-exported entries referenced in the
comment) so they propagate to child processes, and remove the surrounding quotes
from the image URL on the quoted line so the value follows dotenv conventions;
ensure other exported lines (those already using export) remain unchanged and
keep naming identical to existing keys.
In `@external/dynamo/build_multi_lru_image.sh`:
- Around line 193-207: The script currently calls ./container/build.sh while
shell option set -e may cause the script to exit before BUILD_EXIT_CODE is
captured; modify the invocation around the build step (the block that calls
./container/build.sh and assigns BUILD_EXIT_CODE) to temporarily disable set -e
(e.g., set +e), run ./container/build.sh with the existing flags (framework
VLLM, --target "$BUILD_TARGET", --tag "$IMAGE_TAG", --enable-kvbm, build-arg and
vllm-max-jobs, $NO_CACHE, $DRY_RUN), capture the exit status into
BUILD_EXIT_CODE, then re-enable set -e (set -e) so subsequent error handling
works reliably; alternatively implement the &&/|| pattern to set BUILD_EXIT_CODE
explicitly after the command.
- Around line 122-128: When updating an existing clone (KVBM_NEXT_DIR) the
current git sequence uses git checkout "$BRANCH" which can fail if a local
branch named BRANCH doesn't exist; replace that step with logic that ensures a
local branch is created from the remote if missing: after git fetch origin,
check if the local branch exists (e.g., git show-ref --verify --quiet
refs/heads/"$BRANCH"), and if it does git checkout "$BRANCH", otherwise create
it from origin with git checkout -b "$BRANCH" "origin/$BRANCH" (or git branch
--track "$BRANCH" "origin/$BRANCH" then checkout) before running git pull and
git submodule update --init --recursive.
In `@external/dynamo/generalized/router.py`:
- Around line 1069-1071: The comment about create_service() being removed is
inconsistent with other router implementations that still call await
component.create_service(); inspect which Dynamo version the project targets
(pyproject.toml) and either (A) make removal intentional by removing
create_service() everywhere and update the comment near
runtime.namespace("dynamo").component("router") / WorkloadAwareRouter to
explicitly state “create_service removed in Dynamo >=0.8; router endpoints
register automatically” or (B) preserve compatibility by calling create_service
conditionally—e.g., check for hasattr(component, "create_service") or a
runtime/version flag and call await component.create_service() when present—so
that the behavior matches the optimized router implementation (which currently
calls create_service), keeping the comment aligned with the chosen approach.
In `@external/dynamo/optimized/processor_multilru.py`:
- Around line 507-510: Replace the logger.error call in the except block in
external/dynamo/optimized/processor_multilru.py with logger.exception so the
full stack trace is recorded when an exception is caught (keep the existing
message string and the subsequent self._metrics.router_errors_total.inc() and
return None, None intact); locate the except block that currently does
logger.error("Failed to pick worker: %s", e) and change it to
logger.exception(...) to capture the traceback for debugging.
- Around line 389-394: The _extract_hints function's signature and docstring are
out of sync with its actual return values: update the return type annotation
from tuple[str, int, str, str] to include the fifth value (likely a bool for
use_frequency_backend), e.g. tuple[str, int, str, str, bool], and revise the
docstring "Returns:" line to list all five returned items in the correct order
with brief types/names (prefix_id, total_requests, osl, iat,
use_frequency_backend) so documentation matches the implementation.
- Around line 661-664: The fire-and-forget call using
asyncio.create_task(self._update_kve_metrics_async(kve_data)) can let the Task
be garbage-collected; add a class-level set (e.g., self._background_tasks) in
the class __init__ and store the created Task in it, e.g., task =
asyncio.create_task(self._update_kve_metrics_async(kve_data));
self._background_tasks.add(task), and ensure completed tasks are removed (attach
a done callback or await/cleanup in shutdown) so tasks are tracked and not
prematurely collected.
🧹 Nitpick comments (9)
external/dynamo/demo_priority_eviction.sh (1)
53-56: Consider adding defensive handling for the metrics fetch.With
set -euo pipefail, if the metrics endpoint doesn't containprefix_cache_hits_total{,grepreturns exit code 1 and terminates the script. Additionally, if the container isn't running or curl fails, downstream arithmetic on empty values will error.🛠️ Proposed fix to add fallback for robustness
get_hits() { - docker exec dynamo-vllm curl -s http://localhost:18081/metrics 2>/dev/null | \ - grep "prefix_cache_hits_total{" | grep -v external | awk '{print $NF}' + local hits + hits=$(docker exec dynamo-vllm curl -s http://localhost:18081/metrics 2>/dev/null | \ + grep "prefix_cache_hits_total{" | grep -v external | awk '{print $NF}' || true) + echo "${hits:-0}" }external/dynamo/optimized/router_multilru.py (7)
79-81: Addnoqacomment for intentionally unused arguments.The fallback implementation intentionally ignores
tokensandmin_overlapto maintain interface compatibility. Add noqa to suppress the static analysis warning and document the intent.Suggested fix
- async def find_matches_for_request(self, tokens: list[int], min_overlap: int) -> OverlapScores: + async def find_matches_for_request(self, tokens: list[int], min_overlap: int) -> OverlapScores: # noqa: ARG002 """Find overlap scores for each worker. Returns empty scores (round-robin fallback).""" return OverlapScores({})
171-171: Consider usingtempfile.gettempdir()for secure temporary directory handling.Hardcoding
/tmpcan be insecure on multi-user systems. The path is configurable via config, but the default should use the system's secure temp directory.Suggested fix
+import tempfile + def get_builtin_defaults() -> dict[str, Any]: """Return built-in default configuration (matches config.yaml).""" return { # ... other config ... "debug": { "traces_enabled": False, - "trace_dir": "/tmp/dynamo_router_traces", + "trace_dir": os.path.join(tempfile.gettempdir(), "dynamo_router_traces"), "buffer_size": 2000, }, }
378-408: Consider extracting shared Pydantic models to avoid duplication.These models (
RouterRequest,RouterResponse,FeedbackRequest,FeedbackAck) are duplicated inexternal/dynamo/generalized/router.py(lines 80-108) andprocessor_multilru.py. Consider extracting to a sharedmodels.pymodule to follow the DRY principle.
502-503: Log a warning whenlin_forgetis clamped to valid range.The clamping silently modifies user-provided values. Consider logging when the value is adjusted to help with debugging.
Suggested fix
self.lin_forget = float(lints_forget) + original_forget = self.lin_forget self.lin_forget = max(1e-6, min(self.lin_forget, 0.999999)) + if self.lin_forget != original_forget: + logger.warning("lin_forget clamped from %s to %s", original_forget, self.lin_forget)
568-574: Consider logging exception details instead of just the message.The blind
Exceptioncatch is acceptable here since trace writing should never crash the router, but logging only%sloses the exception type. Uselogger.debug(..., exc_info=True)or at least include the exception type.Suggested fix
try: path = os.path.join(self.debug_trace_dir, "router_traces.jsonl") with open(path, "a", encoding="utf-8") as f: f.write(json.dumps(item, separators=(",", ":")) + "\n") - except Exception as e: - logger.debug("Trace write failed: %s", e) + except Exception: + logger.debug("Trace write failed", exc_info=True)
1075-1075: Prefix unused variable with underscore.
chosen_ctxis unpacked but never used. Prefix with underscore to indicate intentional discard and suppress the linter warning.Suggested fix
- chosen, chosen_ctx, all_ctx, raw_scores, probs = self._select_worker(worker_ids, req, metrics, scores) + chosen, _chosen_ctx, all_ctx, raw_scores, probs = self._select_worker(worker_ids, req, metrics, scores)
1257-1264: Add return type hint to_get_underloaded.Per coding guidelines, public/internal APIs should have type hints. This method returns a tuple that should be annotated.
Suggested fix
- def _get_underloaded(self, metrics: dict[str, Any] | None): + def _get_underloaded(self, metrics: dict[str, Any] | None) -> tuple[int, float]: if not metrics or not metrics.get("endpoints"): wid = int(random.choice(list(self.engine_client.instance_ids()))) return wid, 0.0external/dynamo/optimized/processor_multilru.py (1)
157-157: Sort__slots__alphabetically for consistency.Static analysis recommends natural sorting of
__slots__. This is a minor style convention.Suggested fix
- __slots__ = ("prompt_tokens", "cached_tokens", "device_blocks", "host_blocks", "disk_blocks") + __slots__ = ("cached_tokens", "device_blocks", "disk_blocks", "host_blocks", "prompt_tokens")
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_minimal_test.yml (1)
103-111:⚠️ Potential issue | 🟠 MajorUpdate
eval_llmto use port 8000 for consistency.The
eval_llmuses port 8099 whiledynamo_llmuses port 8000 in the same file. This is inconsistent. Theprofile_rethinking_full_test.ymlconfig shows thateval_llmworks correctly with port 8000, and most other eval configs maintain the same port for both LLMs. Update line 106 tobase_url: http://localhost:8000/v1.
🤖 Fix all issues with AI agents
In `@external/dynamo/optimized/processor_multilru.py`:
- Around line 742-769: Update parse_args() to declare and return an
argparse.Namespace return type and add a new CLI flag for kv_cache_block_size
(e.g., parser.add_argument("--kv-cache-block-size", type=int, default=1,
help="...")), then propagate that parsed value into register_llm where
kv_cache_block_size is currently hardcoded to 1 so the function uses
args.kv_cache_block_size; ensure register_llm signature or call sites accept the
new parameter if needed and update any references in the file (including the
other occurrence around lines 809-816) to use the parsed value.
In `@external/dynamo/optimized/processor.py`:
- Around line 497-500: Replace the plain error log with an exception log so the
full stack trace is captured: in the exception handler that currently calls
logger.error("Failed to pick worker: %s", e) (the block that then increments
self._metrics.router_errors_total and returns None, None), use
logger.exception(...) instead (preserving the message text) so the stack trace
is recorded when the method fails to pick a worker.
- Line 654: The create_task call that fires-and-forgets
self._update_kve_metrics_async(kve_data) can be garbage-collected; add a
persistent container (e.g., self._background_tasks: set[asyncio.Task]) in the
class __init__ and replace the bare asyncio.create_task(...) with creating the
task, adding it to self._background_tasks, and attaching a done callback to
remove the task from the set when finished; reference the existing call to
self._update_kve_metrics_async and the class initializer to locate where to add
the set and where to store/remove the task.
In `@external/dynamo/optimized/router_multilru.py`:
- Around line 1343-1345: The multilru router/processor files are inconsistent
with router.py/processor.py regarding component.create_service(); update
router_multilru.py and processor_multilru.py to match the non-multilru files by
conditionally invoking create_service when present: after obtaining component
via runtime.namespace("dynamo").component("router") (and similarly for
processor), check for the existence of create_service on the component and await
it if available (e.g., use a guard like hasattr or getattr to avoid failing on
Dynamo 0.8.x), and update/remove the outdated comment so both variants behave
consistently across Dynamo versions.
🧹 Nitpick comments (13)
external/dynamo/monitoring/scripts/kv_event_observer.py (2)
171-206: Consider validating URL scheme for defensive coding.The
urlopencall constructs URLs fromhostandmetrics_portparameters. While this is an operator-run monitoring script, validating that the constructed URL useshttp://scheme would be more defensive against misconfiguration.🛡️ Optional: Add scheme validation
def _poll_metrics(self): """Background thread to poll Prometheus metrics for cache hits.""" metrics_url = f"http://{self.host}:{self.metrics_port}/metrics" + if not metrics_url.startswith(("http://", "https://")): + raise ValueError(f"Invalid metrics URL scheme: {metrics_url}") while self.running:
507-508: Consider prefixing unused signal handler arguments.The signal handler signature requires
(signum, frame)arguments but they're unused. Prefixing with_improves clarity and silences linter warnings.♻️ Optional: Prefix unused args
- signal.signal(signal.SIGINT, lambda s, f: setattr(observer, 'running', False)) - signal.signal(signal.SIGTERM, lambda s, f: setattr(observer, 'running', False)) + signal.signal(signal.SIGINT, lambda _s, _f: setattr(observer, 'running', False)) + signal.signal(signal.SIGTERM, lambda _s, _f: setattr(observer, 'running', False))external/dynamo/optimized/router.py (5)
79-81: Unused parameters in fallbackKvIndexer.find_matches_for_request.The
tokensandmin_overlapparameters are intentionally unused in this fallback stub implementation. Consider prefixing them with underscores to indicate they are deliberately ignored.Suggested fix
- async def find_matches_for_request(self, tokens: list[int], min_overlap: int) -> OverlapScores: + async def find_matches_for_request(self, _tokens: list[int], _min_overlap: int) -> OverlapScores: """Find overlap scores for each worker. Returns empty scores (round-robin fallback).""" return OverlapScores({})
171-171: Hardcoded/tmppath for debug traces.Using
/tmpas the default trace directory can be a security concern in shared environments. Consider using a user-specific temp directory or requiring explicit configuration.Suggested fix using tempfile
+import tempfile + def get_builtin_defaults() -> dict[str, Any]: """Return built-in default configuration (matches config.yaml).""" return { ... "debug": { "traces_enabled": False, - "trace_dir": "/tmp/dynamo_router_traces", + "trace_dir": os.path.join(tempfile.gettempdir(), "dynamo_router_traces"), "buffer_size": 2000, }, }Apply similar change to the constructor default at line 473.
Also applies to: 473-473
568-573: Uselogger.debugwith exception info for trace write failures.The current implementation catches a blind
Exceptionand logs with%s. Consider usinglogger.debug(..., exc_info=True)to capture the full context for debugging trace write issues.Suggested fix
try: path = os.path.join(self.debug_trace_dir, "router_traces.jsonl") with open(path, "a", encoding="utf-8") as f: f.write(json.dumps(item, separators=(",", ":")) + "\n") - except Exception as e: - logger.debug("Trace write failed: %s", e) + except OSError: + logger.debug("Trace write failed", exc_info=True)
1073-1073: Unused variablechosen_ctxshould be prefixed with underscore.The unpacked
chosen_ctxvariable is not used. Prefix it with an underscore to indicate intentional discard.Suggested fix
- chosen, chosen_ctx, all_ctx, raw_scores, probs = self._select_worker(worker_ids, req, metrics, scores) + chosen, _chosen_ctx, all_ctx, raw_scores, probs = self._select_worker(worker_ids, req, metrics, scores)
1266-1333: Add return type annotation toparse_args().Per coding guidelines, public functions require type hints on return values.
Suggested fix
-def parse_args(): +def parse_args() -> argparse.Namespace: """Parse minimal CLI arguments.external/dynamo/optimized/processor.py (1)
157-157: Sort__slots__tuple for consistency.Ruff suggests sorting
__slots__alphabetically for maintainability.Suggested fix
- __slots__ = ("prompt_tokens", "cached_tokens", "device_blocks", "host_blocks", "disk_blocks") + __slots__ = ("cached_tokens", "device_blocks", "disk_blocks", "host_blocks", "prompt_tokens")external/dynamo/optimized/processor_multilru.py (1)
157-157: Sort__slots__tuple for consistency.Same issue as in processor.py - consider sorting alphabetically.
Suggested fix
- __slots__ = ("prompt_tokens", "cached_tokens", "device_blocks", "host_blocks", "disk_blocks") + __slots__ = ("cached_tokens", "device_blocks", "disk_blocks", "host_blocks", "prompt_tokens")external/dynamo/optimized/router_multilru.py (4)
79-81: Unused parameters in fallbackKvIndexer.find_matches_for_request.Same issue as router.py - prefix unused parameters with underscores.
Suggested fix
- async def find_matches_for_request(self, tokens: list[int], min_overlap: int) -> OverlapScores: + async def find_matches_for_request(self, _tokens: list[int], _min_overlap: int) -> OverlapScores: """Find overlap scores for each worker. Returns empty scores (round-robin fallback).""" return OverlapScores({})
171-171: Hardcoded/tmppath for debug traces.Same security concern as router.py - consider using
tempfile.gettempdir()or requiring explicit configuration.Also applies to: 473-473, 1388-1388
1073-1073: Unused variablechosen_ctxshould be prefixed with underscore.Same issue as router.py.
Suggested fix
- chosen, chosen_ctx, all_ctx, raw_scores, probs = self._select_worker(worker_ids, req, metrics, scores) + chosen, _chosen_ctx, all_ctx, raw_scores, probs = self._select_worker(worker_ids, req, metrics, scores)
1266-1333: Add return type annotation toparse_args().Same issue as router.py - public functions require return type hints.
Suggested fix
-def parse_args(): +def parse_args() -> argparse.Namespace: """Parse minimal CLI arguments.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@external/dynamo/monitoring/README.md`:
- Line 3: Replace the inanimate possessive phrase "ai-dynamo's Prometheus API"
in the README with a non-possessive form such as "the ai-dynamo Prometheus API"
or "the Prometheus API from ai-dynamo" so the sentence reads e.g. "Metrics are
collected at 2-second resolution directly from the ai-dynamo Prometheus API for
per-request granularity."; locate the exact phrase "ai-dynamo's Prometheus API"
in the README and update it to one of the suggested alternatives.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@external/dynamo/E2E_SEQUENCE.md`:
- Around line 329-371: The stub and usage comment disagree on
OverlapScores.scores: update the stub and comment so they match the Rust binding
— decide whether OverlapScores.scores returns raw counts (Dict[int, int]) or
normalized ratios (Dict[int, float]); then change the type annotation on
OverlapScores.scores and the Router usage comment to the chosen representation,
and add a short note pointing to KvIndexer.find_matches_for_request (or
find_matches) as the place where normalization/float() casting occurs so callers
know whether they must cast or if values are already normalized.
- Line 165: The sequence diagram's RouterResponse field name is inconsistent
with the JSON example and router.py; pick one canonical name and make them
match: either change the sequence diagram label RouterResponse to use
"prefix_hit_rate" to match router.py and the JSON example, or change the JSON
example and router.py usage to use "overlap" instead — update all occurrences
(sequence diagram arrow label, the JSON example block, and the RouterResponse
handling in router.py) so RouterResponse consistently uses the same field name
("prefix_hit_rate" or "overlap") across the diagram, JSON example, and code.
- Around line 7-51: The markdown has two fenced code blocks missing language
identifiers (the ASCII-art architecture diagram and the raw HTTP headers block),
which triggers MD040; update the first fenced block containing the NeMo Agent
Toolkit ASCII diagram to use ```text and update the second block showing example
headers (the x-prefix-id/x-prefix-total-requests snippet) to use ```http (or
```text) so both code fences include a language specifier; locate the blocks by
the ASCII art labeled "NeMo Agent Toolkit / DynamoModelConfig" and the raw
headers section that begins with "x-prefix-id" and add the language tokens to
the opening backticks.
- Around line 1-3: Add the standard Apache License 2.0 header comment to the
very top of E2E_SEQUENCE.md (before any content); include the SPDX short
identifier "SPDX-License-Identifier: Apache-2.0" and the usual Apache 2.0
boilerplate header (copyright notice placeholder and permission/limitation
lines) wrapped as a top-of-file comment appropriate for Markdown (e.g., an HTML
comment block <!-- ... -->) so the file is explicitly licensed under Apache-2.0.
Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
6421ddf to
556c7bb
Compare
Signed-off-by: bbednarski9 <bbednarski@nvidia.com>
Signed-off-by: bbednarski9 <bbednarski@nvidia.com>
Signed-off-by: bbednarski9 <bbednarski@nvidia.com>
Signed-off-by: bbednarski9 <bbednarski@nvidia.com>
Signed-off-by: bbednarski9 <bbednarski@nvidia.com>
Signed-off-by: bbednarski9 <bbednarski@nvidia.com>
Signed-off-by: bbednarski9 <bbednarski@nvidia.com>
Signed-off-by: bbednarski9 <bbednarski@nvidia.com>
Signed-off-by: bbednarski9 <bbednarski@nvidia.com>
Signed-off-by: bbednarski9 <bbednarski@nvidia.com>
Signed-off-by: bbednarski9 <bbednarski@nvidia.com>
Signed-off-by: bbednarski9 <bbednarski@nvidia.com>
Signed-off-by: bbednarski9 <bbednarski@nvidia.com>
Signed-off-by: bbednarski9 <bbednarski@nvidia.com>
Signed-off-by: bbednarski9 <bbednarski@nvidia.com>
Signed-off-by: bbednarski9 <bbednarski@nvidia.com>
Signed-off-by: bbednarski9 <bbednarski@nvidia.com>
Signed-off-by: bbednarski9 <bbednarski@nvidia.com>
Signed-off-by: bbednarski9 <bbednarski@nvidia.com>
Signed-off-by: bbednarski9 <bbednarski@nvidia.com>
Signed-off-by: bbednarski9 <bbednarski@nvidia.com>
Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
…or agent_hints validation Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
|
/ok to test 2fab656 |
d95f130 to
8b8b3c6
Compare
Salonijain27
left a comment
There was a problem hiding this comment.
Approved from a dependency point of view
Signed-off-by: bbednarski9 <bbednarski@nvidia.com>
|
/ok to test 2ee1229 |
Signed-off-by: bbednarski9 <bbednarski@nvidia.com>
Signed-off-by: bbednarski9 <bbednarski@nvidia.com>
|
/merge |
…dashboard (NVIDIA#1486) This PR brings the ai-dynamo prometheus metrics API integration with grafana dash boarding into NAT. This is a core feature of the dynamo optimization pipeline for the NAT/dynamo integration. We also integrate the prometheus API with the NAT profiler, so that we can begin end-to-end agentic header-injection optimization. Closes ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **New Features** * Built-in Prometheus/Grafana monitoring, dashboards and a metrics collector; enhanced Dynamo metrics surfaced to the profiler. * Optimized Thompson Sampling routing and processor with observability and multi-worker support. * Multi-worker orchestration, MultiLRU tooling, KV-event observer, demo and image-build scripts, and new deployment start/stop helpers. * **Documentation** * Extensive monitoring, architecture, parameters, and end-to-end deployment guides and diagrams. * **Chores** * Default HTTP port changed from 8099 to 8000; configs and examples updated. Authors: - Bryan Bednarski (https://github.com/bbednarski9) - Dhruv Nandakumar (https://github.com/dnandakumar-nv) Approvers: - Dhruv Nandakumar (https://github.com/dnandakumar-nv) - https://github.com/Salonijain27 URL: NVIDIA#1486
Description
This PR brings the ai-dynamo prometheus metrics API integration with grafana dash boarding into NAT. This is a core feature of the dynamo optimization pipeline for the NAT/dynamo integration. We also integrate the prometheus API with the NAT profiler, so that we can begin end-to-end agentic header-injection optimization.
Closes
By Submitting this PR I confirm:
Summary by CodeRabbit
New Features
Documentation
Chores