feat(inference): WIP inference lifecycle prototype — discover, validate, register, serve by bussyjd · Pull Request #302 · ObolNetwork/obol-stack

bussyjd · 2026-03-29T16:00:21Z

Draft PR for issue #300 — Hermes Agent automated inference lifecycle.

What's here

1,942 lines of Python prototype across 8 files in internal/inference/lifecycle/:

File	Lines	Purpose
registry.py	253	SQLite model registry with state machine
hardware.py	188	GPU/system auto-detection, optimal ngl
discover.py	218	x/LocalLLaMA signal sourcing via x-cli
validate.py	347	llama-bench + ToolCall-15 eval pipeline
serve.py	316	systemd service mgmt, hot-swap + rollback
api.py	272	FastAPI routes for all lifecycle endpoints
test_inference_lifecycle.py	348	Unit tests for all modules

Context

Built during prototyping session where we:

Set up llama.cpp + TurboQuant on SilverMesh (AMD RX 6800 XT)
Benchmarked Qwen3.5-27B Q6_K → 15/15 on ToolCall-15
Validated x-cli for community signal sourcing
Tested llama-server as systemd service with auto-restart

Status

WIP — needs review, integration with obol-stack ServiceOffer CRD, and x402 payment wiring from PR #288.

Refs: #300

…te, register, serve Implements issue #300 prototype with Python modules: - registry.py: SQLite-backed model registry with state machine - hardware.py: GPU/system auto-detection, optimal ngl calculation - discover.py: x/LocalLLaMA signal sourcing via x-cli - validate.py: llama-bench + ToolCall-15 evaluation pipeline - serve.py: systemd service management with hot-swap + rollback - api.py: FastAPI routes for all lifecycle endpoints - tests: unit tests for registry, hardware, discovery, validation 1,942 lines total. All new files under internal/inference/lifecycle/. No existing Go files modified. Refs: #300

internal/inference/lifecycle/discover.py

+    ]
+    removed = len(candidates) - len(filtered)
+    if removed:
+        logger.info("Filtered out %d candidates exceeding %.1f GB", removed, max_size_gb)


internal/inference/lifecycle/registry.py

+        conn.execute(f"UPDATE models SET {set_clause} WHERE id = ?",
+                     list(updates.values()) + [model_id])
+        conn.commit()
+        logger.info("Model %s: %s -> %s", model_id, current.status.value, new_status.value)


In general, to fix log injection you should sanitize any user‑controlled strings before logging: strip or replace newline and carriage‑return characters (and optionally other control characters), and avoid logging raw request bodies or other large, untrusted text without normalization. For structured logs that end up as plain text, removing \r and \n is typically sufficient to prevent forging extra log lines.

For this specific case, the best minimal fix is to sanitize model_id before logging it in Registry.update_status. The model_id value can be arbitrarily controlled via the promote_model and retire_model endpoints, and is logged at line 163. We can defensively normalize it right before logging by replacing any \r and \n characters with safe placeholders (e.g., a space or empty string) in a local variable used only for logging. This keeps the database operations and functional behavior unchanged, because all real uses of model_id (SQL queries, state transitions) continue to use the original string; only the log message sees the sanitized version.

Concretely in internal/inference/lifecycle/registry.py, inside update_status, we will:

Introduce a local variable safe_model_id right before the logger.info call.

Set safe_model_id to a sanitized version of model_id, e.g. model_id.replace("\r", "").replace("\n", "").

Change the logger.info call to log safe_model_id instead of model_id.

No additional imports are needed; we rely only on standard string methods. This single change addresses all three CodeQL alert variants because they all trace to the same logging sink.

internal/inference/lifecycle/serve.py

+
+    # Update registry
+    registry.promote(model_id)
+    logger.info("Model %s (%s) now serving on port %d", model_id, model.name, port)


In general, to fix log injection issues, sanitize any user-controlled strings before logging them, especially removing or neutralizing newline and carriage-return characters (and optionally other control characters). When using structured/log-framework formatting (like %s in logger.info), keep that pattern but pass in a sanitized version of the untrusted value.

For this specific case, the best targeted fix is to sanitize model_id before it is logged in start_serving. Since we cannot assume how Registry constrains IDs and we do not want to change functional behavior, we will introduce a small helper function in serve.py that strips \r and \n (and, for completeness, any ASCII control characters) from strings used in logs. We then apply this helper when logging model IDs in start_serving and hot_swap. This preserves all existing functionality (the registry operations use the original model_id), while ensuring that the string sent to the logger cannot inject extra log lines.

Concretely:

In internal/inference/lifecycle/serve.py, define a helper _sanitize_for_log(value: str) -> str near the top of the file.

Replace logger.info("Model %s (%s) now serving on port %d", model_id, model.name, port) with the same log message but passing sanitized versions of model_id and model.name to be conservative.

Replace logger.error("Hot swap to %s failed: %s", new_model_id, e) and logger.info("Rolling back to %s", old_model_id) similarly, so that any logged model IDs are sanitized.

No changes are required in api.py since it does not log the tainted data directly.

No new imports are necessary; the helper can be implemented with basic string methods and a simple comprehension.

internal/inference/lifecycle/serve.py

+
+    # Update registry
+    registry.promote(model_id)
+    logger.info("Model %s (%s) now serving on port %d", model_id, model.name, port)


internal/inference/lifecycle/serve.py

+    try:
+        return start_serving(new_model_id, registry, port)
+    except RuntimeError as e:
+        logger.error("Hot swap to %s failed: %s", new_model_id, e)


In general, to fix log injection when logging user-controlled values, sanitize or normalize those values before logging. For plain-text logs, the most important step is to remove or neutralize newline and carriage-return characters (and optionally other non-printable control characters) so that a single log call cannot create multiple apparent log entries or otherwise corrupt log structure.

For this specific case, the best minimal fix without changing behavior is to ensure that any model_id value logged by hot_swap is sanitized first. We can achieve this by introducing a small helper function within internal/inference/lifecycle/serve.py that strips \r and \n from a string (and safely handles None and non-strings), and then using that helper when passing new_model_id and old_model_id into log calls. This keeps the functional behavior of the system intact while ensuring that any untrusted input appearing in the logs cannot inject extra log lines.

Concretely:

Add a helper such as _sanitize_for_log(value: Optional[str]) -> str near the top of serve.py (after logger definition, for example). It should:

Return "" for None or non-string values.

Replace \r and \n with empty strings (or space).

Update logger.error("Hot swap to %s failed: %s", new_model_id, e) to use the sanitized value: logger.error("Hot swap to %s failed: %s", _sanitize_for_log(new_model_id), e).

Optionally, for completeness and consistency, also sanitize old_model_id when logging “Rolling back to %s”.

No new external dependencies are required; we only use standard Python string methods.

github-advanced-security bot found potential problems Mar 29, 2026

View reviewed changes

bussyjd closed this Mar 30, 2026

bussyjd deleted the feat/inference-lifecycle branch March 30, 2026 03:58

@@ -160,7 +160,8 @@
                     conn.execute(f"UPDATE models SET {set_clause} WHERE id = ?",
                                  list(updates.values()) + [model_id])
                     conn.commit()
-                    logger.info("Model %s: %s -> %s", model_id, current.status.value, new_status.value)
+                    safe_model_id = model_id.replace("\r", "").replace("\n", "")
+                    logger.info("Model %s: %s -> %s", safe_model_id, current.status.value, new_status.value)
                     return self.get_model(model_id)
                 def update_benchmark(self, model_id: str, eval_suite: str, score: float,

@@ -13,6 +13,20 @@
             logger = logging.getLogger(__name__)
+            def _sanitize_for_log(value: Optional[str]) -> Optional[str]:
+                """Sanitize a string for safe logging by removing control characters.
+                This helps prevent log injection via embedded newlines or other
+                non-printable characters while preserving the original value for
+                non-logging use.
+                """
+                if value is None:
+                    return None
+                # Remove ASCII control characters (0x00-0x1F and 0x7F)
+                return "".join(ch for ch in value if 32 <= ord(ch) < 127)
             LLAMA_SERVER_BINARY = Path.home() / "Development" / "llama-cpp-turboquant-cuda" / "build" / "bin" / "llama-server"
             SERVICE_NAME = "obol-inference"
             DEFAULT_PORT = 8080
@@ -194,7 +208,12 @@
                 # Update registry
                 registry.promote(model_id)
-                logger.info("Model %s (%s) now serving on port %d", model_id, model.name, port)
+                logger.info(
+                    "Model %s (%s) now serving on port %d",
+                    _sanitize_for_log(model_id),
+                    _sanitize_for_log(model.name),
+                    port,
+                )
                 return get_serving_status(registry)
@@ -244,9 +263,9 @@
                 try:
                     return start_serving(new_model_id, registry, port)
                 except RuntimeError as e:
-                    logger.error("Hot swap to %s failed: %s", new_model_id, e)
+                    logger.error("Hot swap to %s failed: %s", _sanitize_for_log(new_model_id), e)
                     if rollback_on_failure and old_model_id:
-                        logger.info("Rolling back to %s", old_model_id)
+                        logger.info("Rolling back to %s", _sanitize_for_log(old_model_id))
                         try:
                             return start_serving(old_model_id, registry, port)
                         except RuntimeError as rollback_err:

@@ -13,6 +13,17 @@
             logger = logging.getLogger(__name__)
+            def _sanitize_for_log(value: Optional[str]) -> str:
+                """Sanitize potentially untrusted values before logging.
+                Removes newline and carriage-return characters to prevent log injection.
+                """
+                if not isinstance(value, str):
+                    return ""
+                return value.replace("\r", "").replace("\n", "")
             LLAMA_SERVER_BINARY = Path.home() / "Development" / "llama-cpp-turboquant-cuda" / "build" / "bin" / "llama-server"
             SERVICE_NAME = "obol-inference"
             DEFAULT_PORT = 8080
@@ -244,9 +255,13 @@
                 try:
                     return start_serving(new_model_id, registry, port)
                 except RuntimeError as e:
-                    logger.error("Hot swap to %s failed: %s", new_model_id, e)
+                    logger.error(
+                        "Hot swap to %s failed: %s",
+                        _sanitize_for_log(new_model_id),
+                        e,
+                    )
                     if rollback_on_failure and old_model_id:
-                        logger.info("Rolling back to %s", old_model_id)
+                        logger.info("Rolling back to %s", _sanitize_for_log(old_model_id))
                         try:
                             return start_serving(old_model_id, registry, port)
                         except RuntimeError as rollback_err:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(inference): WIP inference lifecycle prototype — discover, validate, register, serve#302

feat(inference): WIP inference lifecycle prototype — discover, validate, register, serve#302
bussyjd wants to merge 1 commit intomainfrom
feat/inference-lifecycle

bussyjd commented Mar 29, 2026

Uh oh!

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bussyjd commented Mar 29, 2026

What's here

Context

Status

Uh oh!

Check failure

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Uh oh!

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Copilot Autofix

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant