Skip to content

feat(inference): WIP inference lifecycle prototype — discover, validate, register, serve#302

Closed
bussyjd wants to merge 1 commit intomainfrom
feat/inference-lifecycle
Closed

feat(inference): WIP inference lifecycle prototype — discover, validate, register, serve#302
bussyjd wants to merge 1 commit intomainfrom
feat/inference-lifecycle

Conversation

@bussyjd
Copy link
Copy Markdown
Collaborator

@bussyjd bussyjd commented Mar 29, 2026

Draft PR for issue #300 — Hermes Agent automated inference lifecycle.

What's here

1,942 lines of Python prototype across 8 files in internal/inference/lifecycle/:

File Lines Purpose
registry.py 253 SQLite model registry with state machine
hardware.py 188 GPU/system auto-detection, optimal ngl
discover.py 218 x/LocalLLaMA signal sourcing via x-cli
validate.py 347 llama-bench + ToolCall-15 eval pipeline
serve.py 316 systemd service mgmt, hot-swap + rollback
api.py 272 FastAPI routes for all lifecycle endpoints
test_inference_lifecycle.py 348 Unit tests for all modules

Context

Built during prototyping session where we:

  • Set up llama.cpp + TurboQuant on SilverMesh (AMD RX 6800 XT)
  • Benchmarked Qwen3.5-27B Q6_K → 15/15 on ToolCall-15
  • Validated x-cli for community signal sourcing
  • Tested llama-server as systemd service with auto-restart

Status

WIP — needs review, integration with obol-stack ServiceOffer CRD, and x402 payment wiring from PR #288.

Refs: #300

…te, register, serve

Implements issue #300 prototype with Python modules:
- registry.py: SQLite-backed model registry with state machine
- hardware.py: GPU/system auto-detection, optimal ngl calculation
- discover.py: x/LocalLLaMA signal sourcing via x-cli
- validate.py: llama-bench + ToolCall-15 evaluation pipeline
- serve.py: systemd service management with hot-swap + rollback
- api.py: FastAPI routes for all lifecycle endpoints
- tests: unit tests for registry, hardware, discovery, validation

1,942 lines total. All new files under internal/inference/lifecycle/.
No existing Go files modified.

Refs: #300
]
removed = len(candidates) - len(filtered)
if removed:
logger.info("Filtered out %d candidates exceeding %.1f GB", removed, max_size_gb)

Check failure

Code scanning / CodeQL

Log Injection High

This log entry depends on a
user-provided value
.

Copilot Autofix

AI 1 day ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.

conn.execute(f"UPDATE models SET {set_clause} WHERE id = ?",
list(updates.values()) + [model_id])
conn.commit()
logger.info("Model %s: %s -> %s", model_id, current.status.value, new_status.value)

Check failure

Code scanning / CodeQL

Log Injection High

This log entry depends on a
user-provided value
.
This log entry depends on a
user-provided value
.
This log entry depends on a
user-provided value
.

Copilot Autofix

AI 1 day ago

In general, to fix log injection you should sanitize any user‑controlled strings before logging: strip or replace newline and carriage‑return characters (and optionally other control characters), and avoid logging raw request bodies or other large, untrusted text without normalization. For structured logs that end up as plain text, removing \r and \n is typically sufficient to prevent forging extra log lines.

For this specific case, the best minimal fix is to sanitize model_id before logging it in Registry.update_status. The model_id value can be arbitrarily controlled via the promote_model and retire_model endpoints, and is logged at line 163. We can defensively normalize it right before logging by replacing any \r and \n characters with safe placeholders (e.g., a space or empty string) in a local variable used only for logging. This keeps the database operations and functional behavior unchanged, because all real uses of model_id (SQL queries, state transitions) continue to use the original string; only the log message sees the sanitized version.

Concretely in internal/inference/lifecycle/registry.py, inside update_status, we will:

  • Introduce a local variable safe_model_id right before the logger.info call.
  • Set safe_model_id to a sanitized version of model_id, e.g. model_id.replace("\r", "").replace("\n", "").
  • Change the logger.info call to log safe_model_id instead of model_id.

No additional imports are needed; we rely only on standard string methods. This single change addresses all three CodeQL alert variants because they all trace to the same logging sink.


Suggested changeset 1
internal/inference/lifecycle/registry.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/internal/inference/lifecycle/registry.py b/internal/inference/lifecycle/registry.py
--- a/internal/inference/lifecycle/registry.py
+++ b/internal/inference/lifecycle/registry.py
@@ -160,7 +160,8 @@
         conn.execute(f"UPDATE models SET {set_clause} WHERE id = ?",
                      list(updates.values()) + [model_id])
         conn.commit()
-        logger.info("Model %s: %s -> %s", model_id, current.status.value, new_status.value)
+        safe_model_id = model_id.replace("\r", "").replace("\n", "")
+        logger.info("Model %s: %s -> %s", safe_model_id, current.status.value, new_status.value)
         return self.get_model(model_id)
 
     def update_benchmark(self, model_id: str, eval_suite: str, score: float,
EOF
@@ -160,7 +160,8 @@
conn.execute(f"UPDATE models SET {set_clause} WHERE id = ?",
list(updates.values()) + [model_id])
conn.commit()
logger.info("Model %s: %s -> %s", model_id, current.status.value, new_status.value)
safe_model_id = model_id.replace("\r", "").replace("\n", "")
logger.info("Model %s: %s -> %s", safe_model_id, current.status.value, new_status.value)
return self.get_model(model_id)

def update_benchmark(self, model_id: str, eval_suite: str, score: float,
Copilot is powered by AI and may make mistakes. Always verify output.

# Update registry
registry.promote(model_id)
logger.info("Model %s (%s) now serving on port %d", model_id, model.name, port)

Check failure

Code scanning / CodeQL

Log Injection High

This log entry depends on a
user-provided value
.

Copilot Autofix

AI 1 day ago

In general, to fix log injection issues, sanitize any user-controlled strings before logging them, especially removing or neutralizing newline and carriage-return characters (and optionally other control characters). When using structured/log-framework formatting (like %s in logger.info), keep that pattern but pass in a sanitized version of the untrusted value.

For this specific case, the best targeted fix is to sanitize model_id before it is logged in start_serving. Since we cannot assume how Registry constrains IDs and we do not want to change functional behavior, we will introduce a small helper function in serve.py that strips \r and \n (and, for completeness, any ASCII control characters) from strings used in logs. We then apply this helper when logging model IDs in start_serving and hot_swap. This preserves all existing functionality (the registry operations use the original model_id), while ensuring that the string sent to the logger cannot inject extra log lines.

Concretely:

  • In internal/inference/lifecycle/serve.py, define a helper _sanitize_for_log(value: str) -> str near the top of the file.
  • Replace logger.info("Model %s (%s) now serving on port %d", model_id, model.name, port) with the same log message but passing sanitized versions of model_id and model.name to be conservative.
  • Replace logger.error("Hot swap to %s failed: %s", new_model_id, e) and logger.info("Rolling back to %s", old_model_id) similarly, so that any logged model IDs are sanitized.
  • No changes are required in api.py since it does not log the tainted data directly.

No new imports are necessary; the helper can be implemented with basic string methods and a simple comprehension.

Suggested changeset 1
internal/inference/lifecycle/serve.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/internal/inference/lifecycle/serve.py b/internal/inference/lifecycle/serve.py
--- a/internal/inference/lifecycle/serve.py
+++ b/internal/inference/lifecycle/serve.py
@@ -13,6 +13,20 @@
 
 logger = logging.getLogger(__name__)
 
+
+def _sanitize_for_log(value: Optional[str]) -> Optional[str]:
+    """Sanitize a string for safe logging by removing control characters.
+
+    This helps prevent log injection via embedded newlines or other
+    non-printable characters while preserving the original value for
+    non-logging use.
+    """
+    if value is None:
+        return None
+    # Remove ASCII control characters (0x00-0x1F and 0x7F)
+    return "".join(ch for ch in value if 32 <= ord(ch) < 127)
+
+
 LLAMA_SERVER_BINARY = Path.home() / "Development" / "llama-cpp-turboquant-cuda" / "build" / "bin" / "llama-server"
 SERVICE_NAME = "obol-inference"
 DEFAULT_PORT = 8080
@@ -194,7 +208,12 @@
 
     # Update registry
     registry.promote(model_id)
-    logger.info("Model %s (%s) now serving on port %d", model_id, model.name, port)
+    logger.info(
+        "Model %s (%s) now serving on port %d",
+        _sanitize_for_log(model_id),
+        _sanitize_for_log(model.name),
+        port,
+    )
 
     return get_serving_status(registry)
 
@@ -244,9 +263,9 @@
     try:
         return start_serving(new_model_id, registry, port)
     except RuntimeError as e:
-        logger.error("Hot swap to %s failed: %s", new_model_id, e)
+        logger.error("Hot swap to %s failed: %s", _sanitize_for_log(new_model_id), e)
         if rollback_on_failure and old_model_id:
-            logger.info("Rolling back to %s", old_model_id)
+            logger.info("Rolling back to %s", _sanitize_for_log(old_model_id))
             try:
                 return start_serving(old_model_id, registry, port)
             except RuntimeError as rollback_err:
EOF
@@ -13,6 +13,20 @@

logger = logging.getLogger(__name__)


def _sanitize_for_log(value: Optional[str]) -> Optional[str]:
"""Sanitize a string for safe logging by removing control characters.

This helps prevent log injection via embedded newlines or other
non-printable characters while preserving the original value for
non-logging use.
"""
if value is None:
return None
# Remove ASCII control characters (0x00-0x1F and 0x7F)
return "".join(ch for ch in value if 32 <= ord(ch) < 127)


LLAMA_SERVER_BINARY = Path.home() / "Development" / "llama-cpp-turboquant-cuda" / "build" / "bin" / "llama-server"
SERVICE_NAME = "obol-inference"
DEFAULT_PORT = 8080
@@ -194,7 +208,12 @@

# Update registry
registry.promote(model_id)
logger.info("Model %s (%s) now serving on port %d", model_id, model.name, port)
logger.info(
"Model %s (%s) now serving on port %d",
_sanitize_for_log(model_id),
_sanitize_for_log(model.name),
port,
)

return get_serving_status(registry)

@@ -244,9 +263,9 @@
try:
return start_serving(new_model_id, registry, port)
except RuntimeError as e:
logger.error("Hot swap to %s failed: %s", new_model_id, e)
logger.error("Hot swap to %s failed: %s", _sanitize_for_log(new_model_id), e)
if rollback_on_failure and old_model_id:
logger.info("Rolling back to %s", old_model_id)
logger.info("Rolling back to %s", _sanitize_for_log(old_model_id))
try:
return start_serving(old_model_id, registry, port)
except RuntimeError as rollback_err:
Copilot is powered by AI and may make mistakes. Always verify output.

# Update registry
registry.promote(model_id)
logger.info("Model %s (%s) now serving on port %d", model_id, model.name, port)

Check failure

Code scanning / CodeQL

Log Injection High

This log entry depends on a
user-provided value
.

Copilot Autofix

AI 1 day ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.

try:
return start_serving(new_model_id, registry, port)
except RuntimeError as e:
logger.error("Hot swap to %s failed: %s", new_model_id, e)

Check failure

Code scanning / CodeQL

Log Injection High

This log entry depends on a
user-provided value
.

Copilot Autofix

AI 1 day ago

In general, to fix log injection when logging user-controlled values, sanitize or normalize those values before logging. For plain-text logs, the most important step is to remove or neutralize newline and carriage-return characters (and optionally other non-printable control characters) so that a single log call cannot create multiple apparent log entries or otherwise corrupt log structure.

For this specific case, the best minimal fix without changing behavior is to ensure that any model_id value logged by hot_swap is sanitized first. We can achieve this by introducing a small helper function within internal/inference/lifecycle/serve.py that strips \r and \n from a string (and safely handles None and non-strings), and then using that helper when passing new_model_id and old_model_id into log calls. This keeps the functional behavior of the system intact while ensuring that any untrusted input appearing in the logs cannot inject extra log lines.

Concretely:

  • Add a helper such as _sanitize_for_log(value: Optional[str]) -> str near the top of serve.py (after logger definition, for example). It should:
    • Return "" for None or non-string values.
    • Replace \r and \n with empty strings (or space).
  • Update logger.error("Hot swap to %s failed: %s", new_model_id, e) to use the sanitized value: logger.error("Hot swap to %s failed: %s", _sanitize_for_log(new_model_id), e).
  • Optionally, for completeness and consistency, also sanitize old_model_id when logging “Rolling back to %s”.

No new external dependencies are required; we only use standard Python string methods.


Suggested changeset 1
internal/inference/lifecycle/serve.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/internal/inference/lifecycle/serve.py b/internal/inference/lifecycle/serve.py
--- a/internal/inference/lifecycle/serve.py
+++ b/internal/inference/lifecycle/serve.py
@@ -13,6 +13,17 @@
 
 logger = logging.getLogger(__name__)
 
+
+def _sanitize_for_log(value: Optional[str]) -> str:
+    """Sanitize potentially untrusted values before logging.
+
+    Removes newline and carriage-return characters to prevent log injection.
+    """
+    if not isinstance(value, str):
+        return ""
+    return value.replace("\r", "").replace("\n", "")
+
+
 LLAMA_SERVER_BINARY = Path.home() / "Development" / "llama-cpp-turboquant-cuda" / "build" / "bin" / "llama-server"
 SERVICE_NAME = "obol-inference"
 DEFAULT_PORT = 8080
@@ -244,9 +255,13 @@
     try:
         return start_serving(new_model_id, registry, port)
     except RuntimeError as e:
-        logger.error("Hot swap to %s failed: %s", new_model_id, e)
+        logger.error(
+            "Hot swap to %s failed: %s",
+            _sanitize_for_log(new_model_id),
+            e,
+        )
         if rollback_on_failure and old_model_id:
-            logger.info("Rolling back to %s", old_model_id)
+            logger.info("Rolling back to %s", _sanitize_for_log(old_model_id))
             try:
                 return start_serving(old_model_id, registry, port)
             except RuntimeError as rollback_err:
EOF
@@ -13,6 +13,17 @@

logger = logging.getLogger(__name__)


def _sanitize_for_log(value: Optional[str]) -> str:
"""Sanitize potentially untrusted values before logging.

Removes newline and carriage-return characters to prevent log injection.
"""
if not isinstance(value, str):
return ""
return value.replace("\r", "").replace("\n", "")


LLAMA_SERVER_BINARY = Path.home() / "Development" / "llama-cpp-turboquant-cuda" / "build" / "bin" / "llama-server"
SERVICE_NAME = "obol-inference"
DEFAULT_PORT = 8080
@@ -244,9 +255,13 @@
try:
return start_serving(new_model_id, registry, port)
except RuntimeError as e:
logger.error("Hot swap to %s failed: %s", new_model_id, e)
logger.error(
"Hot swap to %s failed: %s",
_sanitize_for_log(new_model_id),
e,
)
if rollback_on_failure and old_model_id:
logger.info("Rolling back to %s", old_model_id)
logger.info("Rolling back to %s", _sanitize_for_log(old_model_id))
try:
return start_serving(old_model_id, registry, port)
except RuntimeError as rollback_err:
Copilot is powered by AI and may make mistakes. Always verify output.
@bussyjd bussyjd closed this Mar 30, 2026
@bussyjd bussyjd deleted the feat/inference-lifecycle branch March 30, 2026 03:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant