feat(inference): WIP inference lifecycle prototype — discover, validate, register, serve#302
feat(inference): WIP inference lifecycle prototype — discover, validate, register, serve#302
Conversation
…te, register, serve Implements issue #300 prototype with Python modules: - registry.py: SQLite-backed model registry with state machine - hardware.py: GPU/system auto-detection, optimal ngl calculation - discover.py: x/LocalLLaMA signal sourcing via x-cli - validate.py: llama-bench + ToolCall-15 evaluation pipeline - serve.py: systemd service management with hot-swap + rollback - api.py: FastAPI routes for all lifecycle endpoints - tests: unit tests for registry, hardware, discovery, validation 1,942 lines total. All new files under internal/inference/lifecycle/. No existing Go files modified. Refs: #300
| ] | ||
| removed = len(candidates) - len(filtered) | ||
| if removed: | ||
| logger.info("Filtered out %d candidates exceeding %.1f GB", removed, max_size_gb) |
Check failure
Code scanning / CodeQL
Log Injection High
Copilot Autofix
AI 1 day ago
Copilot could not generate an autofix suggestion
Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.
| conn.execute(f"UPDATE models SET {set_clause} WHERE id = ?", | ||
| list(updates.values()) + [model_id]) | ||
| conn.commit() | ||
| logger.info("Model %s: %s -> %s", model_id, current.status.value, new_status.value) |
Check failure
Code scanning / CodeQL
Log Injection High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 1 day ago
In general, to fix log injection you should sanitize any user‑controlled strings before logging: strip or replace newline and carriage‑return characters (and optionally other control characters), and avoid logging raw request bodies or other large, untrusted text without normalization. For structured logs that end up as plain text, removing \r and \n is typically sufficient to prevent forging extra log lines.
For this specific case, the best minimal fix is to sanitize model_id before logging it in Registry.update_status. The model_id value can be arbitrarily controlled via the promote_model and retire_model endpoints, and is logged at line 163. We can defensively normalize it right before logging by replacing any \r and \n characters with safe placeholders (e.g., a space or empty string) in a local variable used only for logging. This keeps the database operations and functional behavior unchanged, because all real uses of model_id (SQL queries, state transitions) continue to use the original string; only the log message sees the sanitized version.
Concretely in internal/inference/lifecycle/registry.py, inside update_status, we will:
- Introduce a local variable
safe_model_idright before thelogger.infocall. - Set
safe_model_idto a sanitized version ofmodel_id, e.g.model_id.replace("\r", "").replace("\n", ""). - Change the
logger.infocall to logsafe_model_idinstead ofmodel_id.
No additional imports are needed; we rely only on standard string methods. This single change addresses all three CodeQL alert variants because they all trace to the same logging sink.
| @@ -160,7 +160,8 @@ | ||
| conn.execute(f"UPDATE models SET {set_clause} WHERE id = ?", | ||
| list(updates.values()) + [model_id]) | ||
| conn.commit() | ||
| logger.info("Model %s: %s -> %s", model_id, current.status.value, new_status.value) | ||
| safe_model_id = model_id.replace("\r", "").replace("\n", "") | ||
| logger.info("Model %s: %s -> %s", safe_model_id, current.status.value, new_status.value) | ||
| return self.get_model(model_id) | ||
|
|
||
| def update_benchmark(self, model_id: str, eval_suite: str, score: float, |
|
|
||
| # Update registry | ||
| registry.promote(model_id) | ||
| logger.info("Model %s (%s) now serving on port %d", model_id, model.name, port) |
Check failure
Code scanning / CodeQL
Log Injection High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 1 day ago
In general, to fix log injection issues, sanitize any user-controlled strings before logging them, especially removing or neutralizing newline and carriage-return characters (and optionally other control characters). When using structured/log-framework formatting (like %s in logger.info), keep that pattern but pass in a sanitized version of the untrusted value.
For this specific case, the best targeted fix is to sanitize model_id before it is logged in start_serving. Since we cannot assume how Registry constrains IDs and we do not want to change functional behavior, we will introduce a small helper function in serve.py that strips \r and \n (and, for completeness, any ASCII control characters) from strings used in logs. We then apply this helper when logging model IDs in start_serving and hot_swap. This preserves all existing functionality (the registry operations use the original model_id), while ensuring that the string sent to the logger cannot inject extra log lines.
Concretely:
- In
internal/inference/lifecycle/serve.py, define a helper_sanitize_for_log(value: str) -> strnear the top of the file. - Replace
logger.info("Model %s (%s) now serving on port %d", model_id, model.name, port)with the same log message but passing sanitized versions ofmodel_idandmodel.nameto be conservative. - Replace
logger.error("Hot swap to %s failed: %s", new_model_id, e)andlogger.info("Rolling back to %s", old_model_id)similarly, so that any logged model IDs are sanitized. - No changes are required in
api.pysince it does not log the tainted data directly.
No new imports are necessary; the helper can be implemented with basic string methods and a simple comprehension.
| @@ -13,6 +13,20 @@ | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| def _sanitize_for_log(value: Optional[str]) -> Optional[str]: | ||
| """Sanitize a string for safe logging by removing control characters. | ||
|
|
||
| This helps prevent log injection via embedded newlines or other | ||
| non-printable characters while preserving the original value for | ||
| non-logging use. | ||
| """ | ||
| if value is None: | ||
| return None | ||
| # Remove ASCII control characters (0x00-0x1F and 0x7F) | ||
| return "".join(ch for ch in value if 32 <= ord(ch) < 127) | ||
|
|
||
|
|
||
| LLAMA_SERVER_BINARY = Path.home() / "Development" / "llama-cpp-turboquant-cuda" / "build" / "bin" / "llama-server" | ||
| SERVICE_NAME = "obol-inference" | ||
| DEFAULT_PORT = 8080 | ||
| @@ -194,7 +208,12 @@ | ||
|
|
||
| # Update registry | ||
| registry.promote(model_id) | ||
| logger.info("Model %s (%s) now serving on port %d", model_id, model.name, port) | ||
| logger.info( | ||
| "Model %s (%s) now serving on port %d", | ||
| _sanitize_for_log(model_id), | ||
| _sanitize_for_log(model.name), | ||
| port, | ||
| ) | ||
|
|
||
| return get_serving_status(registry) | ||
|
|
||
| @@ -244,9 +263,9 @@ | ||
| try: | ||
| return start_serving(new_model_id, registry, port) | ||
| except RuntimeError as e: | ||
| logger.error("Hot swap to %s failed: %s", new_model_id, e) | ||
| logger.error("Hot swap to %s failed: %s", _sanitize_for_log(new_model_id), e) | ||
| if rollback_on_failure and old_model_id: | ||
| logger.info("Rolling back to %s", old_model_id) | ||
| logger.info("Rolling back to %s", _sanitize_for_log(old_model_id)) | ||
| try: | ||
| return start_serving(old_model_id, registry, port) | ||
| except RuntimeError as rollback_err: |
|
|
||
| # Update registry | ||
| registry.promote(model_id) | ||
| logger.info("Model %s (%s) now serving on port %d", model_id, model.name, port) |
Check failure
Code scanning / CodeQL
Log Injection High
Copilot Autofix
AI 1 day ago
Copilot could not generate an autofix suggestion
Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.
| try: | ||
| return start_serving(new_model_id, registry, port) | ||
| except RuntimeError as e: | ||
| logger.error("Hot swap to %s failed: %s", new_model_id, e) |
Check failure
Code scanning / CodeQL
Log Injection High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 1 day ago
In general, to fix log injection when logging user-controlled values, sanitize or normalize those values before logging. For plain-text logs, the most important step is to remove or neutralize newline and carriage-return characters (and optionally other non-printable control characters) so that a single log call cannot create multiple apparent log entries or otherwise corrupt log structure.
For this specific case, the best minimal fix without changing behavior is to ensure that any model_id value logged by hot_swap is sanitized first. We can achieve this by introducing a small helper function within internal/inference/lifecycle/serve.py that strips \r and \n from a string (and safely handles None and non-strings), and then using that helper when passing new_model_id and old_model_id into log calls. This keeps the functional behavior of the system intact while ensuring that any untrusted input appearing in the logs cannot inject extra log lines.
Concretely:
- Add a helper such as
_sanitize_for_log(value: Optional[str]) -> strnear the top ofserve.py(after logger definition, for example). It should:- Return
""forNoneor non-string values. - Replace
\rand\nwith empty strings (or space).
- Return
- Update
logger.error("Hot swap to %s failed: %s", new_model_id, e)to use the sanitized value:logger.error("Hot swap to %s failed: %s", _sanitize_for_log(new_model_id), e). - Optionally, for completeness and consistency, also sanitize
old_model_idwhen logging “Rolling back to %s”.
No new external dependencies are required; we only use standard Python string methods.
| @@ -13,6 +13,17 @@ | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| def _sanitize_for_log(value: Optional[str]) -> str: | ||
| """Sanitize potentially untrusted values before logging. | ||
|
|
||
| Removes newline and carriage-return characters to prevent log injection. | ||
| """ | ||
| if not isinstance(value, str): | ||
| return "" | ||
| return value.replace("\r", "").replace("\n", "") | ||
|
|
||
|
|
||
| LLAMA_SERVER_BINARY = Path.home() / "Development" / "llama-cpp-turboquant-cuda" / "build" / "bin" / "llama-server" | ||
| SERVICE_NAME = "obol-inference" | ||
| DEFAULT_PORT = 8080 | ||
| @@ -244,9 +255,13 @@ | ||
| try: | ||
| return start_serving(new_model_id, registry, port) | ||
| except RuntimeError as e: | ||
| logger.error("Hot swap to %s failed: %s", new_model_id, e) | ||
| logger.error( | ||
| "Hot swap to %s failed: %s", | ||
| _sanitize_for_log(new_model_id), | ||
| e, | ||
| ) | ||
| if rollback_on_failure and old_model_id: | ||
| logger.info("Rolling back to %s", old_model_id) | ||
| logger.info("Rolling back to %s", _sanitize_for_log(old_model_id)) | ||
| try: | ||
| return start_serving(old_model_id, registry, port) | ||
| except RuntimeError as rollback_err: |
Draft PR for issue #300 — Hermes Agent automated inference lifecycle.
What's here
1,942 lines of Python prototype across 8 files in
internal/inference/lifecycle/:Context
Built during prototyping session where we:
Status
WIP — needs review, integration with obol-stack ServiceOffer CRD, and x402 payment wiring from PR #288.
Refs: #300