Skip to content

Commit b68ca07

Browse files
authored
Merge pull request #2 from hotdata-dev/feat/workspace-selection-contract
Define runtime workspace selection contract
2 parents 9bb0ade + 9beb00a commit b68ca07

11 files changed

Lines changed: 563 additions & 28 deletions

File tree

CONTRACT.md

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
# hotdata-runtime Contract
2+
3+
`hotdata-runtime` is the framework-agnostic runtime contract for Hotdata integrations.
4+
5+
## Scope
6+
7+
This package provides shared primitives for:
8+
9+
- Environment and workspace resolution
10+
- Query execution and polling
11+
- Normalized tabular result handling
12+
- Basic workspace health checks
13+
14+
## Public Runtime Contract
15+
16+
The supported import surface is:
17+
18+
- `HotdataClient`
19+
- `QueryResult`
20+
- `from_env`
21+
- `workspace_health_lines`
22+
- `default_api_key`
23+
- `default_host`
24+
- `default_session_id`
25+
- `explicit_workspace_id`
26+
- `list_workspaces`
27+
- `normalize_host`
28+
- `pick_workspace`
29+
- `resolve_workspace_selection`
30+
- `ResultSummary`
31+
- `RunHistoryItem`
32+
- `WorkspaceSelection`
33+
34+
Adapters should import from `hotdata_runtime` and treat this surface as the stable API.
35+
36+
## Semantic Guarantees
37+
38+
### `HotdataClient`
39+
40+
- Represents runtime context: API key, host, workspace, optional session.
41+
- `from_env()` resolves runtime context from env vars and selected workspace.
42+
- `execute_sql(sql)` returns `QueryResult` or raises `RuntimeError`/`TimeoutError`.
43+
- `get_result(result_id)` returns a ready `QueryResult` and waits for readiness when needed.
44+
- `connections()` returns the connections API wrapper for adapter UI/status features.
45+
- `query_runs()` returns the query-runs API wrapper for adapter history views.
46+
- `results()` returns the results API wrapper for adapter result pickers.
47+
- `list_recent_results(...)` returns normalized `ResultSummary` entries.
48+
- `list_run_history(limit=...)` returns normalized `RunHistoryItem` entries.
49+
- `list_qualified_table_names(...)` returns sorted fully qualified table names.
50+
- `columns_for_qualified(qualified, connection_id=...)` resolves table columns, and
51+
adapters should pass `connection_id` when known.
52+
53+
### `QueryResult`
54+
55+
- Canonical tabular result model with `columns`, `rows`, and `row_count`.
56+
- Carries server identifiers and execution metadata when available.
57+
- `to_pandas()` converts to a DataFrame with stable column ordering.
58+
- `to_records(max_rows=...)` returns row dicts keyed by column names.
59+
- `metadata_dict()` returns normalized result metadata for adapter rendering.
60+
61+
### Env Resolution
62+
63+
- `default_api_key()` reads `HOTDATA_API_KEY`.
64+
- `default_host()` reads `HOTDATA_API_URL` (default: `https://api.hotdata.dev`) and normalizes it.
65+
- `default_session_id()` reads `HOTDATA_SANDBOX`.
66+
- `explicit_workspace_id()` reads `HOTDATA_WORKSPACE` (workspace public id).
67+
- `pick_workspace()` prefers explicit env workspace, then active workspace, then first workspace.
68+
- `resolve_workspace_selection()` is the canonical workspace selection algorithm. It returns `WorkspaceSelection` with selected workspace id, selection source, and discovered workspaces when auto-selected.
69+
70+
## Adapter Responsibilities
71+
72+
Framework packages (Jupyter, Marimo, LangChain, LangGraph, LlamaIndex, Streamlit) own:
73+
74+
- Framework-native lifecycle and state management
75+
- Rendering/UI concerns
76+
- Tool/agent wrappers and callback integration
77+
78+
They should not duplicate runtime env/workspace/query semantics.
79+
80+
## Runtime Non-Goals
81+
82+
`hotdata-runtime` does not define framework UI primitives and does not require framework dependencies.
83+
84+
## Versioning Policy
85+
86+
- Backward-incompatible contract changes require a major version bump.
87+
- Additive contract changes are minor versions.
88+
- Bug fixes that preserve contract semantics are patch versions.
89+
90+
## Enforcement
91+
92+
Contract stability is enforced by tests that verify the public export surface and key behavioral invariants.

README.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,32 @@
22

33
Shared runtime primitives for Hotdata integrations: workspace/session semantics, execution context, query state, run history, and replayable result handles. Framework packages (Marimo, Jupyter, Streamlit, LangGraph) depend on this package.
44

5+
Runtime boundary and guarantees are defined in `CONTRACT.md`.
6+
7+
## Features
8+
9+
- **Environment-driven client setup** — create clients from `HOTDATA_API_KEY`, optional `HOTDATA_API_URL`, `HOTDATA_WORKSPACE`, and `HOTDATA_SANDBOX`.
10+
- **Workspace resolution** — choose an explicit workspace from env, otherwise discover workspaces and select the active workspace or first available workspace.
11+
- **Sandbox/session propagation** — pass sandbox session context through the SDK via `X-Session-Id`.
12+
- **HTTP resilience** — configure SDK retries for transient connection failures and retry SQL execution on stale pooled sockets.
13+
- **SQL execution helper** — run SQL through `POST /v1/query`, poll async query runs when needed, and return a `QueryResult`.
14+
- **Result utilities** — convert query results to records, pandas DataFrames, or metadata dictionaries for adapter display layers.
15+
- **History helpers** — list recent results and query run history with normalized dataclasses.
16+
- **Health helpers** — build compact API/workspace health summaries for UI integrations.
17+
518
Install:
619

720
```bash
821
uv pip install hotdata-runtime
922
# or: pip install hotdata-runtime
1023
```
1124

25+
Example:
26+
27+
```bash
28+
python examples/basic_usage.py
29+
```
30+
1231
Development (uses **uv**; creates `.venv/` in this repo):
1332

1433
```bash

examples/basic_usage.py

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
"""Basic hotdata-runtime usage."""
2+
3+
from hotdata_runtime import from_env
4+
5+
6+
def main() -> None:
7+
client = from_env()
8+
result = client.execute_sql("SELECT 1 AS ok")
9+
10+
print("result metadata:", result.metadata_dict())
11+
print("records:", result.to_records(max_rows=5))
12+
13+
print("recent results:")
14+
for item in client.list_recent_results(limit=5, offset=0):
15+
print(item.to_dict())
16+
17+
print("run history:")
18+
for item in client.list_run_history(limit=5):
19+
print(item.to_dict())
20+
21+
client.close()
22+
23+
24+
if __name__ == "__main__":
25+
main()

hotdata_runtime/__init__.py

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,12 @@
22

33
from importlib.metadata import PackageNotFoundError, version
44

5-
from hotdata_runtime.client import HotdataClient, from_env
5+
from hotdata_runtime.client import (
6+
HotdataClient,
7+
ResultSummary,
8+
RunHistoryItem,
9+
from_env,
10+
)
611
from hotdata_runtime.env import (
712
default_api_key,
813
default_host,
@@ -11,6 +16,8 @@
1116
list_workspaces,
1217
normalize_host,
1318
pick_workspace,
19+
resolve_workspace_selection,
20+
WorkspaceSelection,
1421
)
1522
from hotdata_runtime.health import workspace_health_lines
1623
from hotdata_runtime.result import QueryResult
@@ -33,4 +40,8 @@
3340
"list_workspaces",
3441
"normalize_host",
3542
"pick_workspace",
43+
"resolve_workspace_selection",
44+
"ResultSummary",
45+
"RunHistoryItem",
46+
"WorkspaceSelection",
3647
]

hotdata_runtime/client.py

Lines changed: 102 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,12 @@
11
from __future__ import annotations
22

3+
from dataclasses import asdict, dataclass
34
import time
45
from typing import Any, Iterator
56

7+
from urllib3.exceptions import HTTPError as Urllib3HTTPError
8+
from urllib3.exceptions import ProtocolError
9+
610
from hotdata import ApiClient, Configuration
711
from hotdata.api.connections_api import ConnectionsApi
812
from hotdata.api.information_schema_api import InformationSchemaApi
@@ -22,9 +26,33 @@
2226
normalize_host,
2327
pick_workspace,
2428
)
29+
from hotdata_runtime.http import default_http_retries
2530
from hotdata_runtime.result import QueryResult
2631

2732
_TERMINAL = frozenset({"succeeded", "failed", "cancelled"})
33+
_RESULT_FAILURE = frozenset({"failed", "cancelled"})
34+
35+
36+
@dataclass(frozen=True)
37+
class ResultSummary:
38+
result_id: str
39+
status: str
40+
created_at: str | None
41+
42+
def to_dict(self) -> dict[str, Any]:
43+
return asdict(self)
44+
45+
46+
@dataclass(frozen=True)
47+
class RunHistoryItem:
48+
query_run_id: str
49+
status: str
50+
created_at: str | None
51+
execution_time_ms: int | None
52+
result_id: str | None
53+
54+
def to_dict(self) -> dict[str, Any]:
55+
return asdict(self)
2856

2957

3058
class HotdataClient:
@@ -47,16 +75,15 @@ def __init__(
4775
api_key=api_key,
4876
workspace_id=workspace_id,
4977
session_id=session_id,
78+
retries=default_http_retries(),
5079
)
5180
self._api = ApiClient(self._config)
5281

5382
@classmethod
5483
def from_env(cls) -> HotdataClient:
5584
api_key = default_api_key()
5685
if not api_key:
57-
raise RuntimeError(
58-
"HOTDATA_API_KEY or HOTDATA_TOKEN must be set."
59-
)
86+
raise RuntimeError("HOTDATA_API_KEY must be set.")
6087
host = default_host()
6188
session = default_session_id()
6289
workspace_id = pick_workspace(api_key, host, session)
@@ -108,6 +135,39 @@ def query_runs(self) -> QueryRunsApi:
108135
def results(self) -> ResultsApi:
109136
return self._results_api()
110137

138+
def list_recent_results(
139+
self,
140+
*,
141+
limit: int = 50,
142+
offset: int = 0,
143+
) -> list[ResultSummary]:
144+
listing = self.results().list_results(limit=limit, offset=offset)
145+
return [
146+
ResultSummary(
147+
result_id=r.id,
148+
status=r.status,
149+
created_at=r.created_at,
150+
)
151+
for r in listing.results
152+
]
153+
154+
def list_run_history(
155+
self,
156+
*,
157+
limit: int = 20,
158+
) -> list[RunHistoryItem]:
159+
listing = self.query_runs().list_query_runs(limit=limit)
160+
return [
161+
RunHistoryItem(
162+
query_run_id=r.id,
163+
status=r.status,
164+
created_at=r.created_at,
165+
execution_time_ms=r.execution_time_ms,
166+
result_id=r.result_id,
167+
)
168+
for r in listing.query_runs
169+
]
170+
111171
def iter_tables(
112172
self,
113173
*,
@@ -143,9 +203,26 @@ def list_qualified_table_names(
143203

144204
def connection_id_by_name(self) -> dict[str, str]:
145205
listing = self.connections().list_connections()
146-
return {c.name: c.id for c in listing.connections}
206+
id_map: dict[str, str] = {}
207+
duplicate_names: set[str] = set()
208+
for c in listing.connections:
209+
if c.name in id_map and id_map[c.name] != c.id:
210+
duplicate_names.add(c.name)
211+
id_map[c.name] = c.id
212+
if duplicate_names:
213+
names = ", ".join(sorted(duplicate_names))
214+
raise RuntimeError(
215+
f"Duplicate connection names found: {names}. "
216+
"Use an explicit connection_id."
217+
)
218+
return id_map
147219

148-
def columns_for_qualified(self, qualified: str) -> list[TableInfo]:
220+
def columns_for_qualified(
221+
self,
222+
qualified: str,
223+
*,
224+
connection_id: str | None = None,
225+
) -> list[TableInfo]:
149226
parts = qualified.split(".")
150227
if len(parts) < 3:
151228
raise ValueError(
@@ -156,10 +233,12 @@ def columns_for_qualified(self, qualified: str) -> list[TableInfo]:
156233
parts[1],
157234
".".join(parts[2:]),
158235
)
159-
id_map = self.connection_id_by_name()
160-
conn_id = id_map.get(conn_name)
161-
if not conn_id:
162-
raise KeyError(f"Unknown connection {conn_name!r}")
236+
conn_id = connection_id
237+
if conn_id is None:
238+
id_map = self.connection_id_by_name()
239+
conn_id = id_map.get(conn_name)
240+
if not conn_id:
241+
raise KeyError(f"Unknown connection {conn_name!r}")
163242
resp = self._information_schema().information_schema(
164243
connection_id=conn_id,
165244
var_schema=schema_name,
@@ -206,9 +285,9 @@ def _wait_result_ready(
206285
last = results.get_result(result_id)
207286
if last.status == "ready":
208287
return last
209-
if last.status == "failed":
288+
if last.status in _RESULT_FAILURE:
210289
raise RuntimeError(
211-
last.error_message or "Result persistence failed"
290+
last.error_message or f"Result {last.status}"
212291
)
213292
time.sleep(interval_s)
214293
raise TimeoutError(
@@ -217,6 +296,18 @@ def _wait_result_ready(
217296
)
218297

219298
def execute_sql(self, sql: str) -> QueryResult:
299+
last_err: BaseException | None = None
300+
for attempt in range(3):
301+
try:
302+
return self._execute_sql_once(sql)
303+
except (ProtocolError, ConnectionResetError, Urllib3HTTPError) as e:
304+
last_err = e
305+
if attempt == 2:
306+
raise
307+
time.sleep(0.2 * (2**attempt))
308+
raise last_err # pragma: no cover
309+
310+
def _execute_sql_once(self, sql: str) -> QueryResult:
220311
q = self._query_api()
221312
try:
222313
raw = q.query(QueryRequest(sql=sql))

0 commit comments

Comments
 (0)