hotdata-dev · eddietejeda · May 7, 2026 · May 7, 2026 · May 7, 2026 · May 7, 2026
diff --git a/README.md b/README.md
@@ -1,163 +1,115 @@
 # ibis-hotdata
 
-Experimental [Ibis](https://ibis-project.org/) backend for [Hotdata](https://www.hotdata.dev/docs/api-reference)—federated, Postgres-compatible SQL executed over HTTPS.
+Experimental [Ibis](https://ibis-project.org/) backend for [Hotdata](https://www.hotdata.dev/docs/api-reference): compile expressions with Ibis, run federated SQL over the Hotdata API. REST calls use the official **[hotdata](https://github.com/hotdata-dev/sdk-python)** Python SDK. Repo examples use **httpx** (listed under the **dev** dependency group).
 
-Hotdata exposes `POST /v1/query`, optional asynchronous execution (`202` + `GET /v1/query-runs/{id}` + `GET /v1/results/{id}`), and catalog metadata via `GET /v1/information_schema`. This package forwards compiled Ibis SQL through those endpoints.
+**Requirements:** Python 3.10+, **ibis-framework** 10.x, **hotdata** ≥0.1.
 
 ## Install
 
-**From PyPI** (pick your installer):
-
 ```bash
 uv pip install ibis-hotdata
-# or
-python -m pip install ibis-hotdata
+# or: python -m pip install ibis-hotdata
 ```
 
-Use Python **3.10+**. This package pins **`ibis-framework>=10,<11`** to match the Ibis major line.
-
 ## Connect
 
+Programmatic API:
+
 ```python
 import ibis
 
 con = ibis.hotdata.connect(
     api_url="https://api.hotdata.dev",
     token="YOUR_API_TOKEN",
     workspace_id="ws_…",
-    session_id=None,  # optional sandbox: X-Session-Id
+    session_id=None,       # optional: X-Session-Id (sandbox)
     verify_ssl=True,
     timeout=120.0,
-    default_connection=None,  # Hotdata connection id (Ibis “catalog”); see below
-    default_schema=None,        # remote schema name (Ibis “database”)
-    prefer_async=False,         # set True to prefer async query submission
+    default_connection=None,  # Hotdata connection id → Ibis catalog
+    default_schema=None,      # remote schema → Ibis database
+    prefer_async=False,
 )
 ```
 
-### URL form
+URL style (token may live in the query string or the URL “password” segment):
 
 ```python
 con = ibis.connect(
     "hotdata://api.hotdata.dev/?token=…&workspace_id=ws_…&verify_ssl=true"
 )
 ```
 
-The host becomes `https://{host}` (plus any path on the URL). You may place the token in the password segment (`hotdata://x:TOKEN@host/…`) instead of the query string.
-
-After `pip install`, both `ibis.hotdata.connect(...)` and `ibis.connect("hotdata://…")` resolve to this backend via the `ibis.backends` entry point.
-
-## Headers and sessions
-
-Per the [Hotdata API](https://www.hotdata.dev/docs/api-reference), the client sends:
-
-- `Authorization: Bearer <token>`
-- `X-Workspace-Id: <workspace_public_id>`
-- optionally `X-Session-Id: <sandbox_public_id>` when `session_id` is set.
-
-## Ibis identifiers vs Hotdata hierarchy
-
-Following Ibis terminology ([catalog → database → table](https://ibis-project.org/concepts/backend-table-hierarchy.qmd)), this backend maps:
-
-| Ibis surface | Hotdata meaning |
-|-------------|----------------|
-| **Catalog** | Connection **id** from `GET /v1/connections` (same identifier as `connection` on `information_schema` rows). |
-| **Database** | Remote **schema name** surfaced by Hotdata. |
-| **Table name** | Remote table name. |
-
-Typical federated references in SQL are `connection.schema.table` (quoted as needed):
-
-```python
-orders = con.table("orders", database=("conn_abc", "public"))
-```
-
-If the workspace exposes **exactly one** connection and **one** schema discovered for it, defaults are inferred; otherwise provide `default_connection` / `default_schema` when connecting.
-
-## SQL dialect and compilation
-
-The backend reuses Ibis’s **PostgreSQL SQLGlot compiler** (`postgres` dialect) so expressions compile to Postgres-oriented SQL aligned with Hotdata’s documented Postgres-style surface. Operational SQL details and federation edge cases belong in the [Hotdata SQL docs](https://www.hotdata.dev/docs/sql)—this client does not re-validate server capabilities.
-
-## Query execution and async
-
-- By default queries use synchronous `POST /v1/query` with `"async": false`.
-- With `prefer_async=True`, requests use `"async": true`. The HTTP client honors `202` by polling **`GET /v1/query-runs/{id}`** until `succeeded`, then **`GET /v1/results/{id}`** until tabular payload is available.
-- You can tune `poll_interval_s` and `poll_timeout_s` on `connect()`.
-
-## Types and result materialization
-
-- **Known tables:** column types come from `information_schema` when `include_columns=true` and are parsed with the same `PostgresType` mapper Ibis uses for PostgreSQL, with graceful fallback to `string`.
-- **`con.sql(...)`:**
-  inferred via `SELECT * FROM (<your query>) AS ibis_hotdata_preview LIMIT 1`, using HTTP `columns`/`nullable` and the first JSON row shape for coarse inference (Decimals from JSON rarely round-trip cleanly; timestamps may appear as ISO strings unless the API returns richer metadata; nested structures map toward `JSON` / `Array(JSON)`).
+**Mapping:** Ibis **catalog** = Hotdata connection id; **database** = remote schema; **table** = table name. SQL references look like `connection.schema.table`. With a single connection and schema, defaults are inferred; otherwise set `default_connection` / `default_schema` or qualify `con.table(..., database=(conn_id, schema))`.
 
-Results are fetched into **pandas** by default (`execute`), matching core SQL backends. PyArrow batches follow Ibis’s `to_pyarrow` / `to_pyarrow_batches` path over the same row materialization.
+**Execution:** SQL is compiled with Ibis’s **Postgres** SQLGlot compiler. The client uses `POST /v1/query`; with `prefer_async=True` it follows `202` and polls query-run and result endpoints until rows are ready. Tuning: `poll_interval_s`, `poll_timeout_s` on `connect()`.
 
-## Out of scope (v1)
+**Types:** Typed tables come from Hotdata’s information schema. `con.sql(...)` types are inferred from a small preview query; see [Hotdata SQL](https://www.hotdata.dev/docs/sql) for server behavior.
 
-Table creation/DML helpers, uploads, embeddings, indexes, dataset lifecycle—these remain unimplemented unless you drive them explicitly with `.sql(...)`.
+**Not in v1:** Ibis `create_table`, embeddings, indexes. **Uploads:** use `upload_file` + `create_dataset_from_upload` on the connection object (or raw SQL); query datasets as `datasets.<schema>.<table>` per Hotdata.
 
 ## Development
 
-This repo uses **[uv](https://docs.astral.sh/uv/)** for environments and **`uv.lock`**.
-
 ```bash
-uv sync               # editable project + dev group (pytest, pytest-httpserver, ruff)
+uv sync --group dev   # pytest, ruff, httpx (for examples)
 uv run pytest
 uv run ruff check src tests examples
 ```
 
-Optional Python pin:
+Lockfile CI: `uv sync --locked --group dev && uv run pytest`.
 
-```bash
-uv python pin 3.12
-uv sync
-```
+## TPC-H for the examples
+
+Examples assume something like **`tpch.tpch_sf1.customer`**. Provision TPC-H in your workspace (commonly a **DuckDB** connection, then DuckDB’s `tpch` extension and `CALL dbgen(sf = 1)` — see [DuckDB TPC-H](https://www.duckdb.org/docs/current/core_extensions/tpch.html) and [Hotdata Quick Start](https://www.hotdata.dev/docs/quick-start)). If your data lives under `main` instead, pass `--default-schema` / `--default-connection` or set `HOTDATA_DEFAULT_*` (see `examples/_helpers.py`).
+
+## Examples
 
-CI-oriented checks:
+Needs `HOTDATA_TOKEN` and `HOTDATA_WORKSPACE_ID`.
 
 ```bash
-uv sync --locked      # fail if uv.lock is out of date relative to pyproject.toml
-uv run pytest
+uv sync --group dev
+export HOTDATA_TOKEN=…
+export HOTDATA_WORKSPACE_ID=…
+uv run python examples/01_catalog_introspection.py
+uv run python examples/02_execute_sql.py 'SELECT COUNT(*) AS n FROM tpch.tpch_sf1.customer'
+uv run python examples/03_connect_via_url.py
+uv run python examples/04_ibis_table_workflows.py
 ```
 
-Without uv, use `pip install -e .` and install dev tools separately (`pytest`, `pytest-httpserver`, `ruff`).
+### Ibis tables → pandas DataFrames
 
-## TPC-H in Hotdata
+Calling **`.execute()`** on a table expression runs the compiled SQL on Hotdata and returns a **pandas** `DataFrame` (Ibis’s default for this backend).
 
-Hotdata does not ship TPC-H as a single “upload this file” dataset. You expose the benchmark tables through a **connection** in your workspace, then query them like any other federated tables. See [Quick Start](https://www.hotdata.dev/docs/quick-start) (workspaces and connections) and [Data Sources](https://www.hotdata.dev/docs/data-sources) (supported engines).
+Hotdata’s SQL often uses a **federated prefix** (for example `tpch.tpch_sf1`) that may not match the Ibis **catalog** string (the connection id). A reliable pattern is to start from **`con.sql("SELECT * FROM tpch.tpch_sf1.mytable", dialect="postgres")`**, then chain filters and aggregates—see **`examples/04_ibis_table_workflows.py`**.
 
-A practical approach is a **DuckDB** connection: in the [Hotdata app](https://app.hotdata.dev/), add DuckDB for your workspace, then run SQL against that connection (for example with `hotdata query '…' --workspace-id … --connection …` from the CLI) to install and generate data using DuckDB’s built-in TPC-H extension:
+When **`con.table("mytable")`** is enough (single connection/schema and names align with compiled SQL), the same operations apply:
 
-```sql
-INSTALL tpch;
-LOAD tpch;
-CALL dbgen(sf = 1);
-```
+```python
+t = con.table("customer")  # or con.table("customer", database=(conn_id, "tpch_sf1"))
 
-Details, cleanup between runs, and optional query harnesses are in the [DuckDB TPC-H extension](https://www.duckdb.org/docs/current/core_extensions/tpch.html) documentation. By default, `dbgen` creates the TPC-H tables in DuckDB’s default schema (often `main`).
+df = (
+    t.filter(t.c_mktsegment == "AUTOMOBILE")
+    .select("c_custkey", "c_name")
+    .limit(100)
+    .execute()
+)
 
-The examples in this repo assume federated names like **`tpch.tpch_sf1.customer`**: a connection whose id matches **`tpch`** (or is picked by the helper’s resolver) and a schema **`tpch_sf1`**. If your tables live in `main` instead, run the examples with `--default-schema main` and the correct `--default-connection`, or set **`HOTDATA_TPCH_RESOLVE=false`** and **`HOTDATA_DEFAULT_SCHEMA`** / **`HOTDATA_DEFAULT_CONNECTION`** (see `examples/_helpers.py`). Alternatively, create a `tpch_sf1` schema in DuckDB and move or recreate the generated tables there so the layout matches the defaults.
+by_seg = t.group_by(t.c_mktsegment).agg(n=t.count()).execute()
 
-## Examples
-
-The `examples/` directory has small CLIs that assume TPC-H defaults (**`tpch` / `tpch_sf1`**
-for REST metadata, aligning with federation SQL **`tpch.tpch_sf1.*`**). Helpers resolve the
-friendly labels to Hotdata connection ids when possible (`examples/_helpers.py`). Override
-via `--default-connection`, `--default-schema`, or **`HOTDATA_DEFAULT_*`**.
+o = con.table("orders")
+orders_with_names = (
+    t.join(o, t.c_custkey == o.o_custkey)
+    .select(t.c_name, o.o_totalprice)
+    .limit(50)
+    .execute()
+)
 
-```bash
-uv sync
-export HOTDATA_TOKEN=...
-export HOTDATA_WORKSPACE_ID=...
-uv run python examples/01_catalog_introspection.py
-uv run python examples/02_execute_sql.py 'SELECT COUNT(*) AS n FROM tpch.tpch_sf1.customer'
-uv run python examples/03_connect_via_url.py
+total = t.c_acctbal.sum().execute()
 ```
 
-See each script's docstring and `examples/_helpers.py` for flags (`--catalog`, `--schema`, `--prefer-async`, `--insecure`, …).
-
-Tests use **pytest-httpserver**; no workspace tokens are embedded in this repository.
+Other useful paths: **`.to_pyarrow()`** / **`.to_pyarrow_batches()`** for Arrow; **`con.sql("SELECT …", dialect="postgres")`** then chain the returned table expression.
 
 ## References
 
-- [Hotdata API reference](https://www.hotdata.dev/docs/api-reference)
-- [Hotdata SQL reference](https://www.hotdata.dev/docs/sql)
-- [Ibis](https://ibis-project.org/)
+- [Hotdata Python SDK](https://github.com/hotdata-dev/sdk-python)
+- [Hotdata API](https://www.hotdata.dev/docs/api-reference) · [Hotdata SQL](https://www.hotdata.dev/docs/sql)
+- [Ibis](https://ibis-project.org/) · [Ibis backend hierarchy](https://ibis-project.org/concepts/backend-table-hierarchy.qmd)
diff --git a/examples/04_ibis_table_workflows.py b/examples/04_ibis_table_workflows.py
@@ -0,0 +1,70 @@
+#!/usr/bin/env python3
+"""
+Ibis table expressions on TPC-H, executed to pandas via Hotdata.
+
+Hotdata SQL often uses a short federated prefix (e.g. ``tpch.tpch_sf1``) that may not
+match the Ibis **catalog** string (connection id). Building from ``con.sql(...)`` keeps
+qualifiers aligned with working ``SELECT ... FROM tpch.tpch_sf1.*`` queries.
+
+From the repo root::
+
+    HOTDATA_TOKEN=... HOTDATA_WORKSPACE_ID=... \\
+      uv run python examples/04_ibis_table_workflows.py
+"""
+
+from __future__ import annotations
+
+import sys
+from pathlib import Path
+
+_examples = Path(__file__).resolve().parent
+sys.path.insert(0, str(_examples))
+
+import ibis
+
+from _helpers import connect_kwargs, parsed_args, parser
+
+_argp = parser("Ibis table workflows → pandas (Hotdata / TPC-H).")
+_ns = parsed_args(_argp)
+con = ibis.hotdata.connect(**connect_kwargs(_ns))
+
+# Federation prefix as in ``examples/02_execute_sql.py`` (not always == Ibis catalog id).
+FED = "tpch.tpch_sf1"
+
+
+def main() -> None:
+    customer = con.sql(f"SELECT * FROM {FED}.customer", dialect="postgres")
+    orders = con.sql(f"SELECT * FROM {FED}.orders", dialect="postgres")
+
+    print("— project + limit —")
+    q1 = customer.select("c_custkey", "c_name", "c_mktsegment").limit(5)
+    print(con.compile(q1))
+    print(q1.execute(), end="\n\n")
+
+    print("— filter + limit —")
+    q2 = customer.filter(customer.c_mktsegment == "AUTOMOBILE").limit(5)
+    print(con.compile(q2))
+    print(q2.execute(), end="\n\n")
+
+    print("— group by segment —")
+    q3 = customer.group_by(customer.c_mktsegment).agg(n=customer.count())
+    print(con.compile(q3))
+    print(q3.execute(), end="\n\n")
+
+    print("— join customer to orders —")
+    q4 = (
+        customer.join(orders, customer.c_custkey == orders.o_custkey)
+        .select(customer.c_name, orders.o_totalprice, orders.o_orderkey)
+        .limit(8)
+    )
+    print(con.compile(q4))
+    print(q4.execute(), end="\n\n")
+
+    print("— scalar aggregate —")
+    expr = customer.c_acctbal.sum()
+    print(con.compile(expr))
+    print(expr.execute())
+
+
+if __name__ == "__main__":
+    main()
diff --git a/pyproject.toml b/pyproject.toml
@@ -24,7 +24,7 @@ classifiers = [
 ]
 dependencies = [
   "ibis-framework>=10.0,<11",
-  "httpx>=0.27",
+  "hotdata>=0.1.0",
   "pyarrow>=15",
   "pyarrow-hotfix>=0.6",
   "pandas>=2",
@@ -33,9 +33,10 @@ dependencies = [
 
 [dependency-groups]
 dev = [
-    "pytest>=8",
-    "pytest-httpserver>=1",
-    "ruff>=0.5",
+  "httpx>=0.27",
+  "pytest>=8",
+  "pytest-httpserver>=1",
+  "ruff>=0.5",
 ]
 
 [project.urls]

diff --git a/src/ibis_hotdata/backend.py b/src/ibis_hotdata/backend.py
@@ -176,7 +176,7 @@ def do_connect(
         timeout
             HTTP timeout in seconds (per request).
         verify_ssl
-            Passed to ``httpx`` (boolean or path to a CA bundle).
+            Passed through to the Hotdata SDK configuration (boolean or path to a CA bundle).
         default_connection
             Optional default **catalog** (Hotdata connection id). If omitted and the
             workspace exposes exactly one connection, it is chosen automatically;
@@ -270,7 +270,7 @@ def _to_catalog_db_tuple(self, table_loc: sge.Table):
         return sg_cat, sg_db
 
     def _connection_ids(self) -> list[str]:
-        data = self._http.get_json("/v1/connections")
+        data = self._http.list_connections()
         return [c["id"] for c in data["connections"]]
 
     def list_catalogs(self, *, like: str | None = None) -> list[str]:
@@ -322,7 +322,7 @@ def _iterate_information_schema(
             params["include_columns"] = include_columns
             if cursor:
                 params["cursor"] = cursor
-            chunk = self._http.get_json("/v1/information_schema", params=params)
+            chunk = self._http.get_information_schema(params)
             yield from chunk["tables"]
             if not chunk.get("has_more"):
                 break
@@ -425,8 +425,41 @@ def _fetch_from_cursor(self, cursor, schema: sch.Schema) -> pd.DataFrame:
         df = PandasData.convert_table(df, schema)
         return df
 
+    def upload_file(self, data: bytes) -> dict[str, Any]:
+        """POST ``/v1/files``; returns the upload record (use ``id`` with :meth:`create_dataset_from_upload`)."""
+        try:
+            return self._http.upload_file(data)
+        except HotdataAPIError as exc:
+            raise com.IbisError(str(exc)) from exc
+
+    def create_dataset_from_upload(
+        self,
+        upload_id: str,
+        label: str,
+        *,
+        table_name: str | None = None,
+        file_format: str = "csv",
+    ) -> dict[str, Any]:
+        """POST ``/v1/datasets`` with an upload source—materializes a queryable dataset table.
+
+        The response includes ``schema_name`` and ``table_name``. Reference the table in SQL as
+        ``datasets.<schema_name>.<table_name>`` (see Hotdata ``datasets`` documentation).
+        """
+        try:
+            return self._http.create_dataset_from_upload(
+                upload_id=upload_id,
+                label=label,
+                table_name=table_name,
+                file_format=file_format,
+            )
+        except HotdataAPIError as exc:
+            raise com.IbisError(str(exc)) from exc
+
     def create_table(self, *_args: Any, **_kwargs: Any) -> ir.Table:
-        raise NotImplementedError("Hotdata backend does not implement create_table in v1.")
+        raise NotImplementedError(
+            "Hotdata does not implement Ibis create_table in v1; use upload_file + "
+            "create_dataset_from_upload, then SQL or con.table with the returned names."
+        )
 
     def drop_table(self, *_args: Any, **_kwargs: Any) -> None:
         raise NotImplementedError("Hotdata backend does not implement drop_table in v1.")