Skip to content

Commit bbbb2a0

Browse files
authored
Merge pull request #89 from posit-dev/rust-api
Add high-level Rust API and rework Python bindings
2 parents 177e000 + ba2d566 commit bbbb2a0

21 files changed

Lines changed: 3535 additions & 406 deletions

File tree

CLAUDE.md

Lines changed: 194 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,79 @@ DRAW line MAPPING month AS x, total AS y
147147

148148
---
149149

150+
## Public API
151+
152+
### Quick Start
153+
154+
```rust
155+
use ggsql::reader::{DuckDBReader, Reader};
156+
use ggsql::writer::VegaLiteWriter;
157+
158+
// Create a reader
159+
let reader = DuckDBReader::from_connection_string("duckdb://memory")?;
160+
161+
// Execute the ggsql query
162+
let spec = reader.execute(
163+
"SELECT x, y FROM data VISUALISE x, y DRAW point"
164+
)?;
165+
166+
// Render to Vega-Lite JSON
167+
let writer = VegaLiteWriter::new();
168+
let json = writer.render(&spec)?;
169+
```
170+
171+
### Core Functions
172+
173+
| Function | Purpose |
174+
| ----------------------- | ------------------------------------------------------ |
175+
| `reader.execute(query)` | Main entry point: parse, execute SQL, resolve mappings |
176+
| `writer.render(spec)` | Generate output from a Spec |
177+
| `validate(query)` | Validate syntax + semantics, inspect query structure |
178+
179+
### Key Types
180+
181+
**`Validated`** - Result of `validate()`:
182+
183+
- `has_visual()` - Whether query has VISUALISE clause
184+
- `sql()` - The SQL portion (before VISUALISE)
185+
- `visual()` - The VISUALISE portion (raw text)
186+
- `tree()` - CST for advanced inspection
187+
- `valid()` - Whether query is valid
188+
- `errors()` - Validation errors
189+
- `warnings()` - Validation warnings
190+
191+
**`Spec`** - Result of `reader.execute()`, ready for rendering:
192+
193+
- `plot()` - Resolved plot specification
194+
- `metadata()` - Rows, columns, layer count
195+
- `warnings()` - Validation warnings from execution
196+
- `data()` / `layer_data(i)` / `stat_data(i)` - Access DataFrames
197+
- `sql()` / `visual()` / `layer_sql(i)` / `stat_sql(i)` - Query introspection
198+
199+
**`Metadata`**:
200+
201+
- `rows` - Number of rows in primary data
202+
- `columns` - Column names
203+
- `layer_count` - Number of layers
204+
205+
### Reader & Writer
206+
207+
**Reader trait** (data source abstraction):
208+
209+
- `execute_sql(sql)` - Run SQL, return DataFrame
210+
- `register(name, df)` - Register DataFrame as table
211+
- `unregister(name)` - Unregister a previously registered table
212+
- Implementation: `DuckDBReader`
213+
214+
**Writer trait** (output format abstraction):
215+
216+
- `write(spec, data)` - Generate output string
217+
- Implementation: `VegaLiteWriter` (Vega-Lite v6 JSON)
218+
219+
For detailed API documentation, see [`src/doc/API.md`](src/doc/API.md).
220+
221+
---
222+
150223
## Component Breakdown
151224

152225
### 1. Parser Module (`src/parser/`)
@@ -432,7 +505,7 @@ pub type Result<T> = std::result::Result<T, GgsqlError>;
432505

433506
```rust
434507
pub trait Reader {
435-
fn execute(&self, sql: &str) -> Result<DataFrame>;
508+
fn execute_sql(&self, sql: &str) -> Result<DataFrame>;
436509
fn supports_query(&self, sql: &str) -> bool;
437510
}
438511
```
@@ -462,7 +535,6 @@ pub fn parse_connection_string(uri: &str) -> Result<ConnectionInfo> {
462535
The codebase includes connection string parsing and feature flags for additional readers, but they are not yet implemented:
463536

464537
- **PostgreSQL Reader** (`postgres://...`)
465-
466538
- Feature flag: `postgres`
467539
- Connection string parsing exists in `connection.rs`
468540
- Reader implementation: Not yet available
@@ -792,15 +864,18 @@ When running in Positron IDE, the extension provides enhanced functionality:
792864

793865
### 8. Python Bindings (`ggsql-python/`)
794866

795-
**Responsibility**: Python bindings for ggsql, enabling Python users to render Altair charts using ggsql's VISUALISE syntax.
867+
**Responsibility**: Python bindings for ggsql, enabling Python users to create visualizations using ggsql's VISUALISE syntax.
796868

797869
**Features**:
798870

799871
- PyO3-based Rust bindings compiled to a native Python extension
872+
- Two-stage API mirroring the Rust API: `reader.execute()``render()`
873+
- DuckDB reader with DataFrame registration
874+
- Custom Python reader support: any object with `execute_sql(sql) -> DataFrame` method
800875
- Works with any narwhals-compatible DataFrame (polars, pandas, etc.)
801876
- LazyFrames are collected automatically
802-
- Returns native `altair.Chart` objects for easy display and customization
803-
- Query splitting to separate SQL from VISUALISE portions
877+
- Returns native `altair.Chart` objects via `render_altair()` convenience function
878+
- Query validation and introspection (SQL, layer queries, stat queries)
804879

805880
**Installation**:
806881

@@ -817,26 +892,117 @@ maturin develop
817892
import ggsql
818893
import polars as pl
819894

820-
# Split a ggSQL query into SQL and VISUALISE portions
821-
sql, viz = ggsql.split_query("""
822-
SELECT date, revenue FROM sales
823-
VISUALISE date AS x, revenue AS y
824-
DRAW line
825-
""")
895+
# Create reader and register data
896+
reader = ggsql.DuckDBReader("duckdb://memory")
897+
df = pl.DataFrame({"x": [1, 2, 3], "y": [10, 20, 30]})
898+
reader.register("data", df)
899+
900+
# Execute visualization
901+
spec = reader.execute(
902+
"SELECT * FROM data VISUALISE x, y DRAW point"
903+
)
904+
905+
# Inspect metadata
906+
print(f"Rows: {spec.metadata()['rows']}")
907+
print(f"Columns: {spec.metadata()['columns']}")
908+
print(f"SQL: {spec.sql()}")
909+
910+
# Render to Vega-Lite JSON
911+
writer = ggsql.VegaLiteWriter()
912+
json_output = writer.render(spec)
913+
```
914+
915+
**Convenience Function** (`render_altair`):
916+
917+
For quick visualizations without explicit reader setup:
918+
919+
```python
920+
import ggsql
921+
import polars as pl
826922

827-
# Execute SQL and render to Altair chart
828923
df = pl.DataFrame({"x": [1, 2, 3], "y": [10, 20, 30]})
829-
chart = ggsql.render_altair(df, "VISUALISE x, y DRAW point")
830924

831-
# Display or save
925+
# Render DataFrame to Altair chart in one call
926+
chart = ggsql.render_altair(df, "VISUALISE x, y DRAW point")
832927
chart.display() # In Jupyter
833-
chart.save("chart.html")
834928
```
835929

930+
**Query Validation**:
931+
932+
```python
933+
# Validate syntax without execution
934+
validated = ggsql.validate(
935+
"SELECT x, y FROM data VISUALISE x, y DRAW point"
936+
)
937+
print(f"Valid: {validated.valid()}")
938+
print(f"Has VISUALISE: {validated.has_visual()}")
939+
print(f"SQL portion: {validated.sql()}")
940+
print(f"Errors: {validated.errors()}")
941+
```
942+
943+
**Classes**:
944+
945+
| Class | Description |
946+
| -------------------------- | ------------------------------------------------- |
947+
| `DuckDBReader(connection)` | Database reader with DataFrame registration |
948+
| `VegaLiteWriter()` | Vega-Lite JSON output writer |
949+
| `Validated` | Result of `validate()` with query inspection |
950+
| `Spec` | Result of `reader.execute()`, ready for rendering |
951+
836952
**Functions**:
837953

838-
- `split_query(query: str) -> tuple[str, str]` - Split ggSQL query into SQL and VISUALISE portions
839-
- `render_altair(df, viz, **kwargs) -> altair.Chart` - Render DataFrame with VISUALISE spec to Altair chart
954+
| Function | Description |
955+
| ------------------------ | ------------------------------------------------ |
956+
| `validate(query)` | Syntax/semantic validation with query inspection |
957+
| `reader.execute(query)` | Execute ggsql query, return Spec |
958+
| `execute(query, reader)` | Execute with custom reader (bridge path) |
959+
| `render_altair(df, viz)` | Convenience: render DataFrame to Altair chart |
960+
961+
**Spec Methods**:
962+
963+
| Method | Description |
964+
| ---------------- | -------------------------------------------- |
965+
| `render(writer)` | Generate Vega-Lite JSON |
966+
| `metadata()` | Get rows, columns, layer_count |
967+
| `sql()` | Get the SQL portion |
968+
| `visual()` | Get the VISUALISE portion |
969+
| `layer_count()` | Number of DRAW layers |
970+
| `data()` | Get the main DataFrame |
971+
| `layer_data(i)` | Get layer-specific DataFrame (if filtered) |
972+
| `stat_data(i)` | Get stat transform DataFrame (if applicable) |
973+
| `layer_sql(i)` | Get layer filter SQL (if applicable) |
974+
| `stat_sql(i)` | Get stat transform SQL (if applicable) |
975+
| `warnings()` | Get validation warnings |
976+
977+
**Custom Python Readers**:
978+
979+
Any Python object with an `execute_sql(sql: str) -> polars.DataFrame` method can be used as a reader:
980+
981+
```python
982+
import ggsql
983+
import polars as pl
984+
985+
class MyReader:
986+
"""Custom reader that returns static data."""
987+
988+
def execute_sql(self, sql: str) -> pl.DataFrame:
989+
return pl.DataFrame({"x": [1, 2, 3], "y": [10, 20, 30]})
990+
991+
# Use custom reader with ggsql.execute()
992+
reader = MyReader()
993+
spec = ggsql.execute(
994+
"SELECT * FROM data VISUALISE x, y DRAW point",
995+
reader
996+
)
997+
```
998+
999+
Optional methods for custom readers:
1000+
1001+
- `supports_register() -> bool` - Return `True` if registration is supported
1002+
- `register(name: str, df: polars.DataFrame) -> None` - Register a DataFrame as a table
1003+
- `unregister(name: str) -> None` - Unregister a previously registered table
1004+
1005+
Native readers (e.g., `DuckDBReader`) use an optimized fast path, while custom Python readers are automatically bridged via IPC serialization.
8401006

8411007
**Dependencies**:
8421008

@@ -920,22 +1086,23 @@ cargo build --all-features
9201086
```
9211087

9221088
Where `<global_mapping>` can be:
1089+
9231090
- Empty: `VISUALISE` (layers must define all mappings)
9241091
- Mappings: `VISUALISE x, y, date AS x` (mixed implicit/explicit)
9251092
- Wildcard: `VISUALISE *` (map all columns)
9261093

9271094
### Clause Types
9281095

929-
| Clause | Repeatable | Purpose | Example |
930-
| -------------- | ---------- | ------------------ | ------------------------------------ |
931-
| `VISUALISE` | ✅ Yes | Entry point | `VISUALISE date AS x, revenue AS y` |
932-
| `DRAW` | ✅ Yes | Define layers | `DRAW line MAPPING date AS x, value AS y` |
933-
| `SCALE` | ✅ Yes | Configure scales | `SCALE x SETTING type => 'date'` |
934-
| `FACET` | ❌ No | Small multiples | `FACET WRAP region` |
935-
| `COORD` | ❌ No | Coordinate system | `COORD cartesian SETTING xlim => [0,100]` |
936-
| `LABEL` | ❌ No | Text labels | `LABEL title => 'My Chart', x => 'Date'` |
937-
| `GUIDE` | ✅ Yes | Legend/axis config | `GUIDE color SETTING position => 'right'` |
938-
| `THEME` | ❌ No | Visual styling | `THEME minimal` |
1096+
| Clause | Repeatable | Purpose | Example |
1097+
| ----------- | ---------- | ------------------ | ----------------------------------------- |
1098+
| `VISUALISE` | ✅ Yes | Entry point | `VISUALISE date AS x, revenue AS y` |
1099+
| `DRAW` | ✅ Yes | Define layers | `DRAW line MAPPING date AS x, value AS y` |
1100+
| `SCALE` | ✅ Yes | Configure scales | `SCALE x SETTING type => 'date'` |
1101+
| `FACET` | ❌ No | Small multiples | `FACET WRAP region` |
1102+
| `COORD` | ❌ No | Coordinate system | `COORD cartesian SETTING xlim => [0,100]` |
1103+
| `LABEL` | ❌ No | Text labels | `LABEL title => 'My Chart', x => 'Date'` |
1104+
| `GUIDE` | ✅ Yes | Legend/axis config | `GUIDE color SETTING position => 'right'` |
1105+
| `THEME` | ❌ No | Visual styling | `THEME minimal` |
9391106

9401107
### DRAW Clause (Layers)
9411108

@@ -1201,7 +1368,6 @@ COORD cartesian SETTING xlim => [0, 100], ylim => [0, 200]
12011368
LABEL x => 'Category', y => 'Count'
12021369
```
12031370

1204-
12051371
### LABEL Clause
12061372

12071373
**Syntax**:

Cargo.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,8 @@ csscolorparser = "0.8.1"
3232
polars = { version = "0.52", features = ["lazy", "sql", "ipc"] }
3333

3434
# Readers
35-
duckdb = { version = "1.1", features = ["bundled"] }
35+
duckdb = { version = "1.4", features = ["bundled", "vtab-arrow"] }
36+
arrow = { version = "56", default-features = false, features = ["ipc"] }
3637
postgres = "0.19"
3738
sqlx = { version = "0.8", features = ["postgres", "runtime-tokio-rustls"] }
3839
rusqlite = "0.32"

README.md

Lines changed: 39 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ THEME minimal
3030
- ✅ REST API server (`ggsql-rest`) with CORS support
3131
- ✅ Jupyter kernel (`ggsql-jupyter`) with inline Vega-Lite visualizations
3232
- ✅ VS Code extension (`ggsql-vscode`) with syntax highlighting and Positron IDE integration
33+
- ✅ Python bindings (`ggsql-python`) with Altair chart output
3334

3435
**Planned:**
3536

@@ -93,7 +94,9 @@ ggsql/
9394
9495
├── ggsql-jupyter/ # Jupyter kernel
9596
96-
└── ggsql-vscode/ # VS Code extension
97+
├── ggsql-vscode/ # VS Code extension
98+
99+
└── ggsql-python/ # Python bindings
97100
```
98101

99102
## Development Workflow
@@ -297,6 +300,41 @@ When running in Positron IDE, the extension provides additional features:
297300
- **Language runtime registration** for executing ggsql queries directly within Positron
298301
- **Plot pane integration** - visualizations are automatically routed to Positron's Plots pane
299302

303+
## Python Bindings
304+
305+
The `ggsql-python` package provides Python bindings for using ggsql with DataFrames.
306+
307+
### Installation
308+
309+
```bash
310+
cd ggsql-python
311+
pip install maturin
312+
maturin develop
313+
```
314+
315+
### Usage
316+
317+
```python
318+
import ggsql
319+
import polars as pl
320+
321+
# Simple usage with render_altair
322+
df = pl.DataFrame({"x": [1, 2, 3], "y": [10, 20, 30]})
323+
chart = ggsql.render_altair(df, "VISUALISE x, y DRAW point")
324+
chart.display()
325+
326+
# Two-stage API for full control
327+
reader = ggsql.DuckDBReader("duckdb://memory")
328+
reader.register("data", df)
329+
330+
spec = reader.execute("SELECT * FROM data VISUALISE x, y DRAW point")
331+
332+
writer = ggsql.VegaLiteWriter()
333+
json_output = writer.render(spec)
334+
```
335+
336+
See the [ggsql-python README](ggsql-python/README.md) for complete API documentation.
337+
300338
## CLI
301339

302340
### Installation

0 commit comments

Comments
 (0)