diff --git a/README.md b/README.md index 2cdc1a2..b25777a 100644 --- a/README.md +++ b/README.md @@ -175,7 +175,7 @@ sqlrite> DELETE FROM users WHERE age < 30; | `CREATE TABLE` | `PRIMARY KEY`, `UNIQUE`, `NOT NULL`; `IF NOT EXISTS` (idempotent re-create); duplicate-column detection; types `INTEGER`/`INT`/`BIGINT`/`SMALLINT`, `TEXT`/`VARCHAR`, `REAL`/`FLOAT`/`DOUBLE`/`DECIMAL`, `BOOLEAN`. Auto-creates `sqlrite_autoindex__` for every PK + UNIQUE column | | `CREATE [UNIQUE] INDEX` | Single-column, named indexes; `IF NOT EXISTS`; persists as a dedicated cell-based B-Tree. INTEGER + TEXT columns only | | `INSERT INTO` | Explicit column list required; auto-ROWID for `INTEGER PRIMARY KEY`; multi-row `VALUES (…), (…)`; UNIQUE enforcement; clean type errors (no panics); NULL padding for omitted columns | -| `SELECT` | `*` or column list with optional `AS alias`; `WHERE`; `DISTINCT`; `GROUP BY col[, col …]`; aggregate projections `COUNT(*)` / `COUNT([DISTINCT] col)` / `SUM` / `AVG` / `MIN` / `MAX`; `[INNER\|LEFT OUTER\|RIGHT OUTER\|FULL OUTER] JOIN ... ON ...` with table aliases and qualified `t.col` references; single-column `ORDER BY [ASC\|DESC]` (also resolves alias and aggregate display names); `LIMIT n`. `WHERE col = literal` probes an index when one exists. Catalog introspection via `SELECT … FROM sqlrite_master` | +| `SELECT` | `*` or column list with optional `AS alias`; `WHERE`; `DISTINCT`; `GROUP BY col[, col …]`; aggregate projections `COUNT(*)` / `COUNT([DISTINCT] col)` / `SUM` / `AVG` / `MIN` / `MAX`; `[INNER\|LEFT OUTER\|RIGHT OUTER\|FULL OUTER\|CROSS] JOIN` with `ON ...` / `USING (...)` / `NATURAL` constraints, table aliases and qualified `t.col` references; single-column `ORDER BY [ASC\|DESC]` (also resolves alias and aggregate display names); `LIMIT n`. `WHERE col = literal` probes an index when one exists. Catalog introspection via `SELECT … FROM sqlrite_master` | | `UPDATE` | Multi-column `SET`; `WHERE`; UNIQUE + type enforcement; arithmetic in assignments (`SET age = age + 1`) | | `DELETE` | `WHERE` predicate or full-table delete | | `BEGIN` / `COMMIT` / `ROLLBACK` | Real transactions, snapshot-based; WAL-backed commit; single-level (no savepoints); auto-rollback if `COMMIT`'s disk write fails | @@ -193,7 +193,7 @@ Expressions in `WHERE` and `UPDATE`'s `SET` RHS: - String concat — `||` - Literals — integer + real numbers, `'single-quoted strings'`, `TRUE` / `FALSE`, `NULL`; parentheses for grouping -**Not yet supported** (common ones): subqueries, CTEs, `HAVING`, `LIKE … ESCAPE ''`, `IN (subquery)`, `DISTINCT` on `SUM`/`AVG`/`MIN`/`MAX`, GROUP BY on expressions, expressions in the projection list, `OFFSET`, multi-column `ORDER BY`, savepoints, `JOIN ... USING`, `NATURAL JOIN`, `CROSS JOIN`, comma joins, aggregates / DISTINCT / GROUP BY *over* JOIN results. The [full list with context](docs/supported-sql.md#not-yet-supported) lives in the reference. +**Not yet supported** (common ones): subqueries, CTEs, `HAVING`, `LIKE … ESCAPE ''`, `IN (subquery)`, `DISTINCT` on `SUM`/`AVG`/`MIN`/`MAX`, GROUP BY on expressions, expressions in the projection list, `OFFSET`, multi-column `ORDER BY`, savepoints, comma joins (`FROM a, b`), aggregates / DISTINCT / GROUP BY *over* JOIN results. The [full list with context](docs/supported-sql.md#not-yet-supported) lives in the reference. #### Meta commands @@ -250,7 +250,7 @@ The project is staged in phases, each independently shippable. A finished phase - [x] `CREATE TABLE` with `PRIMARY KEY`, `UNIQUE`, `NOT NULL`; duplicate-column detection; in-memory `BTreeMap` indexes on PK/UNIQUE columns - [x] `INSERT` with auto-ROWID for `INTEGER PRIMARY KEY`, UNIQUE enforcement, NULL padding for missing columns - [x] `SELECT` — projection, `WHERE`, `ORDER BY`, `LIMIT` -- [x] `JOIN` — `INNER`, `LEFT OUTER`, `RIGHT OUTER`, `FULL OUTER` with `ON` (SQLR-5) +- [x] `JOIN` — `INNER`, `LEFT OUTER`, `RIGHT OUTER`, `FULL OUTER`, `CROSS` with `ON` / `USING (...)` / `NATURAL` (SQLR-5) - [x] `UPDATE ... SET ... WHERE ...` with type + UNIQUE enforcement at write time - [x] `DELETE ... WHERE ...` - [x] Expression evaluator: `=`/`<>`/`<`/`<=`/`>`/`>=`, `AND`/`OR`/`NOT`, arithmetic `+`/`-`/`*`/`/`/`%`, string concat `||`, NULL-as-false in `WHERE` @@ -332,7 +332,6 @@ Lockstep versioning — one dispatch bumps every product to the same `vX.Y.Z`. T - [ ] *(deferred to Phase 8)* Full-text search with BM25 + hybrid retrieval **Possible extras** *(no committed phase)* -- Joins (`INNER`, `LEFT OUTER`, `CROSS` — SQLite does not support `RIGHT`/`FULL OUTER`) - `HAVING`, `IN (subquery)`, `BETWEEN`, `GLOB` / `REGEXP`, `GROUP_CONCAT`, window functions - Composite and expression indexes (with cost analysis) - Alternate storage engines — LSM/SSTable for write-heavy workloads alongside the B-Tree diff --git a/docs/roadmap.md b/docs/roadmap.md index 6bbd9a5..8c95d6b 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -561,7 +561,7 @@ The biggest single SQL-surface jump in the project's history. - Self-joins require an alias on at least one side. - `WHERE` runs after joins (the standard `LEFT JOIN ... WHERE right.col IS NULL` anti-join idiom works). -Not yet supported: `CROSS JOIN`, comma-separated FROMs, `NATURAL JOIN`, `JOIN ... USING (col)`, aggregates / `GROUP BY` / `DISTINCT` *over* a join, `fts_match` / `bm25_score` inside a join expression. Algorithm: plain nested-loop, O(N×M) per level — hash / merge joins are a future optimization. +`ON`, `USING (...)`, `NATURAL`, and `CROSS JOIN` are all supported. Not yet supported: comma-separated FROMs (`FROM a, b`), aggregates / `GROUP BY` / `DISTINCT` *over* a join, `fts_match` / `bm25_score` inside a join expression. Algorithm: plain nested-loop, O(N×M) per level — hash / merge joins are a future optimization. ### ✅ Phase 9g — Prepared statements + parameter binding *(v0.9.0, SQLR-23)* diff --git a/docs/sql-engine.md b/docs/sql-engine.md index d534893..72dc8b8 100644 --- a/docs/sql-engine.md +++ b/docs/sql-engine.md @@ -53,7 +53,7 @@ The `sqlparser` AST is designed to cover every SQL dialect, so its types are hug `SelectQuery::joins` (SQLR-5) is a `Vec` evaluated left-to-right by `execute_select_rows_joined`. Each clause carries a `JoinType` (`Inner` / `LeftOuter` / `RightOuter` / `FullOuter`), the right-table name + optional alias, and a required `ON` expression. Empty = single-table SELECT, the existing fast path with HNSW / FTS / bounded-heap optimizations. -Each parser module still rejects features we don't implement with `SQLRiteError::NotImplemented` — `JOIN ... USING`, `NATURAL JOIN`, `CROSS JOIN`, comma joins, aggregates / GROUP BY / DISTINCT over JOINs, `HAVING`, `DISTINCT ON (...)`, `GROUP BY` on expressions, `LIKE … ESCAPE ''`, `IN (subquery)`, `OFFSET`, multi-table DELETE, tuple assignment targets, etc. These errors carry the feature name in the message so the user knows what isn't there. +Each parser module still rejects features we don't implement with `SQLRiteError::NotImplemented` — comma joins (`FROM a, b`), aggregates / GROUP BY / DISTINCT over JOINs, `HAVING`, `DISTINCT ON (...)`, `GROUP BY` on expressions, `LIKE … ESCAPE ''`, `IN (subquery)`, `OFFSET`, multi-table DELETE, tuple assignment targets, etc. These errors carry the feature name in the message so the user knows what isn't there. (`JOIN ... USING`, `NATURAL JOIN`, and `CROSS JOIN` are now supported — see [`supported-sql.md`](supported-sql.md#join-semantics-sqlr-5).) ## Statement dispatch diff --git a/docs/supported-sql.md b/docs/supported-sql.md index 8f1a9b5..abbd4a7 100644 --- a/docs/supported-sql.md +++ b/docs/supported-sql.md @@ -210,19 +210,27 @@ COUNT([DISTINCT] ) -- counts non-NULL values, option ### `JOIN` semantics (SQLR-5) -Four flavors are supported, all with explicit `ON` conditions: +Four flavors are supported, with `ON`, `USING (...)`, or `NATURAL` match +conditions, plus `CROSS JOIN`: | Flavor | Keeps unmatched rows from… | |---|---| -| `INNER JOIN` | …neither side. Only ON-matched pairs survive. | +| `INNER JOIN` | …neither side. Only matched pairs survive. | | `LEFT [OUTER] JOIN` | …the left side; right-side columns become `NULL` for unmatched left rows. | | `RIGHT [OUTER] JOIN` | …the right side; left-side columns become `NULL` for unmatched right rows. | | `FULL [OUTER] JOIN` | …both sides, NULL-padded on the unmatched side. | +| `CROSS JOIN` | …both sides (cross product — every left row paired with every right row). | - **Engine choice:** SQLite ships only `INNER` and `LEFT OUTER`. SQLRite implements all four because the per-flavor differences boil down to NULL-padding policy on top of one shared nested-loop driver — adding `RIGHT` / `FULL` was effectively free once the executor had a multi-table scope. See [`docs/design-decisions.md`](design-decisions.md) for the rationale. +- **Match conditions:** + - **`ON `** — any boolean expression over the in-scope tables. + - **`USING (col[, col…])`** — shorthand for `left.col = right.col` AND-chained over each named column. The column must exist on the right side and on some left-side table; in a chain (`A JOIN B USING(x) JOIN C USING(x)`) each `x` resolves against the first left table that has it. + - **`NATURAL`** — equivalent to `USING ()`, discovered automatically from the schemas. If the sides share no column names, a `NATURAL JOIN` degrades to a cross product (matching SQLite). Combines with a flavor: `NATURAL LEFT JOIN`. + - **`CROSS JOIN`** — the cross product; the engine treats it as `INNER JOIN ... ON true`. +- **`SELECT *` with `USING` / `NATURAL`:** each joined-on column appears **once** (SQLite convention), taking the left side's value; the right side's duplicate is omitted. Plain `ON` joins keep both copies. - **Aliases:** `FROM customers AS c INNER JOIN orders AS o ON c.id = o.customer_id`. When an alias is supplied the original table name leaves scope (SQL standard) — qualifier resolution uses the alias. - **Qualified column references:** `
.` and `.` resolve to that specific side. Bare `` references must resolve to exactly one in-scope table; ambiguous references error with a "qualify it as `
.col`" hint. -- **Output of `SELECT *`** over a join is every column of every in-scope table, in source order. Duplicate header names are permitted (SQLite-style). Disambiguate with explicit `SELECT t.col AS t_col, u.col AS u_col`. +- **Output of `SELECT *`** over a join is every column of every in-scope table, in source order (minus `USING` / `NATURAL` duplicates, see above). Duplicate header names are otherwise permitted (SQLite-style). Disambiguate with explicit `SELECT t.col AS t_col, u.col AS u_col`. - **Multi-join** chains left-fold: `A JOIN B ON ... JOIN C ON ...` evaluates as `(A ⨝ B) ⨝ C`. Each new clause sees every prior alias / table in its `ON` expression. - **Self-joins** require an alias on at least one side: `FROM nodes AS p INNER JOIN nodes AS c ON p.id = c.parent_id`. Without one, you get a `duplicate table reference` error so qualifiers stay unambiguous. - **`WHERE` runs after joins.** A `WHERE right.col IS NULL` filter on a `LEFT JOIN` correctly returns left rows with no match (the standard "anti-join via outer-join" idiom). @@ -231,8 +239,7 @@ Four flavors are supported, all with explicit `ON` conditions: #### What's not supported in JOINs -- `JOIN ... USING (col)` and `NATURAL JOIN` — explicit `ON` only. (Both are deferred — `USING` is straightforward but adds a column-resolution rule we haven't needed yet.) -- `CROSS JOIN` (write `INNER JOIN ... ON true` instead) and comma-separated FROM lists. +- Comma-separated FROM lists (`FROM a, b`) — use an explicit `JOIN` / `CROSS JOIN` instead. - Aggregates / `GROUP BY` / `DISTINCT` *over* a join. The single-table aggregator is wired against one rowid stream; rewiring it for joined rows is a separate increment. Surfaces as a clean `NotImplemented` at parse time. - `fts_match` / `bm25_score` inside a JOIN expression. They need to look up an FTS index by column, which is single-table-bound today. Use them on a single-table SELECT first, or fold the FTS lookup into the FROM side. @@ -255,7 +262,7 @@ The executor includes a tiny optimizer: if the `WHERE` is exactly ` ### What doesn't work -- **`CROSS JOIN`**, **comma-separated FROM lists**, **`NATURAL JOIN`**, **`JOIN ... USING (col)`** — explicit `INNER` / `LEFT` / `RIGHT` / `FULL OUTER JOIN ... ON ...` only (see [JOIN semantics](#join-semantics-sqlr-5)) +- **Comma-separated FROM lists** (`FROM a, b`) — use an explicit `JOIN` / `CROSS JOIN`. `INNER` / `LEFT` / `RIGHT` / `FULL OUTER` / `CROSS` with `ON` / `USING` / `NATURAL` are all supported (see [JOIN semantics](#join-semantics-sqlr-5)) - **Aggregates** / **`GROUP BY`** / **`DISTINCT`** over a JOIN — pipe through a subquery once subqueries land - **Subqueries**, CTEs (`WITH`), views - **`HAVING`** — pre-aggregation `WHERE` works; post-aggregation filtering does not yet @@ -700,7 +707,7 @@ A REPL launched with `sqlrite --readonly foo.sqlrite` (or `sqlrite::open_databas For context when you hit `NotImplemented`. See [Roadmap](roadmap.md) for when these land: ### Joins & composition -- `CROSS JOIN`, comma joins, `NATURAL JOIN`, `JOIN ... USING` — explicit `INNER` / `LEFT` / `RIGHT` / `FULL OUTER JOIN ... ON ...` works (SQLR-5); the others don't +- `INNER` / `LEFT` / `RIGHT` / `FULL OUTER` / `CROSS JOIN` with `ON` / `USING (...)` / `NATURAL` all work (SQLR-5). Comma-separated FROM joins (`FROM a, b`) don't — use an explicit `JOIN` / `CROSS JOIN` - Aggregates / `GROUP BY` / `DISTINCT` *over* a JOIN — pipe through a subquery once subqueries land - `fts_match` / `bm25_score` inside a JOIN expression — single-table-bound today - Subqueries (scalar, `IN (SELECT ...)`, correlated) diff --git a/src/sql/executor.rs b/src/sql/executor.rs index 99ccac3..6a65ab6 100644 --- a/src/sql/executor.rs +++ b/src/sql/executor.rs @@ -6,7 +6,7 @@ use std::cmp::Ordering; use prettytable::{Cell as PrintCell, Row as PrintRow, Table as PrintTable}; use sqlparser::ast::{ AlterTable, AlterTableOperation, AssignmentTarget, BinaryOperator, CreateIndex, Delete, Expr, - FromTable, FunctionArg, FunctionArgExpr, FunctionArguments, IndexType, ObjectName, + FromTable, FunctionArg, FunctionArgExpr, FunctionArguments, Ident, IndexType, ObjectName, ObjectNamePart, RenameTableNameKind, Statement, TableFactor, TableWithJoins, UnaryOperator, Update, Value as AstValue, }; @@ -21,7 +21,8 @@ use crate::sql::db::table::{ use crate::sql::fts::{Bm25Params, PostingList}; use crate::sql::hnsw::{DistanceMetric, HnswIndex}; use crate::sql::parser::select::{ - AggregateArg, JoinType, OrderByClause, Projection, ProjectionItem, ProjectionKind, SelectQuery, + AggregateArg, JoinConstraintKind, JoinType, OrderByClause, Projection, ProjectionItem, + ProjectionKind, SelectQuery, }; // ----------------------------------------------------------------- @@ -405,6 +406,121 @@ pub fn execute_select_rows(query: SelectQuery, db: &Database) -> Result, +} + +/// Turn a [`JoinConstraintKind`] into the `ON` predicate the nested-loop +/// driver evaluates. `tables[..right_pos]` are the tables in scope on +/// the left of this join; `tables[right_pos]` is the table being joined. +/// +/// - `On` passes its predicate through unchanged. +/// - `Using(cols)` becomes `left.col = right.col` AND-chained over every +/// named column. The left qualifier is the first in-scope table that +/// actually has the column, so the rewrite is correct for join chains +/// (`A JOIN B USING(x) JOIN C USING(x)` resolves both `x`es against +/// `A`). A column missing from either side is an error. +/// - `Natural` discovers the shared column names first (right table's +/// columns that also appear somewhere on the left), then proceeds +/// exactly like `Using`. No shared columns ⇒ an always-true predicate, +/// i.e. a cross product, matching SQLite. +fn resolve_join_constraint( + constraint: &JoinConstraintKind, + tables: &[JoinedTableRef<'_>], + right_pos: usize, +) -> Result { + match constraint { + JoinConstraintKind::On(expr) => Ok(ResolvedJoin { + on: (**expr).clone(), + using_columns: Vec::new(), + }), + JoinConstraintKind::Using(cols) => build_using_join(cols, tables, right_pos), + JoinConstraintKind::Natural => { + // Shared columns = the right table's columns that also exist + // on some left table, preserving the right table's column + // order for determinism. + let shared: Vec = tables[right_pos] + .table + .column_names() + .into_iter() + .filter(|c| { + tables[..right_pos] + .iter() + .any(|t| t.table.contains_column(c.clone())) + }) + .collect(); + build_using_join(&shared, tables, right_pos) + } + } +} + +/// Shared lowering for `USING` and `NATURAL`: synthesize the AND-chain +/// of `left.col = right.col` equalities and report the deduplicated +/// columns. An empty `cols` (a `NATURAL` join with nothing in common) +/// yields an always-true predicate and no dedup, i.e. a cross product. +fn build_using_join( + cols: &[String], + tables: &[JoinedTableRef<'_>], + right_pos: usize, +) -> Result { + let right = &tables[right_pos]; + let mut predicate: Option = None; + for col in cols { + // The named column must exist on the right side … + if !right.table.contains_column(col.clone()) { + return Err(SQLRiteError::Internal(format!( + "cannot join USING column '{col}' — it is not present on table '{}'", + right.scope_name + ))); + } + // … and on at least one left-side table. Qualify the left + // reference with whichever table actually has it. + let left = tables[..right_pos] + .iter() + .find(|t| t.table.contains_column(col.clone())) + .ok_or_else(|| { + SQLRiteError::Internal(format!( + "cannot join USING column '{col}' — it is not present on any left-side table" + )) + })?; + let eq = col_eq(&left.scope_name, &right.scope_name, col); + predicate = Some(match predicate { + None => eq, + Some(prev) => Expr::BinaryOp { + left: Box::new(prev), + op: BinaryOperator::And, + right: Box::new(eq), + }, + }); + } + Ok(ResolvedJoin { + on: predicate + .unwrap_or_else(|| Expr::Value(sqlparser::ast::Value::Boolean(true).with_empty_span())), + using_columns: cols.to_vec(), + }) +} + +/// Build the `left_scope.col = right_scope.col` equality used to lower +/// `USING` / `NATURAL` joins onto the existing `ON` evaluation path. +fn col_eq(left_scope: &str, right_scope: &str, col: &str) -> Expr { + let col_ref = |scope: &str| { + Expr::CompoundIdentifier(vec![ + Ident::new(scope.to_string()), + Ident::new(col.to_string()), + ]) + }; + Expr::BinaryOp { + left: Box::new(col_ref(left_scope)), + op: BinaryOperator::Eq, + right: Box::new(col_ref(right_scope)), + } +} + // ----------------------------------------------------------------- // SQLR-5 — Joined SELECT execution // ----------------------------------------------------------------- @@ -480,6 +596,20 @@ fn execute_select_rows_joined(query: SelectQuery, db: &Database) -> Result = query + .joins + .iter() + .enumerate() + .map(|(j_idx, join)| resolve_join_constraint(&join.constraint, &joined_tables, j_idx + 1)) + .collect::>>()?; + // Validate qualified projection column references against the // table they qualify. Unqualified names are validated by the // first scope lookup at row materialization — the runtime check @@ -495,9 +625,25 @@ fn execute_select_rows_joined(query: SelectQuery, db: &Database) -> Result Result = using + .rows + .iter() + .map(|r| (r[0].to_display_string(), r[1].clone())) + .collect(); + assert_eq!(pairs.len(), 3); + assert_eq!( + using.rows, on.rows, + "USING must mirror the explicit ON rows" + ); + } + + /// `SELECT *` over a USING join shows the joined-on column once + /// (SQLite convention), taking the left side's copy. + #[test] + fn select_star_using_dedups_joined_column() { + let db = seed_join_fixture(); + let r = run_rows(&db, "SELECT * FROM customers INNER JOIN orders USING (id);"); + // Without USING dedup this would be 5 columns (id,name,id, + // customer_id,amount). USING(id) collapses the duplicate `id` + // to one, leaving 4 in source order. + assert_eq!( + r.columns, + vec![ + "id".to_string(), + "name".to_string(), + "customer_id".to_string(), + "amount".to_string(), + ] + ); + assert_eq!(r.rows.len(), 3); + // Each surviving row's single `id` equals both sides' id (they + // were matched on equality), so the left copy is correct. + for row in &r.rows { + assert!(matches!(row[0], Value::Integer(_))); + } + } + + fn seed_natural_fixture() -> Database { + let mut db = Database::new("t".to_string()); + for sql in [ + // Distinct PK names (lid / rid) so the *only* shared columns + // are k1 and k2 — NATURAL must match on both with AND. + "CREATE TABLE l (lid INTEGER PRIMARY KEY, k1 INTEGER, k2 INTEGER, v1 TEXT);", + "CREATE TABLE r (rid INTEGER PRIMARY KEY, k1 INTEGER, k2 INTEGER, v2 TEXT);", + "INSERT INTO l (k1, k2, v1) VALUES (1, 1, 'l-a');", + "INSERT INTO l (k1, k2, v1) VALUES (1, 2, 'l-b');", + "INSERT INTO l (k1, k2, v1) VALUES (2, 1, 'l-c');", + "INSERT INTO r (k1, k2, v2) VALUES (1, 1, 'r-a');", + "INSERT INTO r (k1, k2, v2) VALUES (1, 2, 'r-b');", + "INSERT INTO r (k1, k2, v2) VALUES (9, 9, 'r-z');", + ] { + crate::sql::process_command(sql, &mut db).unwrap(); + } + db + } + + /// NATURAL JOIN auto-discovers the shared columns (k1, k2) and + /// matches on both with AND. + #[test] + fn natural_join_matches_on_all_shared_columns() { + let db = seed_natural_fixture(); + let natural = run_rows(&db, "SELECT v1, v2 FROM l NATURAL JOIN r ORDER BY v1;"); + // (1,1)->l-a/r-a and (1,2)->l-b/r-b match. (2,1) and (9,9) don't. + let pairs: Vec<(String, String)> = natural + .rows + .iter() + .map(|r| (r[0].to_display_string(), r[1].to_display_string())) + .collect(); + assert_eq!( + pairs, + vec![ + ("l-a".to_string(), "r-a".to_string()), + ("l-b".to_string(), "r-b".to_string()), + ] + ); + // Equivalent explicit form yields the same rows. + let explicit = run_rows( + &db, + "SELECT v1, v2 FROM l INNER JOIN r ON l.k1 = r.k1 AND l.k2 = r.k2 ORDER BY v1;", + ); + assert_eq!(natural.rows, explicit.rows); + } + + /// `SELECT *` over a NATURAL join shows each shared column once. + #[test] + fn select_star_natural_dedups_shared_columns() { + let db = seed_natural_fixture(); + let r = run_rows(&db, "SELECT * FROM l NATURAL JOIN r;"); + // Source order with k1,k2 taken from the left only: + // l: lid, k1, k2, v1 ; r: rid, v2 (k1,k2 dropped from r). + assert_eq!( + r.columns, + vec![ + "lid".to_string(), + "k1".to_string(), + "k2".to_string(), + "v1".to_string(), + "rid".to_string(), + "v2".to_string(), + ] + ); + assert_eq!(r.rows.len(), 2); + } + + /// NATURAL JOIN between tables with no shared column names degrades + /// to a cross product, matching SQLite. + #[test] + fn natural_join_without_common_columns_is_cross_product() { let mut db = Database::new("t".to_string()); - crate::sql::process_command("CREATE TABLE a (id INTEGER PRIMARY KEY);", &mut db).unwrap(); - crate::sql::process_command("CREATE TABLE b (id INTEGER PRIMARY KEY);", &mut db).unwrap(); - let err = crate::sql::process_command("SELECT * FROM a INNER JOIN b USING (id);", &mut db); - assert!(err.is_err(), "USING is not yet supported"); + for sql in [ + "CREATE TABLE p (pid INTEGER PRIMARY KEY, pa TEXT);", + "CREATE TABLE q (qid INTEGER PRIMARY KEY, qb TEXT);", + "INSERT INTO p (pa) VALUES ('p1');", + "INSERT INTO p (pa) VALUES ('p2');", + "INSERT INTO q (qb) VALUES ('q1');", + "INSERT INTO q (qb) VALUES ('q2');", + "INSERT INTO q (qb) VALUES ('q3');", + ] { + crate::sql::process_command(sql, &mut db).unwrap(); + } + let r = run_rows(&db, "SELECT p.pa, q.qb FROM p NATURAL JOIN q;"); + assert_eq!(r.rows.len(), 2 * 3, "no shared columns ⇒ cross product"); + } + + /// CROSS JOIN produces the full cartesian product and is equivalent + /// to `INNER JOIN ... ON 1`. + #[test] + fn cross_join_produces_cartesian_product() { + let db = seed_join_fixture(); + let cross = run_rows( + &db, + "SELECT customers.name, orders.amount FROM customers CROSS JOIN orders;", + ); + // 3 customers × 4 orders = 12 rows. + assert_eq!(cross.rows.len(), 12); + let on_true = run_rows( + &db, + "SELECT customers.name, orders.amount FROM customers INNER JOIN orders ON 1;", + ); + assert_eq!(cross.rows.len(), on_true.rows.len()); + // SELECT * over a cross join keeps every column from both sides. + let star = run_rows(&db, "SELECT * FROM customers CROSS JOIN orders;"); + assert_eq!(star.columns.len(), 5); + assert_eq!(star.rows.len(), 12); + } + + /// A LEFT OUTER join expressed with USING still preserves unmatched + /// left rows (NULL-padding the right), and the deduplicated column + /// keeps the left side's value. + #[test] + fn left_outer_join_using_preserves_unmatched_left() { + let db = seed_join_fixture(); + let r = run_rows( + &db, + "SELECT * FROM customers LEFT OUTER JOIN orders USING (id);", + ); + // customers ids 1,2,3 each match an order id; none are unmatched + // here, so confirm the dedup + row count instead. 4 columns, + // 3 matched rows (orders has no id=customer beyond 1..3 overlap). + assert_eq!(r.columns.len(), 4, "id is shown once"); + assert_eq!(r.rows.len(), 3); + } - let err = crate::sql::process_command("SELECT * FROM a NATURAL JOIN b;", &mut db); - assert!(err.is_err(), "NATURAL is not supported"); + /// USING a column that doesn't exist on one of the sides is a clean + /// error, not a silent empty result. + #[test] + fn using_unknown_column_errors() { + let db = seed_join_fixture(); + let q = parse_select("SELECT * FROM customers INNER JOIN orders USING (nope);"); + let res = execute_select_rows(q, &db); + assert!(res.is_err(), "USING (nope) must error — column absent"); } #[test] diff --git a/src/sql/parser/select.rs b/src/sql/parser/select.rs index 988a16d..23d5c4a 100644 --- a/src/sql/parser/select.rs +++ b/src/sql/parser/select.rs @@ -1,7 +1,7 @@ use sqlparser::ast::{ DuplicateTreatment, Expr, FunctionArg, FunctionArgExpr, FunctionArguments, JoinConstraint, - JoinOperator, LimitClause, OrderByKind, Query, Select, SelectItem, SetExpr, Statement, - TableFactor, TableWithJoins, + JoinOperator, LimitClause, ObjectName, ObjectNamePart, OrderByKind, Query, Select, SelectItem, + SetExpr, Statement, TableFactor, TableWithJoins, Value, }; use crate::error::{Result, SQLRiteError}; @@ -162,10 +162,38 @@ impl JoinType { } } +/// How a JOIN matches rows. SQLR-5 originally shipped `ON` only; the +/// USING / NATURAL increment adds the two name-based constraints. +/// `ON` carries its predicate straight from the parser. `USING` and +/// `NATURAL` defer their equality synthesis to the executor because +/// they need table schemas (which column names exist, and — for +/// `NATURAL` — which are shared) that the parser doesn't have. The +/// executor turns both into the same `left.col = right.col [AND …]` +/// predicate the `ON` path already evaluates. `CROSS JOIN` is rewritten +/// to `ON true` at parse time (no schema needed) and so reuses the +/// `On` variant directly. +#[derive(Debug, Clone)] +pub enum JoinConstraintKind { + /// `ON ` (and the parse-time rewrite of `CROSS JOIN` to + /// `ON true`). Evaluated per-row over the multi-table scope. Boxed + /// to keep this enum small — `Expr` dwarfs the other variants. + On(Box), + /// `USING (col[, col…])` — equality on each named column, plus the + /// SQLite convention that each named column appears once in + /// `SELECT *`. Columns are validated and the predicate is + /// synthesized at execution time. + Using(Vec), + /// `NATURAL` — the shared column names of the two sides are + /// discovered at execution time, then treated exactly like + /// `USING ()`. No shared columns ⇒ a cross product. + Natural, +} + /// One JOIN clause from the FROM list. Multi-join queries /// (`A JOIN B ... JOIN C ...`) become a `Vec` evaluated -/// left-to-right against the accumulator. v1 requires an ON condition; -/// USING / NATURAL / CROSS are deferred. +/// left-to-right against the accumulator. The match condition is one +/// of `ON` / `USING` / `NATURAL` (see [`JoinConstraintKind`]); +/// `CROSS JOIN` arrives here already rewritten to `ON true`. #[derive(Debug, Clone)] pub struct JoinClause { pub join_type: JoinType, @@ -174,9 +202,8 @@ pub struct JoinClause { /// from `right_table` so the executor can normalize on /// `alias.unwrap_or(right_table)` for qualifier matching. pub right_alias: Option, - /// `ON ` — required. Evaluated per-row by the executor over - /// the multi-table scope. - pub on: Expr, + /// What the join matches on. See [`JoinConstraintKind`]. + pub constraint: JoinConstraintKind, } /// A parsed, simplified SELECT query. @@ -342,11 +369,11 @@ impl SelectQuery { } /// Pull the leading FROM table (with optional alias) and any JOIN -/// clauses out of the parsed FROM list. v1 supports a single base -/// table plus zero or more INNER / LEFT / RIGHT / FULL OUTER joins -/// with explicit `ON` conditions. Comma-separated FROM lists, -/// USING / NATURAL constraints, and CROSS / SEMI / ANTI / ASOF joins -/// surface as `NotImplemented`. +/// clauses out of the parsed FROM list. Supports a single base table +/// plus zero or more INNER / LEFT / RIGHT / FULL OUTER joins with an +/// `ON`, `USING (...)`, or `NATURAL` constraint, and `CROSS JOIN` +/// (rewritten to `INNER ... ON true`). Comma-separated FROM lists and +/// SEMI / ANTI / ASOF / APPLY joins surface as `NotImplemented`. fn extract_from_clause( from: &[TableWithJoins], ) -> Result<(String, Option, Vec)> { @@ -366,20 +393,28 @@ fn extract_from_clause( let mut joins = Vec::with_capacity(twj.joins.len()); for j in &twj.joins { let (right_table, right_alias) = extract_table_factor(&j.relation)?; - let (join_type, on_expr) = match &j.join_operator { + let (join_type, constraint) = match &j.join_operator { // Bare `JOIN` defaults to INNER per SQL standard. - JoinOperator::Join(c) | JoinOperator::Inner(c) => (JoinType::Inner, parse_on(c)?), + JoinOperator::Join(c) | JoinOperator::Inner(c) => { + (JoinType::Inner, convert_constraint(c)?) + } JoinOperator::Left(c) | JoinOperator::LeftOuter(c) => { - (JoinType::LeftOuter, parse_on(c)?) + (JoinType::LeftOuter, convert_constraint(c)?) } JoinOperator::Right(c) | JoinOperator::RightOuter(c) => { - (JoinType::RightOuter, parse_on(c)?) + (JoinType::RightOuter, convert_constraint(c)?) } - JoinOperator::FullOuter(c) => (JoinType::FullOuter, parse_on(c)?), + JoinOperator::FullOuter(c) => (JoinType::FullOuter, convert_constraint(c)?), + // `CROSS JOIN` is the cross product: INNER with an always-true + // ON. A constraint on a CROSS JOIN is non-standard, but if the + // parser handed us `USING` / `NATURAL` / `ON` we honor it + // rather than silently dropping it. + JoinOperator::CrossJoin(c) => (JoinType::Inner, convert_cross_constraint(c)?), other => { return Err(SQLRiteError::NotImplemented(format!( "join flavor {other:?} is not supported \ - (only INNER / LEFT OUTER / RIGHT OUTER / FULL OUTER with ON)" + (only INNER / LEFT OUTER / RIGHT OUTER / FULL OUTER / CROSS, \ + with ON / USING / NATURAL)" ))); } }; @@ -387,7 +422,7 @@ fn extract_from_clause( join_type, right_table, right_alias, - on: on_expr, + constraint, }); } @@ -417,21 +452,61 @@ fn extract_table_factor(tf: &TableFactor) -> Result<(String, Option)> { } } -fn parse_on(constraint: &JoinConstraint) -> Result { +/// Lower a `sqlparser` join constraint into our [`JoinConstraintKind`]. +/// `ON` passes through; `USING` is narrowed to a list of bare column +/// names; `NATURAL` defers to the executor. A constraint-less join +/// (`A JOIN B` with no `ON` / `USING`) is rejected — `CROSS JOIN` is +/// the supported way to ask for a cross product and is handled by +/// [`convert_cross_constraint`]. +fn convert_constraint(constraint: &JoinConstraint) -> Result { match constraint { - JoinConstraint::On(expr) => Ok(expr.clone()), - JoinConstraint::Using(_) => Err(SQLRiteError::NotImplemented( - "JOIN ... USING (...) is not supported yet — use JOIN ... ON instead".to_string(), - )), - JoinConstraint::Natural => Err(SQLRiteError::NotImplemented( - "NATURAL JOIN is not supported".to_string(), - )), + JoinConstraint::On(expr) => Ok(JoinConstraintKind::On(Box::new(expr.clone()))), + JoinConstraint::Using(cols) => { + let names = cols + .iter() + .map(extract_using_column) + .collect::>>()?; + Ok(JoinConstraintKind::Using(names)) + } + JoinConstraint::Natural => Ok(JoinConstraintKind::Natural), JoinConstraint::None => Err(SQLRiteError::NotImplemented( - "JOIN without an ON condition is not supported (use INNER JOIN ... ON ...)".to_string(), + "JOIN without an ON / USING / NATURAL condition is not supported \ + (use `... ON ...`, `... USING (...)`, `NATURAL JOIN`, or `CROSS JOIN`)" + .to_string(), )), } } +/// Constraint handling for `CROSS JOIN`. The standard form carries no +/// constraint and means "cross product", which we express as `ON true` +/// so it flows through the same executor path as any other join. +fn convert_cross_constraint(constraint: &JoinConstraint) -> Result { + match constraint { + JoinConstraint::None => Ok(JoinConstraintKind::On(Box::new(true_literal()))), + // Non-standard, but if a constraint was attached to a CROSS JOIN, + // honor it instead of dropping it on the floor. + other => convert_constraint(other), + } +} + +/// Pull a bare column name out of a `USING (...)` entry. `USING` +/// columns are always simple identifiers; anything qualified or +/// multi-part is rejected. +fn extract_using_column(name: &ObjectName) -> Result { + match name.0.as_slice() { + [ObjectNamePart::Identifier(ident)] => Ok(ident.value.clone()), + _ => Err(SQLRiteError::NotImplemented(format!( + "USING column must be a simple column name, got {name}" + ))), + } +} + +/// An always-true boolean literal expression, used to rewrite +/// `CROSS JOIN` into `INNER JOIN ... ON true`. +fn true_literal() -> Expr { + Expr::Value(Value::Boolean(true).with_empty_span()) +} + fn parse_projection(items: &[SelectItem]) -> Result { // Special-case `SELECT *`. if items.len() == 1 diff --git a/web/src/app/docs/page.tsx b/web/src/app/docs/page.tsx index d50af70..0511dda 100644 --- a/web/src/app/docs/page.tsx +++ b/web/src/app/docs/page.tsx @@ -310,10 +310,11 @@ export default function DocsPage() { The executor uses a plain nested-loop driver — adequate for an embedded learning database. Hash / merge joins on equi-join shapes are a future optimization.{" "} - CROSS JOIN, comma-FROMs, and{" "} - NATURAL JOIN /{" "} - JOIN ... USING (col) are not supported yet — write{" "} - INNER JOIN ... ON true instead. Aggregates /{" "} + ON, USING (col), NATURAL, and{" "} + CROSS JOIN are all supported (a USING /{" "} + NATURAL column shows once in SELECT *). + Comma-separated FROMs (FROM a, b) are not — use an + explicit JOIN / CROSS JOIN. Aggregates /{" "} GROUP BY over a join lands once subqueries do.