diff --git a/README.md b/README.md
index 2cdc1a2..b25777a 100644
--- a/README.md
+++ b/README.md
@@ -175,7 +175,7 @@ sqlrite> DELETE FROM users WHERE age < 30;
| `CREATE TABLE` | `PRIMARY KEY`, `UNIQUE`, `NOT NULL`; `IF NOT EXISTS` (idempotent re-create); duplicate-column detection; types `INTEGER`/`INT`/`BIGINT`/`SMALLINT`, `TEXT`/`VARCHAR`, `REAL`/`FLOAT`/`DOUBLE`/`DECIMAL`, `BOOLEAN`. Auto-creates `sqlrite_autoindex_
_` for every PK + UNIQUE column |
| `CREATE [UNIQUE] INDEX` | Single-column, named indexes; `IF NOT EXISTS`; persists as a dedicated cell-based B-Tree. INTEGER + TEXT columns only |
| `INSERT INTO` | Explicit column list required; auto-ROWID for `INTEGER PRIMARY KEY`; multi-row `VALUES (…), (…)`; UNIQUE enforcement; clean type errors (no panics); NULL padding for omitted columns |
-| `SELECT` | `*` or column list with optional `AS alias`; `WHERE`; `DISTINCT`; `GROUP BY col[, col …]`; aggregate projections `COUNT(*)` / `COUNT([DISTINCT] col)` / `SUM` / `AVG` / `MIN` / `MAX`; `[INNER\|LEFT OUTER\|RIGHT OUTER\|FULL OUTER] JOIN ... ON ...` with table aliases and qualified `t.col` references; single-column `ORDER BY [ASC\|DESC]` (also resolves alias and aggregate display names); `LIMIT n`. `WHERE col = literal` probes an index when one exists. Catalog introspection via `SELECT … FROM sqlrite_master` |
+| `SELECT` | `*` or column list with optional `AS alias`; `WHERE`; `DISTINCT`; `GROUP BY col[, col …]`; aggregate projections `COUNT(*)` / `COUNT([DISTINCT] col)` / `SUM` / `AVG` / `MIN` / `MAX`; `[INNER\|LEFT OUTER\|RIGHT OUTER\|FULL OUTER\|CROSS] JOIN` with `ON ...` / `USING (...)` / `NATURAL` constraints, table aliases and qualified `t.col` references; single-column `ORDER BY [ASC\|DESC]` (also resolves alias and aggregate display names); `LIMIT n`. `WHERE col = literal` probes an index when one exists. Catalog introspection via `SELECT … FROM sqlrite_master` |
| `UPDATE` | Multi-column `SET`; `WHERE`; UNIQUE + type enforcement; arithmetic in assignments (`SET age = age + 1`) |
| `DELETE` | `WHERE` predicate or full-table delete |
| `BEGIN` / `COMMIT` / `ROLLBACK` | Real transactions, snapshot-based; WAL-backed commit; single-level (no savepoints); auto-rollback if `COMMIT`'s disk write fails |
@@ -193,7 +193,7 @@ Expressions in `WHERE` and `UPDATE`'s `SET` RHS:
- String concat — `||`
- Literals — integer + real numbers, `'single-quoted strings'`, `TRUE` / `FALSE`, `NULL`; parentheses for grouping
-**Not yet supported** (common ones): subqueries, CTEs, `HAVING`, `LIKE … ESCAPE ''`, `IN (subquery)`, `DISTINCT` on `SUM`/`AVG`/`MIN`/`MAX`, GROUP BY on expressions, expressions in the projection list, `OFFSET`, multi-column `ORDER BY`, savepoints, `JOIN ... USING`, `NATURAL JOIN`, `CROSS JOIN`, comma joins, aggregates / DISTINCT / GROUP BY *over* JOIN results. The [full list with context](docs/supported-sql.md#not-yet-supported) lives in the reference.
+**Not yet supported** (common ones): subqueries, CTEs, `HAVING`, `LIKE … ESCAPE ''`, `IN (subquery)`, `DISTINCT` on `SUM`/`AVG`/`MIN`/`MAX`, GROUP BY on expressions, expressions in the projection list, `OFFSET`, multi-column `ORDER BY`, savepoints, comma joins (`FROM a, b`), aggregates / DISTINCT / GROUP BY *over* JOIN results. The [full list with context](docs/supported-sql.md#not-yet-supported) lives in the reference.
#### Meta commands
@@ -250,7 +250,7 @@ The project is staged in phases, each independently shippable. A finished phase
- [x] `CREATE TABLE` with `PRIMARY KEY`, `UNIQUE`, `NOT NULL`; duplicate-column detection; in-memory `BTreeMap` indexes on PK/UNIQUE columns
- [x] `INSERT` with auto-ROWID for `INTEGER PRIMARY KEY`, UNIQUE enforcement, NULL padding for missing columns
- [x] `SELECT` — projection, `WHERE`, `ORDER BY`, `LIMIT`
-- [x] `JOIN` — `INNER`, `LEFT OUTER`, `RIGHT OUTER`, `FULL OUTER` with `ON` (SQLR-5)
+- [x] `JOIN` — `INNER`, `LEFT OUTER`, `RIGHT OUTER`, `FULL OUTER`, `CROSS` with `ON` / `USING (...)` / `NATURAL` (SQLR-5)
- [x] `UPDATE ... SET ... WHERE ...` with type + UNIQUE enforcement at write time
- [x] `DELETE ... WHERE ...`
- [x] Expression evaluator: `=`/`<>`/`<`/`<=`/`>`/`>=`, `AND`/`OR`/`NOT`, arithmetic `+`/`-`/`*`/`/`/`%`, string concat `||`, NULL-as-false in `WHERE`
@@ -332,7 +332,6 @@ Lockstep versioning — one dispatch bumps every product to the same `vX.Y.Z`. T
- [ ] *(deferred to Phase 8)* Full-text search with BM25 + hybrid retrieval
**Possible extras** *(no committed phase)*
-- Joins (`INNER`, `LEFT OUTER`, `CROSS` — SQLite does not support `RIGHT`/`FULL OUTER`)
- `HAVING`, `IN (subquery)`, `BETWEEN`, `GLOB` / `REGEXP`, `GROUP_CONCAT`, window functions
- Composite and expression indexes (with cost analysis)
- Alternate storage engines — LSM/SSTable for write-heavy workloads alongside the B-Tree
diff --git a/docs/roadmap.md b/docs/roadmap.md
index 6bbd9a5..8c95d6b 100644
--- a/docs/roadmap.md
+++ b/docs/roadmap.md
@@ -561,7 +561,7 @@ The biggest single SQL-surface jump in the project's history.
- Self-joins require an alias on at least one side.
- `WHERE` runs after joins (the standard `LEFT JOIN ... WHERE right.col IS NULL` anti-join idiom works).
-Not yet supported: `CROSS JOIN`, comma-separated FROMs, `NATURAL JOIN`, `JOIN ... USING (col)`, aggregates / `GROUP BY` / `DISTINCT` *over* a join, `fts_match` / `bm25_score` inside a join expression. Algorithm: plain nested-loop, O(N×M) per level — hash / merge joins are a future optimization.
+`ON`, `USING (...)`, `NATURAL`, and `CROSS JOIN` are all supported. Not yet supported: comma-separated FROMs (`FROM a, b`), aggregates / `GROUP BY` / `DISTINCT` *over* a join, `fts_match` / `bm25_score` inside a join expression. Algorithm: plain nested-loop, O(N×M) per level — hash / merge joins are a future optimization.
### ✅ Phase 9g — Prepared statements + parameter binding *(v0.9.0, SQLR-23)*
diff --git a/docs/sql-engine.md b/docs/sql-engine.md
index d534893..72dc8b8 100644
--- a/docs/sql-engine.md
+++ b/docs/sql-engine.md
@@ -53,7 +53,7 @@ The `sqlparser` AST is designed to cover every SQL dialect, so its types are hug
`SelectQuery::joins` (SQLR-5) is a `Vec` evaluated left-to-right by `execute_select_rows_joined`. Each clause carries a `JoinType` (`Inner` / `LeftOuter` / `RightOuter` / `FullOuter`), the right-table name + optional alias, and a required `ON` expression. Empty = single-table SELECT, the existing fast path with HNSW / FTS / bounded-heap optimizations.
-Each parser module still rejects features we don't implement with `SQLRiteError::NotImplemented` — `JOIN ... USING`, `NATURAL JOIN`, `CROSS JOIN`, comma joins, aggregates / GROUP BY / DISTINCT over JOINs, `HAVING`, `DISTINCT ON (...)`, `GROUP BY` on expressions, `LIKE … ESCAPE ''`, `IN (subquery)`, `OFFSET`, multi-table DELETE, tuple assignment targets, etc. These errors carry the feature name in the message so the user knows what isn't there.
+Each parser module still rejects features we don't implement with `SQLRiteError::NotImplemented` — comma joins (`FROM a, b`), aggregates / GROUP BY / DISTINCT over JOINs, `HAVING`, `DISTINCT ON (...)`, `GROUP BY` on expressions, `LIKE … ESCAPE ''`, `IN (subquery)`, `OFFSET`, multi-table DELETE, tuple assignment targets, etc. These errors carry the feature name in the message so the user knows what isn't there. (`JOIN ... USING`, `NATURAL JOIN`, and `CROSS JOIN` are now supported — see [`supported-sql.md`](supported-sql.md#join-semantics-sqlr-5).)
## Statement dispatch
diff --git a/docs/supported-sql.md b/docs/supported-sql.md
index 8f1a9b5..abbd4a7 100644
--- a/docs/supported-sql.md
+++ b/docs/supported-sql.md
@@ -210,19 +210,27 @@ COUNT([DISTINCT] ) -- counts non-NULL values, option
### `JOIN` semantics (SQLR-5)
-Four flavors are supported, all with explicit `ON` conditions:
+Four flavors are supported, with `ON`, `USING (...)`, or `NATURAL` match
+conditions, plus `CROSS JOIN`:
| Flavor | Keeps unmatched rows from… |
|---|---|
-| `INNER JOIN` | …neither side. Only ON-matched pairs survive. |
+| `INNER JOIN` | …neither side. Only matched pairs survive. |
| `LEFT [OUTER] JOIN` | …the left side; right-side columns become `NULL` for unmatched left rows. |
| `RIGHT [OUTER] JOIN` | …the right side; left-side columns become `NULL` for unmatched right rows. |
| `FULL [OUTER] JOIN` | …both sides, NULL-padded on the unmatched side. |
+| `CROSS JOIN` | …both sides (cross product — every left row paired with every right row). |
- **Engine choice:** SQLite ships only `INNER` and `LEFT OUTER`. SQLRite implements all four because the per-flavor differences boil down to NULL-padding policy on top of one shared nested-loop driver — adding `RIGHT` / `FULL` was effectively free once the executor had a multi-table scope. See [`docs/design-decisions.md`](design-decisions.md) for the rationale.
+- **Match conditions:**
+ - **`ON `** — any boolean expression over the in-scope tables.
+ - **`USING (col[, col…])`** — shorthand for `left.col = right.col` AND-chained over each named column. The column must exist on the right side and on some left-side table; in a chain (`A JOIN B USING(x) JOIN C USING(x)`) each `x` resolves against the first left table that has it.
+ - **`NATURAL`** — equivalent to `USING ()`, discovered automatically from the schemas. If the sides share no column names, a `NATURAL JOIN` degrades to a cross product (matching SQLite). Combines with a flavor: `NATURAL LEFT JOIN`.
+ - **`CROSS JOIN`** — the cross product; the engine treats it as `INNER JOIN ... ON true`.
+- **`SELECT *` with `USING` / `NATURAL`:** each joined-on column appears **once** (SQLite convention), taking the left side's value; the right side's duplicate is omitted. Plain `ON` joins keep both copies.
- **Aliases:** `FROM customers AS c INNER JOIN orders AS o ON c.id = o.customer_id`. When an alias is supplied the original table name leaves scope (SQL standard) — qualifier resolution uses the alias.
- **Qualified column references:** `.` and `.` resolve to that specific side. Bare `` references must resolve to exactly one in-scope table; ambiguous references error with a "qualify it as `.col`" hint.
-- **Output of `SELECT *`** over a join is every column of every in-scope table, in source order. Duplicate header names are permitted (SQLite-style). Disambiguate with explicit `SELECT t.col AS t_col, u.col AS u_col`.
+- **Output of `SELECT *`** over a join is every column of every in-scope table, in source order (minus `USING` / `NATURAL` duplicates, see above). Duplicate header names are otherwise permitted (SQLite-style). Disambiguate with explicit `SELECT t.col AS t_col, u.col AS u_col`.
- **Multi-join** chains left-fold: `A JOIN B ON ... JOIN C ON ...` evaluates as `(A ⨝ B) ⨝ C`. Each new clause sees every prior alias / table in its `ON` expression.
- **Self-joins** require an alias on at least one side: `FROM nodes AS p INNER JOIN nodes AS c ON p.id = c.parent_id`. Without one, you get a `duplicate table reference` error so qualifiers stay unambiguous.
- **`WHERE` runs after joins.** A `WHERE right.col IS NULL` filter on a `LEFT JOIN` correctly returns left rows with no match (the standard "anti-join via outer-join" idiom).
@@ -231,8 +239,7 @@ Four flavors are supported, all with explicit `ON` conditions:
#### What's not supported in JOINs
-- `JOIN ... USING (col)` and `NATURAL JOIN` — explicit `ON` only. (Both are deferred — `USING` is straightforward but adds a column-resolution rule we haven't needed yet.)
-- `CROSS JOIN` (write `INNER JOIN ... ON true` instead) and comma-separated FROM lists.
+- Comma-separated FROM lists (`FROM a, b`) — use an explicit `JOIN` / `CROSS JOIN` instead.
- Aggregates / `GROUP BY` / `DISTINCT` *over* a join. The single-table aggregator is wired against one rowid stream; rewiring it for joined rows is a separate increment. Surfaces as a clean `NotImplemented` at parse time.
- `fts_match` / `bm25_score` inside a JOIN expression. They need to look up an FTS index by column, which is single-table-bound today. Use them on a single-table SELECT first, or fold the FTS lookup into the FROM side.
@@ -255,7 +262,7 @@ The executor includes a tiny optimizer: if the `WHERE` is exactly `
### What doesn't work
-- **`CROSS JOIN`**, **comma-separated FROM lists**, **`NATURAL JOIN`**, **`JOIN ... USING (col)`** — explicit `INNER` / `LEFT` / `RIGHT` / `FULL OUTER JOIN ... ON ...` only (see [JOIN semantics](#join-semantics-sqlr-5))
+- **Comma-separated FROM lists** (`FROM a, b`) — use an explicit `JOIN` / `CROSS JOIN`. `INNER` / `LEFT` / `RIGHT` / `FULL OUTER` / `CROSS` with `ON` / `USING` / `NATURAL` are all supported (see [JOIN semantics](#join-semantics-sqlr-5))
- **Aggregates** / **`GROUP BY`** / **`DISTINCT`** over a JOIN — pipe through a subquery once subqueries land
- **Subqueries**, CTEs (`WITH`), views
- **`HAVING`** — pre-aggregation `WHERE` works; post-aggregation filtering does not yet
@@ -700,7 +707,7 @@ A REPL launched with `sqlrite --readonly foo.sqlrite` (or `sqlrite::open_databas
For context when you hit `NotImplemented`. See [Roadmap](roadmap.md) for when these land:
### Joins & composition
-- `CROSS JOIN`, comma joins, `NATURAL JOIN`, `JOIN ... USING` — explicit `INNER` / `LEFT` / `RIGHT` / `FULL OUTER JOIN ... ON ...` works (SQLR-5); the others don't
+- `INNER` / `LEFT` / `RIGHT` / `FULL OUTER` / `CROSS JOIN` with `ON` / `USING (...)` / `NATURAL` all work (SQLR-5). Comma-separated FROM joins (`FROM a, b`) don't — use an explicit `JOIN` / `CROSS JOIN`
- Aggregates / `GROUP BY` / `DISTINCT` *over* a JOIN — pipe through a subquery once subqueries land
- `fts_match` / `bm25_score` inside a JOIN expression — single-table-bound today
- Subqueries (scalar, `IN (SELECT ...)`, correlated)
diff --git a/src/sql/executor.rs b/src/sql/executor.rs
index 99ccac3..6a65ab6 100644
--- a/src/sql/executor.rs
+++ b/src/sql/executor.rs
@@ -6,7 +6,7 @@ use std::cmp::Ordering;
use prettytable::{Cell as PrintCell, Row as PrintRow, Table as PrintTable};
use sqlparser::ast::{
AlterTable, AlterTableOperation, AssignmentTarget, BinaryOperator, CreateIndex, Delete, Expr,
- FromTable, FunctionArg, FunctionArgExpr, FunctionArguments, IndexType, ObjectName,
+ FromTable, FunctionArg, FunctionArgExpr, FunctionArguments, Ident, IndexType, ObjectName,
ObjectNamePart, RenameTableNameKind, Statement, TableFactor, TableWithJoins, UnaryOperator,
Update, Value as AstValue,
};
@@ -21,7 +21,8 @@ use crate::sql::db::table::{
use crate::sql::fts::{Bm25Params, PostingList};
use crate::sql::hnsw::{DistanceMetric, HnswIndex};
use crate::sql::parser::select::{
- AggregateArg, JoinType, OrderByClause, Projection, ProjectionItem, ProjectionKind, SelectQuery,
+ AggregateArg, JoinConstraintKind, JoinType, OrderByClause, Projection, ProjectionItem,
+ ProjectionKind, SelectQuery,
};
// -----------------------------------------------------------------
@@ -405,6 +406,121 @@ pub fn execute_select_rows(query: SelectQuery, db: &Database) -> Result,
+}
+
+/// Turn a [`JoinConstraintKind`] into the `ON` predicate the nested-loop
+/// driver evaluates. `tables[..right_pos]` are the tables in scope on
+/// the left of this join; `tables[right_pos]` is the table being joined.
+///
+/// - `On` passes its predicate through unchanged.
+/// - `Using(cols)` becomes `left.col = right.col` AND-chained over every
+/// named column. The left qualifier is the first in-scope table that
+/// actually has the column, so the rewrite is correct for join chains
+/// (`A JOIN B USING(x) JOIN C USING(x)` resolves both `x`es against
+/// `A`). A column missing from either side is an error.
+/// - `Natural` discovers the shared column names first (right table's
+/// columns that also appear somewhere on the left), then proceeds
+/// exactly like `Using`. No shared columns ⇒ an always-true predicate,
+/// i.e. a cross product, matching SQLite.
+fn resolve_join_constraint(
+ constraint: &JoinConstraintKind,
+ tables: &[JoinedTableRef<'_>],
+ right_pos: usize,
+) -> Result {
+ match constraint {
+ JoinConstraintKind::On(expr) => Ok(ResolvedJoin {
+ on: (**expr).clone(),
+ using_columns: Vec::new(),
+ }),
+ JoinConstraintKind::Using(cols) => build_using_join(cols, tables, right_pos),
+ JoinConstraintKind::Natural => {
+ // Shared columns = the right table's columns that also exist
+ // on some left table, preserving the right table's column
+ // order for determinism.
+ let shared: Vec = tables[right_pos]
+ .table
+ .column_names()
+ .into_iter()
+ .filter(|c| {
+ tables[..right_pos]
+ .iter()
+ .any(|t| t.table.contains_column(c.clone()))
+ })
+ .collect();
+ build_using_join(&shared, tables, right_pos)
+ }
+ }
+}
+
+/// Shared lowering for `USING` and `NATURAL`: synthesize the AND-chain
+/// of `left.col = right.col` equalities and report the deduplicated
+/// columns. An empty `cols` (a `NATURAL` join with nothing in common)
+/// yields an always-true predicate and no dedup, i.e. a cross product.
+fn build_using_join(
+ cols: &[String],
+ tables: &[JoinedTableRef<'_>],
+ right_pos: usize,
+) -> Result {
+ let right = &tables[right_pos];
+ let mut predicate: Option = None;
+ for col in cols {
+ // The named column must exist on the right side …
+ if !right.table.contains_column(col.clone()) {
+ return Err(SQLRiteError::Internal(format!(
+ "cannot join USING column '{col}' — it is not present on table '{}'",
+ right.scope_name
+ )));
+ }
+ // … and on at least one left-side table. Qualify the left
+ // reference with whichever table actually has it.
+ let left = tables[..right_pos]
+ .iter()
+ .find(|t| t.table.contains_column(col.clone()))
+ .ok_or_else(|| {
+ SQLRiteError::Internal(format!(
+ "cannot join USING column '{col}' — it is not present on any left-side table"
+ ))
+ })?;
+ let eq = col_eq(&left.scope_name, &right.scope_name, col);
+ predicate = Some(match predicate {
+ None => eq,
+ Some(prev) => Expr::BinaryOp {
+ left: Box::new(prev),
+ op: BinaryOperator::And,
+ right: Box::new(eq),
+ },
+ });
+ }
+ Ok(ResolvedJoin {
+ on: predicate
+ .unwrap_or_else(|| Expr::Value(sqlparser::ast::Value::Boolean(true).with_empty_span())),
+ using_columns: cols.to_vec(),
+ })
+}
+
+/// Build the `left_scope.col = right_scope.col` equality used to lower
+/// `USING` / `NATURAL` joins onto the existing `ON` evaluation path.
+fn col_eq(left_scope: &str, right_scope: &str, col: &str) -> Expr {
+ let col_ref = |scope: &str| {
+ Expr::CompoundIdentifier(vec![
+ Ident::new(scope.to_string()),
+ Ident::new(col.to_string()),
+ ])
+ };
+ Expr::BinaryOp {
+ left: Box::new(col_ref(left_scope)),
+ op: BinaryOperator::Eq,
+ right: Box::new(col_ref(right_scope)),
+ }
+}
+
// -----------------------------------------------------------------
// SQLR-5 — Joined SELECT execution
// -----------------------------------------------------------------
@@ -480,6 +596,20 @@ fn execute_select_rows_joined(query: SelectQuery, db: &Database) -> Result = query
+ .joins
+ .iter()
+ .enumerate()
+ .map(|(j_idx, join)| resolve_join_constraint(&join.constraint, &joined_tables, j_idx + 1))
+ .collect::>>()?;
+
// Validate qualified projection column references against the
// table they qualify. Unqualified names are validated by the
// first scope lookup at row materialization — the runtime check
@@ -495,9 +625,25 @@ fn execute_select_rows_joined(query: SelectQuery, db: &Database) -> Result Result = using
+ .rows
+ .iter()
+ .map(|r| (r[0].to_display_string(), r[1].clone()))
+ .collect();
+ assert_eq!(pairs.len(), 3);
+ assert_eq!(
+ using.rows, on.rows,
+ "USING must mirror the explicit ON rows"
+ );
+ }
+
+ /// `SELECT *` over a USING join shows the joined-on column once
+ /// (SQLite convention), taking the left side's copy.
+ #[test]
+ fn select_star_using_dedups_joined_column() {
+ let db = seed_join_fixture();
+ let r = run_rows(&db, "SELECT * FROM customers INNER JOIN orders USING (id);");
+ // Without USING dedup this would be 5 columns (id,name,id,
+ // customer_id,amount). USING(id) collapses the duplicate `id`
+ // to one, leaving 4 in source order.
+ assert_eq!(
+ r.columns,
+ vec![
+ "id".to_string(),
+ "name".to_string(),
+ "customer_id".to_string(),
+ "amount".to_string(),
+ ]
+ );
+ assert_eq!(r.rows.len(), 3);
+ // Each surviving row's single `id` equals both sides' id (they
+ // were matched on equality), so the left copy is correct.
+ for row in &r.rows {
+ assert!(matches!(row[0], Value::Integer(_)));
+ }
+ }
+
+ fn seed_natural_fixture() -> Database {
+ let mut db = Database::new("t".to_string());
+ for sql in [
+ // Distinct PK names (lid / rid) so the *only* shared columns
+ // are k1 and k2 — NATURAL must match on both with AND.
+ "CREATE TABLE l (lid INTEGER PRIMARY KEY, k1 INTEGER, k2 INTEGER, v1 TEXT);",
+ "CREATE TABLE r (rid INTEGER PRIMARY KEY, k1 INTEGER, k2 INTEGER, v2 TEXT);",
+ "INSERT INTO l (k1, k2, v1) VALUES (1, 1, 'l-a');",
+ "INSERT INTO l (k1, k2, v1) VALUES (1, 2, 'l-b');",
+ "INSERT INTO l (k1, k2, v1) VALUES (2, 1, 'l-c');",
+ "INSERT INTO r (k1, k2, v2) VALUES (1, 1, 'r-a');",
+ "INSERT INTO r (k1, k2, v2) VALUES (1, 2, 'r-b');",
+ "INSERT INTO r (k1, k2, v2) VALUES (9, 9, 'r-z');",
+ ] {
+ crate::sql::process_command(sql, &mut db).unwrap();
+ }
+ db
+ }
+
+ /// NATURAL JOIN auto-discovers the shared columns (k1, k2) and
+ /// matches on both with AND.
+ #[test]
+ fn natural_join_matches_on_all_shared_columns() {
+ let db = seed_natural_fixture();
+ let natural = run_rows(&db, "SELECT v1, v2 FROM l NATURAL JOIN r ORDER BY v1;");
+ // (1,1)->l-a/r-a and (1,2)->l-b/r-b match. (2,1) and (9,9) don't.
+ let pairs: Vec<(String, String)> = natural
+ .rows
+ .iter()
+ .map(|r| (r[0].to_display_string(), r[1].to_display_string()))
+ .collect();
+ assert_eq!(
+ pairs,
+ vec![
+ ("l-a".to_string(), "r-a".to_string()),
+ ("l-b".to_string(), "r-b".to_string()),
+ ]
+ );
+ // Equivalent explicit form yields the same rows.
+ let explicit = run_rows(
+ &db,
+ "SELECT v1, v2 FROM l INNER JOIN r ON l.k1 = r.k1 AND l.k2 = r.k2 ORDER BY v1;",
+ );
+ assert_eq!(natural.rows, explicit.rows);
+ }
+
+ /// `SELECT *` over a NATURAL join shows each shared column once.
+ #[test]
+ fn select_star_natural_dedups_shared_columns() {
+ let db = seed_natural_fixture();
+ let r = run_rows(&db, "SELECT * FROM l NATURAL JOIN r;");
+ // Source order with k1,k2 taken from the left only:
+ // l: lid, k1, k2, v1 ; r: rid, v2 (k1,k2 dropped from r).
+ assert_eq!(
+ r.columns,
+ vec![
+ "lid".to_string(),
+ "k1".to_string(),
+ "k2".to_string(),
+ "v1".to_string(),
+ "rid".to_string(),
+ "v2".to_string(),
+ ]
+ );
+ assert_eq!(r.rows.len(), 2);
+ }
+
+ /// NATURAL JOIN between tables with no shared column names degrades
+ /// to a cross product, matching SQLite.
+ #[test]
+ fn natural_join_without_common_columns_is_cross_product() {
let mut db = Database::new("t".to_string());
- crate::sql::process_command("CREATE TABLE a (id INTEGER PRIMARY KEY);", &mut db).unwrap();
- crate::sql::process_command("CREATE TABLE b (id INTEGER PRIMARY KEY);", &mut db).unwrap();
- let err = crate::sql::process_command("SELECT * FROM a INNER JOIN b USING (id);", &mut db);
- assert!(err.is_err(), "USING is not yet supported");
+ for sql in [
+ "CREATE TABLE p (pid INTEGER PRIMARY KEY, pa TEXT);",
+ "CREATE TABLE q (qid INTEGER PRIMARY KEY, qb TEXT);",
+ "INSERT INTO p (pa) VALUES ('p1');",
+ "INSERT INTO p (pa) VALUES ('p2');",
+ "INSERT INTO q (qb) VALUES ('q1');",
+ "INSERT INTO q (qb) VALUES ('q2');",
+ "INSERT INTO q (qb) VALUES ('q3');",
+ ] {
+ crate::sql::process_command(sql, &mut db).unwrap();
+ }
+ let r = run_rows(&db, "SELECT p.pa, q.qb FROM p NATURAL JOIN q;");
+ assert_eq!(r.rows.len(), 2 * 3, "no shared columns ⇒ cross product");
+ }
+
+ /// CROSS JOIN produces the full cartesian product and is equivalent
+ /// to `INNER JOIN ... ON 1`.
+ #[test]
+ fn cross_join_produces_cartesian_product() {
+ let db = seed_join_fixture();
+ let cross = run_rows(
+ &db,
+ "SELECT customers.name, orders.amount FROM customers CROSS JOIN orders;",
+ );
+ // 3 customers × 4 orders = 12 rows.
+ assert_eq!(cross.rows.len(), 12);
+ let on_true = run_rows(
+ &db,
+ "SELECT customers.name, orders.amount FROM customers INNER JOIN orders ON 1;",
+ );
+ assert_eq!(cross.rows.len(), on_true.rows.len());
+ // SELECT * over a cross join keeps every column from both sides.
+ let star = run_rows(&db, "SELECT * FROM customers CROSS JOIN orders;");
+ assert_eq!(star.columns.len(), 5);
+ assert_eq!(star.rows.len(), 12);
+ }
+
+ /// A LEFT OUTER join expressed with USING still preserves unmatched
+ /// left rows (NULL-padding the right), and the deduplicated column
+ /// keeps the left side's value.
+ #[test]
+ fn left_outer_join_using_preserves_unmatched_left() {
+ let db = seed_join_fixture();
+ let r = run_rows(
+ &db,
+ "SELECT * FROM customers LEFT OUTER JOIN orders USING (id);",
+ );
+ // customers ids 1,2,3 each match an order id; none are unmatched
+ // here, so confirm the dedup + row count instead. 4 columns,
+ // 3 matched rows (orders has no id=customer beyond 1..3 overlap).
+ assert_eq!(r.columns.len(), 4, "id is shown once");
+ assert_eq!(r.rows.len(), 3);
+ }
- let err = crate::sql::process_command("SELECT * FROM a NATURAL JOIN b;", &mut db);
- assert!(err.is_err(), "NATURAL is not supported");
+ /// USING a column that doesn't exist on one of the sides is a clean
+ /// error, not a silent empty result.
+ #[test]
+ fn using_unknown_column_errors() {
+ let db = seed_join_fixture();
+ let q = parse_select("SELECT * FROM customers INNER JOIN orders USING (nope);");
+ let res = execute_select_rows(q, &db);
+ assert!(res.is_err(), "USING (nope) must error — column absent");
}
#[test]
diff --git a/src/sql/parser/select.rs b/src/sql/parser/select.rs
index 988a16d..23d5c4a 100644
--- a/src/sql/parser/select.rs
+++ b/src/sql/parser/select.rs
@@ -1,7 +1,7 @@
use sqlparser::ast::{
DuplicateTreatment, Expr, FunctionArg, FunctionArgExpr, FunctionArguments, JoinConstraint,
- JoinOperator, LimitClause, OrderByKind, Query, Select, SelectItem, SetExpr, Statement,
- TableFactor, TableWithJoins,
+ JoinOperator, LimitClause, ObjectName, ObjectNamePart, OrderByKind, Query, Select, SelectItem,
+ SetExpr, Statement, TableFactor, TableWithJoins, Value,
};
use crate::error::{Result, SQLRiteError};
@@ -162,10 +162,38 @@ impl JoinType {
}
}
+/// How a JOIN matches rows. SQLR-5 originally shipped `ON` only; the
+/// USING / NATURAL increment adds the two name-based constraints.
+/// `ON` carries its predicate straight from the parser. `USING` and
+/// `NATURAL` defer their equality synthesis to the executor because
+/// they need table schemas (which column names exist, and — for
+/// `NATURAL` — which are shared) that the parser doesn't have. The
+/// executor turns both into the same `left.col = right.col [AND …]`
+/// predicate the `ON` path already evaluates. `CROSS JOIN` is rewritten
+/// to `ON true` at parse time (no schema needed) and so reuses the
+/// `On` variant directly.
+#[derive(Debug, Clone)]
+pub enum JoinConstraintKind {
+ /// `ON ` (and the parse-time rewrite of `CROSS JOIN` to
+ /// `ON true`). Evaluated per-row over the multi-table scope. Boxed
+ /// to keep this enum small — `Expr` dwarfs the other variants.
+ On(Box),
+ /// `USING (col[, col…])` — equality on each named column, plus the
+ /// SQLite convention that each named column appears once in
+ /// `SELECT *`. Columns are validated and the predicate is
+ /// synthesized at execution time.
+ Using(Vec),
+ /// `NATURAL` — the shared column names of the two sides are
+ /// discovered at execution time, then treated exactly like
+ /// `USING ()`. No shared columns ⇒ a cross product.
+ Natural,
+}
+
/// One JOIN clause from the FROM list. Multi-join queries
/// (`A JOIN B ... JOIN C ...`) become a `Vec` evaluated
-/// left-to-right against the accumulator. v1 requires an ON condition;
-/// USING / NATURAL / CROSS are deferred.
+/// left-to-right against the accumulator. The match condition is one
+/// of `ON` / `USING` / `NATURAL` (see [`JoinConstraintKind`]);
+/// `CROSS JOIN` arrives here already rewritten to `ON true`.
#[derive(Debug, Clone)]
pub struct JoinClause {
pub join_type: JoinType,
@@ -174,9 +202,8 @@ pub struct JoinClause {
/// from `right_table` so the executor can normalize on
/// `alias.unwrap_or(right_table)` for qualifier matching.
pub right_alias: Option,
- /// `ON ` — required. Evaluated per-row by the executor over
- /// the multi-table scope.
- pub on: Expr,
+ /// What the join matches on. See [`JoinConstraintKind`].
+ pub constraint: JoinConstraintKind,
}
/// A parsed, simplified SELECT query.
@@ -342,11 +369,11 @@ impl SelectQuery {
}
/// Pull the leading FROM table (with optional alias) and any JOIN
-/// clauses out of the parsed FROM list. v1 supports a single base
-/// table plus zero or more INNER / LEFT / RIGHT / FULL OUTER joins
-/// with explicit `ON` conditions. Comma-separated FROM lists,
-/// USING / NATURAL constraints, and CROSS / SEMI / ANTI / ASOF joins
-/// surface as `NotImplemented`.
+/// clauses out of the parsed FROM list. Supports a single base table
+/// plus zero or more INNER / LEFT / RIGHT / FULL OUTER joins with an
+/// `ON`, `USING (...)`, or `NATURAL` constraint, and `CROSS JOIN`
+/// (rewritten to `INNER ... ON true`). Comma-separated FROM lists and
+/// SEMI / ANTI / ASOF / APPLY joins surface as `NotImplemented`.
fn extract_from_clause(
from: &[TableWithJoins],
) -> Result<(String, Option, Vec)> {
@@ -366,20 +393,28 @@ fn extract_from_clause(
let mut joins = Vec::with_capacity(twj.joins.len());
for j in &twj.joins {
let (right_table, right_alias) = extract_table_factor(&j.relation)?;
- let (join_type, on_expr) = match &j.join_operator {
+ let (join_type, constraint) = match &j.join_operator {
// Bare `JOIN` defaults to INNER per SQL standard.
- JoinOperator::Join(c) | JoinOperator::Inner(c) => (JoinType::Inner, parse_on(c)?),
+ JoinOperator::Join(c) | JoinOperator::Inner(c) => {
+ (JoinType::Inner, convert_constraint(c)?)
+ }
JoinOperator::Left(c) | JoinOperator::LeftOuter(c) => {
- (JoinType::LeftOuter, parse_on(c)?)
+ (JoinType::LeftOuter, convert_constraint(c)?)
}
JoinOperator::Right(c) | JoinOperator::RightOuter(c) => {
- (JoinType::RightOuter, parse_on(c)?)
+ (JoinType::RightOuter, convert_constraint(c)?)
}
- JoinOperator::FullOuter(c) => (JoinType::FullOuter, parse_on(c)?),
+ JoinOperator::FullOuter(c) => (JoinType::FullOuter, convert_constraint(c)?),
+ // `CROSS JOIN` is the cross product: INNER with an always-true
+ // ON. A constraint on a CROSS JOIN is non-standard, but if the
+ // parser handed us `USING` / `NATURAL` / `ON` we honor it
+ // rather than silently dropping it.
+ JoinOperator::CrossJoin(c) => (JoinType::Inner, convert_cross_constraint(c)?),
other => {
return Err(SQLRiteError::NotImplemented(format!(
"join flavor {other:?} is not supported \
- (only INNER / LEFT OUTER / RIGHT OUTER / FULL OUTER with ON)"
+ (only INNER / LEFT OUTER / RIGHT OUTER / FULL OUTER / CROSS, \
+ with ON / USING / NATURAL)"
)));
}
};
@@ -387,7 +422,7 @@ fn extract_from_clause(
join_type,
right_table,
right_alias,
- on: on_expr,
+ constraint,
});
}
@@ -417,21 +452,61 @@ fn extract_table_factor(tf: &TableFactor) -> Result<(String, Option)> {
}
}
-fn parse_on(constraint: &JoinConstraint) -> Result {
+/// Lower a `sqlparser` join constraint into our [`JoinConstraintKind`].
+/// `ON` passes through; `USING` is narrowed to a list of bare column
+/// names; `NATURAL` defers to the executor. A constraint-less join
+/// (`A JOIN B` with no `ON` / `USING`) is rejected — `CROSS JOIN` is
+/// the supported way to ask for a cross product and is handled by
+/// [`convert_cross_constraint`].
+fn convert_constraint(constraint: &JoinConstraint) -> Result {
match constraint {
- JoinConstraint::On(expr) => Ok(expr.clone()),
- JoinConstraint::Using(_) => Err(SQLRiteError::NotImplemented(
- "JOIN ... USING (...) is not supported yet — use JOIN ... ON instead".to_string(),
- )),
- JoinConstraint::Natural => Err(SQLRiteError::NotImplemented(
- "NATURAL JOIN is not supported".to_string(),
- )),
+ JoinConstraint::On(expr) => Ok(JoinConstraintKind::On(Box::new(expr.clone()))),
+ JoinConstraint::Using(cols) => {
+ let names = cols
+ .iter()
+ .map(extract_using_column)
+ .collect::>>()?;
+ Ok(JoinConstraintKind::Using(names))
+ }
+ JoinConstraint::Natural => Ok(JoinConstraintKind::Natural),
JoinConstraint::None => Err(SQLRiteError::NotImplemented(
- "JOIN without an ON condition is not supported (use INNER JOIN ... ON ...)".to_string(),
+ "JOIN without an ON / USING / NATURAL condition is not supported \
+ (use `... ON ...`, `... USING (...)`, `NATURAL JOIN`, or `CROSS JOIN`)"
+ .to_string(),
)),
}
}
+/// Constraint handling for `CROSS JOIN`. The standard form carries no
+/// constraint and means "cross product", which we express as `ON true`
+/// so it flows through the same executor path as any other join.
+fn convert_cross_constraint(constraint: &JoinConstraint) -> Result {
+ match constraint {
+ JoinConstraint::None => Ok(JoinConstraintKind::On(Box::new(true_literal()))),
+ // Non-standard, but if a constraint was attached to a CROSS JOIN,
+ // honor it instead of dropping it on the floor.
+ other => convert_constraint(other),
+ }
+}
+
+/// Pull a bare column name out of a `USING (...)` entry. `USING`
+/// columns are always simple identifiers; anything qualified or
+/// multi-part is rejected.
+fn extract_using_column(name: &ObjectName) -> Result {
+ match name.0.as_slice() {
+ [ObjectNamePart::Identifier(ident)] => Ok(ident.value.clone()),
+ _ => Err(SQLRiteError::NotImplemented(format!(
+ "USING column must be a simple column name, got {name}"
+ ))),
+ }
+}
+
+/// An always-true boolean literal expression, used to rewrite
+/// `CROSS JOIN` into `INNER JOIN ... ON true`.
+fn true_literal() -> Expr {
+ Expr::Value(Value::Boolean(true).with_empty_span())
+}
+
fn parse_projection(items: &[SelectItem]) -> Result {
// Special-case `SELECT *`.
if items.len() == 1
diff --git a/web/src/app/docs/page.tsx b/web/src/app/docs/page.tsx
index d50af70..0511dda 100644
--- a/web/src/app/docs/page.tsx
+++ b/web/src/app/docs/page.tsx
@@ -310,10 +310,11 @@ export default function DocsPage() {
The executor uses a plain nested-loop driver — adequate for an
embedded learning database. Hash / merge joins on equi-join shapes
are a future optimization.{" "}
- CROSS JOIN, comma-FROMs, and{" "}
- NATURAL JOIN /{" "}
- JOIN ... USING (col) are not supported yet — write{" "}
- INNER JOIN ... ON true instead. Aggregates /{" "}
+ ON, USING (col), NATURAL, and{" "}
+ CROSS JOIN are all supported (a USING /{" "}
+ NATURAL column shows once in SELECT *).
+ Comma-separated FROMs (FROM a, b) are not — use an
+ explicit JOIN / CROSS JOIN. Aggregates /{" "}
GROUP BY over a join lands once subqueries do.