You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add primary key rules for relational operators to spec
- Document PK preservation for restriction, projection, aggregation
- Define A → B (A determines B) based on functional dependencies
- Specify join PK algorithm: PK(A) if A→B, PK(B) if B→A, union otherwise
- Explain predictability vs minimality tradeoff
- Document attribute ordering and non-commutativity
- Add test cases for join primary key determination
Co-authored-by: dimitri-yatsenko<dimitri@datajoint.com>
The error message directs users to the explicit `.join()` method.
166
166
167
+
## Primary Key Rules in Relational Operators
168
+
169
+
In DataJoint, the result of each query operator produces a valid **entity set** with a well-defined **entity type** and **primary key**. This section specifies how the primary key is determined for each relational operator.
170
+
171
+
### General Principle
172
+
173
+
The primary key of a query result identifies unique entities in that result. For most operators, the primary key is preserved from the left operand. For joins, the primary key depends on the functional dependencies between the operands.
174
+
175
+
### Notation
176
+
177
+
In the examples below, `*` marks primary key attributes:
178
+
-`A(x*, y*, z)` means A has primary key `{x, y}` and secondary attribute `z`
179
+
-`A → B` means "A determines B" (defined below)
180
+
181
+
### Rules by Operator
182
+
183
+
| Operator | Primary Key Rule |
184
+
|----------|------------------|
185
+
|`A & B` (restriction) | PK(A) — preserved from left operand |
186
+
|`A - B` (anti-restriction) | PK(A) — preserved from left operand |
187
+
|`A.proj(...)` (projection) | PK(A) — preserved from left operand |
188
+
|`A.aggr(B, ...)` (aggregation) | PK(A) — preserved from left operand |
189
+
|`A * B` (join) | Depends on functional dependencies (see below) |
190
+
191
+
### Join Primary Key Rule
192
+
193
+
The join operator requires special handling because it combines two entity sets. The primary key of `A * B` depends on the **functional dependency relationship** between the operands.
194
+
195
+
#### Definitions
196
+
197
+
**A determines B** (written `A → B`): Every attribute in PK(B) is either already in PK(A) or is a secondary attribute in A.
198
+
199
+
```
200
+
A → B iff ∀b ∈ PK(B): b ∈ PK(A) OR b ∈ secondary(A)
201
+
```
202
+
203
+
Intuitively, `A → B` means that knowing A's primary key is sufficient to determine B's primary key through functional dependencies.
204
+
205
+
**B determines A** (written `B → A`): Every attribute in PK(A) is either already in PK(B) or is a secondary attribute in B.
206
+
207
+
```
208
+
B → A iff ∀a ∈ PK(A): a ∈ PK(B) OR a ∈ secondary(B)
209
+
```
210
+
211
+
#### Join Primary Key Algorithm
212
+
213
+
For `A * B`:
214
+
215
+
| Condition | PK(A * B) | Attribute Order |
216
+
|-----------|-----------|-----------------|
217
+
| A → B | PK(A) | A's attributes first |
218
+
| B → A (and not A → B) | PK(B) | B's attributes first |
### Design Tradeoff: Predictability vs. Minimality
262
+
263
+
The join primary key rule prioritizes **predictability** over **minimality**. In some cases, the resulting primary key may not be minimal (i.e., it may contain functionally redundant attributes).
264
+
265
+
**Example of non-minimal result:**
266
+
```
267
+
A: x*, y*
268
+
B: z*, x (x is secondary in B, so z → x)
269
+
```
270
+
271
+
The mathematically minimal primary key for `A * B` would be `{y, z}` because:
272
+
-`z → x` (from B's structure)
273
+
-`{y, z} → {x, y, z}` (z gives us x, and we have y)
274
+
275
+
However, `{y, z}` is problematic:
276
+
- It is **not the primary key of either operand** (A has `{x, y}`, B has `{z}`)
277
+
- It is **not the union** of the primary keys
278
+
- It represents a **novel entity type** that doesn't correspond to A, B, or their natural pairing
279
+
280
+
This creates confusion: what kind of entity does `{y, z}` identify?
281
+
282
+
**The simplified rule produces `{x, y, z}`** (the union), which:
283
+
- Is immediately recognizable as "one A entity paired with one B entity"
284
+
- Contains A's full primary key and B's full primary key
285
+
- May have redundancy (`x` is determined by `z`) but is semantically clear
286
+
287
+
**Rationale:** Users can always project away redundant attributes if they need the minimal key. But starting with a predictable, interpretable primary key reduces confusion and errors.
288
+
289
+
### Attribute Ordering
290
+
291
+
The primary key attributes always appear **first** in the result's attribute list, followed by secondary attributes. When `B → A` (and not `A → B`), the join is conceptually reordered as `B * A` to maintain this invariant:
292
+
293
+
- If PK = PK(A): A's attributes appear first
294
+
- If PK = PK(B): B's attributes appear first
295
+
- If PK = PK(A) ∪ PK(B): PK(A) attributes first, then PK(B) − PK(A), then secondaries
296
+
297
+
### Non-Commutativity
298
+
299
+
With these rules, join is **not commutative** in terms of:
300
+
1.**Primary key selection**: `A * B` may have a different PK than `B * A` when one direction determines but not the other
301
+
2.**Attribute ordering**: The left operand's attributes appear first (unless B → A)
302
+
303
+
The **result set** (the actual rows returned) remains the same regardless of order, but the **schema** (primary key and attribute order) may differ.
304
+
167
305
## Universal Set `dj.U`
168
306
169
307
`dj.U()` or `dj.U('attr1', 'attr2', ...)` represents the universal set of all possible values and lineages.
@@ -537,6 +675,14 @@ Use .proj() to rename one of the attributes or .join(semantic_check=False) in a
537
675
-`A.aggr(B)` raises error when PK attributes have different lineage
538
676
-`dj.U('a', 'b').aggr(B)` works when B has `a` and `b` attributes
539
677
678
+
6.**Join primary key determination**:
679
+
-`A * B` where `A → B`: result has PK(A)
680
+
-`A * B` where `B → A` (not `A → B`): result has PK(B), B's attributes first
681
+
-`A * B` where both `A → B` and `B → A`: result has PK(A) (left preference)
682
+
-`A * B` where neither direction: result has PK(A) ∪ PK(B)
0 commit comments