Skip to content

Commit 3235bcb

Browse files
committed
Add D8: Primary key formation using functional dependencies
PK(A * B) is determined by functional dependency analysis: - If PK(B) ⊆ J: result PK = PK(A) - If PK(A) ⊆ J: result PK = PK(B) - Otherwise: result PK = PK(A) ∪ PK(B) When both conditions hold, left operand wins (non-commutative). Based on Armstrong's axioms and transitivity of FDs.
1 parent 792ac93 commit 3235bcb

File tree

1 file changed

+58
-0
lines changed

1 file changed

+58
-0
lines changed

docs/SPEC-semantic-matching.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,50 @@ Note: `A - B` is the negated form of restriction (equivalent to `A & ~B`), not a
9595

9696
**Note**: A warning may be raised for joins on unindexed attributes (performance consideration).
9797

98+
### Primary Key Formation in Joins
99+
100+
The primary key of `A * B` is determined by functional dependency analysis, not simple union.
101+
102+
Let J = join attributes (homologous namesakes matched during the join).
103+
104+
**Rule**:
105+
```
106+
PK(A * B) =
107+
PK(A) if PK(B) ⊆ J -- B's entire PK is in the join
108+
PK(B) if PK(A) ⊆ J -- A's entire PK is in the join
109+
PK(A) ∪ PK(B) otherwise -- neither PK is fully covered
110+
```
111+
112+
**When both conditions hold** (both PKs are subsets of J), use **PK(A)** — the left operand's primary key. This makes join non-commutative with respect to primary key formation.
113+
114+
**Rationale** (Armstrong's axioms):
115+
- If PK(B) ⊆ J, then by reflexivity: J → PK(B)
116+
- From A: PK(A) → J (A determines its attributes including join attrs)
117+
- By transitivity: PK(A) → J → PK(B) → all of B
118+
- Therefore PK(A) alone determines all attributes in the result
119+
120+
**Examples**:
121+
122+
1. **B's PK covered by join**:
123+
- A: PK = {session_id}, secondary {subject_id}
124+
- B: PK = {subject_id}
125+
- J = {subject_id}, PK(B) ⊆ J ✓
126+
- Result PK = {session_id}
127+
128+
2. **Neither PK covered**:
129+
- A: PK = {a, b}
130+
- B: PK = {b, c}
131+
- J = {b}
132+
- PK(A) ⊄ J, PK(B) ⊄ J
133+
- Result PK = {a, b, c}
134+
135+
3. **Both PKs covered** (non-commutative case):
136+
- A: PK = {a}, secondary {b}
137+
- B: PK = {b}, secondary {a}
138+
- J = {a, b}
139+
- Both PK(A) ⊆ J and PK(B) ⊆ J
140+
- Result PK = PK(A) = {a} (left operand wins)
141+
98142
## Current Implementation Analysis
99143

100144
### Attribute Representation (`heading.py:48`)
@@ -598,6 +642,19 @@ WHERE c.contype = 'f'
598642
AND c.conrelid = %s::regclass
599643
```
600644

645+
### D8: Primary Key Formation Using Functional Dependencies
646+
647+
**Decision**: Use functional dependency analysis to form minimal primary keys in joins.
648+
649+
**Rule**: For `A * B` joining on attributes J:
650+
- If PK(B) ⊆ J: result PK = PK(A)
651+
- Else if PK(A) ⊆ J: result PK = PK(B)
652+
- Else: result PK = PK(A) ∪ PK(B)
653+
654+
**Tie-breaker**: When both PK(A) ⊆ J and PK(B) ⊆ J, use PK(A) (left operand). This makes join **non-commutative** with respect to primary key formation.
655+
656+
**Rationale**: Based on Armstrong's axioms. If PK(B) ⊆ J, then PK(A) → J → PK(B) by transitivity, so PK(A) alone determines all result attributes. The union rule is only needed when neither PK is fully covered by the join.
657+
601658
## Testing Strategy
602659

603660
1. **Unit tests** for lineage propagation through all query operations
@@ -642,6 +699,7 @@ Semantic matching is a significant change to DataJoint's join semantics that imp
642699
| **D5**: Secondary attr restriction | Replaced by lineage rule - FK-inherited attrs have lineage, native secondary don't |
643700
| **D6**: `@` operator | Deprecated - use `.join(semantic_check=False)` |
644701
| **D7**: Migration | Utility function + automatic fallback computation |
702+
| **D8**: PK formation | Functional dependency analysis; left operand wins ties; non-commutative |
645703

646704
### Compatibility
647705

0 commit comments

Comments
 (0)