You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add D8: Primary key formation using functional dependencies
PK(A * B) is determined by functional dependency analysis:
- If PK(B) ⊆ J: result PK = PK(A)
- If PK(A) ⊆ J: result PK = PK(B)
- Otherwise: result PK = PK(A) ∪ PK(B)
When both conditions hold, left operand wins (non-commutative).
Based on Armstrong's axioms and transitivity of FDs.
Copy file name to clipboardExpand all lines: docs/SPEC-semantic-matching.md
+58Lines changed: 58 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -95,6 +95,50 @@ Note: `A - B` is the negated form of restriction (equivalent to `A & ~B`), not a
95
95
96
96
**Note**: A warning may be raised for joins on unindexed attributes (performance consideration).
97
97
98
+
### Primary Key Formation in Joins
99
+
100
+
The primary key of `A * B` is determined by functional dependency analysis, not simple union.
101
+
102
+
Let J = join attributes (homologous namesakes matched during the join).
103
+
104
+
**Rule**:
105
+
```
106
+
PK(A * B) =
107
+
PK(A) if PK(B) ⊆ J -- B's entire PK is in the join
108
+
PK(B) if PK(A) ⊆ J -- A's entire PK is in the join
109
+
PK(A) ∪ PK(B) otherwise -- neither PK is fully covered
110
+
```
111
+
112
+
**When both conditions hold** (both PKs are subsets of J), use **PK(A)** — the left operand's primary key. This makes join non-commutative with respect to primary key formation.
113
+
114
+
**Rationale** (Armstrong's axioms):
115
+
- If PK(B) ⊆ J, then by reflexivity: J → PK(B)
116
+
- From A: PK(A) → J (A determines its attributes including join attrs)
117
+
- By transitivity: PK(A) → J → PK(B) → all of B
118
+
- Therefore PK(A) alone determines all attributes in the result
119
+
120
+
**Examples**:
121
+
122
+
1.**B's PK covered by join**:
123
+
- A: PK = {session_id}, secondary {subject_id}
124
+
- B: PK = {subject_id}
125
+
- J = {subject_id}, PK(B) ⊆ J ✓
126
+
- Result PK = {session_id}
127
+
128
+
2.**Neither PK covered**:
129
+
- A: PK = {a, b}
130
+
- B: PK = {b, c}
131
+
- J = {b}
132
+
- PK(A) ⊄ J, PK(B) ⊄ J
133
+
- Result PK = {a, b, c}
134
+
135
+
3.**Both PKs covered** (non-commutative case):
136
+
- A: PK = {a}, secondary {b}
137
+
- B: PK = {b}, secondary {a}
138
+
- J = {a, b}
139
+
- Both PK(A) ⊆ J and PK(B) ⊆ J
140
+
- Result PK = PK(A) = {a} (left operand wins)
141
+
98
142
## Current Implementation Analysis
99
143
100
144
### Attribute Representation (`heading.py:48`)
@@ -598,6 +642,19 @@ WHERE c.contype = 'f'
598
642
ANDc.conrelid= %s::regclass
599
643
```
600
644
645
+
### D8: Primary Key Formation Using Functional Dependencies
646
+
647
+
**Decision**: Use functional dependency analysis to form minimal primary keys in joins.
648
+
649
+
**Rule**: For `A * B` joining on attributes J:
650
+
- If PK(B) ⊆ J: result PK = PK(A)
651
+
- Else if PK(A) ⊆ J: result PK = PK(B)
652
+
- Else: result PK = PK(A) ∪ PK(B)
653
+
654
+
**Tie-breaker**: When both PK(A) ⊆ J and PK(B) ⊆ J, use PK(A) (left operand). This makes join **non-commutative** with respect to primary key formation.
655
+
656
+
**Rationale**: Based on Armstrong's axioms. If PK(B) ⊆ J, then PK(A) → J → PK(B) by transitivity, so PK(A) alone determines all result attributes. The union rule is only needed when neither PK is fully covered by the join.
657
+
601
658
## Testing Strategy
602
659
603
660
1.**Unit tests** for lineage propagation through all query operations
@@ -642,6 +699,7 @@ Semantic matching is a significant change to DataJoint's join semantics that imp
642
699
|**D5**: Secondary attr restriction | Replaced by lineage rule - FK-inherited attrs have lineage, native secondary don't |
643
700
|**D6**: `@` operator | Deprecated - use `.join(semantic_check=False)`|
644
701
|**D7**: Migration | Utility function + automatic fallback computation |
0 commit comments