You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix: Secondary attributes CAN have lineage if inherited via FK
Corrected understanding of lineage rules:
1. FK-inherited attributes have lineage (regardless of PK/secondary status)
- `-> Subject` in dependent section gives secondary `subject_id` WITH lineage
2. Native PK attributes have lineage (origin is this table)
3. Only NATIVE secondary attributes have no lineage
- Defined directly in table, not via FK
- These are table-specific data
Key implication: Two tables with `-> Subject` in dependent sections
CAN join on `subject_id` because both trace to Subject.subject_id
Copy file name to clipboardExpand all lines: docs/SPEC-semantic-matching.md
+40-23Lines changed: 40 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -47,33 +47,39 @@ Homologous attributes are also called **semantically matched** attributes.
47
47
48
48
### Attribute Lineage
49
49
50
-
Lineage applies **only to primary key attributes**:
50
+
Lineage is determined by how an attribute is introduced:
51
51
52
-
1.**Primary key attributes** have lineage:
53
-
- If native to the table: `lineage = (this_schema, this_table, attr_name)`
54
-
- If inherited via foreign key: `lineage = (origin_schema, origin_table, origin_attr)`
52
+
1.**Attributes inherited via foreign key** have lineage:
53
+
- Whether they end up as primary or secondary in the referencing table
54
+
-`lineage = (origin_schema, origin_table, origin_attr)` traced to the original definition
55
+
- Example: `-> Subject` in the dependent section introduces `subject_id` as a secondary attribute WITH lineage
55
56
56
-
2.**Secondary attributes** do NOT have lineage:
57
-
-`lineage = None` for all secondary (non-primary-key) attributes
58
-
- Secondary attributes are table-specific data, not entity identifiers
59
-
- Foreign keys can only reference primary keys, so secondary attributes cannot be inherited
57
+
2.**Native primary key attributes** have lineage:
58
+
-`lineage = (this_schema, this_table, attr_name)` - the table where they are defined
59
+
60
+
3.**Native secondary attributes** do NOT have lineage:
61
+
-`lineage = None` for secondary attributes defined directly (not via FK)
62
+
- These are table-specific data, not entity identifiers
60
63
61
64
Lineage propagates through:
62
-
-**Foreign key references**: when table B references table A, the inherited primary key attributes in B have the same lineage as their counterparts in A
63
-
-**Query expressions**: projections preserve lineage for renamed PK attributes; computed attributes have no lineage
65
+
-**Foreign key references**: inherited attributes retain their origin lineage regardless of PK/secondary status
66
+
-**Query expressions**: projections preserve lineage for renamed attributes; computed attributes have no lineage
64
67
65
68
### Join Compatibility Rules
66
69
67
-
For a join `A * B` to be valid:
68
-
1.**Primary key namesakes** must be homologous (same lineage)
69
-
2.**Secondary attribute namesakes** always collide (both have `lineage = None`)
70
+
For a join `A * B` to be valid, all namesake attributes must be homologous (same lineage).
71
+
72
+
**Cases**:
73
+
1.**Both have lineage** → lineages must match (same origin)
74
+
2.**Both have no lineage** → collision (both are native secondary attrs)
75
+
3.**One has lineage, one doesn't** → collision (cannot be the same entity)
70
76
71
-
If namesake attributes exist that are **not** homologous, an error should be raised (collision of non-homologous namesakes).
77
+
If namesake attributes are **not** homologous, an error should be raised.
72
78
73
79
**Implications**:
74
-
-Two tables with the same secondary attribute name (e.g., both have `value`) cannot be joined directly - one must be renamed via `.proj()`
75
-
-Primary key attributes can only match if they share lineage through the FK graph
76
-
- This replaces the old heuristic (secondary attributes can't be join keys) with a principled rule (lineage must match)
80
+
-FK-inherited attributes (PK or secondary) can match if they share lineage
81
+
-Native secondary attributes with the same name always collide - one must be renamed via `.proj()`
82
+
- This replaces the old heuristic with a principled rule: lineage must match
77
83
78
84
**Note**: A warning may be raised for joins on unindexed attributes (performance consideration).
79
85
@@ -463,12 +469,23 @@ raise DataJointError(
463
469
)
464
470
```
465
471
466
-
**New behavior**: The restriction is now a consequence of lineage rules:
467
-
- Secondary attributes have `lineage = None`
468
-
- Two `None` lineages do not match (collision)
469
-
- Therefore, secondary attribute namesakes still cause errors, but for the right reason
472
+
**New behavior**: Lineage determines joinability:
473
+
- Attributes with matching lineage can participate in joins (even if secondary)
474
+
- Attributes with `lineage = None` (native secondary) always collide with namesakes
475
+
- The key distinction is HOW the attribute was introduced, not WHERE it ends up
476
+
477
+
**Key insight**: Secondary attributes introduced via foreign key DO have lineage and CAN participate in joins. Only native secondary attributes (defined directly in the table, not via FK) have no lineage.
470
478
471
-
**Key insight**: Since foreign keys can only reference primary keys, secondary attributes cannot be inherited. They are always native to their table and have no lineage. The old heuristic was correct in effect, but the new rule is principled.
479
+
**Example**:
480
+
```python
481
+
# Table A: -> Subject in dependent section gives secondary `subject_id` WITH lineage
482
+
# Table B: -> Subject in dependent section gives secondary `subject_id` WITH lineage
483
+
# A * B works! Both subject_id attributes trace to Subject.subject_id
484
+
485
+
# Table C: has native secondary `value` (no lineage)
486
+
# Table D: has native secondary `value` (no lineage)
487
+
# C * D fails - collision, must rename one
488
+
```
472
489
473
490
**Error message change**:
474
491
```python
@@ -603,7 +620,7 @@ Semantic matching is a significant change to DataJoint's join semantics that imp
603
620
|**D2**: Renamed attributes | Preserve original lineage |
0 commit comments