You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Clarify lineage applies only to primary key attributes
Key clarification:
- Lineage starts with primary key attributes only
- Foreign keys can only reference primary keys
- Secondary attributes have lineage = None (not inherited)
Implications for join rules:
- PK namesakes must have matching lineage (homologous)
- Secondary namesakes always collide (both have None lineage)
- Old heuristic (no secondary joins) replaced with principled rule
This means the effective behavior for secondary attributes is the same,
but now based on the correct principle rather than a heuristic.
Copy file name to clipboardExpand all lines: docs/SPEC-semantic-matching.md
+39-14Lines changed: 39 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -47,18 +47,35 @@ Homologous attributes are also called **semantically matched** attributes.
47
47
48
48
### Attribute Lineage
49
49
50
-
Every attribute has a **lineage** - a reference to its original definition. Lineage is propagated through:
51
-
- Foreign key references: when table B references table A, the inherited primary key attributes in B have the same lineage as in A
52
-
- Query expressions: projections, joins, and other operations preserve lineage
50
+
Lineage applies **only to primary key attributes**:
51
+
52
+
1.**Primary key attributes** have lineage:
53
+
- If native to the table: `lineage = (this_schema, this_table, attr_name)`
54
+
- If inherited via foreign key: `lineage = (origin_schema, origin_table, origin_attr)`
55
+
56
+
2.**Secondary attributes** do NOT have lineage:
57
+
-`lineage = None` for all secondary (non-primary-key) attributes
58
+
- Secondary attributes are table-specific data, not entity identifiers
59
+
- Foreign keys can only reference primary keys, so secondary attributes cannot be inherited
60
+
61
+
Lineage propagates through:
62
+
-**Foreign key references**: when table B references table A, the inherited primary key attributes in B have the same lineage as their counterparts in A
63
+
-**Query expressions**: projections preserve lineage for renamed PK attributes; computed attributes have no lineage
53
64
54
65
### Join Compatibility Rules
55
66
56
67
For a join `A * B` to be valid:
57
-
1. All namesake attributes (same name in both) must be homologous (same lineage)
68
+
1.**Primary key namesakes** must be homologous (same lineage)
69
+
2.**Secondary attribute namesakes** always collide (both have `lineage = None`)
58
70
59
71
If namesake attributes exist that are **not** homologous, an error should be raised (collision of non-homologous namesakes).
60
72
61
-
**Note**: The current restriction that joins cannot be done on secondary attributes is **deprecated**. As long as attributes are homologous, they can participate in joins regardless of primary/secondary status. A warning may be raised for joins on unindexed attributes (performance consideration).
73
+
**Implications**:
74
+
- Two tables with the same secondary attribute name (e.g., both have `value`) cannot be joined directly - one must be renamed via `.proj()`
75
+
- Primary key attributes can only match if they share lineage through the FK graph
76
+
- This replaces the old heuristic (secondary attributes can't be join keys) with a principled rule (lineage must match)
77
+
78
+
**Note**: A warning may be raised for joins on unindexed attributes (performance consideration).
62
79
63
80
## Current Implementation Analysis
64
81
@@ -276,8 +293,8 @@ Update these methods to preserve lineage:
276
293
### Phase 5: Error Handling
277
294
278
295
1.**Clear error messages** for:
279
-
-Namesake collision: `"Cannot join: attribute 'name' exists in both operands with different lineages (Student.name vs Course.name). Use .proj() to rename one."`
280
-
-Non-PK homologous: `"Cannot join on secondary attribute 'value' - must be in primary key of at least one operand."`
296
+
-PK lineage mismatch: `"Cannot join: attribute 'subject_id' exists in both operands with different lineages (lab.Subject.subject_id vs other.Experiment.subject_id). Use .proj() to rename one."`
297
+
-Secondary attr collision: `"Cannot join: attribute 'value' has no lineage in both operands (secondary attributes). Use .proj() to rename one."`
281
298
282
299
2.**Resolution guidance** in error messages:
283
300
- Suggest specific projection syntax to resolve
@@ -434,9 +451,9 @@ This is intentional: a computed value is a new entity, not inherited from any so
434
451
435
452
`dj.U` promotes attributes to the primary key for grouping/aggregation purposes, but the semantic identity of the attributes remains unchanged.
**New behavior**: Any homologous attributes can participate in joins, regardless of primary/secondary status. The only requirement is matching lineage.
466
+
**New behavior**: The restriction is now a consequence of lineage rules:
467
+
- Secondary attributes have `lineage = None`
468
+
- Two `None` lineages do not match (collision)
469
+
- Therefore, secondary attribute namesakes still cause errors, but for the right reason
450
470
451
-
**Rationale**: The original restriction was a heuristic to prevent accidental joins on coincidentally-named attributes. With proper lineage tracking, this heuristic is no longer needed - lineage provides the authoritative answer.
471
+
**Key insight**: Since foreign keys can only reference primary keys, secondary attributes cannot be inherited. They are always native to their table and have no lineage. The old heuristic was correct in effect, but the new rule is principled.
472
+
473
+
**Error message change**:
474
+
```python
475
+
# Old: "Cannot join query expressions on dependent attribute `value`"
476
+
# New: "Cannot join: attribute 'value' has no lineage in both operands. Use .proj() to rename one."
477
+
```
452
478
453
-
**Performance warning**: Consider warning when joining on attributes that lack indexes in one or both tables:
479
+
**Performance warning**: Consider warning when joining on attributes that lack indexes:
0 commit comments