Skip to content

Commit f096503

Browse files
committed
Revert to _allow_invalid_primary_key for left join bypass
The semantic_check parameter should only control homologous namesake validation, not the left join PK constraint. These are separate concerns: - semantic_check: validates that namesakes have the same lineage - _allow_invalid_primary_key: bypasses left join A → B constraint Aggregation still performs the semantic check but allows invalid intermediate PKs (which are reset via GROUP BY). Co-authored-by: dimitri-yatsenko<dimitri@datajoint.com>
1 parent f5f25ac commit f096503

File tree

3 files changed

+26
-32
lines changed

3 files changed

+26
-32
lines changed

docs/src/design/semantic-matching-spec.md

Lines changed: 5 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -338,19 +338,7 @@ Session.join(Trial, left=True) # OK: Session → Trial
338338
```
339339
DataJointError: Left join requires the left operand to determine the right operand (A → B).
340340
The following attributes from the right operand's primary key are not determined by
341-
the left operand: ['z']. Use an inner join, restructure the query, or use semantic_check=False.
342-
```
343-
344-
### Bypassing with `semantic_check=False`
345-
346-
When `semantic_check=False` is used for a left join where A → B doesn't hold, the constraint is bypassed and **PK = PK(A) ∪ PK(B)** is used. This is useful when the caller will reset the primary key afterward (e.g., aggregation with GROUP BY).
347-
348-
```python
349-
# Direct left join - normally blocked
350-
A.join(B, left=True) # Error: A doesn't determine B
351-
352-
# Bypass with semantic_check=False - produces PK(A) ∪ PK(B)
353-
A.join(B, left=True, semantic_check=False) # Allowed, but PK may have NULLs
341+
the left operand: ['z']. Use an inner join or restructure the query.
354342
```
355343

356344
### Aggregation Exception
@@ -361,10 +349,12 @@ This apparent contradiction is resolved by the `GROUP BY` clause:
361349

362350
1. Aggregation requires B → A so that B can be grouped by A's primary key
363351
2. The intermediate left join `A LEFT JOIN B` would have an invalid PK under the normal left join rules
364-
3. Aggregation uses `semantic_check=False` for its internal join, producing PK(A) ∪ PK(B)
352+
3. Aggregation internally allows the invalid PK, producing PK(A) ∪ PK(B)
365353
4. The `GROUP BY PK(A)` clause then **resets** the primary key to PK(A)
366354
5. The final result has PK(A), which consists entirely of non-NULL values from A
367355

356+
Note: The semantic check (homologous namesake validation) is still performed for aggregation's internal join. Only the primary key validity constraint is bypassed.
357+
368358
**Example:**
369359
```
370360
Session: session_id*, date
@@ -373,7 +363,7 @@ Trial: session_id*, trial_num*, response_time (references Session)
373363
# Aggregation with keep_all_rows=True
374364
Session.aggr(Trial, keep_all_rows=True, avg_rt='avg(response_time)')
375365
376-
# Internally: Session LEFT JOIN Trial with semantic_check=False
366+
# Internally: Session LEFT JOIN Trial (with invalid PK allowed)
377367
# Intermediate PK would be {session_id} ∪ {session_id, trial_num} = {session_id, trial_num}
378368
# But GROUP BY session_id resets PK to {session_id}
379369
# Result: All sessions, with avg_rt=NULL for sessions without trials

src/datajoint/expression.py

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -282,17 +282,19 @@ def __matmul__(self, other):
282282
"The @ operator has been removed in DataJoint 2.0. " "Use .join(other, semantic_check=False) for permissive joins."
283283
)
284284

285-
def join(self, other, semantic_check=True, left=False):
285+
def join(self, other, semantic_check=True, left=False, _allow_invalid_primary_key=False):
286286
"""
287287
Create the joined QueryExpression.
288288
289289
Uses semantic matching: only attributes with the same name AND the same
290290
lineage (homologous namesakes) are used for joining.
291291
292292
:param other: QueryExpression to join with
293-
:param semantic_check: If True (default), raise error on non-homologous namesakes
294-
and enforce left join A → B constraint. If False, bypass these checks.
293+
:param semantic_check: If True (default), raise error on non-homologous namesakes.
294+
If False, bypass semantic check (use for legacy compatibility).
295295
:param left: If True, perform a left join retaining all rows from self.
296+
:param _allow_invalid_primary_key: Internal flag to allow invalid PK in left joins
297+
(used by aggregation where GROUP BY resets the PK afterward).
296298
297299
Examples:
298300
a * b is short for a.join(b)
@@ -336,10 +338,12 @@ def join(self, other, semantic_check=True, left=False):
336338
result._connection = self.connection
337339
result._support = self.support + other.support
338340
result._left = self._left + [left] + other._left
339-
result._heading = self.heading.join(other.heading, left=left, semantic_check=semantic_check)
341+
result._heading = self.heading.join(other.heading, left=left, allow_invalid_primary_key=_allow_invalid_primary_key)
340342
result._restriction = AndList(self.restriction)
341343
result._restriction.append(other.restriction)
342-
result._original_heading = self.original_heading.join(other.original_heading, left=left, semantic_check=semantic_check)
344+
result._original_heading = self.original_heading.join(
345+
other.original_heading, left=left, allow_invalid_primary_key=_allow_invalid_primary_key
346+
)
343347
assert len(result.support) == len(result._left) + 1
344348
return result
345349

@@ -683,8 +687,8 @@ def create(cls, arg, group, keep_all_rows=False):
683687

684688
if keep_all_rows and len(group.support) > 1 or group.heading.new_attributes:
685689
group = group.make_subquery() # subquery if left joining a join
686-
# Use semantic_check=False to bypass left join A → B validation (aggregation resets PK via GROUP BY)
687-
join = arg.join(group, semantic_check=False, left=keep_all_rows)
690+
# Allow invalid PK for left join (aggregation resets PK via GROUP BY afterward)
691+
join = arg.join(group, left=keep_all_rows, _allow_invalid_primary_key=True)
688692
result = cls()
689693
result._connection = join.connection
690694
result._heading = join.heading.set_primary_key(arg.primary_key) # use left operand's primary key

src/datajoint/heading.py

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -468,7 +468,7 @@ def select(self, select_list, rename_map=None, compute_map=None):
468468
)
469469
return Heading(chain(copy_attrs, compute_attrs))
470470

471-
def join(self, other, left=False, semantic_check=True):
471+
def join(self, other, left=False, allow_invalid_primary_key=False):
472472
"""
473473
Join two headings into a new one.
474474
@@ -486,16 +486,16 @@ def join(self, other, left=False, semantic_check=True):
486486
- If B → A or Neither, the PK would include B's attributes, which could be NULL
487487
- Only when A → B does PK(A) uniquely identify all result rows
488488
489-
When semantic_check=False for left joins where A → B doesn't hold, the constraint
490-
is bypassed and PK = PK(A) ∪ PK(B) is used. This is useful for aggregation, where
491-
the GROUP BY clause resets the primary key afterward.
489+
When allow_invalid_primary_key=True for left joins where A → B doesn't hold,
490+
the constraint is bypassed and PK = PK(A) ∪ PK(B) is used. This is useful for
491+
aggregation, where the GROUP BY clause resets the primary key afterward.
492492
493493
It assumes that self and other are headings that share no common dependent attributes.
494494
495495
:param other: The other heading to join with
496-
:param left: If True, this is a left join (requires A → B unless semantic_check=False)
497-
:param semantic_check: If False, bypass left join A → B validation (PK becomes union)
498-
:raises DataJointError: If left=True, semantic_check=True, and A does not determine B
496+
:param left: If True, this is a left join (requires A → B unless allow_invalid_primary_key)
497+
:param allow_invalid_primary_key: If True, bypass left join A → B validation (PK becomes union)
498+
:raises DataJointError: If left=True and A does not determine B (unless allow_invalid_primary_key)
499499
"""
500500
from .errors import DataJointError
501501

@@ -507,9 +507,9 @@ def join(self, other, left=False, semantic_check=True):
507507
name in other.primary_key or name in other.secondary_attributes for name in self.primary_key
508508
)
509509

510-
# For left joins, require A → B unless semantic_check=False
510+
# For left joins, require A → B unless allow_invalid_primary_key=True
511511
if left and not self_determines_other:
512-
if semantic_check:
512+
if not allow_invalid_primary_key:
513513
missing = [
514514
name
515515
for name in other.primary_key
@@ -519,7 +519,7 @@ def join(self, other, left=False, semantic_check=True):
519519
f"Left join requires the left operand to determine the right operand (A → B). "
520520
f"The following attributes from the right operand's primary key are not "
521521
f"determined by the left operand: {missing}. "
522-
f"Use an inner join, restructure the query, or use semantic_check=False."
522+
f"Use an inner join or restructure the query."
523523
)
524524
else:
525525
# Bypass: use union of PKs (will be reset by caller, e.g., aggregation)

0 commit comments

Comments
 (0)