Skip to content

Commit ca35b4b

Browse files
committed
Document allow_invalid_primary_key as public parameter
Make allow_invalid_primary_key a public parameter for join() so users can bypass the left join A → B constraint when they take responsibility for handling the potentially invalid primary key. This is useful when subsequent operations (like GROUP BY) will reset the primary key. Aggregation uses this internally for keep_all_rows=True. Co-authored-by: dimitri-yatsenko<dimitri@datajoint.com>
1 parent f096503 commit ca35b4b

File tree

2 files changed

+24
-6
lines changed

2 files changed

+24
-6
lines changed

docs/src/design/semantic-matching-spec.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -341,6 +341,22 @@ The following attributes from the right operand's primary key are not determined
341341
the left operand: ['z']. Use an inner join or restructure the query.
342342
```
343343

344+
### Bypassing the Left Join Constraint
345+
346+
For special cases where the user takes responsibility for handling the potentially invalid primary key, the constraint can be bypassed using `allow_invalid_primary_key=True`:
347+
348+
```python
349+
# Normally blocked - B does not determine A
350+
A.join(B, left=True) # Error: A → B not satisfied
351+
352+
# Bypass the constraint - user takes responsibility
353+
A.join(B, left=True, allow_invalid_primary_key=True) # Allowed, PK = PK(A) ∪ PK(B)
354+
```
355+
356+
When bypassed, the resulting primary key is the union of both operands' primary keys (PK(A) ∪ PK(B)). The user must ensure that subsequent operations (such as `GROUP BY` or projection) establish a valid primary key.
357+
358+
This mechanism is used internally by aggregation (`aggr`) with `keep_all_rows=True`, which resets the primary key via the `GROUP BY` clause.
359+
344360
### Aggregation Exception
345361

346362
`A.aggr(B, keep_all_rows=True)` uses a left join internally but has the **opposite requirement**: **B → A** (the group expression B must have all of A's primary key attributes).

src/datajoint/expression.py

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -282,7 +282,7 @@ def __matmul__(self, other):
282282
"The @ operator has been removed in DataJoint 2.0. " "Use .join(other, semantic_check=False) for permissive joins."
283283
)
284284

285-
def join(self, other, semantic_check=True, left=False, _allow_invalid_primary_key=False):
285+
def join(self, other, semantic_check=True, left=False, allow_invalid_primary_key=False):
286286
"""
287287
Create the joined QueryExpression.
288288
@@ -293,12 +293,14 @@ def join(self, other, semantic_check=True, left=False, _allow_invalid_primary_ke
293293
:param semantic_check: If True (default), raise error on non-homologous namesakes.
294294
If False, bypass semantic check (use for legacy compatibility).
295295
:param left: If True, perform a left join retaining all rows from self.
296-
:param _allow_invalid_primary_key: Internal flag to allow invalid PK in left joins
297-
(used by aggregation where GROUP BY resets the PK afterward).
296+
:param allow_invalid_primary_key: If True, bypass the left join A → B constraint.
297+
The resulting PK will be PK(A) ∪ PK(B), which may contain NULLs for unmatched rows.
298+
Use when you will reset the PK afterward (e.g., via GROUP BY in aggregation).
298299
299300
Examples:
300301
a * b is short for a.join(b)
301302
a.join(b, semantic_check=False) for permissive joins
303+
a.join(b, left=True, allow_invalid_primary_key=True) for left join with invalid PK
302304
"""
303305
# U joins are deprecated - raise error directing to use & instead
304306
if isinstance(other, U):
@@ -338,11 +340,11 @@ def join(self, other, semantic_check=True, left=False, _allow_invalid_primary_ke
338340
result._connection = self.connection
339341
result._support = self.support + other.support
340342
result._left = self._left + [left] + other._left
341-
result._heading = self.heading.join(other.heading, left=left, allow_invalid_primary_key=_allow_invalid_primary_key)
343+
result._heading = self.heading.join(other.heading, left=left, allow_invalid_primary_key=allow_invalid_primary_key)
342344
result._restriction = AndList(self.restriction)
343345
result._restriction.append(other.restriction)
344346
result._original_heading = self.original_heading.join(
345-
other.original_heading, left=left, allow_invalid_primary_key=_allow_invalid_primary_key
347+
other.original_heading, left=left, allow_invalid_primary_key=allow_invalid_primary_key
346348
)
347349
assert len(result.support) == len(result._left) + 1
348350
return result
@@ -688,7 +690,7 @@ def create(cls, arg, group, keep_all_rows=False):
688690
if keep_all_rows and len(group.support) > 1 or group.heading.new_attributes:
689691
group = group.make_subquery() # subquery if left joining a join
690692
# Allow invalid PK for left join (aggregation resets PK via GROUP BY afterward)
691-
join = arg.join(group, left=keep_all_rows, _allow_invalid_primary_key=True)
693+
join = arg.join(group, left=keep_all_rows, allow_invalid_primary_key=True)
692694
result = cls()
693695
result._connection = join.connection
694696
result._heading = join.heading.set_primary_key(arg.primary_key) # use left operand's primary key

0 commit comments

Comments
 (0)