Skip to content

Commit 84af070

Browse files
fix: Disable semantic_check on populate antijoin (parallels #1383)
Same fix #1383 applied to the Job table's antijoin in refresh(), now applied to AutoPopulate._populate_direct's antijoin and the progress() fallback path. The two-arg subtract `key_source - self` triggers QueryExpression.__sub__ which calls .restrict(Not(...)) with semantic_check=True by default. The semantic-check requirement is wrong here: this antijoin is a plain set-difference, not a join — we ask "which key_source rows aren't yet in self." Whether the same-named PK attribute carries the same source-table lineage tag on both sides is irrelevant. Where it bites: dj.Imported / dj.Computed tables whose primary key is fully inherited from a single FK, with no own-table PK attributes. On those, self.proj() returns the PK attribute with lineage=None (or pointing to self rather than the FK parent), while key_source's matching attribute carries the parent's lineage tag. The semantic-check fails with: Cannot join on attribute 'X': different lineages (schema.parent.X vs None). Use .proj() to rename one of the attributes. This pattern is legitimate ("one row downstream per parent row, no intermediate ID") but rare in typical Elements / SciOps pipelines, which extend the inherited PK with own-table attributes (trial_id, experiment_id, etc.) that anchor proj()'s lineage. That's why the existing #1405 test suite didn't surface it. Changes: - src/datajoint/autopopulate.py - Import Not from .condition at module top. - _populate_direct: replace `(LHS - self.proj())` with `LHS.restrict(Not(self.proj()), semantic_check=False)`. - progress(): same swap on the no-common-attrs fallback branch. - tests/integration/test_autopopulate.py - New test_populate_antijoin_fk_inherited_pk regression test: Spec(Manual) -> Item(Imported with only -> Spec) — the minimal shape that triggers the bug. Without the fix Item.populate() raises DataJointError; with the fix it populates correctly, progress() reports correct counts, and partial-then-full populate works. Stacked on top of #1452 (the secrets-loading + dead-code fix); rebase to master after that lands.
1 parent 0b8e7f6 commit 84af070

2 files changed

Lines changed: 67 additions & 2 deletions

File tree

src/datajoint/autopopulate.py

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
import traceback
1212
from typing import TYPE_CHECKING, Any, Generator
1313

14+
from .condition import Not
1415
from .errors import DataJointError, LostConnectionError
1516
from .expression import AndList, QueryExpression
1617

@@ -401,7 +402,12 @@ def _populate_direct(
401402
"""
402403
from tqdm import tqdm
403404

404-
keys = (self._jobs_to_do(restrictions) - self.proj()).keys()
405+
# Disable semantic_check on the antijoin: when self has FK-inherited
406+
# PK attributes, self.proj() may carry attribute lineages that don't
407+
# match key_source's (same attribute, different source-table tag).
408+
# The set-difference itself doesn't care about lineage — we just want
409+
# rows in key_source that aren't yet in self.
410+
keys = self._jobs_to_do(restrictions).restrict(Not(self.proj()), semantic_check=False).keys()
405411

406412
logger.debug("Found %d keys to populate" % len(keys))
407413

@@ -702,7 +708,8 @@ def progress(self, *restrictions: Any, display: bool = False) -> tuple[int, int]
702708
if not common_attrs:
703709
# No common attributes - fall back to two-query method
704710
total = len(todo)
705-
remaining = len(todo - self.proj())
711+
# Same lineage caveat as in _populate_direct — disable semantic_check.
712+
remaining = len(todo.restrict(Not(self.proj()), semantic_check=False))
706713
else:
707714
# Build a single query that computes both total and remaining
708715
# Using LEFT JOIN with COUNT(DISTINCT) to handle 1:many relationships

tests/integration/test_autopopulate.py

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -236,6 +236,64 @@ def make(self, key):
236236
test_schema.drop(prompt=False)
237237

238238

239+
def test_populate_antijoin_fk_inherited_pk(prefix, connection_test):
240+
"""Regression test: populate antijoin on a table whose PK is fully FK-inherited.
241+
242+
Reproduces the lineage-mismatch failure that hits ``Imported`` or
243+
``Computed`` tables whose primary key consists entirely of attributes
244+
inherited via a foreign key, with no own-table PK attributes.
245+
246+
Without the ``semantic_check=False`` on the populate antijoin, the
247+
subtraction ``key_source - self.proj()`` raises::
248+
249+
DataJointError: Cannot join on attribute 'spec_id': different lineages
250+
(schema.spec.spec_id vs None). Use .proj() to rename one of the attributes.
251+
252+
The set-difference doesn't actually need lineage matching — it just
253+
asks which key_source rows aren't yet in ``self``.
254+
"""
255+
test_schema = dj.Schema(f"{prefix}_antijoin_fk_pk", connection=connection_test)
256+
257+
@test_schema
258+
class Spec(dj.Manual):
259+
definition = """
260+
spec_id : int32
261+
---
262+
label : varchar(30)
263+
"""
264+
265+
@test_schema
266+
class Item(dj.Imported):
267+
definition = """
268+
-> Spec
269+
---
270+
payload : varchar(60)
271+
"""
272+
273+
def make(self, key):
274+
label = (Spec & key).fetch1("label")
275+
self.insert1(dict(key, payload=f"made:{label}"))
276+
277+
try:
278+
Spec.insert([(1, "alpha"), (2, "beta"), (3, "gamma")])
279+
280+
# Before the fix this raised DataJointError on the antijoin.
281+
Item.populate(max_calls=2)
282+
assert len(Item) == 2
283+
284+
remaining, total = Item.progress()
285+
assert total == 3
286+
assert remaining == 1
287+
288+
Item.populate()
289+
assert len(Item) == 3
290+
remaining, total = Item.progress()
291+
assert remaining == 0
292+
assert total == 3
293+
finally:
294+
test_schema.drop(prompt=False)
295+
296+
239297
def test_load_dependencies(prefix, connection_test):
240298
schema = dj.Schema(f"{prefix}_load_dependencies_populate", connection=connection_test)
241299

0 commit comments

Comments
 (0)