Skip to content

Commit a81d4be

Browse files
committed
Update lineage table design in spec
- Only store attributes WITH lineage (native secondary attrs have no entry) - Make lineage column NOT NULL - Add cleanup on table creation (remove leftover entries) - Add cleanup on table drop
1 parent b6be218 commit a81d4be

File tree

1 file changed

+25
-5
lines changed

1 file changed

+25
-5
lines changed

docs/src/design/semantic-matching-spec.md

Lines changed: 25 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -198,6 +198,7 @@ For DataJoint-managed schemas:
198198

199199
- Lineage is stored explicitly in a hidden table (`~lineage`) per schema
200200
- Populated at table declaration time by copying from parent tables
201+
- Only attributes WITH lineage are stored (native secondary attributes have no entry)
201202
- Fast O(1) lookup at query time
202203
- Authoritative source when present
203204

@@ -206,11 +207,15 @@ For DataJoint-managed schemas:
206207
CREATE TABLE `schema_name`.`~lineage` (
207208
table_name VARCHAR(64) NOT NULL,
208209
attribute_name VARCHAR(64) NOT NULL,
209-
lineage VARCHAR(255), -- NULL for native secondary attrs
210+
lineage VARCHAR(255) NOT NULL,
210211
PRIMARY KEY (table_name, attribute_name)
211212
);
212213
```
213214

215+
**Lifecycle**:
216+
- On table creation: delete any existing entries for that table, then insert new entries
217+
- On table drop: delete all entries for that table
218+
214219
#### Method 2: Dependency Graph Traversal
215220

216221
Fallback for non-DataJoint schemas or when `~lineage` doesn't exist:
@@ -247,6 +252,7 @@ These methods are **mutually exclusive**:
247252
```python
248253
def get_lineage(schema, table, attribute):
249254
if lineage_table_exists(schema):
255+
# Returns lineage string if entry exists, None otherwise
250256
return query_lineage_table(schema, table, attribute)
251257
else:
252258
return compute_from_dependencies(schema, table, attribute)
@@ -352,6 +358,9 @@ When a table is declared, populate the `~lineage` table:
352358
def declare_table(table_class, context):
353359
# ... parse definition ...
354360

361+
# Remove any leftover entries from previous declaration
362+
delete_lineage_entries(schema, table_name)
363+
355364
lineage_entries = []
356365

357366
for attr in definition.attributes:
@@ -360,21 +369,32 @@ def declare_table(table_class, context):
360369
parent_lineage = get_lineage(
361370
attr.fk_schema, attr.fk_table, attr.fk_attribute
362371
)
363-
lineage_entries.append((table_name, attr.name, parent_lineage))
372+
if parent_lineage: # Only store if parent has lineage
373+
lineage_entries.append((table_name, attr.name, parent_lineage))
364374
elif attr.in_key:
365375
# Native primary key: this table is the origin
366376
lineage_entries.append((
367377
table_name, attr.name,
368378
f"{schema}.{table_name}.{attr.name}"
369379
))
370-
else:
371-
# Native secondary: no lineage
372-
lineage_entries.append((table_name, attr.name, None))
380+
# Native secondary attributes: no entry (no lineage)
373381

374382
# Insert into ~lineage table
375383
insert_lineage_entries(schema, lineage_entries)
376384
```
377385

386+
### At Table Drop Time
387+
388+
When a table is dropped, remove its lineage entries:
389+
390+
```python
391+
def drop_table(table_class):
392+
# ... drop the table ...
393+
394+
# Clean up lineage entries
395+
delete_lineage_entries(schema, table_name)
396+
```
397+
378398
### Migration for Existing Tables
379399

380400
For existing schemas without `~lineage` tables:

0 commit comments

Comments
 (0)