@@ -213,24 +213,14 @@ class SchemaGraph:
213213
214214### Phase 1: Add Lineage Infrastructure
215215
216- 1 . ** Add ` lineage ` and ` lineage_hash ` fields to ` Attribute ` ** (` heading.py ` )
216+ 1 . ** Add ` lineage ` field to ` Attribute ` ** (` heading.py ` )
217217 - ` lineage ` : string ` "schema.table.attribute" ` or ` None `
218- - ` lineage_hash ` : 8-byte hash for fast comparison
219- - Add both to ` default_attribute_properties ` with default ` None `
220-
221- ``` python
222- def compute_lineage_hash (lineage ):
223- """ Compute a short hash for fast lineage comparison."""
224- if lineage is None :
225- return None
226- # Use first 8 bytes of SHA-256 for compact representation
227- return hashlib.sha256(lineage.encode()).digest()[:8 ]
228- ```
218+ - Add to ` default_attribute_properties ` with default ` None `
229219
230220 ** Comparison strategy** :
231- - Compare ` lineage_hash ` only (8-byte comparison )
232- - Hash collisions (1 in 2^64) are acceptable given the low probability and cost
233- - ` None ` lineage never matches anything
221+ - Direct string comparison (simple equality check )
222+ - Lineage strings are short ( ~ 50-100 chars) and comparisons short-circuit on first difference
223+ - ` None ` lineage never matches anything (including other ` None ` )
234224
2352252 . ** Create ` ~lineage ` table management** (new file: ` datajoint/lineage.py ` )
236226 - ` LineageTable ` class (similar to ` ExternalTable ` )
@@ -384,9 +374,7 @@ CREATE TABLE `~lineage` (
384374 table_name VARCHAR (64 ) NOT NULL ,
385375 attribute_name VARCHAR (64 ) NOT NULL ,
386376 lineage VARCHAR (200 ) NOT NULL , -- "schema.table.attribute"
387- lineage_hash BINARY(8 ) NOT NULL , -- fast comparison hash
388- PRIMARY KEY (table_name, attribute_name),
389- INDEX idx_lineage_hash (lineage_hash)
377+ PRIMARY KEY (table_name, attribute_name)
390378) ENGINE= InnoDB;
391379```
392380
@@ -396,10 +384,8 @@ CREATE TABLE "~lineage" (
396384 table_name VARCHAR (64 ) NOT NULL ,
397385 attribute_name VARCHAR (64 ) NOT NULL ,
398386 lineage VARCHAR (200 ) NOT NULL , -- "schema.table.attribute"
399- lineage_hash BYTEA NOT NULL , -- 8 bytes
400387 PRIMARY KEY (table_name, attribute_name)
401388);
402- CREATE INDEX idx_lineage_hash ON " ~lineage" (lineage_hash);
403389```
404390
405391#### Lineage Lookup
@@ -410,7 +396,7 @@ When a `Heading` is initialized from a table, query the `~lineage` table:
410396def _load_lineage (self , connection , database , table_name ):
411397 """ Load lineage information from the ~lineage metadata table."""
412398 query = """
413- SELECT attribute_name, lineage, lineage_hash
399+ SELECT attribute_name, lineage
414400 FROM `{database} `.`~lineage`
415401 WHERE table_name = %s
416402 """ .format(database = database)
@@ -591,19 +577,17 @@ WHERE c.contype = 'f'
591577
592578## Performance Considerations
593579
594- 1 . ** Memory** : Two additional fields per attribute
580+ 1 . ** Memory** : One additional field per attribute
595581 - ` lineage ` : string ` "schema.table.attribute" ` (~ 50-100 bytes typical) or ` None `
596- - ` lineage_hash ` : 8 bytes (fixed)
597582
598- 2 . ** Comparison** : Hash-only comparison
599- - Compare 8-byte ` lineage_hash ` values (single integer comparison)
600- - No fallback verification needed - collision probability (1 in 2^64) is negligible
601- - ` None ` hashes never match
583+ 2 . ** Comparison** : Direct string comparison
584+ - Short strings ( ~ 50-100 chars) with early short-circuit on difference
585+ - Only compared for namesake attributes (same name in both tables)
586+ - ` None ` lineage never matches anything
602587
6035883 . ** Storage** : Small overhead in ` ~lineage ` table
604- - ~ 150 bytes per attribute (table_name + attribute_name + lineage string + hash )
589+ - ~ 130 bytes per attribute (table_name + attribute_name + lineage string)
605590 - Indexed by (table_name, attribute_name) for fast lookup
606- - Secondary index on ` lineage_hash ` for potential future optimizations
607591
6085924 . ** Dependency loading** : Required before joins
609593 - Already cached per connection (` connection.dependencies ` )
0 commit comments