feat(snowflake)!: Transpilation support for HASH function#7385
feat(snowflake)!: Transpilation support for HASH function#7385fivetran-ashashankar wants to merge 1 commit intomainfrom
Conversation
| if any(isinstance(arg, exp.Star) for arg in expression.expressions): | ||
| select = expression.find_ancestor(exp.Select) | ||
| if not select: | ||
| self.unsupported("HASH(*) requires a SELECT context") | ||
| return self.func("HASH", *expression.expressions) | ||
|
|
||
| from_clause = select.args.get("from_") | ||
| if not from_clause: | ||
| self.unsupported("HASH(*) requires a FROM clause") | ||
| return self.func("HASH", *expression.expressions) | ||
|
|
||
| table = from_clause.this | ||
| table_alias = table.alias_or_name | ||
|
|
||
| return f"HASH(UNPACK(COLUMNS({table_alias}.*)))" |
There was a problem hiding this comment.
Does this logic cover all the cases ?
What happens when we have a join ?
There was a problem hiding this comment.
Also there is no need to check for unsupported.
| if any(isinstance(arg, exp.Star) for arg in expression.expressions): | ||
| select = expression.find_ancestor(exp.Select) | ||
| if not select: | ||
| self.unsupported("HASH(*) requires a SELECT context") | ||
| return self.func("HASH", *expression.expressions) | ||
|
|
||
| from_clause = select.args.get("from_") | ||
| if not from_clause: | ||
| self.unsupported("HASH(*) requires a FROM clause") | ||
| return self.func("HASH", *expression.expressions) | ||
|
|
||
| table = from_clause.this | ||
| table_alias = table.alias_or_name | ||
|
|
||
| return f"HASH(UNPACK(COLUMNS({table_alias}.*)))" |
There was a problem hiding this comment.
Did you take into account this case ?
Snowflake:
SELECT HASH(x) FROM (
SELECT 2.0 AS x
UNION ALL
SELECT 2.0 AS x
);
> -3690131753453205264
-3690131753453205264
SELECT HASH(x) FROM (
SELECT 2 AS x
UNION ALL
SELECT 2 AS x
);
> -3690131753453205264
-3690131753453205264
Both cases return the same values in snowflake.
======================================
Duckdb:
memory D SELECT HASH(x) FROM (
SELECT 2 AS x
UNION ALL
SELECT 2 AS x
);
┌─────────────────────┐
│ hash(x) │
│ uint64 │
├─────────────────────┤
│ 2060787363917578834 │
│ 2060787363917578834 │
└─────────────────────┘
memory D SELECT HASH(x) FROM (
SELECT 2.0 AS x
UNION ALL
SELECT 2.0 AS x
);
┌─────────────────────┐
│ hash(x) │
│ uint64 │
├─────────────────────┤
│ 8094069980479725634 │
│ 8094069980479725634 │
From snowflake docs https://docs.snowflake.com/en/sql-reference/functions/hash#usage-notes :
Any two values of type NUMBER that compare equally will hash to the same hash value, even if the respective types have different precision and/or scale.
Any two values of type FLOAT that can be converted to NUMBER(38, 0) without loss of precision will hash to the same value. For example, the following all return the same hash value:
HASH(10::NUMBER(38,0))
HASH(10::NUMBER(5,3))
HASH(10::FLOAT)
| if any(isinstance(arg, exp.Star) for arg in expression.expressions): | ||
| select = expression.find_ancestor(exp.Select) | ||
| if not select: | ||
| self.unsupported("HASH(*) requires a SELECT context") | ||
| return self.func("HASH", *expression.expressions) | ||
|
|
||
| from_clause = select.args.get("from_") | ||
| if not from_clause: | ||
| self.unsupported("HASH(*) requires a FROM clause") | ||
| return self.func("HASH", *expression.expressions) | ||
|
|
||
| table = from_clause.this | ||
| table_alias = table.alias_or_name | ||
|
|
||
| return f"HASH(UNPACK(COLUMNS({table_alias}.*)))" |
There was a problem hiding this comment.
We should avoid generating raw strings with f-string. We have to create expressions and generate them.
| class Hash(Expression, Func): | ||
| arg_types = {"expressions": True} | ||
| is_var_len_args = True |
There was a problem hiding this comment.
Did you check if HASH exists in other dialects ?
|
|
||
| from_clause = select.args.get("from_") | ||
| if not from_clause: | ||
| self.unsupported("HASH(*) requires a FROM clause") |
There was a problem hiding this comment.
The HASH(*) implementation is only handling *and missing other Snowflake variants from the docs:
(* ILIKE 'col1%') - (could be unsupported)
(* EXCLUDE col1)
(mytable.*)
sqlglot.transpile('SELECT HASH(t.*) FROM (SELECT 1 AS a, 2 AS b, 3 AS c) t', read='snowflake', write='duckdb')[0])" | duckdb
Binder Error:
No function matches the given name and argument types 'hash()'. You might need to add explicit type casts.
Candidate functions:
hash(ANY, [ANY...]) -> UBIGINT
LINE 1: SELECT HASH(t.*) FROM (SELECT 1 AS a, 2 AS b, 3 AS c) AS t
We could perhaps try this:
HASH(UNPACK(COLUMNS(* EXCLUDE a))) & HASH(UNPACK(COLUMNS(t.*)))
instead of
f"HASH(UNPACK(COLUMNS({table_alias}.*)))"
|
I will close this for now, it's complicated and the solution isn't trivial. |
No description provided.