Skip to content

feat(snowflake)!: Transpilation support for HASH function#7385

Closed
fivetran-ashashankar wants to merge 1 commit intomainfrom
RD-1069387-transpile-HASH
Closed

feat(snowflake)!: Transpilation support for HASH function#7385
fivetran-ashashankar wants to merge 1 commit intomainfrom
RD-1069387-transpile-HASH

Conversation

@fivetran-ashashankar
Copy link
Copy Markdown
Collaborator

No description provided.

Comment on lines +4157 to +4171
if any(isinstance(arg, exp.Star) for arg in expression.expressions):
select = expression.find_ancestor(exp.Select)
if not select:
self.unsupported("HASH(*) requires a SELECT context")
return self.func("HASH", *expression.expressions)

from_clause = select.args.get("from_")
if not from_clause:
self.unsupported("HASH(*) requires a FROM clause")
return self.func("HASH", *expression.expressions)

table = from_clause.this
table_alias = table.alias_or_name

return f"HASH(UNPACK(COLUMNS({table_alias}.*)))"
Copy link
Copy Markdown
Collaborator

@geooo109 geooo109 Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this logic cover all the cases ?

What happens when we have a join ?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also there is no need to check for unsupported.

Comment on lines +4157 to +4171
if any(isinstance(arg, exp.Star) for arg in expression.expressions):
select = expression.find_ancestor(exp.Select)
if not select:
self.unsupported("HASH(*) requires a SELECT context")
return self.func("HASH", *expression.expressions)

from_clause = select.args.get("from_")
if not from_clause:
self.unsupported("HASH(*) requires a FROM clause")
return self.func("HASH", *expression.expressions)

table = from_clause.this
table_alias = table.alias_or_name

return f"HASH(UNPACK(COLUMNS({table_alias}.*)))"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you take into account this case ?

Snowflake:
  SELECT HASH(x) FROM (
      SELECT 2.0 AS x
      UNION ALL
      SELECT 2.0 AS x
  );
> -3690131753453205264
-3690131753453205264

SELECT HASH(x) FROM (
      SELECT 2 AS x
      UNION ALL
      SELECT 2 AS x
  );
> -3690131753453205264
-3690131753453205264

Both cases return the same values in snowflake.
======================================
Duckdb:
memory D   SELECT HASH(x) FROM (
               SELECT 2 AS x
               UNION ALL
               SELECT 2 AS x
           );
┌─────────────────────┐
│       hash(x)       │
│       uint64        │
├─────────────────────┤
│ 2060787363917578834 │
│ 2060787363917578834 │
└─────────────────────┘
memory D SELECT HASH(x) FROM (
               SELECT 2.0 AS x
               UNION ALL
               SELECT 2.0 AS x
           );
┌─────────────────────┐
│       hash(x)       │
│       uint64        │
├─────────────────────┤
│ 8094069980479725634 │
│ 8094069980479725634 │

From snowflake docs https://docs.snowflake.com/en/sql-reference/functions/hash#usage-notes :

Any two values of type NUMBER that compare equally will hash to the same hash value, even if the respective types have different precision and/or scale.

Any two values of type FLOAT that can be converted to NUMBER(38, 0) without loss of precision will hash to the same value. For example, the following all return the same hash value:

HASH(10::NUMBER(38,0))

HASH(10::NUMBER(5,3))

HASH(10::FLOAT)

Comment on lines +4157 to +4171
if any(isinstance(arg, exp.Star) for arg in expression.expressions):
select = expression.find_ancestor(exp.Select)
if not select:
self.unsupported("HASH(*) requires a SELECT context")
return self.func("HASH", *expression.expressions)

from_clause = select.args.get("from_")
if not from_clause:
self.unsupported("HASH(*) requires a FROM clause")
return self.func("HASH", *expression.expressions)

table = from_clause.this
table_alias = table.alias_or_name

return f"HASH(UNPACK(COLUMNS({table_alias}.*)))"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid generating raw strings with f-string. We have to create expressions and generate them.

Comment on lines +407 to +409
class Hash(Expression, Func):
arg_types = {"expressions": True}
is_var_len_args = True
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check if HASH exists in other dialects ?


from_clause = select.args.get("from_")
if not from_clause:
self.unsupported("HASH(*) requires a FROM clause")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HASH(*) implementation is only handling *and missing other Snowflake variants from the docs:

(* ILIKE 'col1%') - (could be unsupported)
(* EXCLUDE col1)
(mytable.*)
sqlglot.transpile('SELECT HASH(t.*) FROM (SELECT 1 AS a, 2 AS b, 3 AS c) t', read='snowflake', write='duckdb')[0])" | duckdb
Binder Error:
No function matches the given name and argument types 'hash()'. You might need to add explicit type casts.
        Candidate functions:
        hash(ANY, [ANY...]) -> UBIGINT


LINE 1: SELECT HASH(t.*) FROM (SELECT 1 AS a, 2 AS b, 3 AS c) AS t

We could perhaps try this:
HASH(UNPACK(COLUMNS(* EXCLUDE a))) & HASH(UNPACK(COLUMNS(t.*)))
instead of
f"HASH(UNPACK(COLUMNS({table_alias}.*)))"

@geooo109
Copy link
Copy Markdown
Collaborator

I will close this for now, it's complicated and the solution isn't trivial.

@geooo109 geooo109 closed this Mar 26, 2026
@georgesittas georgesittas deleted the RD-1069387-transpile-HASH branch March 26, 2026 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants