-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Is your feature request related to a problem? Please describe.
Hybrid tables enable us to build tiered storage between MergeTree and Iceberg. This works in general but there is a problem when MergeTree tables use datatypes or function transformations that are not supported in Iceberg schema. Here's an example:
CREATE TABLE test
(
arr Array(String)
) ENGINE = MergeTree()
ORDER BY (arr[1])
We can work around this problem as follow by adding an ALIAS that is generated in Parquet and allows us to assign a sort order on the Iceberg side.
CREATE TABLE test
(
arr Array(String),
_arr_1 String MATERIALIZED arr[1]
) ENGINE = MergeTree()
ORDER BY (_arr_1)
The problem: when we query over this with a Hybrid table while filtering using arr[1], we need a code change to query by _arr_1 instead of arr[1]. Otherwise ClickHouse can't see that it's following the sort order on the Iceberg side and the query results are extremely slow. This means we can't roll out tiered storage transparently to application code.
Describe the solution you'd like
Automatically restructure the query to use _arr_1 on the Iceberg side. The change should be fully transparent to calling applications.
It's OK to assume that users will modify the table to add an ALIAS or other properties because these change are only visible in the schema. The main caveat is that it must be possible to make these mods transparently and safely on production systems.
Describe alternatives you've considered
- Change application code. This prevents tiering from being used because it's not transparent. For some users it may take months to make these changes in production systems.
Additional context
This is a case of a more general problem, which is to map sort orders and partitioning from MergeTree to Iceberg. Here are the sticking points:
- Sort orders are limited to scalar values plus a limited set of transforms in Iceberg. ClickHouse can use array elements or expressions.
- Partition keys have a similar issue.
See the following links for more information: