-
Notifications
You must be signed in to change notification settings - Fork 18
Fix Iceberg ARRAY columns with dot-separated names returning empty lists #1894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
ab327d2
ef84f17
b0b731e
5f6b0cf
7524aed
def81c2
c7fd56b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| ok |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| #!/usr/bin/env bash | ||
| # Regression test for #90731. | ||
| # ColumnsDescription::getAllRegisteredNames must include columns whose names | ||
| # contain a dot, so they appear in IHints suggestions after a typo. | ||
|
|
||
| CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) | ||
| # shellcheck source=../shell_config.sh | ||
| . "$CURDIR"/../shell_config.sh | ||
|
|
||
| $CLICKHOUSE_CLIENT -q " | ||
| CREATE TABLE t_dotted_hint (\`a.b\` Array(String)) | ||
| ENGINE = MergeTree ORDER BY tuple(); | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same with this test. It's testing
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Valid point. The fix isn't in Iceberg code — the bug surfaced via Iceberg, but the root cause was in engine-agnostic machinery:
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This also tells the missing hole of Iceberg test suite I've shared with you earlier and I will share my findings into our team channel
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The fix isn't in Iceberg-specific code — the bug surfaced via Iceberg, but the root cause was in engine-agnostic machinery: the dot-name filter in |
||
| " | ||
|
|
||
| # Misspell the column name; the dotted name must appear in the IHints suggestion | ||
| # list ("Maybe you meant: [...]"), which is produced from | ||
| # ColumnsDescription::getAllRegisteredNames. If that method filtered out dotted | ||
| # names again, the suggestion list would be empty and this would fail. | ||
| $CLICKHOUSE_CLIENT -q "ALTER TABLE t_dotted_hint MODIFY COLUMN a_b Array(String);" 2>&1 \ | ||
| | grep -qF "Maybe you meant: ['a.b']" && echo "ok" || echo "FAIL" | ||
|
|
||
| $CLICKHOUSE_CLIENT -q "DROP TABLE t_dotted_hint;" | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| ['x','y','z'] | ||
| ['x','y','z'] | ||
| ['a','b','c'] [1,2] ['p','q'] | ||
| ['a','b','c'] [1,2] ['p','q'] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| #!/usr/bin/env bash | ||
| # Tags: no-fasttest | ||
| # Regression test for https://github.com/ClickHouse/ClickHouse/issues/90731. | ||
| # An Iceberg ARRAY column whose name literally contains a dot (e.g. `a.b`) must | ||
| # read back its stored values, not an empty array. The mixed case also checks a | ||
| # lone dotted column next to several dotted columns that share a prefix | ||
| # (`c.x`, `c.y`): all of them must round-trip correctly. | ||
|
|
||
| CUR_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) | ||
| # shellcheck source=../shell_config.sh | ||
| . "$CUR_DIR"/../shell_config.sh | ||
|
|
||
| LONE_PATH="${CLICKHOUSE_USER_FILES}/lakehouses/${CLICKHOUSE_DATABASE}_iceberg_dotted_lone" | ||
| MIXED_PATH="${CLICKHOUSE_USER_FILES}/lakehouses/${CLICKHOUSE_DATABASE}_iceberg_dotted_mixed" | ||
| rm -rf "${LONE_PATH}" "${MIXED_PATH}" | ||
|
|
||
| # 1) A lone dotted Array column must read its stored values, not []. | ||
| ${CLICKHOUSE_CLIENT} --query " | ||
| SET allow_experimental_insert_into_iceberg = 1; | ||
| DROP TABLE IF EXISTS t_iceberg_dotted_lone; | ||
| CREATE TABLE t_iceberg_dotted_lone (\`a.b\` Array(String)) ENGINE = IcebergLocal('${LONE_PATH}', 'Parquet'); | ||
| INSERT INTO t_iceberg_dotted_lone (\`a.b\`) SELECT ['x', 'y', 'z']; | ||
| " | ||
| # Read through the Iceberg engine table ... | ||
| ${CLICKHOUSE_CLIENT} --query "SELECT \`a.b\` FROM t_iceberg_dotted_lone" | ||
| # ... and through the icebergLocal table function with an explicitly-declared schema. | ||
| ${CLICKHOUSE_CLIENT} --query "SELECT \`a.b\` FROM icebergLocal('${LONE_PATH}', 'Parquet', '\`a.b\` Array(String)')" | ||
|
|
||
| # 2) A lone dotted column alongside dotted columns that share a prefix (c.x, c.y). | ||
| ${CLICKHOUSE_CLIENT} --query " | ||
| SET allow_experimental_insert_into_iceberg = 1; | ||
| DROP TABLE IF EXISTS t_iceberg_dotted_mixed; | ||
| CREATE TABLE t_iceberg_dotted_mixed (\`a.b\` Array(String), \`c.x\` Array(Int32), \`c.y\` Array(String)) ENGINE = IcebergLocal('${MIXED_PATH}', 'Parquet'); | ||
| INSERT INTO t_iceberg_dotted_mixed (\`a.b\`, \`c.x\`, \`c.y\`) SELECT ['a', 'b', 'c'], [1, 2], ['p', 'q']; | ||
| " | ||
| ${CLICKHOUSE_CLIENT} --query "SELECT \`a.b\`, \`c.x\`, \`c.y\` FROM t_iceberg_dotted_mixed" | ||
| ${CLICKHOUSE_CLIENT} --query "SELECT \`a.b\`, \`c.x\`, \`c.y\` FROM icebergLocal('${MIXED_PATH}', 'Parquet', '\`a.b\` Array(String), \`c.x\` Array(Int32), \`c.y\` Array(String)')" | ||
|
|
||
| # Cleanup | ||
| ${CLICKHOUSE_CLIENT} --query "DROP TABLE IF EXISTS t_iceberg_dotted_lone" | ||
| ${CLICKHOUSE_CLIENT} --query "DROP TABLE IF EXISTS t_iceberg_dotted_mixed" | ||
| rm -rf "${LONE_PATH}" "${MIXED_PATH}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now PR only remove this single line.
This code was added in ClickHouse#35852
Can this change break hints?
For example return all
database.tablecombinations instead of onlydatabase, or something like this?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this check was added, it is possible it was added by some reason in that PR. Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked into both — why the filter was there, and whether removing it breaks hints. Short answer: it's safe, and the
database.tablecase can't arise here.Why ClickHouse#35852 added it. That PR introduced
ColumnsDescription::getAllRegisteredNamestogether with the dot-filter as part of a new feature (column-name hints). The filter was a design choice to keep flattened-Nested subcolumns (e.g.n.a) out of the suggestion list — not a guard added to fix a specific breakage. So removing it doesn't re-open a known bug; it only makes dotted / Nested-subcolumn names eligible as suggestions.It's column-scoped — so no
database.table. This method iterates one table'scolumnsand returns eachcolumn.name. It only ever returns column names of a single table, never database or table names. The.ina.bis part of a literal column name, not adb.tableseparator, so nodatabase.tablecombination can originate here.No consumer splits on
.. The list feeds only theIHintsprompter (NamePrompter::getHints), which is pure Levenshtein edit-distance over whole candidate strings — it never tokenizes on.. The consumers are the "no such column" /ALTER MODIFY|RENAME|DROPhint messages, plusStorageSystemZooKeeper(whose columns are hardcoded and contain no dot, so the change is a no-op there). A dotted name is just one more intact candidate; it appears only if it's an edit-distance near-miss of the typed name.Observed output (filter removed):
Multi-dot names stay whole; a far typo yields no suggestion. Removing the filter strictly adds correct near-match candidates and never malforms one. (I inferred the "before" from the single removed line plus the
xyzno-match behavior rather than building a separate base binary — happy to produce a literal side-by-side if you'd like one.)I also tightened
04260to assert the dotted name appears specifically inside theMaybe you meant: [...]list, so if the filter were ever re-added (emptying that list) the test fails directly.