Support commit ts in TiFlash#10723
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Review Completed ✅Status: 4 issues identified and posted as PR review comments Findings Summary:
Review Links:
Issues Found:
Recommendation: Address the P0 and P1 issues before merging to ensure the code builds successfully. |
📝 WalkthroughWalkthroughAdds support for TiDB's hidden extra commit_ts column (ColumnID -5) by aliasing it to the internal version column across storage, filter, remote/schema generation, and expression analysis; includes casts after table-scan when requested types differ and adds tests covering aliasing and casting. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Coprocessor as DAG/Coprocessor
participant Storage as DeltaMerge/Storage
participant Filter as FilterParser/RSOperator
Client->>Coprocessor: Request scan referencing extra_commit_ts (col -5)
Coprocessor->>Storage: Resolve columns for table scan (map -5 -> version)
Storage-->>Coprocessor: Return schema with version column
Coprocessor->>Coprocessor: genNamesAndTypes/GenSchema (emit version name)
Coprocessor->>Coprocessor: DAGExpressionAnalyzer (if requested type != actual) add cast after table-scan
Client->>Filter: Push-down filter uses col -5
Filter->>Storage: Remap -5 -> version_col_id for evaluation
Filter-->>Client: Filter applied using version column
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@dbms/src/Flash/Coprocessor/GenSchemaAndColumn.cpp`:
- Around line 136-138: The throw in the switch case for
MutSup::extra_commit_ts_col_id in GenSchemaAndColumn.cpp omits an ErrorCodes
value; update the throw to include a DB::ErrorCodes enum (e.g.,
ErrorCodes::NOT_IMPLEMENTED) so it follows the pattern throw Exception("Not
supported in disaggregated read now", ErrorCodes::NOT_IMPLEMENTED); — locate the
case handling MutSup::extra_commit_ts_col_id and replace the current throw
Exception(...) with one that passes the appropriate ErrorCodes::... constant.
In `@dbms/src/Storages/MutableSupport.h`:
- Line 46: Rename the snake_case constant extra_commit_ts_col_id to camelCase
extraCommitTsColId and update every usage and reference accordingly; locate the
declaration inline static constexpr ColumnID extra_commit_ts_col_id in
MutableSupport.h and change the identifier, then update all call sites, static
casts, comments, and any tests or documentation that use extra_commit_ts_col_id
so they reference extraCommitTsColId to comply with the project's naming
guideline.
In `@dbms/src/Storages/tests/gtest_hidden_commit_ts_column.cpp`:
- Line 46: Update the inline comment that reads "Maybe another test has already
registed, ignore exception here." to correct the typo: change "registed" to
"registered" so it reads "Maybe another test has already registered, ignore
exception here." — edit this comment in gtest_hidden_commit_ts_column.cpp (the
comment string above the exception-handling block) to fix the spelling only.
| case MutSup::extra_commit_ts_col_id: | ||
| throw Exception("Not supported in disaggregated read now"); | ||
| default: |
There was a problem hiding this comment.
Add an ErrorCodes value to this Exception.
The current throw omits an error code, which makes error handling inconsistent.
🔧 Suggested fix
- case MutSup::extra_commit_ts_col_id:
- throw Exception("Not supported in disaggregated read now");
+ case MutSup::extra_commit_ts_col_id:
+ throw Exception("Not supported in disaggregated read now", ErrorCodes::NOT_IMPLEMENTED);🤖 Prompt for AI Agents
In `@dbms/src/Flash/Coprocessor/GenSchemaAndColumn.cpp` around lines 136 - 138,
The throw in the switch case for MutSup::extra_commit_ts_col_id in
GenSchemaAndColumn.cpp omits an ErrorCodes value; update the throw to include a
DB::ErrorCodes enum (e.g., ErrorCodes::NOT_IMPLEMENTED) so it follows the
pattern throw Exception("Not supported in disaggregated read now",
ErrorCodes::NOT_IMPLEMENTED); — locate the case handling
MutSup::extra_commit_ts_col_id and replace the current throw Exception(...) with
one that passes the appropriate ErrorCodes::... constant.
|
|
||
| inline static constexpr ColumnID extra_handle_id = -1; | ||
| inline static constexpr ColumnID extra_table_id_col_id = -3; | ||
| inline static constexpr ColumnID extra_commit_ts_col_id = -5; |
There was a problem hiding this comment.
Rename to camelCase to match naming guideline.
extra_commit_ts_col_id introduces a new snake_case identifier.
🔧 Suggested rename (plus update all usages)
- inline static constexpr ColumnID extra_commit_ts_col_id = -5;
+ inline static constexpr ColumnID extraCommitTsColId = -5;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| inline static constexpr ColumnID extra_commit_ts_col_id = -5; | |
| inline static constexpr ColumnID extraCommitTsColId = -5; |
🤖 Prompt for AI Agents
In `@dbms/src/Storages/MutableSupport.h` at line 46, Rename the snake_case
constant extra_commit_ts_col_id to camelCase extraCommitTsColId and update every
usage and reference accordingly; locate the declaration inline static constexpr
ColumnID extra_commit_ts_col_id in MutableSupport.h and change the identifier,
then update all call sites, static casts, comments, and any tests or
documentation that use extra_commit_ts_col_id so they reference
extraCommitTsColId to comply with the project's naming guideline.
| } | ||
| catch (DB::Exception &) | ||
| { | ||
| // Maybe another test has already registed, ignore exception here. |
There was a problem hiding this comment.
Typo: "registed" → "registered".
- // Maybe another test has already registed, ignore exception here.
+ // Maybe another test has already registered, ignore exception here.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Maybe another test has already registed, ignore exception here. | |
| // Maybe another test has already registered, ignore exception here. |
🤖 Prompt for AI Agents
In `@dbms/src/Storages/tests/gtest_hidden_commit_ts_column.cpp` at line 46, Update
the inline comment that reads "Maybe another test has already registed, ignore
exception here." to correct the typo: change "registed" to "registered" so it
reads "Maybe another test has already registered, ignore exception here." — edit
this comment in gtest_hidden_commit_ts_column.cpp (the comment string above the
exception-handling block) to fix the spelling only.
| has_cast = true; | ||
| } | ||
|
|
||
| if (may_need_add_cast_column[i] && table_scan_columns[i].id == ExtraCommitTSColumnID) |
There was a problem hiding this comment.
P0 - Undefined ExtraCommitTSColumnID breaks build
ExtraCommitTSColumnID is referenced but not defined anywhere in the codebase, causing build failure.
Evidence: This file is part of the default flash_service library build. A repo-wide search shows only uses (no definition) at line 1261 and in test file.
Likely fix: Use MutSup::extra_commit_ts_col_id (defined in Storages/MutableSupport.h:46).
| ColumnID col_id = col_info.id; | ||
| // TiDB may request a hidden commit_ts column in TableScan with a special ColumnID. | ||
| // In TiFlash it is stored in `_INTERNAL_VERSION` (VersionColumnID), so create an alias mapping. | ||
| if (unlikely(col_id == MutSup::extra_commit_ts_col_id)) |
There was a problem hiding this comment.
P2 - Rough set filter commit_ts alias uses wrong map key
RSOperator::build intends to alias TiDB's hidden commit_ts ColumnID (-5) to TiFlash's internal version ColumnID (-1024), but only populates the map under key -1024. Later, FilterParser resolves ColumnRef to the original ColumnID (-5) and calls id_to_attr.at(-5), which will throw std::out_of_range if a filter references commit_ts.
Evidence: Line 76 rewrites col_id from -5 to -1024, but map insertion at line 89 uses column_id_to_attr[cd.id], creating key -1024 only. Later lookup at FilterParser.cpp:168 uses the original -5.
Suggested fix: Populate the map under both keys, or adjust the lookup to use the aliased key.
| #include <IO/Buffer/WriteBufferFromString.h> | ||
| #include <Interpreters/Context.h> | ||
| #include <Storages/DeltaMerge/DeltaMergeDefines.h> | ||
| #include <Storages/DeltaMerge/Filter/PushDownFilter.h> |
There was a problem hiding this comment.
P1 - New gtest does not compile (multiple issues)
This test file has several compilation errors that block the gtests_dbms build:
- Line 23: Includes non-existent header
<Storages/DeltaMerge/Filter/PushDownFilter.h>(directory containsPushDownExecutor.*instead) - Lines 63, 92, 94, 225: References undefined constants
ExtraCommitTSColumnID,VERSION_COLUMN_ID,VERSION_COLUMN_NAME,VERSION_COLUMN_TYPE, and non-existentDM::PushDownFilter::build(...)API - Line 215: Calls
DAGQueryInfoconstructor with wrong signature (missingfts_query_infoparameter)
Evidence: Current code uses DB::MutSup::{extra_commit_ts_col_id, version_col_id, version_column_name, getVersionColumnType()} and DM::getVersionColumnDefine() instead.
|
@pantheon-ai[bot]: adding LGTM is restricted to approvers and reviewers in OWNERS files. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
| break; | ||
| } | ||
| case MutSup::extra_commit_ts_col_id: | ||
| throw Exception("Not supported in disaggregated read now"); |
There was a problem hiding this comment.
P2 - Disaggregated read rejects commit_ts
Disaggregated compute mode's TableScan path unconditionally throws when it encounters the hidden commit_ts ColumnID (MutSup::extra_commit_ts_col_id, -5), so any DAG request that includes that column will fail in disaggregated mode.
Evidence:
- Line 137:
case MutSup::extra_commit_ts_col_id: throw Exception(\"Not supported in disaggregated read now\"); - Called from
StorageDisaggregatedRemote.cpp:759and:833 - Non-disaggregated paths map -5 to
_INTERNAL_VERSIONinstead of rejecting it
Suggested fix: Implement the same aliasing behavior (-5 → version column) for disaggregated mode instead of throwing.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
dbms/src/Storages/tests/gtest_hidden_commit_ts_column.cpp (1)
62-245: Use camelCase for C++ local variables in this file.Several new locals are snake_case (for example
commit_ts_ci,table_scan_column_info,pushed_down_filters,runtime_filter_ids). Please align with repo C++ naming convention for consistency.As per coding guidelines: “Method and variable names in C++ must use camelCase (e.g.,
readBlock,totalBytes).”🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dbms/src/Storages/tests/gtest_hidden_commit_ts_column.cpp` around lines 62 - 245, The test uses snake_case local variable names; rename them to camelCase everywhere to follow C++ conventions—e.g., change commit_ts_ci -> commitTsCi, table_scan_column_info -> tableScanColumnInfo, pushed_down_filters -> pushedDownFilters, may_need_add_cast_column -> mayNeedAddCastColumn, casted_columns -> castedColumns, extra_cast -> extraCast, scan_column_infos -> scanColumnInfos, runtime_filter_ids -> runtimeFilterIds, table_column_defines -> tableColumnDefines (and any other snake_case locals) and update all references (calls to getDataTypeByColumnInfoForComputingLayer, PushDownExecutor::build, DM::RSOperator::build, DAGQueryInfo construction, analyzer.buildExtraCastsAfterTS, etc.) so identifiers and their usages remain consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@dbms/src/Storages/tests/gtest_hidden_commit_ts_column.cpp`:
- Around line 62-245: The test uses snake_case local variable names; rename them
to camelCase everywhere to follow C++ conventions—e.g., change commit_ts_ci ->
commitTsCi, table_scan_column_info -> tableScanColumnInfo, pushed_down_filters
-> pushedDownFilters, may_need_add_cast_column -> mayNeedAddCastColumn,
casted_columns -> castedColumns, extra_cast -> extraCast, scan_column_infos ->
scanColumnInfos, runtime_filter_ids -> runtimeFilterIds, table_column_defines ->
tableColumnDefines (and any other snake_case locals) and update all references
(calls to getDataTypeByColumnInfoForComputingLayer, PushDownExecutor::build,
DM::RSOperator::build, DAGQueryInfo construction,
analyzer.buildExtraCastsAfterTS, etc.) so identifiers and their usages remain
consistent.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
dbms/src/Flash/Coprocessor/DAGExpressionAnalyzer.cppdbms/src/Flash/Coprocessor/GenSchemaAndColumn.cppdbms/src/Storages/DeltaMerge/Filter/RSOperator.cppdbms/src/Storages/tests/gtest_hidden_commit_ts_column.cpp
🚧 Files skipped from review as they are similar to previous changes (2)
- dbms/src/Flash/Coprocessor/DAGExpressionAnalyzer.cpp
- dbms/src/Storages/DeltaMerge/Filter/RSOperator.cpp
What problem does this PR solve?
Issue Number: ref #10715
Problem Summary:
What is changed and how it works?
Check List
Tests
Side effects
Documentation
Release note
Summary by CodeRabbit
New Features
Tests