Skip to content

add schema change test for variant and inverted index#61746

Open
eldenmoon wants to merge 1 commit intoapache:masterfrom
eldenmoon:sc
Open

add schema change test for variant and inverted index#61746
eldenmoon wants to merge 1 commit intoapache:masterfrom
eldenmoon:sc

Conversation

@eldenmoon
Copy link
Member

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Copilot AI review requested due to automatic review settings March 26, 2026 04:38
@eldenmoon
Copy link
Member Author

run buildall

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new cloud-only regression test to validate schema change behavior for VARIANT columns when multiple inverted indexes exist, and verifies that inverted index files and bloom filter settings remain usable after schema changes.

Changes:

  • Introduce test_variant_multi_index_schema_change regression suite (cloud mode only).
  • Create/load a VARIANT table with multiple inverted indexes, then run schema changes (MODIFY COLUMN nullability; SET bloom_filter_columns).
  • Assert query correctness (MATCH_PHRASE and bloom-filtered predicates) and verify nested index files via BE HTTP API.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

String backendId = tablets[0].BackendId
String ip = backendIdToBackendIP.get(backendId)
String port = backendIdToBackendHttpPort.get(backendId)
check_nested_index_file(ip, port, tabletId, 2, 3, "V2")
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check_nested_index_file(...) hard-codes index storage format as "V2", but the table DDL does not set "inverted_index_storage_format" in PROPERTIES. This makes the assertion dependent on cluster defaults (and could flip to V3/V1), causing test flakiness. Consider setting "inverted_index_storage_format" = "V2" explicitly in the CREATE TABLE PROPERTIES (or loosening the assertion to accept the actual format returned by the API).

Suggested change
check_nested_index_file(ip, port, tabletId, 2, 3, "V2")
check_nested_index_file(ip, port, tabletId, 2, 3, null)

Copilot uses AI. Check for mistakes.
String backendId = tablets[0].BackendId
String ip = backendIdToBackendIP.get(backendId)
String port = backendIdToBackendHttpPort.get(backendId)
check_nested_index_file(ip, port, tabletId, 2, 3, "V2")
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check_nested_index_file(ip, port, tabletId, 2, 3, ...) assumes there will be exactly 2 rowsets after trigger_and_wait_compaction(tableName, "cumulative"). Rowset count after compaction is environment/strategy dependent, and other regression tests compute activeRowsetCount from be_show_tablet_status rather than hard-coding it. To avoid flaky failures, consider deriving the expected rowset count dynamically (or asserting a range / invariant like "< before"), then pass that value into check_nested_index_file.

Suggested change
check_nested_index_file(ip, port, tabletId, 2, 3, "V2")
// Derive the active rowset count dynamically to avoid hard-coded assumptions
def tabletStatusList = be_show_tablet_status(tableName)
int activeRowsetCount = 0
for (status in tabletStatusList) {
if (status.TabletId == tabletId) {
// Use the rowset count reported by the backend for this tablet
activeRowsetCount = (status.RowsetCount as int)
break
}
}
assertTrue(activeRowsetCount > 0)
check_nested_index_file(ip, port, tabletId, activeRowsetCount, 3, "V2")

Copilot uses AI. Check for mistakes.
String backendId = tablets[0].BackendId
String ip = backendIdToBackendIP.get(backendId)
String port = backendIdToBackendHttpPort.get(backendId)
check_nested_index_file(ip, port, tabletId, 2, 3, "V2")
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expected per-segment indices_count is set to 3, but the table defines two inverted indexes on var and the inserted JSON has two extracted paths (string, array_string). A very similar variant multi-index test asserts indices_count = 4 for this shape (2 indexes × 2 paths). Unless there is a documented reason this schema change produces only 3 indices, this expectation is likely incorrect and will fail. Consider aligning the expected indices_count with the actual number of built indices (e.g., 4) or deriving it from SHOW INDEX/returned JSON instead of hard-coding.

Suggested change
check_nested_index_file(ip, port, tabletId, 2, 3, "V2")
check_nested_index_file(ip, port, tabletId, 2, 4, "V2")

Copilot uses AI. Check for mistakes.
@eldenmoon
Copy link
Member Author

run buildall

@eldenmoon
Copy link
Member Author

run buildall

1 similar comment
@eldenmoon
Copy link
Member Author

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants