release-26.2: sql: fix partial index data loss / phantom rows during update#166325
Open
fqazi wants to merge 1 commit intocockroachdb:release-26.2from
Open
release-26.2: sql: fix partial index data loss / phantom rows during update#166325fqazi wants to merge 1 commit intocockroachdb:release-26.2from
fqazi wants to merge 1 commit intocockroachdb:release-26.2from
Conversation
This commit fixes a bug on tables with multiple column families where a concurrent update that does not overlap with a partial index's column family could cause the partial index to write a NULL instead of the actual data, or incorrectly add phantom rows to a temporary index during a schema change backfill. This bug was previously masked on default (single-column-family) tables because an update to any column causes the optimizer to conservatively fetch all columns in that family. However, with multiple column families, two normalization rules in the optimizer caused issues: 1. PruneMutationFetchCols: If an update does not change any column associated with an index, the optimizer avoids fetching those columns. This causes the execution layer to see NULLs for the unfetched columns. 2. SimplifyPartialIndexProjections: If an update does not change any column associated with a partial index, the optimizer simplifies the partial index predicate evaluation to FALSE. This causes the execution layer to skip writes to the index. During a schema change backfill, the execution layer's updater must unconditionally write complete index entries to temporary (mutating) indexes for any concurrent update, even if the index's columns are unchanged. This ensures the backfill merger has a complete snapshot to correctly reconcile the final index. If columns are pruned (Rule 1) or writes are simplified away (Rule 2), the temporary index receives incomplete entries (NULLs) or misses the update entirely. Furthermore, missing columns can lead to phantom rows if the partial index predicate evaluates to TRUE when given a NULL value (e.g., WHERE val IS NULL). This change ensures the optimizer always fetches the required columns and avoids simplifying predicate evaluation if the index is a mutation index, correctly propagating the full row state to the execution layer. Fixes: cockroachdb#166122 Release note (bug fix): Fixed a bug where concurrent updates to a table using multiple column families during a partial index creation could result in data loss, incorrect NULL values, or validation failures in the resulting index.
2cbc49a to
923c26a
Compare
|
Thanks for opening a backport. Before merging, please confirm that the change does not break backwards compatibility and otherwise complies with the backport policy. Include a brief release justification in the PR description explaining why the backport is appropriate. All backports must be reviewed by the TL for the owning area. While the stricter LTS policy does not yet apply, please exercise judgment and consider gating non-critical changes behind a disabled-by-default feature flag when appropriate. |
Member
michae2
approved these changes
Mar 20, 2026
Collaborator
michae2
left a comment
There was a problem hiding this comment.
@michae2 reviewed 4 files and all commit messages, and made 1 comment.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on mw5h and yuzefovich).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backport 1/1 commits from #166123 on behalf of @fqazi.
sql: fix partial index data loss / phantom rows during update
This commit fixes a bug on tables with multiple column families where a
concurrent update that does not overlap with a partial index's column
family could cause the partial index to write a NULL instead of the
actual data, or incorrectly add phantom rows to a temporary index
during a schema change backfill.
This bug was previously masked on default (single-column-family) tables
because an update to any column causes the optimizer to conservatively
fetch all columns in that family. However, with multiple column families,
two normalization rules in the optimizer caused issues:
associated with an index, the optimizer avoids fetching those
columns. This causes the execution layer to see NULLs for the
unfetched columns.
any column associated with a partial index, the optimizer
simplifies the partial index predicate evaluation to FALSE.
This causes the execution layer to skip writes to the index.
During a schema change backfill, the execution layer's updater must
unconditionally write complete index entries to temporary (mutating)
indexes for any concurrent update, even if the index's columns are
unchanged. This ensures the backfill merger has a complete snapshot
to correctly reconcile the final index. If columns are pruned (Rule 1)
or writes are simplified away (Rule 2), the temporary index receives
incomplete entries (NULLs) or misses the update entirely.
Furthermore, missing columns can lead to phantom rows if the partial
index predicate evaluates to TRUE when given a NULL value (e.g.,
WHERE val IS NULL).
This change ensures the optimizer always fetches the required
columns and avoids simplifying predicate evaluation if the index
is a mutation index, correctly propagating the full row state to the
execution layer.
Fixes: #166122
Release note (bug fix): Fixed a bug where concurrent updates to a table
using multiple column families during a partial index creation could
result in data loss, incorrect NULL values, or validation failures in
the resulting index.
Release justification: addresses a bug that can lead to partial indexes that are corrupt or fail to construct in the face of concurrent updates.