[fix](be) Avoid signaling unbuilt shared hash table#63345
Conversation
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: When a shared hash table builder exits before process_build_block() finishes, for example after an error on the build side, the hash table variant can remain monostate while the builder is not marked terminated. The close() path still set _signaled and woke non-builder tasks, which could mask the original error with a misleading shared hash table type mismatch.
This change only signals non-builder tasks after the shared hash table has actually been built, so failure paths keep _signaled false and non-builders return EOF instead of visiting a monostate hash table.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- ./run-be-ut.sh --run --filter=SharedHashTableSignalTest.UnbuiltHashTableDoesNotSignal -j 8
- build-support/check-format.sh
- build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN (failed on pre-existing hashjoin_build_sink.cpp/hashjoin_build_sink_test.cpp complexity diagnostics and jni-util.h static assertions)
- Behavior changed: No
- Does this need documentation: No
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
There was a problem hiding this comment.
Pull request overview
Fixes a misleading error from shared-hash-table joins where non-builder tasks were woken up even when the builder closed before constructing the hash table, causing a confusing "Hash table type mismatch" error instead of preserving the original build-side error.
Changes:
- Add a
hash_table_built()helper that checks whether the shared hash table variant is no longermonostate. - Gate
_signaled = trueon both!_terminatedandhash_table_built(_shared_state)so non-builders aren't woken into an unbuilt hash table. - Add unit test
UnbuiltHashTableDoesNotSignalverifying_signaledstays false and non-builders receive EOF in this scenario.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| be/src/exec/operator/hashjoin_build_sink.cpp | Gate _signaled on actual hash-table-built state via new helper. |
| be/test/exec/operator/hashjoin_build_sink_test.cpp | New test covering the unbuilt-hash-table close path. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/review |
There was a problem hiding this comment.
I found one blocking correctness issue in the shared hash table failure path.
Critical checkpoint conclusions:
- Goal/test: the PR addresses the monostate shared-hash-table wakeup case and adds a targeted unit test, but the current predicate still allows signaling from failed builder paths after method_variant has been initialized.
- Scope: the change is small and focused, but the readiness predicate is not precise enough for all failure windows.
- Concurrency: the dependency wakeup path remains lock-protected by the operator mutex, but waking non-builders after an unsuccessful builder close can expose partially initialized shared state.
- Lifecycle: no new static/global lifecycle issue found.
- Config/compatibility: no config, protocol, storage format, or rolling-upgrade compatibility concern found.
- Parallel paths: this applies to the shared broadcast hash table path; I did not find another required code path change.
- Special checks: the new check needs to distinguish fully successful build completion from merely non-monostate hash method initialization.
- Tests: the added test covers the monostate-at-close case, but not failure after _hash_table_init()/before full build-side completion.
- Observability/transactions/data writes/FE-BE variables: not applicable.
- Performance: no performance regression found.
User focus: no additional user-provided review focus was supplied.
| if (!_terminated) { | ||
| // Do not wake non-builders into a monostate hash table after early termination/errors. | ||
| const auto should_signal = !_terminated && hash_table_built(_shared_state); | ||
| if (should_signal) { |
There was a problem hiding this comment.
This still signals non-builder tasks when the builder is closing after a failed build path as long as _hash_table_init() has already changed the first variant away from std::monostate. For example, in shared broadcast mode _hash_table_init() initializes variants before process_build_block()/build_asof_index() fully complete; if a later variant initialization fails, or build_asof_index() returns an error after the hash table variant is initialized, close(exec_status) will run with _terminated == false and hash_table_built(_shared_state) == true, set _signaled, and wake non-builders to copy/probe shared state from a failed builder. That can still mask the original failure with the shared-table type mismatch or expose incomplete ASOF state. Please gate signaling on successful builder completion as well, for example by checking exec_status.ok() or by setting an explicit build_completed flag only after all build-side steps succeed.
TPC-H: Total hot run time: 31476 ms |
TPC-DS: Total hot run time: 170385 ms |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary:
For shared hash table hash joins, the builder task may leave
process_build_block()before the hash table is actually built, for example when the build side returns an error before the EOS build step completes. In that state the shared hash table variant can still bemonostate.HashJoinBuildSinkLocalState::close()previously set_signaled = truewhenever the builder was not terminated. That wakes non-builder tasks even when the shared hash table was never built, so they can reach the shared hash table copy path and report a misleadingHash table type mismatch when share hash tableinstead of preserving the original build-side error.This PR gates
_signaledon the actual shared hash table state. Non-builder tasks are signaled only when the builder is not terminated and the shared hash table variant is no longermonostate. If the builder closes after an error before the hash table is built,_signaledstays false and non-builders take the existing EOF guard instead of visiting an unbuilt hash table.Release note
None
Check List (For Author)
./run-be-ut.sh --run --filter=SharedHashTableSignalTest.UnbuiltHashTableDoesNotSignal -j 8build-support/check-format.shbuild-support/run-clang-tidy.sh --build-dir be/ut_build_ASANwas attempted; it reports pre-existing complexity diagnostics in the touched hash join files andjni-util.hstatic assertions.