Skip to content

Server crash (SIGABRT) in Parquet V3 Reader applyPrewhere - Only Debug Builds #1603

@CarlosFelipeOR

Description

@CarlosFelipeOR

I checked the Altinity Stable Builds lifecycle table, and the Altinity Stable Build version I'm using is still supported.

Type of problem

Bug report - something's broken

Describe the situation

The ClickHouse server crashes with SIGABRT (Signal 6) on debug builds when the Parquet V3 Reader processes a query involving PREWHERE on Iceberg tables. The crash occurs in addDummyColumnWithRowCount() called from Reader::applyPrewhere().

This issue:


How to reproduce the behavior

Environment

  • Branch: mkmkme/antalya-26.1/iceberg-fix-prewhere
  • Commit: 7bce15324748014412db5c93e7775198e528c993
  • Build type: Debug (amd_debug)

Reproduction

The crash is triggered by any stateless test that reads Parquet files with PREWHERE on a debug build containing the backports of ClickHouse#95476, ClickHouse#98360, and ClickHouse#100361. It reproduces deterministically across all 3 amd_debug parallel jobs.


Stack trace

Error:
Received signal 6
---

Stack trace:
pthread_kill @ 0x00000000000969fd
raise @ 0x0000000000042476
__lgamma_r_finite @ 0x00000000000287f3
__yn_finite @ 0x000000000002871b
? @ 0x0000000000039e96
src/Core/Block.cpp:1027: DB::addDummyColumnWithRowCount(DB::Block&, unsigned long) @ 0x000000001b2386fa
src/Processors/Formats/Impl/Parquet/Reader.cpp:2067: DB::Parquet::Reader::applyPrewhere(DB::Parquet::Reader::RowSubgroup&, DB::Parquet::Reader::RowGroup const&) @ 0x0000000020962173
src/Processors/Formats/Impl/Parquet/ReadManager.cpp:350: DB::Parquet::ReadManager::finishRowSubgroupStage(unsigned long, unsigned long, DB::Parquet::ReadStage, DB::Parquet::MemoryUsageDiff&) @ 0x0000000020943918
src/Processors/Formats/Impl/Parquet/ReadManager.cpp:808: DB::Parquet::ReadManager::runTask(DB::Parquet::ReadManager::Task, bool, DB::Parquet::MemoryUsageDiff&) @ 0x000000002094693c
src/Processors/Formats/Impl/Parquet/ReadManager.cpp:710: DB::Parquet::ReadManager::runBatchOfTasks(std::vector<DB::Parquet::ReadManager::Task, std::allocator<DB::Parquet::ReadManager::Task>> const&) @ 0x00000000209462b3
src/Processors/Formats/Impl/Parquet/ReadManager.cpp:607: operator()
contrib/llvm-project/libcxx/include/__type_traits/invoke.h:87: std::__invoke_result_impl<void, DB::Parquet::ReadManager::scheduleTasksIfNeeded(DB::Parquet::ReadStage)::$_0&>::type std::__invoke[abi:se210105]<DB::Parquet::ReadManager::scheduleTasksIfNeeded(DB::Parquet::ReadStage)::$_0&>(DB::Parquet::ReadManager::scheduleTasksIfNeeded(DB::Parquet::ReadStage)::$_0&)
contrib/llvm-project/libcxx/include/__type_traits/invoke.h:342: void std::__invoke_void_return_wrapper<void, true>::__call[abi:se210105]<DB::Parquet::ReadManager::scheduleTasksIfNeeded(DB::Parquet::ReadStage)::$_0&>(DB::Parquet::ReadManager::scheduleTasksIfNeeded(DB::Parquet::ReadStage)::$_0&)
contrib/llvm-project/libcxx/include/__type_traits/invoke.h:348: void std::__invoke_r[abi:se210105]<void, DB::Parquet::ReadManager::scheduleTasksIfNeeded(DB::Parquet::ReadStage)::$_0&>(DB::Parquet::ReadManager::scheduleTasksIfNeeded(DB::Parquet::ReadStage)::$_0&)
contrib/llvm-project/libcxx/include/__functional/function.h:450: ? @ 0x0000000020947cf1
contrib/llvm-project/libcxx/include/__functional/function.h:508: ?
contrib/llvm-project/libcxx/include/__functional/function.h:772: ?
src/Common/threadPoolCallbackRunner.cpp:224: DB::ThreadPoolCallbackRunnerFast::threadFunction() @ 0x0000000017ddddff
contrib/llvm-project/libcxx/include/__functional/function.h:508: ?
contrib/llvm-project/libcxx/include/__functional/function.h:772: ?
src/Common/ThreadPool.cpp:801: ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::worker() @ 0x0000000016472df9
contrib/llvm-project/libcxx/include/__type_traits/invoke.h:0: std::__invoke_result_impl<void, void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*&>::type std::__invoke[abi:se210105]<void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*&>(void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*&)
contrib/llvm-project/libcxx/include/tuple:1380: decltype(auto) std::__apply_tuple_impl[abi:se210105]<void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&)(), std::tuple<ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*>&, 0ul>(void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&)(), std::tuple<ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*>&, std::__tuple_indices<0ul>)
contrib/llvm-project/libcxx/include/tuple:1384: decltype(auto) std::apply[abi:se210105]<void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&)(), std::tuple<ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*>&>(void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&)(), std::tuple<ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*>&)
src/Common/ThreadPool.h:312: ThreadFromGlobalPoolImpl<false, true>::ThreadFromGlobalPoolImpl<void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*>(void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&&)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*&&)::'lambda'()::operator()() @ 0x00000000164791e8
contrib/llvm-project/libcxx/include/__functional/function.h:508: ?
contrib/llvm-project/libcxx/include/__functional/function.h:772: ?
src/Common/ThreadPool.cpp:811: ThreadPoolImpl<std::thread>::ThreadFromThreadPool::worker() @ 0x0000000016470303
contrib/llvm-project/libcxx/include/__type_traits/invoke.h:0: std::__invoke_result_impl<void, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>::type std::__invoke[abi:se210105]<void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>(void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*&&)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*&&)
contrib/llvm-project/libcxx/include/__thread/thread.h:159: void std::__thread_execute[abi:se210105]<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*, 2ul>(std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>&, std::__tuple_indices<2ul>)
contrib/llvm-project/libcxx/include/__thread/thread.h:168: void* std::__thread_proxy[abi:se210105]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>>(void*) @ 0x0000000016476e8e
? @ 0x0000000000094ac3
? @ 0x00000000001268c0


Root cause analysis

Note: This analysis was generated with AI assistance and has not been fully verified at the code level.

The crash results from the interaction of three backported upstream PRs:

  1. enable prewhere for iceberg ClickHouse/ClickHouse#95476 — enabled supportsPrewhere() for Iceberg tables
  2. Fix exception in Parquet PREWHERE when column is not in file ClickHouse/ClickHouse#98360 — modified Reader.cpp (Parquet V3) to handle missing columns in PREWHERE
  3. Fix exception in updateFormatPrewhereInfo when only row-level filter is set ClickHouse/ClickHouse#100361 — fixed updateFormatPrewhereInfo() to pass row_level_filter through to the reader

Before ClickHouse#100361, the row_level_filter was silently dropped and never reached the Parquet reader (which caused issue #1595 — row policies ignored). After ClickHouse#100361, the row_level_filter is correctly passed through, but the Parquet V3 Reader's applyPrewhere() in the antalya-26.1 codebase does not handle it correctly, causing an assertion failure in addDummyColumnWithRowCount().

This crash does not appear upstream, likely because the upstream Parquet V3 Reader has diverged significantly from the antalya-26.1 version.


Additional context

CI evidence

Crash frequency (from CI database)

Related

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions