Describe the bug
Running TPC-DS SF=1 using queries-spark/q23.sql in datafusion-benchmarks fails after #1605 is merged. The exception is raised by the native side:
org.apache.comet.CometNativeException: range end index 18446744072743568078 out of range for slice of length 0
at comet::errors::init::{{closure}}(/home/wherobots/datafusion-comet/native/core/src/errors.rs:151)
at <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/alloc/src/boxed.rs:2007)
at std::panicking::rust_panic_with_hook(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/std/src/panicking.rs:836)
at std::panicking::begin_panic_handler::{{closure}}(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/std/src/panicking.rs:701)
at std::sys::backtrace::__rust_end_short_backtrace(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/std/src/sys/backtrace.rs:168)
at rust_begin_unwind(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/std/src/panicking.rs:692)
at core::panicking::panic_fmt(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/panicking.rs:75)
at core::slice::index::slice_end_index_len_fail::do_panic::runtime(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/panic.rs:218)
at core::slice::index::slice_end_index_len_fail::do_panic(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/intrinsics/mod.rs:3869)
at core::slice::index::slice_end_index_len_fail(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/panic.rs:223)
at <core::ops::range::Range<usize> as core::slice::index::SliceIndex<[T]>>::index(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/slice/index.rs:437)
at core::slice::index::<impl core::ops::index::Index<I> for [T]>::index(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/slice/index.rs:16)
at arrow_data::transform::variable_size::extend_offset_values(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-data-54.2.1/src/transform/variable_size.rs:38)
at arrow_data::transform::variable_size::build_extend::{{closure}}(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-data-54.2.1/src/transform/variable_size.rs:57)
at <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/alloc/src/boxed.rs:2007)
at arrow_data::transform::MutableArrayData::extend(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-data-54.2.1/src/transform/mod.rs:722)
at comet::execution::operators::copy::copy_array(/home/wherobots/datafusion-comet/native/core/src/execution/operators/copy.rs:233)
at comet::execution::operators::copy::copy_or_unpack_array(/home/wherobots/datafusion-comet/native/core/src/execution/operators/copy.rs:280)
at comet::execution::operators::copy::CopyStream::copy::{{closure}}(/home/wherobots/datafusion-comet/native/core/src/execution/operators/copy.rs:196)
at core::iter::adapters::map::map_try_fold::{{closure}}(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/iter/adapters/map.rs:95)
at core::iter::traits::iterator::Iterator::try_fold(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/iter/traits/iterator.rs:2370)
at <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::try_fold(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/iter/adapters/map.rs:121)
at <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::try_fold(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/iter/adapters/mod.rs:191)
at core::iter::traits::iterator::Iterator::try_for_each(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/iter/traits/iterator.rs:2431)
at <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::next(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/iter/adapters/mod.rs:174)
at alloc::vec::Vec<T,A>::extend_desugared(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/alloc/src/vec/mod.rs:3535)
at <alloc::vec::Vec<T,A> as alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/alloc/src/vec/spec_extend.rs:19)
at <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/alloc/src/vec/spec_from_iter_nested.rs:42)
at <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/alloc/src/vec/spec_from_iter.rs:34)
at <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/alloc/src/vec/mod.rs:3427)
at core::iter::traits::iterator::Iterator::collect(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/iter/traits/iterator.rs:1971)
at <core::result::Result<V,E> as core::iter::traits::collect::FromIterator<core::result::Result<A,E>>>::from_iter::{{closure}}(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/result.rs:1985)
at core::iter::adapters::try_process(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/iter/adapters/mod.rs:160)
at <core::result::Result<V,E> as core::iter::traits::collect::FromIterator<core::result::Result<A,E>>>::from_iter(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/result.rs:1985)
at core::iter::traits::iterator::Iterator::collect(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/iter/traits/iterator.rs:1971)
at comet::execution::operators::copy::CopyStream::copy(/home/wherobots/datafusion-comet/native/core/src/execution/operators/copy.rs:193)
at <comet::execution::operators::copy::CopyStream as futures_core::stream::Stream>::poll_next::{{closure}}(/home/wherobots/datafusion-comet/native/core/src/execution/operators/copy.rs:214)
at core::task::poll::Poll<T>::map(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/task/poll.rs:54)
at <comet::execution::operators::copy::CopyStream as futures_core::stream::Stream>::poll_next(/home/wherobots/datafusion-comet/native/core/src/execution/operators/copy.rs:213)
at <core::pin::Pin<P> as futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-core-0.3.31/src/stream.rs:130)
at <S as futures_core::stream::TryStream>::try_poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-core-0.3.31/src/stream.rs:206)
at <futures_util::stream::try_stream::try_fold::TryFold<St,Fut,T,F> as core::future::future::Future>::poll(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/stream/try_stream/try_fold.rs:81)
at datafusion_physical_plan::joins::hash_join::collect_left_input::{{closure}}(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-plan-46.0.0/src/joins/hash_join.rs:960)
at <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/future/future/map.rs:55)
at <futures_util::future::future::Map<Fut,F> as core::future::future::Future>::poll(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/lib.rs:86)
at <core::pin::Pin<P> as core::future::future::Future>::poll(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/core/src/future/future.rs:124)
at <futures_util::future::future::shared::Shared<Fut> as core::future::future::Future>::poll(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/future/future/shared.rs:322)
at futures_util::future::future::FutureExt::poll_unpin(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/future/future/mod.rs:558)
at datafusion_physical_plan::joins::utils::OnceFut<T>::get_shared(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-plan-46.0.0/src/joins/utils.rs:1091)
at datafusion_physical_plan::joins::hash_join::HashJoinStream::collect_build_side(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-plan-46.0.0/src/joins/hash_join.rs:1406)
at datafusion_physical_plan::joins::hash_join::HashJoinStream::poll_next_impl(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-plan-46.0.0/src/joins/hash_join.rs:1381)
at <datafusion_physical_plan::joins::hash_join::HashJoinStream as futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-plan-46.0.0/src/joins/hash_join.rs:1628)
at <core::pin::Pin<P> as futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-core-0.3.31/src/stream.rs:130)
at futures_util::stream::stream::StreamExt::poll_next_unpin(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/stream/stream/mod.rs:1638)
at <datafusion_physical_plan::projection::ProjectionStream as futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-plan-46.0.0/src/projection.rs:354)
at <core::pin::Pin<P> as futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-core-0.3.31/src/stream.rs:130)
at futures_util::stream::stream::StreamExt::poll_next_unpin(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/stream/stream/mod.rs:1638)
at <datafusion_physical_plan::projection::ProjectionStream as futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-plan-46.0.0/src/projection.rs:354)
at <core::pin::Pin<P> as futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-core-0.3.31/src/stream.rs:130)
at futures_util::stream::stream::StreamExt::poll_next_unpin(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/stream/stream/mod.rs:1638)
at <comet::execution::operators::copy::CopyStream as futures_core::stream::Stream>::poll_next(/home/wherobots/datafusion-comet/native/core/src/execution/operators/copy.rs:213)
at <core::pin::Pin<P> as futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-core-0.3.31/src/stream.rs:130)
at futures_util::stream::stream::StreamExt::poll_next_unpin(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/stream/stream/mod.rs:1638)
at datafusion_physical_plan::joins::hash_join::HashJoinStream::fetch_probe_batch(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-plan-46.0.0/src/joins/hash_join.rs:1427)
at datafusion_physical_plan::joins::hash_join::HashJoinStream::poll_next_impl(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-plan-46.0.0/src/joins/hash_join.rs:1384)
at <datafusion_physical_plan::joins::hash_join::HashJoinStream as futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-plan-46.0.0/src/joins/hash_join.rs:1628)
at <core::pin::Pin<P> as futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-core-0.3.31/src/stream.rs:130)
at futures_util::stream::stream::StreamExt::poll_next_unpin(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/stream/stream/mod.rs:1638)
at <datafusion_physical_plan::projection::ProjectionStream as futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-plan-46.0.0/src/projection.rs:354)
at <core::pin::Pin<P> as futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-core-0.3.31/src/stream.rs:130)
at futures_util::stream::stream::StreamExt::poll_next_unpin(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/stream/stream/mod.rs:1638)
at <datafusion_physical_plan::projection::ProjectionStream as futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-plan-46.0.0/src/projection.rs:354)
at <core::pin::Pin<P> as futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-core-0.3.31/src/stream.rs:130)
at futures_util::stream::stream::StreamExt::poll_next_unpin(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/stream/stream/mod.rs:1638)
at <datafusion_physical_plan::aggregates::row_hash::GroupedHashAggregateStream as futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-plan-46.0.0/src/aggregates/row_hash.rs:647)
at <core::pin::Pin<P> as futures_core::stream::Stream>::poll_next(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-core-0.3.31/src/stream.rs:130)
at futures_util::stream::stream::StreamExt::poll_next_unpin(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/stream/stream/mod.rs:1638)
at <futures_util::stream::stream::next::Next<St> as core::future::future::Future>::poll(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/stream/stream/next.rs:32)
at futures_util::future::future::FutureExt::poll_unpin(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/future/future/mod.rs:558)
at <futures_util::async_await::poll::PollOnce<F> as core::future::future::Future>::poll(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/async_await/poll.rs:37)
at comet::execution::jni_api::Java_org_apache_comet_Native_executePlan::{{closure}}::{{closure}}(/home/wherobots/datafusion-comet/native/core/src/execution/jni_api.rs:544)
at tokio::runtime::park::CachedParkThread::block_on::{{closure}}(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/park.rs:284)
at tokio::task::coop::with_budget(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/task/coop/mod.rs:167)
at tokio::task::coop::budget(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/task/coop/mod.rs:133)
at tokio::runtime::park::CachedParkThread::block_on(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/park.rs:284)
at tokio::runtime::context::blocking::BlockingRegionGuard::block_on(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/context/blocking.rs:66)
at tokio::runtime::scheduler::multi_thread::MultiThread::block_on::{{closure}}(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/scheduler/multi_thread/mod.rs:87)
at tokio::runtime::context::runtime::enter_runtime(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/context/runtime.rs:65)
at tokio::runtime::scheduler::multi_thread::MultiThread::block_on(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/scheduler/multi_thread/mod.rs:86)
at tokio::runtime::runtime::Runtime::block_on_inner(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/runtime.rs:370)
at tokio::runtime::runtime::Runtime::block_on(/home/wherobots/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/runtime.rs:342)
at comet::execution::jni_api::Java_org_apache_comet_Native_executePlan::{{closure}}(/home/wherobots/datafusion-comet/native/core/src/execution/jni_api.rs:544)
at comet::errors::curry::{{closure}}(/home/wherobots/datafusion-comet/native/core/src/errors.rs:485)
at std::panicking::try::do_call(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/std/src/panicking.rs:584)
at __rust_try(__internal__:0)
at std::panicking::try(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/std/src/panicking.rs:547)
at std::panic::catch_unwind(/rustc/4eb161250e340c8f48f66e2b929ef4a5bed7c181/library/std/src/panic.rs:358)
at comet::errors::try_unwrap_or_throw(/home/wherobots/datafusion-comet/native/core/src/errors.rs:499)
at Java_org_apache_comet_Native_executePlan(/home/wherobots/datafusion-comet/native/core/src/execution/jni_api.rs:498)
at <unknown>(__internal__:0)
at org.apache.comet.Native.executePlan(Native Method)
at org.apache.comet.CometExecIterator.$anonfun$getNextBatch$1(CometExecIterator.scala:137)
at org.apache.comet.CometExecIterator.$anonfun$getNextBatch$1$adapted(CometExecIterator.scala:135)
at org.apache.comet.vector.NativeUtil.getNextBatch(NativeUtil.scala:157)
at org.apache.comet.CometExecIterator.getNextBatch(CometExecIterator.scala:135)
at org.apache.comet.CometExecIterator.hasNext(CometExecIterator.scala:156)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at org.apache.comet.CometBatchIterator.hasNext(CometBatchIterator.java:50)
at org.apache.comet.Native.executePlan(Native Method)
at org.apache.comet.CometExecIterator.$anonfun$getNextBatch$1(CometExecIterator.scala:137)
at org.apache.comet.CometExecIterator.$anonfun$getNextBatch$1$adapted(CometExecIterator.scala:135)
at org.apache.comet.vector.NativeUtil.getNextBatch(NativeUtil.scala:157)
at org.apache.comet.CometExecIterator.getNextBatch(CometExecIterator.scala:135)
at org.apache.comet.CometExecIterator.hasNext(CometExecIterator.scala:156)
at org.apache.spark.sql.comet.execution.shuffle.CometNativeShuffleWriter.write(CometNativeShuffleWriter.scala:101)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1589)
This is caused by auto-broadcasting the smaller side which contains empty record batches. The empty StringArrays in the empty record batches were not correctly exported through the Arrow C Data Interface. The very large value 18446744072743568078 in the error message is the first offset value in the offset buffer, it should be 0 when the array is empty (see Arrow Columnar Format Spec for details). However, it turns out to be some garbled data.
There were efforts in the past for fixing problems exporting empty var-sized binary array, apache/arrow#40038 and the corresponding PR apache/arrow#40043 exports a non-null offset buffers for empty arrays. However, this fix still has one problem: the newly allocated offset buffer is not properly initialized, which leaves garbled offset value in the offset buffer and produces this problem.
This problem cannot be reproduced on recent versions of macOS, because macOS fills freed memory blocks with 0, which is naturally the valid initial value for the offset buffer and covers up the problem.
Steps to reproduce
Run TPC-DS benchmark on Linux using https://github.com/apache/datafusion-benchmarks:
spark-submit \
--master local[8] \
--conf spark.driver.memory=3g \
--conf spark.memory.offHeap.enabled=true \
--conf spark.memory.offHeap.size=16g \
--conf spark.jars=$COMET_JAR_PR \
--conf spark.driver.extraClassPath=$COMET_JAR_PR \
--conf spark.executor.extraClassPath=$COMET_JAR_PR \
--conf spark.sql.extensions=org.apache.comet.CometSparkSessionExtensions \
--conf spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager \
--conf spark.comet.enabled=true \
--conf spark.comet.exec.shuffle.enabled=true \
--conf spark.comet.exec.shuffle.mode=auto \
--conf spark.comet.exec.shuffle.compression.codec=lz4 \
--conf spark.comet.exec.replaceSortMergeJoin=false \
--conf spark.comet.exec.sortMergeJoinWithJoinFilter.enabled=false \
--conf spark.comet.cast.allowIncompatible=true \
--conf spark.comet.exec.shuffle.fallbackToColumnar=true \
tpcbench.py \
--benchmark tpcds \
--data $TPCDS_DATA \
--queries ../../tpcds/queries-spark \
--output tpc-results
It will fail at the second query in Q23.
Expected behavior
TPC-DS Q23 should finish successfully.
Additional context
No response
Describe the bug
Running TPC-DS SF=1 using queries-spark/q23.sql in datafusion-benchmarks fails after #1605 is merged. The exception is raised by the native side:
This is caused by auto-broadcasting the smaller side which contains empty record batches. The empty StringArrays in the empty record batches were not correctly exported through the Arrow C Data Interface. The very large value
18446744072743568078in the error message is the first offset value in the offset buffer, it should be0when the array is empty (see Arrow Columnar Format Spec for details). However, it turns out to be some garbled data.There were efforts in the past for fixing problems exporting empty var-sized binary array, apache/arrow#40038 and the corresponding PR apache/arrow#40043 exports a non-null offset buffers for empty arrays. However, this fix still has one problem: the newly allocated offset buffer is not properly initialized, which leaves garbled offset value in the offset buffer and produces this problem.
This problem cannot be reproduced on recent versions of macOS, because macOS fills freed memory blocks with 0, which is naturally the valid initial value for the offset buffer and covers up the problem.
Steps to reproduce
Run TPC-DS benchmark on Linux using https://github.com/apache/datafusion-benchmarks:
spark-submit \ --master local[8] \ --conf spark.driver.memory=3g \ --conf spark.memory.offHeap.enabled=true \ --conf spark.memory.offHeap.size=16g \ --conf spark.jars=$COMET_JAR_PR \ --conf spark.driver.extraClassPath=$COMET_JAR_PR \ --conf spark.executor.extraClassPath=$COMET_JAR_PR \ --conf spark.sql.extensions=org.apache.comet.CometSparkSessionExtensions \ --conf spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager \ --conf spark.comet.enabled=true \ --conf spark.comet.exec.shuffle.enabled=true \ --conf spark.comet.exec.shuffle.mode=auto \ --conf spark.comet.exec.shuffle.compression.codec=lz4 \ --conf spark.comet.exec.replaceSortMergeJoin=false \ --conf spark.comet.exec.sortMergeJoinWithJoinFilter.enabled=false \ --conf spark.comet.cast.allowIncompatible=true \ --conf spark.comet.exec.shuffle.fallbackToColumnar=true \ tpcbench.py \ --benchmark tpcds \ --data $TPCDS_DATA \ --queries ../../tpcds/queries-spark \ --output tpc-resultsIt will fail at the second query in Q23.
Expected behavior
TPC-DS Q23 should finish successfully.
Additional context
No response