TPCH round 2 (work in progress) #3128

davidbuniat · 2026-01-30T23:45:54Z

No description provided.

claude · 2026-01-30T23:46:20Z

Claude finished @davidbuniat's task in 58s —— View job

PR Review Complete

I've reviewed the TPCH optimization changes focusing on the new zero-copy string handling logic. Found 6 significant issues related to memory safety and concurrency:

Issues Found:

Buffer overrun risks (4 locations): Missing bounds checks when accessing offsets[local_idx + 1] in string processing loops
Use-after-free risk: Potential lifetime issue with nd::array(samples) copy in DeeplakeStringBuffer
Race condition: Double-checked locking pattern may allow concurrent batch initialization

All issues have been commented inline with specific recommendations.

Note:

The zero-copy optimization is well-documented and the performance intent is clear. These issues are critical for correctness and should be addressed before merge.

claude · 2026-01-30T23:47:06Z

cpp/nd/string_array_holder.hpp

+        const auto local_idx = range_offsets_[0] + batch_start;
+        const auto* offsets = offsets_cache_[0];
+        const auto start = offsets[local_idx];
+        const auto end = offsets[local_idx + count];


Potential buffer overrun: Accessing offsets[local_idx + count] without bounds checking. If local_idx + count exceeds the valid offset array size, this causes undefined behavior.

Add bounds validation: if (local_idx + count >= offset_array_size) return 0; before the access.

claude · 2026-01-30T23:47:08Z

cpp/deeplake_pg/duckdb_deeplake_scan.cpp

+                for (duckdb::idx_t row_in_batch = 0; row_in_batch < batch_size; ++row_in_batch) {
+                    const auto local_idx = start_idx + row_in_batch;
+                    const auto str_start = offsets[local_idx] - base_offset;
+                    const auto str_end = offsets[local_idx + 1] - base_offset;


Potential buffer overrun: Accessing offsets[local_idx + 1] without validating that local_idx < offset_array_size - 1. When row_in_batch reaches batch_size - 1, local_idx + 1 may exceed array bounds.

Add assertion or bounds check before the loop: ASSERT(start_idx + batch_size < offset_array_length);

claude · 2026-01-30T23:47:10Z

cpp/deeplake_pg/duckdb_deeplake_scan.cpp

+                        for (duckdb::idx_t row_in_batch = 0; row_in_batch < batch_size; ++row_in_batch) {
+                            const auto local_idx = start_idx + row_in_batch;
+                            const auto str_start = offsets[local_idx] - base_offset;
+                            const auto str_end = offsets[local_idx + 1] - base_offset;


Potential buffer overrun (duplicate): Same issue as line 582 - accessing offsets[local_idx + 1] without bounds validation when iterating through batch_size elements.

Add bounds check or assertion before the loop.

claude · 2026-01-30T23:47:15Z

cpp/nd/string_array_holder.hpp

+        const auto* offsets = offsets_cache_[0];
+        const auto start_offset = offsets[0];
+        const auto str_start = offsets[local_idx] - start_offset;
+        const auto str_end = offsets[local_idx + 1] - start_offset;


Potential buffer overrun: Accessing offsets[local_idx + 1] without bounds checking. The function requires is_single_chunk() == true but doesn't validate that local_idx + 1 is within the valid offset array range.

Add bounds validation before access to prevent undefined behavior.

claude · 2026-01-30T23:47:19Z

cpp/deeplake_pg/duckdb_deeplake_scan.cpp

+            // This allows string_t to point directly into our buffer.
+            duckdb::StringVector::AddBuffer(
+                output_vector,
+                duckdb::make_buffer<DeeplakeStringBuffer>(nd::array(samples)));


Potential use-after-free risk: Creating nd::array(samples) as a copy and moving it into DeeplakeStringBuffer, but the original samples parameter is an rvalue reference. If the caller doesn't maintain the original array, the buffer may be freed while DuckDB still references it.

Consider moving samples directly: duckdb::make_buffer<DeeplakeStringBuffer>(std::move(samples)) to ensure proper ownership transfer.

claude · 2026-01-30T23:47:23Z

cpp/deeplake_pg/table_data_impl.hpp

+    if (!batch.initialized_.load(std::memory_order_acquire)) [[unlikely]] {
+        std::lock_guard lock(col_data.mutex_);
+        for (int64_t i = 0; i <= batch_index; ++i) {
+            if (!col_data.batches[i].initialized_.load(std::memory_order_relaxed)) {


Race condition risk: The double-checked locking pattern initializes multiple batches (loop from 0 to batch_index) while only checking batch.initialized_ for the target batch. If two threads request different batches concurrently, both may enter the lock and initialize overlapping ranges, potentially causing data races on batch.owner_ and batch.holder_.

Consider checking and setting initialized flag for each batch i inside the loop to prevent concurrent initialization of the same batch.

sonarqubecloud · 2026-01-31T05:00:58Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

intermediate

766b629

claude bot reviewed Jan 30, 2026

View reviewed changes

davidbuniat added 2 commits January 31, 2026 01:03

updates incremental

0aa6b7b

update

bc0c28e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TPCH round 2 (work in progress) #3128

TPCH round 2 (work in progress) #3128

Uh oh!

davidbuniat commented Jan 30, 2026

Uh oh!

claude bot commented Jan 30, 2026 •

edited

Loading

Uh oh!

claude bot Jan 30, 2026

Uh oh!

claude bot Jan 30, 2026

Uh oh!

claude bot Jan 30, 2026

Uh oh!

claude bot Jan 30, 2026

Uh oh!

claude bot Jan 30, 2026

Uh oh!

claude bot Jan 30, 2026

Uh oh!

sonarqubecloud bot commented Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TPCH round 2 (work in progress) #3128

Are you sure you want to change the base?

TPCH round 2 (work in progress) #3128

Uh oh!

Conversation

davidbuniat commented Jan 30, 2026

Uh oh!

claude bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete

Issues Found:

Note:

Uh oh!

claude bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Jan 31, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude bot commented Jan 30, 2026 •

edited

Loading