Search before asking
Version
doris: 4.0.3
adbc client: github.com/apache/arrow-go/v18
What's Wrong?
Description:
When fetching large JSON string columns using Arrow Flight SQL, the returned RecordBatch contains corrupted memory state. Specifically, the Offsets indicate valid data ranges, but the DataBuffer is reported as size 0.
Evidence:
- When accessing the last row of a batch:
- Total Length: 1283
- Offsets Buffer Length: 1284
- Last 10 Offsets: [... 1251873, 0] (Note the non-monotonic reset to 0)
- Data Buffer Total Size: 0 bytes
This indicates that during the serialization of the RecordBatch in the Doris BE, the DataBuffer was either truncated or released prematurely, while the Offsets metadata was not correctly synchronized.
Impact:
Client-side library (like apache/arrow-go) encounters a slice bounds out of range panic when attempting to access the string value, as it tries to slice a 0-length buffer using offsets that point to invalid memory.
What You Expected?
Investigate the ArrowFlightStream serialization logic in the BE, particularly how StringArray offsets are calculated and how the DataBuffer lifecycle is managed during stream fragmentation.
How to Reproduce?
- Execute a query via Arrow Flight SQL that returns a large number of rows (e.g., > 2000 rows).
- The target column contains JSON string data (average length ~1KB per row).
- Use reader.Next() to iterate through RecordBatches.
- Access the last few rows of a specific RecordBatch using array.String.Value(i).
Anything Else?
--- [DEBUG] Arrow Array Memory State Dump ---
Array Type: *array.String
Total Length: 1283
Null Count: 0
Offsets Buffer Length: 1284
Last 10 Offsets: [1244061 0 1246014 0 1247967 0 1249920 0 1251873 0]
Data Buffer Total Size (Bytes): 0
Target Row: 1282 | StartOffset: 1251873, EndOffset: 0
Are you willing to submit PR?
Code of Conduct
Search before asking
Version
doris: 4.0.3
adbc client: github.com/apache/arrow-go/v18
What's Wrong?
Description:
When fetching large JSON string columns using Arrow Flight SQL, the returned RecordBatch contains corrupted memory state. Specifically, the Offsets indicate valid data ranges, but the DataBuffer is reported as size 0.
Evidence:
This indicates that during the serialization of the RecordBatch in the Doris BE, the DataBuffer was either truncated or released prematurely, while the Offsets metadata was not correctly synchronized.
Impact:
Client-side library (like apache/arrow-go) encounters a slice bounds out of range panic when attempting to access the string value, as it tries to slice a 0-length buffer using offsets that point to invalid memory.
What You Expected?
Investigate the ArrowFlightStream serialization logic in the BE, particularly how StringArray offsets are calculated and how the DataBuffer lifecycle is managed during stream fragmentation.
How to Reproduce?
Anything Else?
--- [DEBUG] Arrow Array Memory State Dump ---
Array Type: *array.String
Total Length: 1283
Null Count: 0
Offsets Buffer Length: 1284
Last 10 Offsets: [1244061 0 1246014 0 1247967 0 1249920 0 1251873 0]
Data Buffer Total Size (Bytes): 0
Target Row: 1282 | StartOffset: 1251873, EndOffset: 0
Are you willing to submit PR?
Code of Conduct