Skip to content

Commit 1c67e4c

Browse files
authored
feat: eliminate GenericDatum in Avro reader for performance (#374)
Replace GenericDatum intermediate layer with direct Avro decoder access to improve manifest I/O performance. Changes: - Add avro_direct_decoder_internal.h with DecodeAvroToBuilder API - Add avro_direct_decoder.cc implementing direct Avro→Arrow decoding - Primitive types: bool, int, long, float, double, string, binary, fixed - Temporal types: date, time, timestamp - Logical types: uuid, decimal - Nested types: struct, list, map - Union type handling for nullable fields - Modify avro_reader.cc to use DataFileReaderBase with direct decoder - Replace DataFileReader<GenericDatum> with DataFileReaderBase - Use decoder.decodeInt(), decodeLong(), etc. directly - Remove GenericDatum allocation and extraction overhead - Update CMakeLists.txt to include new decoder source Performance improvement: - Before: Avro binary → GenericDatum → Extract → Arrow (3 steps) - After: Avro binary → decoder.decodeInt() → Arrow (2 steps)
1 parent d8ad925 commit 1c67e4c

File tree

9 files changed

+1275
-39
lines changed

9 files changed

+1275
-39
lines changed

src/iceberg/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,7 @@ if(ICEBERG_BUILD_BUNDLE)
148148
set(ICEBERG_BUNDLE_SOURCES
149149
arrow/arrow_fs_file_io.cc
150150
avro/avro_data_util.cc
151+
avro/avro_direct_decoder.cc
151152
avro/avro_reader.cc
152153
avro/avro_writer.cc
153154
avro/avro_register.cc

src/iceberg/avro/CMakeLists.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,3 +16,9 @@
1616
# under the License.
1717

1818
iceberg_install_all_headers(iceberg/avro)
19+
20+
# avro_scan benchmark executable
21+
add_executable(avro_scan avro_scan.cc)
22+
target_link_libraries(avro_scan PRIVATE iceberg_bundle_static)
23+
set_target_properties(avro_scan PROPERTIES RUNTIME_OUTPUT_DIRECTORY
24+
"${CMAKE_BINARY_DIR}/src/iceberg/avro")

0 commit comments

Comments
 (0)