Commit 1c67e4c
authored
feat: eliminate GenericDatum in Avro reader for performance (#374)
Replace GenericDatum intermediate layer with direct Avro decoder access
to improve manifest I/O performance.
Changes:
- Add avro_direct_decoder_internal.h with DecodeAvroToBuilder API
- Add avro_direct_decoder.cc implementing direct Avro→Arrow decoding
- Primitive types: bool, int, long, float, double, string, binary, fixed
- Temporal types: date, time, timestamp
- Logical types: uuid, decimal
- Nested types: struct, list, map
- Union type handling for nullable fields
- Modify avro_reader.cc to use DataFileReaderBase with direct decoder
- Replace DataFileReader<GenericDatum> with DataFileReaderBase
- Use decoder.decodeInt(), decodeLong(), etc. directly
- Remove GenericDatum allocation and extraction overhead
- Update CMakeLists.txt to include new decoder source
Performance improvement:
- Before: Avro binary → GenericDatum → Extract → Arrow (3 steps)
- After: Avro binary → decoder.decodeInt() → Arrow (2 steps)1 parent d8ad925 commit 1c67e4c
File tree
9 files changed
+1275
-39
lines changed- src/iceberg
- avro
- test
9 files changed
+1275
-39
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
148 | 148 | | |
149 | 149 | | |
150 | 150 | | |
| 151 | + | |
151 | 152 | | |
152 | 153 | | |
153 | 154 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
0 commit comments