Skip to content

Commit e9c1cbc

Browse files
committed
feat: implement direct Avro encoder for performance
Implement direct Avro encoder to eliminate GenericDatum intermediate layer, matching the decoder approach for better performance. Implementation: - Add avro_direct_encoder_internal.h with EncodeArrowToAvro API - Add avro_direct_encoder.cc implementing direct Arrow→Avro encoding - All primitive types: bool, int, long, float, double, string, binary - Temporal types: date, time, timestamp - Logical types: uuid, decimal - Nested types: struct, list, map (both string and non-string keys) - Union type handling for optional fields - Modify avro_writer.cc to use DataFileWriterBase with direct encoder - Add EncodeContext to reuse scratch buffers and avoid allocations This matches Java Iceberg implementation using Encoder interface directly, avoiding intermediate object allocation overhead.
1 parent 61a7de5 commit e9c1cbc

File tree

6 files changed

+982
-18
lines changed

6 files changed

+982
-18
lines changed

src/iceberg/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -153,6 +153,7 @@ if(ICEBERG_BUILD_BUNDLE)
153153
arrow/arrow_fs_file_io.cc
154154
avro/avro_data_util.cc
155155
avro/avro_direct_decoder.cc
156+
avro/avro_direct_encoder.cc
156157
avro/avro_reader.cc
157158
avro/avro_writer.cc
158159
avro/avro_register.cc

0 commit comments

Comments
 (0)