Skip to content

perf: Add ReflectionCache for Iceberg serialization optimization [iceberg]#3558

Open
Shekharrajak wants to merge 2 commits intoapache:mainfrom
Shekharrajak:feature/iceberg-serialization-optimizations-3456
Open

perf: Add ReflectionCache for Iceberg serialization optimization [iceberg]#3558
Shekharrajak wants to merge 2 commits intoapache:mainfrom
Shekharrajak:feature/iceberg-serialization-optimizations-3456

Conversation

@Shekharrajak
Copy link
Contributor

Which issue does this PR close?

Closes #3456.

Rationale for this change

PR #3298 added reflection caching optimizations for Iceberg serialization, but these were lost during subsequent refactoring in #3349 and #3443. The current code performs redundant Class.forName() and getMethod() calls for every task (tens of thousands of times for large tables), causing significant serialization overhead.

What changes are included in this PR?

  • Add ReflectionCache case class
  • Update serializePartitions() to create cache once and pass to helper methods
  • Update extractDeleteFilesList() and serializePartitionData() to use cached methods
  • Add field ID mapping cache to avoid redundant buildFieldIdMapping() calls per-task
  • Add CometIcebergSerializationBenchmark to measure serialization performance

How are these changes tested?

  • Existing Iceberg integration tests ensure correctness is preserved

Benchmark:

Metric Before After Improvement
serializePartitions() 7,235 ms 5,211 ms 28% faster
Class.forName() 233.5 ns ~0 ns cached
getMethod() 18.2 ns ~0 ns cached

@Shekharrajak Shekharrajak changed the title Add ReflectionCache for Iceberg serialization optimization (#3456) Add ReflectionCache for Iceberg serialization optimization Feb 20, 2026
@Shekharrajak Shekharrajak changed the title Add ReflectionCache for Iceberg serialization optimization perf: Add ReflectionCache for Iceberg serialization optimization Feb 20, 2026
@mbutrovich mbutrovich changed the title perf: Add ReflectionCache for Iceberg serialization optimization perf: Add ReflectionCache for Iceberg serialization optimization [iceberg] Feb 20, 2026
@mbutrovich mbutrovich self-requested a review February 20, 2026 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: Optimize CometIcebergNativeScan serialization

1 participant

Comments