feat(core): introduce ColumnarRowRef with shared batch context#120
Open
xylaaaaa wants to merge 6 commits intoalibaba:mainfrom
Open
feat(core): introduce ColumnarRowRef with shared batch context#120xylaaaaa wants to merge 6 commits intoalibaba:mainfrom
xylaaaaa wants to merge 6 commits intoalibaba:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a performance optimization for columnar data access by replacing per-row ColumnarRow construction with a shared batch context approach. The main goal is to reduce per-row overhead in KeyValueDataFileRecordReader by avoiding repeated construction of row-level column views.
Changes:
- Introduced
ColumnarBatchContextstruct to hold batch-level shared state (struct array, array vector, memory pool) - Added
ColumnarRowRefclass as a lightweight row view backed by shared batch context - Refactored
KeyValueDataFileRecordReaderto useColumnarRowRefinstead ofColumnarRowfor key/value materialization
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/paimon/common/data/columnar/columnar_batch_context.h | New struct to hold batch-level shared state (struct array, array vector holder, and memory pool) |
| src/paimon/common/data/columnar/columnar_row_ref.h | New lightweight row view class that references shared batch context instead of maintaining per-row copies |
| src/paimon/common/data/columnar/columnar_row_ref.cpp | Implementation of ColumnarRowRef methods for accessing nested types (decimal, timestamp, row, array, map) |
| src/paimon/core/io/key_value_data_file_record_reader.h | Added forward declaration and member variables for key_ctx_ and value_ctx_, removed unused columnar_row.h include |
| src/paimon/core/io/key_value_data_file_record_reader.cpp | Updated to use ColumnarRowRef with shared contexts; added context creation and reset handling |
| src/paimon/common/data/columnar/columnar_row_test.cpp | Added basic unit test for ColumnarRowRef covering field access and RowKind operations |
| src/paimon/CMakeLists.txt | Registered columnar_row_ref.cpp in the build configuration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
lszskye
reviewed
Feb 6, 2026
lxy-9602
reviewed
Feb 6, 2026
lxy-9602
reviewed
Feb 6, 2026
lxy-9602
reviewed
Feb 6, 2026
lxy-9602
reviewed
Feb 6, 2026
Add ColumnarBatchContext and ColumnarRowRef to reduce per-row construction overhead in KeyValueDataFileRecordReader. Switch key/value row construction in KeyValueDataFileRecordReader to ColumnarRowRef and manage batch-level contexts through reader lifecycle. Add unit coverage for ColumnarRowRef basic access and RowKind behavior.
48b8cbb to
50fd013
Compare
lxy-9602
reviewed
Feb 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
KeyValueDataFileRecordReaderconstructsColumnarRowobjects per row, which repeatedly rebuilds row-level column views and introduces avoidable per-row overhead.Changes
ColumnarBatchContextto hold batch-level shared state (StructArray,ArrayVector,MemoryPool).ColumnarRowRefas a row view implementation backed by shared batch context.KeyValueDataFileRecordReaderfromColumnarRowtoColumnarRowReffor key/value row materialization.key_ctx_/value_ctx_lifecycle management and reset handling in reader state.ColumnarRowRefbasic field access andRowKindbehavior.columnar_row_ref.cppinsrc/paimon/CMakeLists.txt.Tests
cmake --build build --target paimon-common-test paimon-core-test -j8./build/relwithdebinfo/paimon-common-test --gtest_filter="ColumnarRowTest.*:ColumnarRowRefTest.*"./build/relwithdebinfo/paimon-core-test --gtest_filter="KeyValueDataFileRecordReaderTest.*"./build/relwithdebinfo/paimon-core-test --gtest_filter="*KeyValueProjectionReaderTest*"All tests passed locally.
Risk / Compatibility