Skip to content

Comments

feat(core): introduce ColumnarRowRef with shared batch context#120

Open
xylaaaaa wants to merge 6 commits intoalibaba:mainfrom
xylaaaaa:feature/columnar-row-ref
Open

feat(core): introduce ColumnarRowRef with shared batch context#120
xylaaaaa wants to merge 6 commits intoalibaba:mainfrom
xylaaaaa:feature/columnar-row-ref

Conversation

@xylaaaaa
Copy link
Contributor

@xylaaaaa xylaaaaa commented Feb 6, 2026

Background

KeyValueDataFileRecordReader constructs ColumnarRow objects per row, which repeatedly rebuilds row-level column views and introduces avoidable per-row overhead.

Changes

  • Added ColumnarBatchContext to hold batch-level shared state (StructArray, ArrayVector, MemoryPool).
  • Added ColumnarRowRef as a row view implementation backed by shared batch context.
  • Switched KeyValueDataFileRecordReader from ColumnarRow to ColumnarRowRef for key/value row materialization.
  • Added key_ctx_ / value_ctx_ lifecycle management and reset handling in reader state.
  • Added unit test coverage for ColumnarRowRef basic field access and RowKind behavior.
  • Registered columnar_row_ref.cpp in src/paimon/CMakeLists.txt.

Tests

  • cmake --build build --target paimon-common-test paimon-core-test -j8
  • ./build/relwithdebinfo/paimon-common-test --gtest_filter="ColumnarRowTest.*:ColumnarRowRefTest.*"
  • ./build/relwithdebinfo/paimon-core-test --gtest_filter="KeyValueDataFileRecordReaderTest.*"
  • ./build/relwithdebinfo/paimon-core-test --gtest_filter="*KeyValueProjectionReaderTest*"

All tests passed locally.

Risk / Compatibility

  • No external interface changes.
  • Main risk is row-view lifecycle/thread-safety in projection paths; covered by the related reader/projection tests above.

Copilot AI review requested due to automatic review settings February 6, 2026 06:19
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a performance optimization for columnar data access by replacing per-row ColumnarRow construction with a shared batch context approach. The main goal is to reduce per-row overhead in KeyValueDataFileRecordReader by avoiding repeated construction of row-level column views.

Changes:

  • Introduced ColumnarBatchContext struct to hold batch-level shared state (struct array, array vector, memory pool)
  • Added ColumnarRowRef class as a lightweight row view backed by shared batch context
  • Refactored KeyValueDataFileRecordReader to use ColumnarRowRef instead of ColumnarRow for key/value materialization

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/paimon/common/data/columnar/columnar_batch_context.h New struct to hold batch-level shared state (struct array, array vector holder, and memory pool)
src/paimon/common/data/columnar/columnar_row_ref.h New lightweight row view class that references shared batch context instead of maintaining per-row copies
src/paimon/common/data/columnar/columnar_row_ref.cpp Implementation of ColumnarRowRef methods for accessing nested types (decimal, timestamp, row, array, map)
src/paimon/core/io/key_value_data_file_record_reader.h Added forward declaration and member variables for key_ctx_ and value_ctx_, removed unused columnar_row.h include
src/paimon/core/io/key_value_data_file_record_reader.cpp Updated to use ColumnarRowRef with shared contexts; added context creation and reset handling
src/paimon/common/data/columnar/columnar_row_test.cpp Added basic unit test for ColumnarRowRef covering field access and RowKind operations
src/paimon/CMakeLists.txt Registered columnar_row_ref.cpp in the build configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Add ColumnarBatchContext and ColumnarRowRef to reduce per-row construction overhead in KeyValueDataFileRecordReader.

Switch key/value row construction in KeyValueDataFileRecordReader to ColumnarRowRef and manage batch-level contexts through reader lifecycle.

Add unit coverage for ColumnarRowRef basic access and RowKind behavior.
@xylaaaaa xylaaaaa force-pushed the feature/columnar-row-ref branch from 48b8cbb to 50fd013 Compare February 13, 2026 06:08
Copy link
Collaborator

@lxy-9602 lxy-9602 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants