Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Jan 2, 2026

Refactored the sort-merge join implementation to improve code organization by extracting all filter-related logic into a dedicated filter.rs module.

Changes:

  • Created new filter.rs module (~576 lines) containing:

    • Filter metadata tracking (FilterMetadata struct)
    • Deferred filtering decision logic (needs_deferred_filtering)
    • Filter mask correction for different join types (get_corrected_filter_mask)
    • Filter application with null-joined row handling (filter_record_batch_by_join_type)
    • Helper functions for filter column extraction and batch filtering
  • Updated stream.rs:

    • Removed ~450 lines of filter-specific code
    • Now delegates to filter module functions
    • Simplified main join logic to focus on stream processing
  • Updated tests.rs:

    • Updated imports to use new filter module
    • Changed test code to use FilterMetadata struct
    • All 47 sort-merge join tests passing

🤖 Generated with Claude Code

Which issue does this PR close?

  • Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Jan 2, 2026
@viirya viirya force-pushed the refactor-smj-filter-module branch from 87bff13 to 4d6f0dd Compare January 2, 2026 22:09
Refactored the sort-merge join implementation to improve code organization
by extracting all filter-related logic into a dedicated filter.rs module.

Changes:
- Created new filter.rs module (~576 lines) containing:
  - Filter metadata tracking (FilterMetadata struct)
  - Deferred filtering decision logic (needs_deferred_filtering)
  - Filter mask correction for different join types (get_corrected_filter_mask)
  - Filter application with null-joined row handling (filter_record_batch_by_join_type)
  - Helper functions for filter column extraction and batch filtering

- Updated stream.rs:
  - Removed ~450 lines of filter-specific code
  - Now delegates to filter module functions
  - Simplified main join logic to focus on stream processing

- Updated tests.rs:
  - Updated imports to use new filter module
  - Changed test code to use FilterMetadata struct
  - All 47 sort-merge join tests passing

The refactoring maintains all existing functionality with no behavior changes.
Null-joined batch creation for outer joins with different column counts is
handled correctly by:
- Properly extracting and replacing columns based on join type and batch organization
- Using RecordBatchOptions to bypass strict nullable field validation in outer joins

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@viirya viirya force-pushed the refactor-smj-filter-module branch from 4d6f0dd to 348f56c Compare January 2, 2026 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant