Skip to content

Comments

[SPARK-55336][PYTHON] Let createDF use create_batch logic for decoupling#54111

Closed
Yicong-Huang wants to merge 2 commits intoapache:masterfrom
Yicong-Huang:SPARK-55336/refactor/factor-out-create-batch-logic
Closed

[SPARK-55336][PYTHON] Let createDF use create_batch logic for decoupling#54111
Yicong-Huang wants to merge 2 commits intoapache:masterfrom
Yicong-Huang:SPARK-55336/refactor/factor-out-create-batch-logic

Conversation

@Yicong-Huang
Copy link
Contributor

What changes were proposed in this pull request?

This PR duplicates the pandas-to-Arrow batch conversion logic in ArrowStreamPandasSerializer to decouple it.

  • create_arrow_array_from_pandas() - converts a pandas Series to Arrow Array
  • create_arrow_batch_from_pandas() - converts a list of (series, spark_type) tuples to Arrow RecordBatch

Both _create_from_pandas_with_arrow (classic Spark) and createDataFrame (Spark Connect) now use these standalone functions directly with ArrowStreamSerializer, instead of depending on ArrowStreamPandasSerializer.

Why are the changes needed?

For better decoupling. Previously, createDataFrame had to instantiate ArrowStreamPandasSerializer just to call its _create_batch method.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions
Copy link

github-actions bot commented Feb 3, 2026

JIRA Issue Information

=== Sub-task SPARK-55336 ===
Summary: Factor out ArrowStreamPandasSerializer._create_batch logic for createDataFrame
Assignee: None
Status: Open
Affected: ["4.2.0"]


This comment was automatically generated by GitHub Actions

Copy link
Contributor

@zhengruifeng zhengruifeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending CI

@HyukjinKwon
Copy link
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants