Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
c358c44
[SPARK-55162][PYTHON] Extract transformers from ArrowStreamUDFSerializer
Yicong-Huang Jan 24, 2026
d0c2644
fix: format
Yicong-Huang Jan 24, 2026
fc23683
refactor: simplify
Yicong-Huang Jan 24, 2026
1464cfb
refactor: use function and apply maps
Yicong-Huang Jan 24, 2026
5f9627b
refactor: move to conversion.py
Yicong-Huang Jan 24, 2026
b396e10
fix: format
Yicong-Huang Jan 24, 2026
f6dbc95
fix: keep wrapper
Yicong-Huang Jan 24, 2026
535ad45
refactor: use transformer for GroupArrowUDFSerializer
Yicong-Huang Jan 24, 2026
26b0a70
refactor: use flatten_struct in ArrowStreamArrowUDTFSerializer
Yicong-Huang Jan 24, 2026
8467a59
fix: import
Yicong-Huang Jan 24, 2026
31370e2
refactor: extract converter logic out to Conversion
Yicong-Huang Jan 25, 2026
71c5e49
fix: format
Yicong-Huang Jan 25, 2026
84bdb21
refactor: simplify and add comments
Yicong-Huang Jan 25, 2026
7ce137a
fix: type annotation
Yicong-Huang Jan 25, 2026
aa33b98
Merge remote-tracking branch 'origin/SPARK-55169/refactor/arrow-udtf-…
Yicong-Huang Jan 26, 2026
6a0e897
refactor: extract to_pandas transformer
Yicong-Huang Jan 26, 2026
ba75137
Merge upstream master into SPARK-55176
Yicong-Huang Jan 28, 2026
06aeec7
refactor: simplify
Yicong-Huang Jan 28, 2026
56f6a37
fix: revert changes
Yicong-Huang Jan 28, 2026
4bdf46c
revert: bring back convert_legacy
Yicong-Huang Jan 28, 2026
eecdd13
fix: comments
Yicong-Huang Jan 28, 2026
9efc623
fix: type
Yicong-Huang Jan 28, 2026
e5c6ad1
fix: unused import
Yicong-Huang Jan 28, 2026
1c7c9ed
refactor: use spark_type from callsite
Yicong-Huang Jan 29, 2026
cbb3a90
fix: use classmethod
Yicong-Huang Jan 29, 2026
bec3f44
fix: simplify
Yicong-Huang Jan 29, 2026
cf51876
fix: import and doc
Yicong-Huang Jan 29, 2026
78f2920
refactor: extract `to_pandas` transformer
Yicong-Huang Jan 29, 2026
2ca4b27
refactor: simplify serializers and move data transformations to wrapp…
Yicong-Huang Jan 30, 2026
f359724
refactor: simplify ArrowStreamArrowUDTFSerializer using zip_batches
Yicong-Huang Jan 30, 2026
c0f084e
fix: correct pandas UDF struct handling and parameter initialization
Yicong-Huang Jan 31, 2026
bf3ba3f
refactor: remove unused ArrowStreamGroupSerializer parameters
Yicong-Huang Jan 31, 2026
9b34b0b
refactor: extract common verification functions and centralize Arrow-…
Yicong-Huang Jan 31, 2026
afa0a25
refactor: simplify UDF wrappers and data flow
Yicong-Huang Jan 31, 2026
63b6fbe
fix: remove unused imports to fix linting errors
Yicong-Huang Jan 31, 2026
79d7c41
fix: correct SQL_BATCHED_UDF and legacy Arrow UDF handling
Yicong-Huang Jan 31, 2026
ba21e3e
Merge remote-tracking branch 'upstream/master' into SPARK-55175/poc/s…
Yicong-Huang Jan 31, 2026
78728c2
wip
Yicong-Huang Jan 31, 2026
9cbf250
fix: simplify ArrowStreamArrowUDTFSerializer type coercion
Yicong-Huang Jan 31, 2026
70c0eb5
wip: remove ArrowStreamArrowUDTFSerializer
Yicong-Huang Jan 31, 2026
60ffb0c
refactor: remove TransformWithState PySpark Row serializers
Yicong-Huang Feb 1, 2026
cabf94c
refactor: merge to `enforece_schema` transform
Yicong-Huang Feb 2, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions python/pyspark/errors/error-conditions.json
Original file line number Diff line number Diff line change
Expand Up @@ -380,6 +380,11 @@
"<arg_name> index out of range, got '<index>'."
]
},
"INVALID_ARROW_BATCH_ZIP": {
"message": [
"Cannot zip Arrow batches/arrays: <reason>."
]
},
"INVALID_ARROW_UDTF_RETURN_TYPE": {
"message": [
"The return type of the arrow-optimized Python UDTF should be of type 'pandas.DataFrame', but the '<func>' method returned a value of type <return_type> with value: <value>."
Expand Down
10 changes: 5 additions & 5 deletions python/pyspark/sql/connect/session.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@
from pyspark.sql.connect.readwriter import DataFrameReader
from pyspark.sql.connect.streaming.readwriter import DataStreamReader
from pyspark.sql.connect.streaming.query import StreamingQueryManager
from pyspark.sql.pandas.serializers import ArrowStreamPandasSerializer
from pyspark.sql.conversion import PandasBatchTransformer
from pyspark.sql.pandas.types import (
to_arrow_schema,
to_arrow_type,
Expand Down Expand Up @@ -630,15 +630,15 @@ def createDataFrame(

safecheck = configs["spark.sql.execution.pandas.convertToArrowArraySafely"]

ser = ArrowStreamPandasSerializer(cast(str, timezone), safecheck == "true", False)

_table = pa.Table.from_batches(
[
ser._create_batch(
PandasBatchTransformer.to_arrow(
[
(c, at, st)
for (_, c), at, st in zip(data.items(), arrow_types, spark_types)
]
],
timezone=cast(str, timezone),
safecheck=safecheck == "true",
)
]
)
Expand Down
Loading