[SPARK-55264] Add ExecuteOutput command to Spark Connect pipelines proto#54104
[SPARK-55264] Add ExecuteOutput command to Spark Connect pipelines proto#54104aditya-nambiar wants to merge 6 commits intoapache:masterfrom
Conversation
JIRA Issue Information=== Improvement SPARK-55264 === This comment was automatically generated by GitHub Actions |
sql/connect/common/src/main/protobuf/spark/connect/pipelines.proto
Outdated
Show resolved
Hide resolved
| // Storage location for pipeline checkpoints and metadata. | ||
| optional string storage = 8; | ||
|
|
||
| google.protobuf.Any extension = 999; |
There was a problem hiding this comment.
It's conventional to make this repeated in case there are multiple extensions.
sql/connect/common/src/main/protobuf/spark/connect/pipelines.proto
Outdated
Show resolved
Hide resolved
|
Thanks for submitting this @aditya-nambiar – I just left a couple comments. Are you also able to fill out the description? |
sryza
left a comment
There was a problem hiding this comment.
LGTM! It looks like there are some irrelevant changes to python/test_support/sql/orc_partitioned/_SUCCESS?
| } | ||
|
|
||
| // Request to execute all flows for a single output (dataset or sink) remotely. | ||
| message ExecuteOutput { |
There was a problem hiding this comment.
I would probably be a little more explicit and name this something like ExecuteFlowsPerOutput, but no strong opinions here; fine to leave as-is.
| } | ||
| } | ||
|
|
||
| // Request to execute all flows for a single output (dataset or sink) remotely. |
There was a problem hiding this comment.
| // Request to execute all flows for a single output (dataset or sink) remotely. | |
| // Request to execute all flows for a single output (dataset or sink) remotely. |
|
You'll probably need to also regenerate the Python proto wrappers, which you can do with |
5dfcbe9 to
6f14ef8
Compare
6f14ef8 to
e0920e3
Compare
hvanhovell
left a comment
There was a problem hiding this comment.
@aditya-nambiar Please generate the python files!
Change itself LGTM
What changes were proposed in this pull request?
This pull request adds a new ExecuteOutput command to the Spark Connect pipelines protobuf definition. The new command enables clients to directly execute multiple flows writing to an output.
Why are the changes needed?
Required to enable standlone Python MV/ST
Does this PR introduce any user-facing change?
No
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?