[SPARK-55264] Add ExecuteOutput command to Spark Connect pipelines proto#54104

Open

aditya-nambiar wants to merge 6 commits intoapache:masterfrom

aditya-nambiar:SPARK-55264

aditya-nambiar commented Feb 3, 2026 •

edited

Loading

What changes were proposed in this pull request?

This pull request adds a new ExecuteOutput command to the Spark Connect pipelines protobuf definition. The new command enables clients to directly execute multiple flows writing to an output.

Why are the changes needed?

Required to enable standlone Python MV/ST

Does this PR introduce any user-facing change?

No

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?


          [SPARK-55264] Add ExecuteOutput command to Spark Connect pipelines proto

aa5e344

github-actions bot commented Feb 3, 2026

JIRA Issue Information

=== Improvement SPARK-55264 ===
Summary: Add Spark Connect command for directly executing flows
Assignee: None
Status: Open
Affected: ["4.1.1"]

This comment was automatically generated by GitHub Actions

github-actions bot added SQL CONNECT labels

sryza reviewed

View reviewed changes

sql/connect/common/src/main/protobuf/spark/connect/pipelines.proto Outdated Show resolved Hide resolved

sql/connect/common/src/main/protobuf/spark/connect/pipelines.proto Outdated

+                  // Storage location for pipeline checkpoints and metadata.
+                  optional string storage = 8;
+                  google.protobuf.Any extension = 999;

Contributor

sryza Feb 3, 2026

It's conventional to make this repeated in case there are multiple extensions.

sql/connect/common/src/main/protobuf/spark/connect/pipelines.proto Outdated Show resolved Hide resolved

Contributor

sryza commented Feb 4, 2026

Thanks for submitting this @aditya-nambiar – I just left a couple comments. Are you also able to fill out the description?


          Removed catalog, database and sql_conf

91deca0

aditya-nambiar requested a review from sryza

February 4, 2026 04:23

github-actions bot added the PYTHON label

sryza approved these changes

View reviewed changes

Contributor

sryza left a comment

LGTM! It looks like there are some irrelevant changes to python/test_support/sql/orc_partitioned/_SUCCESS?

sql/connect/common/src/main/protobuf/spark/connect/pipelines.proto Outdated

                 }
+                // Request to execute all flows for a single output (dataset or sink)  remotely.
+                message ExecuteOutput {

Contributor

sryza Feb 4, 2026

I would probably be a little more explicit and name this something like ExecuteFlowsPerOutput, but no strong opinions here; fine to leave as-is.

sql/connect/common/src/main/protobuf/spark/connect/pipelines.proto Outdated

                   }
                 }
+                // Request to execute all flows for a single output (dataset or sink)  remotely.

Contributor

sryza Feb 4, 2026

Suggested change

      
              // Request to execute all flows for a single output (dataset or sink)  remotely.
          
              // Request to execute all flows for a single output (dataset or sink) remotely.

Contributor

sryza commented Feb 4, 2026

You'll probably need to also regenerate the Python proto wrappers, which you can do with dev/connect-gen-protos.sh

aditya-nambiar force-pushed the SPARK-55264 branch 3 times, most recently from 5dfcbe9 to 6f14ef8 Compare

February 5, 2026 04:40


          ExecuteOutput -> ExecuteOutputFlows

e0920e3

aditya-nambiar force-pushed the SPARK-55264 branch from 6f14ef8 to e0920e3 Compare

February 5, 2026 04:41

hvanhovell reviewed

View reviewed changes

Contributor

hvanhovell left a comment

@aditya-nambiar Please generate the python files!

Change itself LGTM

aditya-nambiar added 3 commits

February 10, 2026 17:19


          Generated protos

e0f4838


          Changed comment

04ee9d9


          Rebased to master

cd0edbc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CONNECT PYTHON SQL