Skip to content

[SPARK-55264] Add ExecuteOutput command to Spark Connect pipelines proto#54104

Open
aditya-nambiar wants to merge 6 commits intoapache:masterfrom
aditya-nambiar:SPARK-55264
Open

[SPARK-55264] Add ExecuteOutput command to Spark Connect pipelines proto#54104
aditya-nambiar wants to merge 6 commits intoapache:masterfrom
aditya-nambiar:SPARK-55264

Conversation

@aditya-nambiar
Copy link

@aditya-nambiar aditya-nambiar commented Feb 3, 2026

What changes were proposed in this pull request?

This pull request adds a new ExecuteOutput command to the Spark Connect pipelines protobuf definition. The new command enables clients to directly execute multiple flows writing to an output.

Why are the changes needed?

Required to enable standlone Python MV/ST

Does this PR introduce any user-facing change?

No

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

@github-actions
Copy link

github-actions bot commented Feb 3, 2026

JIRA Issue Information

=== Improvement SPARK-55264 ===
Summary: Add Spark Connect command for directly executing flows
Assignee: None
Status: Open
Affected: ["4.1.1"]


This comment was automatically generated by GitHub Actions

// Storage location for pipeline checkpoints and metadata.
optional string storage = 8;

google.protobuf.Any extension = 999;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's conventional to make this repeated in case there are multiple extensions.

@sryza
Copy link
Contributor

sryza commented Feb 4, 2026

Thanks for submitting this @aditya-nambiar – I just left a couple comments. Are you also able to fill out the description?

Copy link
Contributor

@sryza sryza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! It looks like there are some irrelevant changes to python/test_support/sql/orc_partitioned/_SUCCESS?

}

// Request to execute all flows for a single output (dataset or sink) remotely.
message ExecuteOutput {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably be a little more explicit and name this something like ExecuteFlowsPerOutput, but no strong opinions here; fine to leave as-is.

}
}

// Request to execute all flows for a single output (dataset or sink) remotely.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Request to execute all flows for a single output (dataset or sink) remotely.
// Request to execute all flows for a single output (dataset or sink) remotely.

@sryza
Copy link
Contributor

sryza commented Feb 4, 2026

You'll probably need to also regenerate the Python proto wrappers, which you can do with dev/connect-gen-protos.sh

@aditya-nambiar aditya-nambiar force-pushed the SPARK-55264 branch 3 times, most recently from 5dfcbe9 to 6f14ef8 Compare February 5, 2026 04:40
Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aditya-nambiar Please generate the python files!

Change itself LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants