Skip to content

Comments

ci: run TPC-H benchmarks on a Kind Kubernetes cluster#3549

Closed
Shekharrajak wants to merge 7 commits intoapache:mainfrom
Shekharrajak:feature/3537-k8s-benchmark-ci
Closed

ci: run TPC-H benchmarks on a Kind Kubernetes cluster#3549
Shekharrajak wants to merge 7 commits intoapache:mainfrom
Shekharrajak:feature/3537-k8s-benchmark-ci

Conversation

@Shekharrajak
Copy link
Contributor

Which issue does this PR close?

Add GitHub CI workflow to run TPC-H benchmarks on a Kind Kubernetes cluster, validating Comet performance achieves ≥1.1x speedup over Spark baseline.

Closes #3537

Rationale for this change

Run Spark baseline benchmark
Run Comet benchmark
Validate speedup ≥ 1.1x (10% improvement)

What changes are included in this PR?

The workflow triggers on PRs modifying:

  • native/**/*.rs
  • spark/**/*.scala
  • spark/**/*.java

How are these changes tested?

Local Testing

# Setup cluster
./hack/k8s-benchmark-setup.sh
 
# Run benchmarks
./benchmarks/scripts/run-k8s-benchmark.sh spark q1
./benchmarks/scripts/run-k8s-benchmark.sh comet q1
 
# Compare results
python3 benchmarks/scripts/compare-results.py \
    --spark /tmp/comet-bench-results/spark_q1_result.json \
    --comet /tmp/comet-bench-results/comet_q1_result.json \
    --min-speedup 1.1
 
# Cleanup
./hack/k8s-benchmark-setup.sh --delete

@Shekharrajak Shekharrajak force-pushed the feature/3537-k8s-benchmark-ci branch from 9f78801 to 0b44de0 Compare February 19, 2026 08:14
on:
pull_request:
paths:
- "native/**/*.rs"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only code changes will trigger this .

env:
RUST_VERSION: stable
K8S_VERSION: "1.32.0"
SPARK_VERSION: "3.5"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can expand it for different spark versions


- name: Install K8s tools
run: |
curl -Lo ./kind "https://kind.sigs.k8s.io/dl/v0.26.0/kind-linux-amd64"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using stable kind version to create k8s cluster

print(f"Required: {min_speedup:.2f}x")
print("-" * 50)

if speedup >= min_speedup:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make sure that we are not degrading the performance - we can have bunch of queries, joins, read, write.

# See the License for the specific language governing permissions and
# limitations under the License.

FROM apache/spark:3.5.8 AS builder
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will extend for multiple spark versions


---
apiVersion: v1
kind: ServiceAccount
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

service account and RBAC is needed for driver/executors to have k8s API access.

helm repo add spark-operator https://kubeflow.github.io/spark-operator 2>/dev/null || true
helm repo update

if helm list -n spark-operator 2>/dev/null | grep -q spark-operator; then
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using kubeflow/spark-operator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can extend it for apache-spark-operator as well but under the hood both have similar way of running spark components.


kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind k8s cluter worker nodes configs

@Shekharrajak
Copy link
Contributor Author

@andygrove , can I get access to trigger the workflow - this will help me to validate quicker.

@Shekharrajak Shekharrajak force-pushed the feature/3537-k8s-benchmark-ci branch 2 times, most recently from 945c8c4 to 348ea4a Compare February 19, 2026 12:15

on:
pull_request:
# paths:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commenting this so that CI check get triggered.


COPY --from=builder /comet/spark/target/comet-spark-spark${SPARK_VERSION}_${SCALA_VERSION}-*.jar $SPARK_HOME/jars/
ARG COMET_JAR
COPY ${COMET_JAR} $SPARK_HOME/jars/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

building from source in dockerfile takes time.

@Shekharrajak Shekharrajak force-pushed the feature/3537-k8s-benchmark-ci branch from ac4eed3 to d65fffd Compare February 19, 2026 17:42
@andygrove
Copy link
Member

@andygrove , can I get access to trigger the workflow - this will help me to validate quicker.

I don't have a way to do that, I'm afraid.

@andygrove
Copy link
Member

@Shekharrajak Thanks for looking at this, but I have some concerns about this approach. CI takes way too long already and adding this workflow seems like it would add even more overhead. I am not sure that GitHub runners will provide consistent performance for benchmarking, and we really need to be testing with large data sets for meaningful results.

Committers already have the ability to trigger TPC-H benchmarks @ 100GB by commenting on PRs. This is quite new and experimental, so it hasn't been documented yet. These benchmarks run on dedicated hardware.

@Shekharrajak
Copy link
Contributor Author

@Shekharrajak Thanks for looking at this, but I have some concerns about this approach. CI takes way too long already and adding this workflow seems like it would add even more overhead. I am not sure that GitHub runners will provide consistent performance for benchmarking, and we really need to be testing with large data sets for meaningful results.

Committers already have the ability to trigger TPC-H benchmarks @ 100GB by commenting on PRs. This is quite new and experimental, so it hasn't been documented yet. These benchmarks run on dedicated hardware.

So do we expect some kind of script that can help running benchmarks in k8s cluster in local?

@andygrove
Copy link
Member

@Shekharrajak Thanks for looking at this, but I have some concerns about this approach. CI takes way too long already and adding this workflow seems like it would add even more overhead. I am not sure that GitHub runners will provide consistent performance for benchmarking, and we really need to be testing with large data sets for meaningful results.
Committers already have the ability to trigger TPC-H benchmarks @ 100GB by commenting on PRs. This is quite new and experimental, so it hasn't been documented yet. These benchmarks run on dedicated hardware.

So do we expect some kind of script that can help running benchmarks in k8s cluster in local?

The benchmarks do run in k8s already, but using local mode rather than truly distributed. I am planning on making that change, and I also need to align this with the benchmark scripts currently in the repo, which I am working on refactoring in #3538 and #3539. I planned on supporting k8s as a future step after the docker-compose one gets merged.

@Shekharrajak
Copy link
Contributor Author

The benchmarks do run in k8s already, but using local mode rather than truly distributed. I am planning on making that change, and I also need to align this with the benchmark scripts currently in the repo, which I am working on refactoring in #3538 and #3539. I planned on supporting k8s as a future step after the docker-compose one gets merged.

Thanks for sharing. Then I think no more work required as part of #3537 until those PRs are merged. We can close this PR for now.

@mbutrovich mbutrovich closed this Feb 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Kubernetes support to unified benchmark runner

3 participants