Skip to content

Conversation

@dsmiley
Copy link
Contributor

@dsmiley dsmiley commented Sep 26, 2025

I spent a bunch of time with a GitHub Copilot Agent to develop a new BATS test to test a rolling upgrade of SolrCloud.
It ignores the current project/build since it runs Docker containers of Solr.

I think this test should not be run by default; it's a slow test. Not sure yet on the best way to segment this test and maybe other slow ones. At least another directory.

It could be improved to build the local project in Docker and use that image as the upgrade destination target.

Separately, I want to use this as a basis to check for Overseer disablement settings, but that's not present here. Maybe I'll add it in this PR.

Disclaimer: Bash is not my comfort zone, but AI churned this out. IntelliJ Junie was also super helpful.

Copilot AI and others added 21 commits September 23, 2025 04:09
…eper

Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>
…r checking

Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>
Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>
Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>
…, clean up comments

Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>
Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>
…ssert, eliminate tedious patterns

Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>
…rade

Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>
…, reverse upgrade order, skip intermediate verification

Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>
…iners

Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>
…er ownership

Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>
…ted working

Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>
@janhoy
Copy link
Contributor

janhoy commented Oct 16, 2025

Idea: Soon, solr will have Testcontainers as a dependency (#3670), and Testcontainers is ideal for orchestrating and managing containers like this. I suppose you can write a JUnit test that sets up a docker network, then adds 3 Solr containers in cloud mode, then write a loop where you swap one after the other with a newer version and inspect logs or what you need.

It makes sense for such a test to be disabled by default or included in nightly.

Here's Claude's draft stab at a rolling test: https://claude.ai/public/artifacts/c4d09d7b-df54-419a-b660-169e22191a26

@dsmiley
Copy link
Contributor Author

dsmiley commented Oct 16, 2025

Idea: Soon, solr will have Testcontainers as a dependency (#3670), and Testcontainers is ideal for orchestrating and managing containers like this.

Thank you for #3670 but you say having a dependency (on Testcontainers) as if we've achieved something as a project. It's a few lines of configuration to depend on it. An achievement would be https://issues.apache.org/jira/browse/SOLR-17653 albeit that particular issue wouldn't be helpful for the upgrade test.

I suppose you can write a JUnit test that sets up a docker network, then adds 3 Solr containers in cloud mode, then write a loop where you swap one after the other with a newer version and inspect logs or what you need.

It makes sense for such a test to be disabled by default or included in nightly.

Here's Claude's draft stab at a rolling test: https://claude.ai/public/artifacts/c4d09d7b-df54-419a-b660-169e22191a26

I am a huge fan of Testcontainers; I filed SOLR-17653 after all. I would have personally enjoyed writing/working with Java code to manipulate containers instead of Bash/BATS. I could have used my desire for SOLR-17653 as an excuse here to do it. I was awefully tempted. I contemplated doing that very thing and I had a 1:1 call with Eric Pugh about this rolling upgrade test to discuss how to go about it. I chose Bash/BATS in spite of disliking the framework. I am no good at this framework; GitHub Copilot did the critical 1st draft and most improvements afterwards. The rolling upgrade test scenario doesn't speak to the strengths of Testcontainers specifically (JUnit wrapper for Docker). It's to the strenghts of Docker, yes, put not specifically Testcontainers/JUnit. This is just orchestration here (Not SolrJ integration which definitely requires SOLR-17653), which is basically what our pile of Bash/BATS scripts do. We have a pile of similar Docker tests in Bash too, albeit not using BATS. Together (:solr:package:integrationTests & :solr:docker:testDocker) are high level integration tests, which is what the test here definitely is.

Put differently, if this specific test were to exist as a JUnit/Testcontainers based test, then it really calls into question why we have any Bash/BATS scripts at all -- why would/shouldn't they also be TestContainers based? "When in Rome, do as the romans do" -- I respected our status quo because it works, even though I'm not productive in it.

@janhoy
Copy link
Contributor

janhoy commented Oct 21, 2025

Did our Solr 10 base image change and remove curl?

We changed base image in #3782, and I can see that it does not have cURL. Perhaps we should edit the Dockerfile to add it?

@epugh
Copy link
Contributor

epugh commented Oct 21, 2025

I think so... I've logged onto a Docker container and done "curl localhost" a million times to confirm the server was running, and I wasn't binding to 0.0.0.0 so many times. Did it the other day with Solr..

For this PR, I think we actually can run curl locally, and I want to move it towards more use of standard bats patterns...

@epugh
Copy link
Contributor

epugh commented Oct 23, 2025

Will retest after #3803 get's merged!

TEST_STARTED_AT_ISO=$(date -Iseconds)
export TEST_STARTED_AT_ISO

# Persist artifacts under Gradle’s test-output (fallback to Bats temp dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this? AI likes ot have fallbacks and null checks, but it makes things more verbose. Let's just pick one or the other. I actually like having it under test-output as i find the bats temp dir hard to find when I need to debug ;-). maybe we should put it under test-output?

docker run --name solr-node1 -d \
--network solrcloud-test \
--memory=400m \
--platform linux/amd64 \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why specify the platform?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise it fails on my mac m4... I was seeing some sort of alarming alert about this.. I will check again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly the nightly docker images are not multiarch...

https://hub.docker.com/r/apache/solr-nightly/tags

that should be fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed... I don't think I have the docker skills to fix this, but would love to review!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @janhoy for the multi arch build!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Humm... I went back and retested and it looks like I was a bit premature on removing the platform tag, so restored it.

@epugh
Copy link
Contributor

epugh commented Dec 16, 2025

Did our Solr 10 base image change and remove curl?

We changed base image in #3782, and I can see that it does not have cURL. Perhaps we should edit the Dockerfile to add it?

i thought we changed out back to having curl in it?

@epugh
Copy link
Contributor

epugh commented Dec 16, 2025

Did our Solr 10 base image change and remove curl?

We changed base image in #3782, and I can see that it does not have cURL. Perhaps we should edit the Dockerfile to add it?

i thought we changed out back to having curl in it?

#3803 was where we restored curl.

# which means you don't get an error message for passing a start arg, like --jvm-opts to a stop commmand.

# Pre-check
timeout || skip "timeout utility is not available"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to use the new wait_for method but didn't have any luck. So instead, let's just have better handling for when timeout isn't available.

local resp code body
sleep 5
resp=$(curl -s -S -w "\n%{http_code}" -X POST -H 'Content-type:application/json' -d "$json" "$url")
code="${resp##*$'\n'}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be using a run and a assert_success type bats calls?

@epugh
Copy link
Contributor

epugh commented Dec 17, 2025

I've been doing some testing of Solr 9.7 to 9.10 wiht changing luceneMatchVersion and schema.xml versions, and using a variant of this test. I'm not sure when I have taken this one test too far.....

What would we want it to have to be considred worth merging? SSL setup? oauth based security? Other complex things?

@dsmiley
Copy link
Contributor Author

dsmiley commented Dec 17, 2025

What would we want it to have to be considred worth merging? SSL setup? oauth based security? Other complex things?

Gosh; nothing complex to get this useful test merged! I think the leading concern I have is that this test is slow and just different... I would like it to somehow get run separately from a normal integration test run. Like this test should perhaps be opt-in, and we do the opt-in on Jenkins CI but not elsewhere.

@epugh
Copy link
Contributor

epugh commented Jan 5, 2026

@dsmiley I think the one change I want to make is rename it test_rolling_upgrade.bats. The fact that it uses Docker isn't important, and we don't label other tests, like the extraction ones, with the word 'docker'.... WDYT?

I'd love to get this in to use in testing some other scenarios.

epugh added 3 commits January 5, 2026 06:48
Tried to make it much closer to test_docker_solrcloud style, but the combination of setup and setup_file + the tear downs defeated me.
Copy link
Contributor Author

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_rolling_upgrade.bats is a much better name.

What keeps this heavy test from the normal rotation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change this test in this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping to get both of them using better docker patterns, I'll back it out...

@epugh
Copy link
Contributor

epugh commented Jan 5, 2026

test_rolling_upgrade.bats is a much better name.

What keeps this heavy test from the normal rotation?

Will rename. As far as heavy test form normal rotation, it honestly isn't that much heavier than some other tests that run. I think in a seperate PR would should come up with criteria for what is considered heavy, and deal with it there...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants