-
Notifications
You must be signed in to change notification settings - Fork 800
Rolling upgrade test (BATS, Docker) #3706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Rolling upgrade test (BATS, Docker) #3706
Conversation
…eper Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>
…r checking Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>
Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>
Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>
…, clean up comments Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>
Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>
…ssert, eliminate tedious patterns Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>
…rade Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>
…, reverse upgrade order, skip intermediate verification Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>
…iners Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>
…er ownership Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>
…set proper ownership" This reverts commit 769a8e4.
…ted working Co-authored-by: dsmiley <377295+dsmiley@users.noreply.github.com>
|
Idea: Soon, solr will have Testcontainers as a dependency (#3670), and Testcontainers is ideal for orchestrating and managing containers like this. I suppose you can write a JUnit test that sets up a docker network, then adds 3 Solr containers in cloud mode, then write a loop where you swap one after the other with a newer version and inspect logs or what you need. It makes sense for such a test to be disabled by default or included in nightly. Here's Claude's draft stab at a rolling test: https://claude.ai/public/artifacts/c4d09d7b-df54-419a-b660-169e22191a26 |
Thank you for #3670 but you say having a dependency (on Testcontainers) as if we've achieved something as a project. It's a few lines of configuration to depend on it. An achievement would be https://issues.apache.org/jira/browse/SOLR-17653 albeit that particular issue wouldn't be helpful for the upgrade test.
I am a huge fan of Testcontainers; I filed SOLR-17653 after all. I would have personally enjoyed writing/working with Java code to manipulate containers instead of Bash/BATS. I could have used my desire for SOLR-17653 as an excuse here to do it. I was awefully tempted. I contemplated doing that very thing and I had a 1:1 call with Eric Pugh about this rolling upgrade test to discuss how to go about it. I chose Bash/BATS in spite of disliking the framework. I am no good at this framework; GitHub Copilot did the critical 1st draft and most improvements afterwards. The rolling upgrade test scenario doesn't speak to the strengths of Testcontainers specifically (JUnit wrapper for Docker). It's to the strenghts of Docker, yes, put not specifically Testcontainers/JUnit. This is just orchestration here (Not SolrJ integration which definitely requires SOLR-17653), which is basically what our pile of Bash/BATS scripts do. We have a pile of similar Docker tests in Bash too, albeit not using BATS. Together ( Put differently, if this specific test were to exist as a JUnit/Testcontainers based test, then it really calls into question why we have any Bash/BATS scripts at all -- why would/shouldn't they also be TestContainers based? "When in Rome, do as the romans do" -- I respected our status quo because it works, even though I'm not productive in it. |
We changed base image in #3782, and I can see that it does not have cURL. Perhaps we should edit the Dockerfile to add it? |
|
I think so... I've logged onto a Docker container and done "curl localhost" a million times to confirm the server was running, and I wasn't binding to 0.0.0.0 so many times. Did it the other day with Solr.. For this PR, I think we actually can run curl locally, and I want to move it towards more use of standard bats patterns... |
|
Will retest after #3803 get's merged! |
| TEST_STARTED_AT_ISO=$(date -Iseconds) | ||
| export TEST_STARTED_AT_ISO | ||
|
|
||
| # Persist artifacts under Gradle’s test-output (fallback to Bats temp dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this? AI likes ot have fallbacks and null checks, but it makes things more verbose. Let's just pick one or the other. I actually like having it under test-output as i find the bats temp dir hard to find when I need to debug ;-). maybe we should put it under test-output?
| docker run --name solr-node1 -d \ | ||
| --network solrcloud-test \ | ||
| --memory=400m \ | ||
| --platform linux/amd64 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why specify the platform?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise it fails on my mac m4... I was seeing some sort of alarming alert about this.. I will check again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly the nightly docker images are not multiarch...
https://hub.docker.com/r/apache/solr-nightly/tags
that should be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed... I don't think I have the docker skills to fix this, but would love to review!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @janhoy for the multi arch build!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Humm... I went back and retested and it looks like I was a bit premature on removing the platform tag, so restored it.
i thought we changed out back to having curl in it? |
| # which means you don't get an error message for passing a start arg, like --jvm-opts to a stop commmand. | ||
|
|
||
| # Pre-check | ||
| timeout || skip "timeout utility is not available" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to use the new wait_for method but didn't have any luck. So instead, let's just have better handling for when timeout isn't available.
| local resp code body | ||
| sleep 5 | ||
| resp=$(curl -s -S -w "\n%{http_code}" -X POST -H 'Content-type:application/json' -d "$json" "$url") | ||
| code="${resp##*$'\n'}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should be using a run and a assert_success type bats calls?
|
I've been doing some testing of Solr 9.7 to 9.10 wiht changing What would we want it to have to be considred worth merging? SSL setup? oauth based security? Other complex things? |
Gosh; nothing complex to get this useful test merged! I think the leading concern I have is that this test is slow and just different... I would like it to somehow get run separately from a normal integration test run. Like this test should perhaps be opt-in, and we do the opt-in on Jenkins CI but not elsewhere. |
|
@dsmiley I think the one change I want to make is rename it I'd love to get this in to use in testing some other scenarios. |
This reverts commit 4bebd3c.
Tried to make it much closer to test_docker_solrcloud style, but the combination of setup and setup_file + the tear downs defeated me.
dsmiley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_rolling_upgrade.bats is a much better name.
What keeps this heavy test from the normal rotation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change this test in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping to get both of them using better docker patterns, I'll back it out...
Will rename. As far as heavy test form normal rotation, it honestly isn't that much heavier than some other tests that run. I think in a seperate PR would should come up with criteria for what is considered heavy, and deal with it there... |
I spent a bunch of time with a GitHub Copilot Agent to develop a new BATS test to test a rolling upgrade of SolrCloud.
It ignores the current project/build since it runs Docker containers of Solr.
I think this test should not be run by default; it's a slow test. Not sure yet on the best way to segment this test and maybe other slow ones. At least another directory.
It could be improved to build the local project in Docker and use that image as the upgrade destination target.
Separately, I want to use this as a basis to check for Overseer disablement settings, but that's not present here. Maybe I'll add it in this PR.
Disclaimer: Bash is not my comfort zone, but AI churned this out. IntelliJ Junie was also super helpful.