Skip to content

Conversation

@epugh
Copy link
Contributor

@epugh epugh commented Nov 30, 2025

https://issues.apache.org/jira/browse/SOLR-18008

Description

I've added two tests that simulate what happens if you have remnant core files that collide with new core creation.

@test "create replicated collections when core remnants exist" demonstrates when you create a collection and one of the cores exists.

I am adding @test "add replica when core remnants exist" that does the same for creating a replica.

I am thinking of adding @test "delete failed replica when replica create fails due to remnants exist" as I've seen that scenario also fail.

Solution

Look at what changes to existing logic are required to deal with remnants existing. I'm not trying to figure out why we get remnants, as I don't have visibility into how it happens at this time.

Two scenarios that I've seen are adding a new collection where a core remnants already exists and adding a replica where the core remnants already exists.

Tests

Added a Junit test that demonstrates with and without the parameter. I wanted both in case at some point in the future the default behavior changes when it comes to remnants. I will delete the BATS test before merging.

@epugh epugh changed the title bats test to simulate remnant core SOLR-18008: bats test to simulate various issues with remnant core Dec 1, 2025
@epugh epugh changed the title SOLR-18008: bats test to simulate various issues with remnant core SOLR-18008: Remnant core files prevent Collections for working Dec 1, 2025
Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not see this solution...

preExistingZkEntry = getZkController().checkIfCoreNodeNameAlreadyExists(cd);
}

if (Files.exists(cd.getInstanceDir())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error you are trying to avoid, coming out of coresLocate.create, seems like a relatively fine protection in case a valid core is there. Curious; does calling a core admin level API to delete the core work? I don't mean replica deletion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will take a looksee.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i still need to check on this.

@@ -0,0 +1,62 @@
#!/usr/bin/env bats
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't think we should add a bats test for matters that JUnit could easily handle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to move it to junit. My challenge with the junit tests, and it might just be my knowledge, is that i sturggle to really feel like I understand how these junit tests really map to the real world. Whereas the bats tests map much closer. Maybe because I think from outside solr looking in, versus as a java coder, inside looking out.

For example, the logic of creating a fake directory to trigger the failure case. I get it in bats, and don't have a good sense of doing that in junit test.

@epugh epugh marked this pull request as ready for review December 31, 2025 15:07
@github-actions github-actions bot added documentation Improvements or additions to documentation client:solrj labels Dec 31, 2025
@epugh
Copy link
Contributor Author

epugh commented Dec 31, 2025

Okay, two things.

  1. If the JUnit test looks good, then I'll delete the bats test.

  2. I'm thinking that maybe we want to slip into the current 10x release process a update to the system property name. In this PR i removed the .startup. from the name because it's not a startup ONLY thing. However in release 10.0 we moved the name from solr.deleteUnknownCores to solr.cloud.startup.delete.unknown.cores.enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cat:cloud client:solrj documentation Improvements or additions to documentation tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants