Reduce thread contention by shuffling rfile order #5998

ddanielr · 2025-12-02T04:05:55Z

Scans can hang on the MultiIterator.seek method due to iterators waiting for the block cache lock for a given rfile.
Shuffling the rfile opening order should reduce the frequency of this cache locking problem.

ddanielr · 2025-12-02T04:06:31Z

Need to add test cases.

ddanielr · 2025-12-02T04:13:16Z

core/src/main/java/org/apache/accumulo/core/conf/Property.java

      "The maximum amount of memory that will be used to cache results of a client query/scan. "
          + "Once this limit is reached, the buffered data is sent to the client.",
      "1.3.5"),
+  TABLE_SHUFFLE_SOURCES("table.shuffle.sources", "false", PropertyType.BOOLEAN,


@ctubbsii Not sure if this should get the experimental tag or not.
I wanted to make sure this was a configurable property as shuffling should increase hdfs read load and that might not be desired.

However, if you think it makes sense to change the default behavior then I can just remove the property changes.

dlmarion

Made some comments on the code, but can you point to where the contention is happening?

core/src/main/java/org/apache/accumulo/core/client/rfile/RFileScannerBuilder.java

keith-turner · 2025-12-02T16:39:01Z

This PR modifies the the client side code for reading rfiles. Would want to also modify the server side code for reading rfiles. Maybe could shuffle the filesToOpen list in the following code.

accumulo/server/base/src/main/java/org/apache/accumulo/server/fs/FileManager.java

Line 301 in 11a9bd9

for (String file : filesToOpen) {

Found this code by following ScanDataSource.createIterator() in the tserver code that creates servers side scan iterators.

Would there be any reason to shuffle inputs for compactions?

Shuffle the mapFile iterators before the MultiIterator is created to avoid block cache contention.

dlmarion · 2025-12-05T18:26:28Z

server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/ScanDataSource.java

+    List<InterruptibleIterator> mapfiles =
        fileManager.openFiles(files, scanParams.isIsolated(), samplerConfig);
+    // Randomize the ordering of files to avoid block cache contention on seeks
+    Collections.shuffle(mapfiles);


I think you can use tablet.getTableConfiguration() to see if a shuffle property is set on the table. Another option, which is not currently wired up, would be to use an execution hint on a per-scan basis so that not all scans on the table are affected.

Realized the changes could be moved down to MultiIterator so the behavior was similiar for deepCopies as well as constructed MultiIterators. Added a test for deepCopy since one did not exist

ddanielr · 2025-12-05T22:13:39Z

Worked through this with @keith-turner and found that the shuffling could be done lower in the MultiIterator before things are added to the priorityQueue.

Realized we had to do it in two places because of deepCopy. I added a test for deepCopy since that didn't exist.

I'm not sure if we need to gate this change behind a property or not so I've removed that code for now.
The MultiIterator is a system iterator so it's not part of the public API

ddanielr · 2025-12-05T23:48:47Z

Ran the sunny ITs locally and this passed.

dlmarion

I'm not sure we should make this change in a patch release without making it configurable and defaulting to off. I think this thread contention issue - where you have multiple scans hitting the same tablet at the same time resulting in the same set of files in the same order - likely only happens in a few cases:

 - a table with very few tablets
 - a case where you have a thundering herd of like queries starting at the same time

core/src/test/java/org/apache/accumulo/core/iterators/system/MultiIteratorTest.java

keith-turner · 2025-12-16T23:39:47Z

core/src/main/java/org/apache/accumulo/core/iteratorsImpl/system/MultiIterator.java

    super(other.iters.size());
    this.iters = new ArrayList<>();
    this.fence = other.fence;
+    Collections.shuffle(other.iters);


These shuffles will make seeks happen in different order. There is still the case of initially opening the RFiles in FileManager that we could also shuffle.

I believe I handled the FileManager case in bb6102a

…ultiIteratorTest.java Co-authored-by: Keith Turner <kturner@apache.org>

Adds the table property for shuffling files. Adds shuffling for files in the FileManager. Moves the shuffling logic into a separate Iterator class and changes the ScanDataSource code to select the specific iterator class.

ddanielr · 2025-12-17T13:49:12Z

I'm not sure we should make this change in a patch release without making it configurable and defaulting to off. I think this thread contention issue - where you have multiple scans hitting the same tablet at the same time resulting in the same set of files in the same order - likely only happens in a few cases:
 - a table with very few tablets
 - a case where you have a thundering herd of like queries starting at the same time

Changed the code to move the shuffle to a different iterator class and handle the property lookup at the table level.
Added back in the property and set it to false for a default value.

Moving the functionality to a separate class was helpful as I could better track uses of MultiIterator to see where shuffling might be utilized.

keith-turner · 2025-12-17T14:35:34Z

core/src/main/java/org/apache/accumulo/core/iteratorsImpl/system/MultiShuffledIterator.java

+    Collections.shuffle(other.iters);
+    for (SortedKeyValueIterator<Key,Value> iter : other.iters) {
+      iters.add(iter.deepCopy(env));
+    }


This will shuffle the iterators the deep copy is created from and make the deep copy and source have the same order. Moving the shuffle after the loop independently shuffles the deep copy.

Suggested change

Collections.shuffle(other.iters);

for (SortedKeyValueIterator<Key,Value> iter : other.iters) {

iters.add(iter.deepCopy(env));

}

for (SortedKeyValueIterator<Key,Value> iter : other.iters) {

iters.add(iter.deepCopy(env));

}

Collections.shuffle(this.iters);

core/src/main/java/org/apache/accumulo/core/iteratorsImpl/system/MultiShuffledIterator.java

Support shuffled iterators in GeneratedSplits and OfflineIterator

core/src/test/java/org/apache/accumulo/core/iterators/system/MultiShuffledIteratorTest.java

Removed the duplicate test code by extending the original test class

keith-turner

This looks good to me. This is still in draft, was there anything else that you wanted to do?

ddanielr added this to the 2.1.5 milestone Dec 2, 2025

ddanielr requested review from ctubbsii and dlmarion December 2, 2025 04:06

ddanielr commented Dec 2, 2025

View reviewed changes

dlmarion reviewed Dec 2, 2025

View reviewed changes

core/src/main/java/org/apache/accumulo/core/client/rfile/RFileScannerBuilder.java Outdated Show resolved Hide resolved

core/src/main/java/org/apache/accumulo/core/client/rfile/RFileScannerBuilder.java Outdated Show resolved Hide resolved

Shuffle the files for ServerSide iterators

9223fc9

Shuffle the mapFile iterators before the MultiIterator is created to avoid block cache contention.

ddanielr force-pushed the bugfix/reduce-thread-contention branch from 7f3c66a to 9223fc9 Compare December 5, 2025 17:17

dlmarion reviewed Dec 5, 2025

View reviewed changes

ddanielr added 2 commits December 5, 2025 21:52

Modified MultiIterator instead of ScanDataSource

28c8f09

Realized the changes could be moved down to MultiIterator so the behavior was similiar for deepCopies as well as constructed MultiIterators. Added a test for deepCopy since one did not exist

formatting change

c67d7a0

ddanielr requested a review from keith-turner December 5, 2025 23:27

dlmarion reviewed Dec 6, 2025

View reviewed changes

keith-turner reviewed Dec 16, 2025

View reviewed changes

ddanielr and others added 2 commits December 16, 2025 20:46

Update core/src/test/java/org/apache/accumulo/core/iterators/system/M…

f3438ce

…ultiIteratorTest.java Co-authored-by: Keith Turner <kturner@apache.org>

Adds shuffle prop and fileManager shuffling

bb6102a

Adds the table property for shuffling files. Adds shuffling for files in the FileManager. Moves the shuffling logic into a separate Iterator class and changes the ScanDataSource code to select the specific iterator class.

keith-turner reviewed Dec 17, 2025

View reviewed changes

ddanielr added 2 commits December 17, 2025 19:24

Clean up constructors for MuliIterator

fa0e2b6

Support shuffled iterators in GeneratedSplits and OfflineIterator

fix formatting

147a980

keith-turner reviewed Dec 23, 2025

View reviewed changes

core/src/test/java/org/apache/accumulo/core/iterators/system/MultiShuffledIteratorTest.java Outdated Show resolved Hide resolved

Remove duplicate test code

ed88068

Removed the duplicate test code by extending the original test class

keith-turner approved these changes Dec 23, 2025

View reviewed changes

Reduce thread contention by shuffling rfile order #5998

Are you sure you want to change the base?

Reduce thread contention by shuffling rfile order #5998

Conversation

ddanielr commented Dec 2, 2025

Uh oh!

ddanielr commented Dec 2, 2025

Uh oh!

ddanielr Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

dlmarion left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

keith-turner commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dlmarion Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

ddanielr commented Dec 5, 2025

Uh oh!

ddanielr commented Dec 5, 2025

Uh oh!

dlmarion left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

keith-turner Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

ddanielr Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

ddanielr commented Dec 17, 2025

Uh oh!

keith-turner Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

keith-turner left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

keith-turner commented Dec 2, 2025 •

edited

Loading