Skip to content

Conversation

@ddanielr
Copy link
Contributor

@ddanielr ddanielr commented Dec 2, 2025

Scans can hang on the MultiIterator.seek method due to iterators waiting for the block cache lock for a given rfile.
Shuffling the rfile opening order should reduce the frequency of this cache locking problem.

@ddanielr ddanielr added this to the 2.1.5 milestone Dec 2, 2025
@ddanielr ddanielr requested review from ctubbsii and dlmarion December 2, 2025 04:06
@ddanielr
Copy link
Contributor Author

ddanielr commented Dec 2, 2025

Need to add test cases.

"The maximum amount of memory that will be used to cache results of a client query/scan. "
+ "Once this limit is reached, the buffered data is sent to the client.",
"1.3.5"),
TABLE_SHUFFLE_SOURCES("table.shuffle.sources", "false", PropertyType.BOOLEAN,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ctubbsii Not sure if this should get the experimental tag or not.
I wanted to make sure this was a configurable property as shuffling should increase hdfs read load and that might not be desired.

However, if you think it makes sense to change the default behavior then I can just remove the property changes.

Copy link
Contributor

@dlmarion dlmarion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some comments on the code, but can you point to where the contention is happening?

@keith-turner
Copy link
Contributor

keith-turner commented Dec 2, 2025

This PR modifies the the client side code for reading rfiles. Would want to also modify the server side code for reading rfiles. Maybe could shuffle the filesToOpen list in the following code.

Found this code by following ScanDataSource.createIterator() in the tserver code that creates servers side scan iterators.

Would there be any reason to shuffle inputs for compactions?

Shuffle the mapFile iterators before the MultiIterator is created to
avoid block cache contention.
@ddanielr ddanielr force-pushed the bugfix/reduce-thread-contention branch from 7f3c66a to 9223fc9 Compare December 5, 2025 17:17
List<InterruptibleIterator> mapfiles =
fileManager.openFiles(files, scanParams.isIsolated(), samplerConfig);
// Randomize the ordering of files to avoid block cache contention on seeks
Collections.shuffle(mapfiles);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use tablet.getTableConfiguration() to see if a shuffle property is set on the table. Another option, which is not currently wired up, would be to use an execution hint on a per-scan basis so that not all scans on the table are affected.

Realized the changes could be moved down to MultiIterator so the
behavior was similiar for deepCopies as well as constructed
MultiIterators.

Added a test for deepCopy since one did not exist
@ddanielr
Copy link
Contributor Author

ddanielr commented Dec 5, 2025

Worked through this with @keith-turner and found that the shuffling could be done lower in the MultiIterator before things are added to the priorityQueue.

Realized we had to do it in two places because of deepCopy. I added a test for deepCopy since that didn't exist.

I'm not sure if we need to gate this change behind a property or not so I've removed that code for now.
The MultiIterator is a system iterator so it's not part of the public API

@ddanielr ddanielr requested a review from keith-turner December 5, 2025 23:27
@ddanielr
Copy link
Contributor Author

ddanielr commented Dec 5, 2025

Ran the sunny ITs locally and this passed.

Copy link
Contributor

@dlmarion dlmarion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should make this change in a patch release without making it configurable and defaulting to off. I think this thread contention issue - where you have multiple scans hitting the same tablet at the same time resulting in the same set of files in the same order - likely only happens in a few cases:

 - a table with very few tablets
 - a case where you have a thundering herd of like queries starting at the same time

super(other.iters.size());
this.iters = new ArrayList<>();
this.fence = other.fence;
Collections.shuffle(other.iters);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These shuffles will make seeks happen in different order. There is still the case of initially opening the RFiles in FileManager that we could also shuffle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I handled the FileManager case in bb6102a

ddanielr and others added 2 commits December 16, 2025 20:46
…ultiIteratorTest.java

Co-authored-by: Keith Turner <kturner@apache.org>
Adds the table property for shuffling files.
Adds shuffling for files in the FileManager.

Moves the shuffling logic into a separate Iterator class and changes the
ScanDataSource code to select the specific iterator class.
@ddanielr
Copy link
Contributor Author

I'm not sure we should make this change in a patch release without making it configurable and defaulting to off. I think this thread contention issue - where you have multiple scans hitting the same tablet at the same time resulting in the same set of files in the same order - likely only happens in a few cases:

 - a table with very few tablets
 - a case where you have a thundering herd of like queries starting at the same time

Changed the code to move the shuffle to a different iterator class and handle the property lookup at the table level.
Added back in the property and set it to false for a default value.

Moving the functionality to a separate class was helpful as I could better track uses of MultiIterator to see where shuffling might be utilized.

Comment on lines 55 to 58
Collections.shuffle(other.iters);
for (SortedKeyValueIterator<Key,Value> iter : other.iters) {
iters.add(iter.deepCopy(env));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will shuffle the iterators the deep copy is created from and make the deep copy and source have the same order. Moving the shuffle after the loop independently shuffles the deep copy.

Suggested change
Collections.shuffle(other.iters);
for (SortedKeyValueIterator<Key,Value> iter : other.iters) {
iters.add(iter.deepCopy(env));
}
for (SortedKeyValueIterator<Key,Value> iter : other.iters) {
iters.add(iter.deepCopy(env));
}
Collections.shuffle(this.iters);

Support shuffled iterators in GeneratedSplits and OfflineIterator
Removed the duplicate test code by extending the original test class
Copy link
Contributor

@keith-turner keith-turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. This is still in draft, was there anything else that you wanted to do?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants