-
Notifications
You must be signed in to change notification settings - Fork 471
Reduce thread contention by shuffling rfile order #5998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 2.1
Are you sure you want to change the base?
Conversation
|
Need to add test cases. |
| "The maximum amount of memory that will be used to cache results of a client query/scan. " | ||
| + "Once this limit is reached, the buffered data is sent to the client.", | ||
| "1.3.5"), | ||
| TABLE_SHUFFLE_SOURCES("table.shuffle.sources", "false", PropertyType.BOOLEAN, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ctubbsii Not sure if this should get the experimental tag or not.
I wanted to make sure this was a configurable property as shuffling should increase hdfs read load and that might not be desired.
However, if you think it makes sense to change the default behavior then I can just remove the property changes.
dlmarion
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some comments on the code, but can you point to where the contention is happening?
core/src/main/java/org/apache/accumulo/core/client/rfile/RFileScannerBuilder.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/accumulo/core/client/rfile/RFileScannerBuilder.java
Outdated
Show resolved
Hide resolved
|
This PR modifies the the client side code for reading rfiles. Would want to also modify the server side code for reading rfiles. Maybe could shuffle the accumulo/server/base/src/main/java/org/apache/accumulo/server/fs/FileManager.java Line 301 in 11a9bd9
Found this code by following ScanDataSource.createIterator() in the tserver code that creates servers side scan iterators. Would there be any reason to shuffle inputs for compactions? |
Shuffle the mapFile iterators before the MultiIterator is created to avoid block cache contention.
7f3c66a to
9223fc9
Compare
| List<InterruptibleIterator> mapfiles = | ||
| fileManager.openFiles(files, scanParams.isIsolated(), samplerConfig); | ||
| // Randomize the ordering of files to avoid block cache contention on seeks | ||
| Collections.shuffle(mapfiles); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use tablet.getTableConfiguration() to see if a shuffle property is set on the table. Another option, which is not currently wired up, would be to use an execution hint on a per-scan basis so that not all scans on the table are affected.
Realized the changes could be moved down to MultiIterator so the behavior was similiar for deepCopies as well as constructed MultiIterators. Added a test for deepCopy since one did not exist
|
Worked through this with @keith-turner and found that the shuffling could be done lower in the MultiIterator before things are added to the priorityQueue. Realized we had to do it in two places because of deepCopy. I added a test for deepCopy since that didn't exist. I'm not sure if we need to gate this change behind a property or not so I've removed that code for now. |
|
Ran the sunny ITs locally and this passed. |
dlmarion
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we should make this change in a patch release without making it configurable and defaulting to off. I think this thread contention issue - where you have multiple scans hitting the same tablet at the same time resulting in the same set of files in the same order - likely only happens in a few cases:
- a table with very few tablets
- a case where you have a thundering herd of like queries starting at the same time
core/src/test/java/org/apache/accumulo/core/iterators/system/MultiIteratorTest.java
Show resolved
Hide resolved
| super(other.iters.size()); | ||
| this.iters = new ArrayList<>(); | ||
| this.fence = other.fence; | ||
| Collections.shuffle(other.iters); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These shuffles will make seeks happen in different order. There is still the case of initially opening the RFiles in FileManager that we could also shuffle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I handled the FileManager case in bb6102a
…ultiIteratorTest.java Co-authored-by: Keith Turner <kturner@apache.org>
Adds the table property for shuffling files. Adds shuffling for files in the FileManager. Moves the shuffling logic into a separate Iterator class and changes the ScanDataSource code to select the specific iterator class.
Changed the code to move the shuffle to a different iterator class and handle the property lookup at the table level. Moving the functionality to a separate class was helpful as I could better track uses of MultiIterator to see where shuffling might be utilized. |
| Collections.shuffle(other.iters); | ||
| for (SortedKeyValueIterator<Key,Value> iter : other.iters) { | ||
| iters.add(iter.deepCopy(env)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will shuffle the iterators the deep copy is created from and make the deep copy and source have the same order. Moving the shuffle after the loop independently shuffles the deep copy.
| Collections.shuffle(other.iters); | |
| for (SortedKeyValueIterator<Key,Value> iter : other.iters) { | |
| iters.add(iter.deepCopy(env)); | |
| } | |
| for (SortedKeyValueIterator<Key,Value> iter : other.iters) { | |
| iters.add(iter.deepCopy(env)); | |
| } | |
| Collections.shuffle(this.iters); |
core/src/main/java/org/apache/accumulo/core/iteratorsImpl/system/MultiShuffledIterator.java
Outdated
Show resolved
Hide resolved
Support shuffled iterators in GeneratedSplits and OfflineIterator
core/src/test/java/org/apache/accumulo/core/iterators/system/MultiShuffledIteratorTest.java
Outdated
Show resolved
Hide resolved
Removed the duplicate test code by extending the original test class
keith-turner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. This is still in draft, was there anything else that you wanted to do?
Scans can hang on the MultiIterator.seek method due to iterators waiting for the block cache lock for a given rfile.
Shuffling the rfile opening order should reduce the frequency of this cache locking problem.