-
Notifications
You must be signed in to change notification settings - Fork 471
WIP - Preserve duplicate keys for scan resumptions #5957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 2.1
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be two scenarios where this will not work as expected.
First when reading duplicate keys from multiple files they may not always come back in the same order. So if 100 duplicate keys are spread across 3 files and 50 are read then when coming back and skipping 50 those may not be the same 50 values. The keys from multiple files are placed in a priority queue and it will have no guaranteed order for items with the same priority. If files are added or removed between reading scan batches this would probably greatly increase the chance of duplicate keys having a different scan order.
Second this change stores state on the server side. If a scan reads 50 of 100 duplicate keys from tserver A and then tserver A fails, then the scan will skip the other 50 when it resumes on another tservers.
One possible solution to this is to change server side scan batching such that it never ends a batch on a duplicate key or the scan fails if it would do that. So maybe we let the batch grow up to 2x or 3x the batch size (this could be configurable via a new setting) when duplicate keys are present and if it still fills up then the scan fails with an error. This way the scan either gets all the duplicate keys or it fails instead of getting incorrect data sometimes for this case.
Another possible solution is to sort data on the value when everything in the keys is equals. Then scans could continue based on key+value instead of just key. However persisted data is not currently sorted this way. This would be a really big change to code, data, and user creation of rfiles. It offers the advantage of avoiding having to buffer all duplicate keys on the server side..
|
Another thing to consider is that not all keys on the server side are MemKeys. Persisted data are not MemKeys. Only recently written data stored in inmemory maps are MemKeys. The purpose of the count on MemKeys is to avoid reading data written after a scan started. |
Fixes #5951
When a scan contained multiple duplicate keys and the scan broke or cut before returning all of those duplicates, the remainder of those duplicate keys were dropped and not returned when the scan resumed. This happened because when the scan cut, we did not store the
MemKeywhich contains an additional field,kvCountwhich when used, distinguishes otherwise duplicate keys. When we resumed the scan without these MemKey details, the scanned picked back up after the logical key we left off on, which incorrectly skips the remaining duplicate keys in the scan. We now store the full MemKey and when the scan resumes on the same key, skips only the one entry we already returned so the other duplicate keys are correctly returned now.The new test case fails without the rest of the changes and passes with them.