WIP - Preserve duplicate keys for scan resumptions #5957

DomGarguilo · 2025-10-10T19:45:12Z

When a scan contained multiple duplicate keys and the scan broke or cut before returning all of those duplicates, the remainder of those duplicate keys were dropped and not returned when the scan resumed. This happened because when the scan cut, we did not store the MemKey which contains an additional field, kvCount which when used, distinguishes otherwise duplicate keys. When we resumed the scan without these MemKey details, the scanned picked back up after the logical key we left off on, which incorrectly skips the remaining duplicate keys in the scan. We now store the full MemKey and when the scan resumes on the same key, skips only the one entry we already returned so the other duplicate keys are correctly returned now.

The new test case fails without the rest of the changes and passes with them.

keith-turner

There may be two scenarios where this will not work as expected.

First when reading duplicate keys from multiple files they may not always come back in the same order. So if 100 duplicate keys are spread across 3 files and 50 are read then when coming back and skipping 50 those may not be the same 50 values. The keys from multiple files are placed in a priority queue and it will have no guaranteed order for items with the same priority. If files are added or removed between reading scan batches this would probably greatly increase the chance of duplicate keys having a different scan order.

Second this change stores state on the server side. If a scan reads 50 of 100 duplicate keys from tserver A and then tserver A fails, then the scan will skip the other 50 when it resumes on another tservers.

One possible solution to this is to change server side scan batching such that it never ends a batch on a duplicate key or the scan fails if it would do that. So maybe we let the batch grow up to 2x or 3x the batch size (this could be configurable via a new setting) when duplicate keys are present and if it still fills up then the scan fails with an error. This way the scan either gets all the duplicate keys or it fails instead of getting incorrect data sometimes for this case.

Another possible solution is to sort data on the value when everything in the keys is equals. Then scans could continue based on key+value instead of just key. However persisted data is not currently sorted this way. This would be a really big change to code, data, and user creation of rfiles. It offers the advantage of avoiding having to buffer all duplicate keys on the server side..

keith-turner · 2025-11-18T23:10:02Z

Another thing to consider is that not all keys on the server side are MemKeys. Persisted data are not MemKeys. Only recently written data stored in inmemory maps are MemKeys. The purpose of the count on MemKeys is to avoid reading data written after a scan started.

DomGarguilo added 2 commits October 7, 2025 14:31

Add test case that proves bug

2fe1a3e

Preserve duplicate keys for scan resumptions

ccbea52

DomGarguilo added this to the 2.1.5 milestone Oct 10, 2025

DomGarguilo self-assigned this Oct 10, 2025

DomGarguilo added blocker This issue blocks any release version labeled on it. bug This issue has been verified to be a bug. labels Oct 10, 2025

DomGarguilo linked an issue Oct 20, 2025 that may be closed by this pull request

Scanning a table without the versioning iterator can drop keys #5951

Open

keith-turner reviewed Nov 18, 2025

View reviewed changes

DomGarguilo added 5 commits November 19, 2025 18:58

Merge remote-tracking branch 'upstream/2.1' into droppedKeysFix

72fcfd9

Merge remote-tracking branch 'upstream/2.1' into droppedKeysFix

b32c036

Merge remote-tracking branch 'upstream/2.1' into droppedKeysFix

f8d426e

Fix hanging bug, keep dup keys across boundaries

fc8b6c9

Revert unrelated change

0f01109

DomGarguilo changed the title ~~Preserve duplicate keys for scan resumptions~~ WIP - Preserve duplicate keys for scan resumptions Dec 10, 2025

DomGarguilo added 2 commits December 10, 2025 13:48

add new test class

dd3b3d3

Merge remote-tracking branch 'upstream/2.1' into droppedKeysFix

c87e52e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP - Preserve duplicate keys for scan resumptions #5957

WIP - Preserve duplicate keys for scan resumptions #5957

DomGarguilo commented Oct 10, 2025

Uh oh!

keith-turner left a comment •

edited

Loading

Uh oh!

keith-turner commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

WIP - Preserve duplicate keys for scan resumptions #5957

Are you sure you want to change the base?

WIP - Preserve duplicate keys for scan resumptions #5957

Conversation

DomGarguilo commented Oct 10, 2025

Uh oh!

keith-turner left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

keith-turner commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

keith-turner left a comment •

edited

Loading