Skip to content

Commit 6495c32

Browse files
authored
fix(container-runtime): Distinguish "Duplicate DataStore Id" errors between permanently corrupted op stream v. single-client one-time failure (#26021)
Today we throw a DataCorruption error with message "Duplicate DataStore created with existing id" if we process a DataStore attach op but already have that ID (not counting when it's our own attach op). However, if we have that ID for an unbound DataStore - meaning, we created it locally and it doesn't exist anywhere else - then as long as we crash the container without trying to attach our copy, this case will not result in a corrupted container. We have seen this play out in cases where a partner had a bug that violated protocol and returned an existing container when the client thought it was creating a new container. In that case (or any such case), we want to properly represent the impact of this error case - it's not permanent DataCorruption, subsequent sessions succeed.
1 parent 9daa8f8 commit 6495c32

File tree

4 files changed

+316
-16
lines changed

4 files changed

+316
-16
lines changed

packages/runtime/container-runtime/src/channelCollection.ts

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -492,9 +492,33 @@ export class ChannelCollection
492492
continue;
493493
}
494494

495-
// If a non-local operation then go and create the object, otherwise mark it as officially attached.
495+
// Check for collision with local (not yet live / known to other clients) DataStore
496+
// This is not a DataCorruption case if we crash the container before the DataStore becomes visible to others (it's a DataProcessingError instead)
497+
//
498+
// POSSIBLE CAUSES:
499+
// - Something with ID creation, e.g. a bug in shortID logic, or somehow a generated ID matches an existing alias.
500+
// - An invalid operation by the application or service where an existing container is returned to a new container attach call,
501+
// resulting in duplicate accounting for objects that were supposed to be local-only. e.g. if the application patches in custom
502+
// logic not supported by Fluid's API.
503+
if (this.contexts.getUnbound(attachMessage.id) !== undefined) {
504+
const error = DataProcessingError.create(
505+
"Local DataStore matches remote DataStore id",
506+
"DataStoreAttach",
507+
envelope,
508+
{ ...tagCodeArtifacts({ dataStoreId: attachMessage.id }) },
509+
);
510+
throw error;
511+
}
512+
513+
// Check for collision with already processed (attaching/attached or aliased) DataStore
514+
// This is presumed to indicate a corrupted op stream, where we'd expect all future sessions to fail here too.
515+
//
516+
// POSSIBLE CAUSES:
517+
// - A bug in the service or driver that results in ops being duplicated
518+
// - Similar to above, an existing container being returned to a new container attach call,
519+
// where the DataStore in question was already made locally visible before container attach.
520+
// (Perhaps future sessions would not fail in this case, but it's hypothetical and hard to differentiate)
496521
if (this.alreadyProcessed(attachMessage.id)) {
497-
// TODO: dataStoreId may require a different tag from PackageData #7488
498522
const error = new DataCorruptionError(
499523
// pre-0.58 error message: duplicateDataStoreCreatedWithExistingId
500524
"Duplicate DataStore created with existing id",

packages/runtime/container-runtime/src/dataStoreContexts.ts

Lines changed: 56 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -13,27 +13,41 @@ import {
1313
import type { FluidDataStoreContext, LocalFluidDataStoreContext } from "./dataStoreContext.js";
1414

1515
/**
16+
* Manages the collection of data store contexts, tracking their bound/unbound state.
17+
*
18+
* @remarks
19+
* A context is "unbound" when it's created locally but not yet made visible (reachable from root).
20+
* A context is "bound" once it's made locally visible, regardless of the Container's attach state.
21+
* In attached containers, binding a context immediately sends an attach op and transitions it to Attaching state.
22+
*
1623
* @internal
1724
*/
1825
export class DataStoreContexts
1926
implements Iterable<[string, FluidDataStoreContext]>, IDisposable
2027
{
28+
/**
29+
* Set of IDs for contexts that are unbound (not yet made locally visible).
30+
* These contexts exist locally but aren't known to other clients (even in an attached container).
31+
*/
2132
private readonly notBoundContexts = new Set<string>();
2233

2334
/**
24-
* Attached and loaded context proxies
35+
* Map of all data store contexts (both bound and unbound).
2536
*/
2637
private readonly _contexts = new Map<string, FluidDataStoreContext>();
2738

2839
/**
2940
* List of pending context waiting either to be bound or to arrive from another client.
3041
* This covers the case where a local context has been created but not yet bound,
31-
* or the case where a client knows a store will exist and is waiting on its creation,
42+
* or the case where a client knows a store will exist (e.g. by alias) and is waiting on its creation,
3243
* so that a caller may await the deferred's promise until such a time as the context is fully ready.
3344
* This is a superset of _contexts, since contexts remain here once the Deferred resolves.
3445
*/
3546
private readonly deferredContexts = new Map<string, Deferred<FluidDataStoreContext>>();
3647

48+
/**
49+
* Lazy disposal logic that disposes all contexts when called.
50+
*/
3751
// eslint-disable-next-line unicorn/consistent-function-scoping -- Property is defined once; no need to extract inner lambda
3852
private readonly disposeOnce = new Lazy<void>(() => {
3953
// close/stop all store contexts
@@ -73,22 +87,39 @@ export class DataStoreContexts
7387
}
7488
public readonly dispose = (): void => this.disposeOnce.value;
7589

90+
/**
91+
* Returns the count of unbound contexts (i.e. local-only on this client)
92+
*/
7693
public notBoundLength(): number {
7794
return this.notBoundContexts.size;
7895
}
7996

97+
/**
98+
* Returns true if the given ID corresponds to an unbound context. (i.e. local-only on this client)
99+
*/
80100
public isNotBound(id: string): boolean {
81101
return this.notBoundContexts.has(id);
82102
}
83103

104+
/**
105+
* Returns true if a context with the given ID exists (bound or unbound).
106+
*/
84107
public has(id: string): boolean {
85108
return this._contexts.has(id);
86109
}
87110

111+
/**
112+
* Returns the context with the given ID, or undefined if not found.
113+
* This returns both bound and unbound contexts.
114+
*/
88115
public get(id: string): FluidDataStoreContext | undefined {
89116
return this._contexts.get(id);
90117
}
91118

119+
/**
120+
* Deletes the context with the given ID from all internal maps.
121+
* @returns True if the context was found and deleted, false otherwise.
122+
*/
92123
public delete(id: string): boolean {
93124
this.deferredContexts.delete(id);
94125
this.notBoundContexts.delete(id);
@@ -100,16 +131,24 @@ export class DataStoreContexts
100131
return this._contexts.delete(id);
101132
}
102133

134+
/**
135+
* Map of recently deleted contexts for diagnostic purposes for GC.
136+
* Allows retrieval of context information even after deletion for logging/telemetry.
137+
*/
103138
private readonly _recentlyDeletedContexts: Map<string, FluidDataStoreContext | undefined> =
104139
new Map();
105140

141+
/**
142+
* Returns a recently deleted context by ID, or undefined if not found.
143+
* Used for diagnostic logging for GC, when a deleted context is referenced.
144+
*/
106145
public getRecentlyDeletedContext(id: string): FluidDataStoreContext | undefined {
107146
return this._recentlyDeletedContexts.get(id);
108147
}
109148

110149
/**
111-
* Return the unbound local context with the given id,
112-
* or undefined if it's not found or not unbound.
150+
* Returns the unbound local context with the given ID.
151+
* @returns The unbound context, or undefined if not found or not unbound.
113152
*/
114153
public getUnbound(id: string): LocalFluidDataStoreContext | undefined {
115154
const context = this._contexts.get(id);
@@ -121,7 +160,8 @@ export class DataStoreContexts
121160
}
122161

123162
/**
124-
* Add the given context, marking it as to-be-bound
163+
* Adds the given context to the collection, marking it as unbound (not yet locally visible).
164+
* Asserts that no context with this ID already exists.
125165
*/
126166
public addUnbound(context: LocalFluidDataStoreContext): void {
127167
const id = context.id;
@@ -152,6 +192,10 @@ export class DataStoreContexts
152192
return deferredContext.promise;
153193
}
154194

195+
/**
196+
* Gets or creates a deferred promise for the given context ID.
197+
* Used to allow waiting for contexts that don't exist yet.
198+
*/
155199
private ensureDeferred(id: string): Deferred<FluidDataStoreContext> {
156200
const deferred = this.deferredContexts.get(id);
157201
if (deferred) {
@@ -164,7 +208,8 @@ export class DataStoreContexts
164208
}
165209

166210
/**
167-
* Update this context as bound
211+
* Marks the context with the given ID as bound (locally visible).
212+
* Removes it from the unbound set and resolves its deferred promise.
168213
*/
169214
public bind(id: string): void {
170215
const removed: boolean = this.notBoundContexts.delete(id);
@@ -191,9 +236,11 @@ export class DataStoreContexts
191236
}
192237

193238
/**
194-
* Add the given context, marking it as not local-only.
195-
* This could be because it's a local context that's been bound, or because it's a remote context.
196-
* @param context - The context to add
239+
* Adds the given context to the collection as already bound or from a remote client.
240+
* This is used when:
241+
* - Adding a local context that's already been bound via the bind() method, OR
242+
* - Adding a remote context that was created by another client.
243+
* The context's deferred promise is resolved immediately.
197244
*/
198245
public addBoundOrRemoted(context: FluidDataStoreContext): void {
199246
const id = context.id;

0 commit comments

Comments
 (0)