-
Notifications
You must be signed in to change notification settings - Fork 941
Define an Entity Merge algorithm (from OTEP #264) #4768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| - Remove any attribute from `Attributes` which exists in either the | ||
| description or identity of an entity in `E`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This step is unclear to me. Does it mean that any attributes of the new_entity can potentially drop attributes from other existing entities in E? Shouldn't we make resolving any cross-entity attribute conflicts a requirement for adding an entity to a resource?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more that they override them.
Remember, in the datamodel, attributes are a separate thing from entities vs. in OTLP. So what this says, is in the datamodel, you remove it from the "loose" attributes, because it exists in the new entity's attributes. The new entity will provide a new value that lands in OTLP resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this deserves a clarifying note, as I think most readers (for better or worse) are more familiar with OTLP than the data model document, and assume data model == OTLP. Consider:
Note: For cases where Entity Attributes are encoded as references to Resource Attributes, as is the case for OTLP, this implies Entity Attributes take precedence over any Resource Attributes which are not associated with an Entity.
Even though entities in the data model have their own set of attributes, I still agree that entity conflict resolution needs to be a part of the merge process. E.g. if two entities claim to have detected different values for the same key, it seems like that still needs to be resolved when merging entities into a resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dashpole Ah, yes, we need to add a new mechanism to deconflict entities that have the same attributes. We don't expect this to be allowed in semconv, but it's possible if this is open.
Happy to add a clarifying note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussion from SIG -
Principle - preserve as much information as possible
Actions
- Define "keep highest priority entity on key conflicts, drop entity ref of lower priority entity (if identity attribute), just the attribute (if descriptive)"
- Define
dropped_attributes_countonEntityRef - Define
dropped_entities_countonResource
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I've updated the specification here. I think dropped_*_count is a follow PR or set of actions.
| resource. | ||
|
|
||
| - Construct a set of existing entities on the resource, `E`. | ||
| - For each entity, `new_entity`, in priority order (highest first), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does an SDK determine the relative priority of entities?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resource detection configuration
specification/resource/data-model.md
Outdated
| - For each entity, `new_entity`, in priority order (highest first), | ||
| do one of the following: | ||
| - If an entity `e'` exsits in `E` with the same entity type as `new_entity`: | ||
| - Perform a [Entity DataModel Merge](../entities/data-model.md#merging-of-entities), if applicable, otherwise ignore `new_entity`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if applicable
nit: maybe rephrase to "if they are mergeable"? If applicable confused me here.
otherwise ignore
new_entity.
Do we consider this an error (e.g. surfaced using the SDK error handler or logged as a warning in the collector)? This makes it sound like these cases should be silently ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is considered to be an error case (or at least a misconfiguration). I think the term "ignore" just relates to what the SDK should construct in order to emit telemetry. Logging an error is also acceptable
| - Remove any attribute from `Attributes` which exists in either the | ||
| description or identity of an entity in `E`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this deserves a clarifying note, as I think most readers (for better or worse) are more familiar with OTLP than the data model document, and assume data model == OTLP. Consider:
Note: For cases where Entity Attributes are encoded as references to Resource Attributes, as is the case for OTLP, this implies Entity Attributes take precedence over any Resource Attributes which are not associated with an Entity.
Even though entities in the data model have their own set of attributes, I still agree that entity conflict resolution needs to be a part of the merge process. E.g. if two entities claim to have detected different values for the same key, it seems like that still needs to be resolved when merging entities into a resource.
3374dd3 to
de9dd47
Compare
| [Attribute Referencing Model](../entities/data-model.md#attribute-referencing-model)). | ||
| - If, for all entities, there are now overlapping attribute keys, then nothing | ||
| is needed. | ||
| - If there is a conflict where two entities use the same attribute key, but |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From Entities SIG - Should this be a failure?
We could require attributes can only be owned by one entity and make this be a runtime failure in SDKs + Collector to prevent this type of modelling.
Changes
Adds the merge algorithm specified in OTEP#264 to the Resource + Entity data model specification (both still in development).
This merge algorithm has been used in Entity prototypes for the past year and has not seen any change. It's been adaptable enough to handle several re-architectures of SDKs through prototyping, so we're confident in pushing it forward into the specification.
A few caveats:
Link to prototypes: