Skip to content

Conversation

@samikshya-db
Copy link
Collaborator

@samikshya-db samikshya-db commented Dec 29, 2025

Description

  • 4 things that is improved with respect to telemetry :
    • Common object mapper across telemetry use-case (This is already thread safe and is expensive to create, i.e., good tor re-use)
    • MakeflushIntervalMillis config same across both telemetry clients (un-auth and auth)
    • Clear connection param cache when connection is closed : this was a memory leak before
    • Rather than creating a scheduledExecutor for each telemetry client, we share it across a factory.

Testing

  • unit tests

Additional Notes to the Reviewer

  • When the LAST connection to a host is closed, all pending telemetry events for that host are flushed across all prior connections (since they all shared the same TelemetryClient). i.e., If you have 5 connections to host-A, closing connections 1-4 does nothing (just decrements refCount). Only when you close connection 5 (the last one) does the flush occur, sending all accumulated telemetry from all 5 connections.

NO_CHANGELOG=true

@samikshya-db samikshya-db changed the title [PECOBLR-1408] [PECOBLR-1407] Address telemetry code audit comments- part1 [PECOBLR-1408] [PECOBLR-1407] [PECOBLR-1409] Address telemetry code audit comments- part1 Dec 29, 2025
@samikshya-db samikshya-db changed the title [PECOBLR-1408] [PECOBLR-1407] [PECOBLR-1409] Address telemetry code audit comments- part1 [PECOBLR-1408] [PECOBLR-1407] [PECOBLR-1409][PECOBLR-1411] Address telemetry code audit comments- part1 Dec 30, 2025
Copy link
Collaborator

@tejassp-db tejassp-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Connection ref count never decrements to zero. Should be fixed.

*/
public TelemetryCollector getOrCreateCollector(IDatabricksConnectionContext context) {
String key =
(context != null && context.getConnectionUuid() != null)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for the context or connection uuid to be null? If both are always non-null, then we should enforce that in the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am skeptical of removing this because of ThreadLocal usage. Check the internal outage here : [Link]

this.executorService = executorService;
this.scheduledExecutorService =
Executors.newSingleThreadScheduledExecutor(createSchedulerThreadFactory());
this.scheduledExecutorService = scheduledExecutorService;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a flush is slow, does it block flush for other statements?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my understanding :
I schedule periodic flushes in telemetry client : this is shared across all connections in the process. This submits flush task to executor service (Added a 10 thread case - link).

This should not be blocking other statements.

@samikshya-db
Copy link
Collaborator Author

@tejassp-db - the fixes for Connection ref count never decrements to zero is merged too. thanks for the review.

key,
(k, holder) -> {
if (holder.refCount.get() <= 1) {
holder.connectionUuids.remove(connectionUuid);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this scenario probable

  1. ThreadA - It removes a connection and if branch goes through with empty condition being true.
  2. ThreadB - Adds a new connection ConnectionNew to the host
  3. ThreadA - Now operates under the incorrect assumption that there are no connections to the host. And will publish incomplete ConnectionNew details and truncate it as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[We have already discussed this offline, but adding the pointers for future reference]

The above scenario is not possible. create a connection-> every kind of export will be from TelemetryHelper -> which will in turn get the telemetry client from TelemetryClientFactory.

telemetryClientHolders is a ConcurrentHashMap in the telemetry client factory
ThreadA - It removes a connection and if branch goes through with empty condition being true. -> closes client in sync fashion
ThreadB add a new connection - telemetryClientHolders.compute will not start as threadA has the lock.

@tejassp-db has suggested that we document this behavior in comments. Will add comments to clarify.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants