-
Notifications
You must be signed in to change notification settings - Fork 27
[PECOBLR-1408] [PECOBLR-1407] [PECOBLR-1409][PECOBLR-1411] Address telemetry code audit comments- part1 #1163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[PECOBLR-1408] [PECOBLR-1407] [PECOBLR-1409][PECOBLR-1411] Address telemetry code audit comments- part1 #1163
Conversation
tejassp-db
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Connection ref count never decrements to zero. Should be fixed.
src/main/java/com/databricks/jdbc/api/impl/arrow/ArrowResultChunk.java
Outdated
Show resolved
Hide resolved
| */ | ||
| public TelemetryCollector getOrCreateCollector(IDatabricksConnectionContext context) { | ||
| String key = | ||
| (context != null && context.getConnectionUuid() != null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for the context or connection uuid to be null? If both are always non-null, then we should enforce that in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am skeptical of removing this because of ThreadLocal usage. Check the internal outage here : [Link]
| this.executorService = executorService; | ||
| this.scheduledExecutorService = | ||
| Executors.newSingleThreadScheduledExecutor(createSchedulerThreadFactory()); | ||
| this.scheduledExecutorService = scheduledExecutorService; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a flush is slow, does it block flush for other statements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my understanding :
I schedule periodic flushes in telemetry client : this is shared across all connections in the process. This submits flush task to executor service (Added a 10 thread case - link).
This should not be blocking other statements.
|
@tejassp-db - the fixes for |
| key, | ||
| (k, holder) -> { | ||
| if (holder.refCount.get() <= 1) { | ||
| holder.connectionUuids.remove(connectionUuid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this scenario probable
- ThreadA - It removes a connection and if branch goes through with empty condition being true.
- ThreadB - Adds a new connection ConnectionNew to the host
- ThreadA - Now operates under the incorrect assumption that there are no connections to the host. And will publish incomplete ConnectionNew details and truncate it as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[We have already discussed this offline, but adding the pointers for future reference]
The above scenario is not possible. create a connection-> every kind of export will be from TelemetryHelper -> which will in turn get the telemetry client from TelemetryClientFactory.
telemetryClientHolders is a ConcurrentHashMap in the telemetry client factory
ThreadA - It removes a connection and if branch goes through with empty condition being true. -> closes client in sync fashion
ThreadB add a new connection - telemetryClientHolders.compute will not start as threadA has the lock.
@tejassp-db has suggested that we document this behavior in comments. Will add comments to clarify.
Description
flushIntervalMillisconfig same across both telemetry clients (un-auth and auth)Testing
Additional Notes to the Reviewer
host-A, closing connections 1-4 does nothing (just decrements refCount). Only when you close connection 5 (the last one) does the flush occur, sending all accumulated telemetry from all 5 connections.NO_CHANGELOG=true