Skip to content

Comments

[telemetry] Add retry logic for partial telemetry push failures#1224

Open
akgitrepos wants to merge 3 commits intodatabricks:mainfrom
akgitrepos:telemetry-retry-fix
Open

[telemetry] Add retry logic for partial telemetry push failures#1224
akgitrepos wants to merge 3 commits intodatabricks:mainfrom
akgitrepos:telemetry-retry-fix

Conversation

@akgitrepos
Copy link

Description

Implements retry logic for partial telemetry push failures in TelemetryPushClient. When the telemetry service returns a partial success response (numProtoSuccess < totalEvents), the client now automatically retries up to 3 times with exponential backoff (1s → 2s → 4s, max 10s).

This addresses the TODO comment in TelemetryPushClient.java.

Changes Made

  • Added configurable maxRetries parameter (default: 3)
  • Implemented exponential backoff strategy (aligned with DatabricksHttpRetryHandler)
  • Added unit tests for retry behavior

Testing

  • Added 2 new unit tests:
    • pushEvent_noRetryOnFullSuccess - Verifies no retry when all events succeed
    • pushEvent_retriesOnPartialSuccess - Verifies retry is triggered on partial failure
  • All 14 TelemetryPushClient tests pass

Additional Notes

  • Backward compatible - existing constructor still works with default retry count
  • Idempotent - retries all events (safe for duplicates)

Implements retry logic for partial telemetry push failures. When the
telemetry service returns a partial success response, the client now
automatically retries up to 3 times with exponential backoff.

This addresses the TODO comment in TelemetryPushClient.java.

Signed-off-by: Akshey Sigdel <sigdelakshey@gmail.com>
Signed-off-by: Akshey Sigdel <sigdelakshey@gmail.com>
@akgitrepos
Copy link
Author

@gopalldb @vikrantpuppala Hi, I've implemented retry logic for partial telemetry push failures which was one of the TODO item. Would appreciate a review. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant