update FilesExt retry logic #1211

yuanjieding-db · 2026-01-20T14:12:33Z

What changes are proposed in this pull request?

WHAT

Extending retry function with a new parameter max_attempt to allow client to retry and fail after certain amount
Remove 500 from FilesExt retry status code
Add new config to set the retry attempts for FilesExt
Update the retry logic of FilesExt to fail after certain attempts.

WHY

500 errors shouldn't be retried
The FilesExt should always prioritize fallback over retry to avoid regression

How is this tested?

Unit tests were updated to reflect the change.

parthban-db · 2026-01-26T18:18:49Z

databricks/sdk/config.py

+    # Maximum number of retry attempts for FilesExt cloud API operations.
+    # This works in conjunction with retry_timeout_seconds - whichever limit
+    # is hit first will stop the retry loop.
+    files_ext_cloud_api_max_retries: int = 3


Maybe we can use this as a temporary parameter, since our end goal is not to fallback.

Suggested change

# Maximum number of retry attempts for FilesExt cloud API operations.

# This works in conjunction with retry_timeout_seconds - whichever limit

# is hit first will stop the retry loop.

files_ext_cloud_api_max_retries: int = 3

# Maximum number of retry attempts for FilesExt cloud API operations.

# This works in conjunction with retry_timeout_seconds - whichever limit

# is hit first will stop the retry loop.

experimental_files_ext_cloud_api_max_retries: int = 3

parthban-db · 2026-01-26T18:20:59Z

databricks/sdk/retries.py

+
+            # Determine which limit was hit
+            if max_attempts is not None and attempt > max_attempts:
+                raise TimeoutError(f"Exceeded max retry attempts ({max_attempts})") from last_err


Do we have a better error to represent this error? TimeoutError feels a bit odd for this case, as the function is not actually timed out.

Ideally we should use a custom type RetryError, with TimeoutError and MaxRetryExceededError as it's derived types, so that the user can catch the RetryError if they don't care why the retry exhausted, while keeping the information.
However since we have been using built-in TimeoutError and users may already be depending on this behavior, it is risky to change it to a different Error.
If we were to introduce a new type of error for max retry exceeded scenario, it would be more difficult for the upper layer to handle the retry error: it needs to catch both Errors manually.

I don't see a better solution here, unless we can rewrite the retry logic completely, or make FilesExt using a different retry library.

parthban-db · 2026-01-26T18:25:39Z

tests/test_retries.py

+    def failing_function():
+        nonlocal call_count
+        call_count += 1
+        raise ValueError("test error")


Can we sleep on this failing function for a sec? I am a bit worried that this test will become flaky because it is possible to retry 100 times in 2 seconds.

It wouldn't because of the backoff logic, right?

parthban-db · 2026-01-26T18:27:31Z

tests/test_retries.py

+def test_max_attempts_none_preserves_backward_compatibility():
+    """Test that max_attempts=None only uses timeout (backward compatibility)."""


Suggested change

def test_max_attempts_none_preserves_backward_compatibility():

"""Test that max_attempts=None only uses timeout (backward compatibility)."""

def test_max_attempts_none():

"""Test that max_attempts=None only uses timeout."""

I think we don't need to mention that this test is for backward compatibility because, almost always, a unit test is for regression catching.

parthban-db · 2026-01-26T18:28:15Z

tests/test_retries.py

            assert call_count == attempts
+
+
+def test_max_attempts_respected():


Can we make these a table test to simplify the tests?

parthban-db · 2026-01-26T18:28:51Z

tests/test_retries.py

+    with pytest.raises(TimeoutError) as exc_info:
+        failing_function()
+
+    # Should have attempted 3 times (initial + 2 retries)


Suggested change

# Should have attempted 3 times (initial + 2 retries)

# Should have attempted 3 times (initial + 2 retries).

Period after a sentence, ditto all.

github-actions · 2026-02-02T13:30:48Z

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/sdk-py

Inputs:

PR number: 1211
Commit SHA: 2cfa627d0cb5ba41bb17e33fb23985d1dda4cd01

Checks will be approved automatically on success.

yuanjieding-db requested a review from parthban-db January 20, 2026 14:12

yuanjieding-db temporarily deployed to test-trigger-is January 20, 2026 14:12 — with GitHub Actions Inactive

yuanjieding-db temporarily deployed to test-trigger-is January 20, 2026 14:23 — with GitHub Actions Inactive

yuanjieding-db force-pushed the fix-indefinite-retry branch from 45745d5 to 969720e Compare January 21, 2026 15:24

yuanjieding-db temporarily deployed to test-trigger-is January 21, 2026 15:24 — with GitHub Actions Inactive

parthban-db temporarily deployed to test-trigger-is January 26, 2026 18:16 — with GitHub Actions Inactive

parthban-db reviewed Jan 26, 2026

View reviewed changes

yuanjieding-db added 3 commits February 2, 2026 14:30

update FilesExt retry logic

ab289f6

fmt

d2a168d

address comments

2cfa627

yuanjieding-db force-pushed the fix-indefinite-retry branch from 87e5849 to 2cfa627 Compare February 2, 2026 13:30

yuanjieding-db temporarily deployed to test-trigger-is February 2, 2026 13:30 — with GitHub Actions Inactive

yuanjieding-db requested a review from parthban-db February 3, 2026 12:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update FilesExt retry logic #1211

update FilesExt retry logic #1211

Uh oh!

yuanjieding-db commented Jan 20, 2026

Uh oh!

parthban-db Jan 26, 2026

Uh oh!

yuanjieding-db Feb 3, 2026

Uh oh!

parthban-db Jan 26, 2026

Uh oh!

yuanjieding-db Feb 2, 2026

Uh oh!

parthban-db Jan 26, 2026

Uh oh!

yuanjieding-db Feb 3, 2026

Uh oh!

parthban-db Jan 26, 2026

Uh oh!

yuanjieding-db Feb 3, 2026

Uh oh!

parthban-db Jan 26, 2026

Uh oh!

parthban-db Jan 26, 2026

Uh oh!

yuanjieding-db Feb 3, 2026

Uh oh!

github-actions bot commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		def test_max_attempts_none_preserves_backward_compatibility():
		"""Test that max_attempts=None only uses timeout (backward compatibility)."""

		assert call_count == attempts


		def test_max_attempts_respected():

	# Should have attempted 3 times (initial + 2 retries)
	# Should have attempted 3 times (initial + 2 retries).

update FilesExt retry logic #1211

Are you sure you want to change the base?

update FilesExt retry logic #1211

Uh oh!

Conversation

yuanjieding-db commented Jan 20, 2026

What changes are proposed in this pull request?

How is this tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants