fix: honoring read_timeout at the cosmos client level #44472

dibahlfi · 2025-12-18T17:15:51Z

The fix enables client-level read timeout configuration to automatically applies to all queries unless explicitly specified at the request level.

dibahlfi · 2025-12-18T17:17:45Z

/azp run python - cosmos - tests

azure-pipelines · 2025-12-18T17:18:03Z

Azure Pipelines successfully started running 1 pipeline(s).

Copilot

Pull request overview

This PR implements client-level read timeout configuration for Azure Cosmos DB operations. The fix ensures that when a read timeout is specified at the client level, it automatically applies to all queries and operations unless explicitly overridden at the request level.

Key Changes

Added read timeout handling in CosmosClient initialization to propagate client-level timeouts to the connection policy
Extended container property getters to pass through read_timeout options from request level
Added comprehensive test coverage for client-level, request-level, and policy-level timeout behaviors

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
sdk/cosmos/azure-cosmos/azure/cosmos/cosmos_client.py	Added logic to extract and apply client-level read_timeout to ConnectionPolicy
sdk/cosmos/azure-cosmos/azure/cosmos/aio/_cosmos_client.py	Added async version of client-level read_timeout handling
sdk/cosmos/azure-cosmos/azure/cosmos/container.py	Extended _get_properties_with_options to propagate read_timeout from options
sdk/cosmos/azure-cosmos/azure/cosmos/aio/_container.py	Extended async _get_properties_with_options to propagate read_timeout
sdk/cosmos/azure-cosmos/tests/test_crud.py	Added three test methods covering request-level override, client-level timeout, and policy-level timeout scenarios
sdk/cosmos/azure-cosmos/tests/test_crud_async.py	Added async versions of the three timeout test methods

Copilot · 2025-12-18T17:27:51Z

sdk/cosmos/azure-cosmos/tests/test_crud_async.py

+            print(f"test_crud_async got the client")
+            database = timeout_client.get_database_client(self.database_for_test.id)
+            container = database.get_container_client(normal_container.id)
+            print(f"test_crud_async about to execute read operation")


Debug print statement should be removed before merging. This statement appears to be leftover debugging code.

Suggested change

print(f"test_crud_async got the client")

database = timeout_client.get_database_client(self.database_for_test.id)

container = database.get_container_client(normal_container.id)

print(f"test_crud_async about to execute read operation")

database = timeout_client.get_database_client(self.database_for_test.id)

container = database.get_container_client(normal_container.id)

Copilot · 2025-12-18T17:27:52Z

sdk/cosmos/azure-cosmos/tests/test_crud.py

+        timeout_client = cosmos_client.CosmosClient(
+            url=self.host,
+            credential=self.masterKey,
+            read_timeout=0.001  # Very short timeout to force failure
+        )
+        database = timeout_client.get_database_client(self.databaseForTest.id)
+        container = database.get_container_client(normal_container.id)
+
+        # Test 1: Point read operation should time out
+        with self.assertRaises((exceptions.CosmosClientTimeoutError, ServiceResponseError)):
+            container.read_item(
+                item='test_item_0',
+                partition_key='partition0'
+            )
+
+        # Test 2: Query operation should time out
+        with self.assertRaises((exceptions.CosmosClientTimeoutError, ServiceResponseError)):
+            list(container.query_items(
+                query="SELECT * FROM c WHERE c.pk = @pk",
+                parameters=[{"name": "@pk", "value": "partition0"}]
+            ))


The timeout_client is not being properly closed, and the container is not being cleaned up. This could lead to resource leaks. Consider wrapping the client in a try-finally block to ensure proper cleanup, and add container deletion at the end.

Copilot · 2025-12-18T17:27:52Z

sdk/cosmos/azure-cosmos/tests/test_crud.py

+
+        # Create client with very short read timeout
+        connection_policy = documents.ConnectionPolicy()
+        connection_policy.read_timeout = 0.001


The attribute name 'read_timeout' should be 'ReadTimeout' to match the naming convention used in line 1246 of test_crud_async.py and align with the ConnectionPolicy's property naming convention (ReadTimeout with capital R and T).

Suggested change

connection_policy.read_timeout = 0.001

connection_policy.ReadTimeout = 0.001

Copilot · 2025-12-18T17:27:52Z

sdk/cosmos/azure-cosmos/tests/test_crud.py

+        print(f"test_crud got the client")
+        database = timeout_client.get_database_client(self.databaseForTest.id)
+        container = database.get_container_client(normal_container.id)
+        print(f"test_crud about to execute read operation")


Debug print statement should be removed before merging. This statement appears to be leftover debugging code.

Suggested change

print(f"test_crud got the client")

database = timeout_client.get_database_client(self.databaseForTest.id)

container = database.get_container_client(normal_container.id)

print(f"test_crud about to execute read operation")

database = timeout_client.get_database_client(self.databaseForTest.id)

container = database.get_container_client(normal_container.id)

Copilot · 2025-12-18T17:27:52Z

sdk/cosmos/azure-cosmos/tests/test_crud.py

+        print(f"test_crud got the client")
+        database = timeout_client.get_database_client(self.databaseForTest.id)
+        container = database.get_container_client(normal_container.id)
+        print(f"test_crud about to execute read operation")


Debug print statement should be removed before merging. This statement appears to be leftover debugging code.

Suggested change

print(f"test_crud got the client")

database = timeout_client.get_database_client(self.databaseForTest.id)

container = database.get_container_client(normal_container.id)

print(f"test_crud about to execute read operation")

database = timeout_client.get_database_client(self.databaseForTest.id)

container = database.get_container_client(normal_container.id)

Copilot · 2025-12-18T17:27:53Z

sdk/cosmos/azure-cosmos/tests/test_crud_async.py

+            print(f"test_crud_async got the client")
+            database = timeout_client.get_database_client(self.database_for_test.id)
+            container = database.get_container_client(normal_container.id)
+            print(f"test_crud_async about to execute read operation")


Debug print statement should be removed before merging. This statement appears to be leftover debugging code.

Suggested change

print(f"test_crud_async got the client")

database = timeout_client.get_database_client(self.database_for_test.id)

container = database.get_container_client(normal_container.id)

print(f"test_crud_async about to execute read operation")

database = timeout_client.get_database_client(self.database_for_test.id)

container = database.get_container_client(normal_container.id)

Copilot · 2025-12-18T17:27:53Z

sdk/cosmos/azure-cosmos/azure/cosmos/aio/_cosmos_client.py

+    # Check if read_timeout is explicitly provided in kwargs (client-level)
+    if 'read_timeout' in kwargs:
+        policy.ReadTimeout = kwargs.pop('read_timeout')
+        # Otherwise, check if policy has the new read_timeout property


Comment is misplaced. This comment belongs before the elif statement on line 90, not after the if block on line 88. It should be moved to maintain code readability.

Suggested change

# Otherwise, check if policy has the new read_timeout property

# Otherwise, check if policy has the new read_timeout property

Copilot · 2025-12-18T17:27:53Z

sdk/cosmos/azure-cosmos/tests/test_crud.py

+        timeout_client = cosmos_client.CosmosClient(
+            url=self.host,
+            credential=self.masterKey,
+            read_timeout=0.001  # Very short timeout that would normally fail
+        )
+        print(f"test_crud got the client")
+        database = timeout_client.get_database_client(self.databaseForTest.id)
+        container = database.get_container_client(normal_container.id)
+        print(f"test_crud about to execute read operation")
+        # Test 1: Point read with request-level timeout should succeed (overrides client timeout)
+        result = container.read_item(
+            item='test_item_0',
+            partition_key='partition0',
+            read_timeout=30  # Higher timeout at request level
+        )
+        self.assertEqual(result['id'], 'test_item_0')
+
+        # Test 2: Query with request-level timeout should succeed (overrides client timeout)
+        results = list(container.query_items(
+            query="SELECT * FROM c WHERE c.pk = @pk",
+            parameters=[{"name": "@pk", "value": "partition1"}],
+            read_timeout=30  # Higher timeout at request level
+        ))
+        self.assertEqual(len(results), 1)
+        self.assertEqual(results[0]['id'], 'test_item_1')
+
+        # Test 3: Upsert (write) with request-level timeout should succeed
+        upsert_item = {
+            'id': 'test_item_0',
+            'pk': 'partition0',
+            'data': 'updated_data'
+        }
+        result = container.upsert_item(
+            body=upsert_item,
+            read_timeout=30  # Higher timeout at request level
+        )
+        self.assertEqual(result['data'], 'updated_data')
+
+        # Test 4: Create (write) with request-level timeout should succeed
+        new_item = {
+            'id': 'new_test_item',
+            'pk': 'new_partition',
+            'data': 'new_data'
+        }
+        result = container.create_item(
+            body=new_item,
+            read_timeout=30  # Higher timeout at request level
+        )
+        self.assertEqual(result['id'], 'new_test_item')


The timeout_client is not being properly closed. This could lead to resource leaks. Consider wrapping it in a try-finally block or using a context manager to ensure proper cleanup, similar to the async version which uses 'async with'.

Suggested change

timeout_client = cosmos_client.CosmosClient(

url=self.host,

credential=self.masterKey,

read_timeout=0.001 # Very short timeout that would normally fail

)

print(f"test_crud got the client")

database = timeout_client.get_database_client(self.databaseForTest.id)

container = database.get_container_client(normal_container.id)

print(f"test_crud about to execute read operation")

# Test 1: Point read with request-level timeout should succeed (overrides client timeout)

result = container.read_item(

item='test_item_0',

partition_key='partition0',

read_timeout=30 # Higher timeout at request level

)

self.assertEqual(result['id'], 'test_item_0')

# Test 2: Query with request-level timeout should succeed (overrides client timeout)

results = list(container.query_items(

query="SELECT * FROM c WHERE c.pk = @pk",

parameters=[{"name": "@pk", "value": "partition1"}],

read_timeout=30 # Higher timeout at request level

))

self.assertEqual(len(results), 1)

self.assertEqual(results[0]['id'], 'test_item_1')

# Test 3: Upsert (write) with request-level timeout should succeed

upsert_item = {

'id': 'test_item_0',

'pk': 'partition0',

'data': 'updated_data'

}

result = container.upsert_item(

body=upsert_item,

read_timeout=30 # Higher timeout at request level

)

self.assertEqual(result['data'], 'updated_data')

# Test 4: Create (write) with request-level timeout should succeed

new_item = {

'id': 'new_test_item',

'pk': 'new_partition',

'data': 'new_data'

}

result = container.create_item(

body=new_item,

read_timeout=30 # Higher timeout at request level

)

self.assertEqual(result['id'], 'new_test_item')

with cosmos_client.CosmosClient(

url=self.host,

credential=self.masterKey,

read_timeout=0.001 # Very short timeout that would normally fail

) as timeout_client:

print(f"test_crud got the client")

database = timeout_client.get_database_client(self.databaseForTest.id)

container = database.get_container_client(normal_container.id)

print(f"test_crud about to execute read operation")

# Test 1: Point read with request-level timeout should succeed (overrides client timeout)

result = container.read_item(

item='test_item_0',

partition_key='partition0',

read_timeout=30 # Higher timeout at request level

)

self.assertEqual(result['id'], 'test_item_0')

# Test 2: Query with request-level timeout should succeed (overrides client timeout)

results = list(container.query_items(

query="SELECT * FROM c WHERE c.pk = @pk",

parameters=[{"name": "@pk", "value": "partition1"}],

read_timeout=30 # Higher timeout at request level

))

self.assertEqual(len(results), 1)

self.assertEqual(results[0]['id'], 'test_item_1')

# Test 3: Upsert (write) with request-level timeout should succeed

upsert_item = {

'id': 'test_item_0',

'pk': 'partition0',

'data': 'updated_data'

}

result = container.upsert_item(

body=upsert_item,

read_timeout=30 # Higher timeout at request level

)

self.assertEqual(result['data'], 'updated_data')

# Test 4: Create (write) with request-level timeout should succeed

new_item = {

'id': 'new_test_item',

'pk': 'new_partition',

'data': 'new_data'

}

result = container.create_item(

body=new_item,

read_timeout=30 # Higher timeout at request level

)

self.assertEqual(result['id'], 'new_test_item')

Copilot · 2025-12-18T17:27:53Z

sdk/cosmos/azure-cosmos/tests/test_crud.py

+            container.read_item(
+                item='test_item_0',
+                partition_key='partition0'
+            )
+        except Exception as e:
+            print(f"Exception is {e}")
+
+        # Test 1: Point read operation should time out
+        with self.assertRaises((exceptions.CosmosClientTimeoutError, ServiceResponseError)):
+            container.read_item(
+                item='test_item_0',
+                partition_key='partition0'
+            )
+
+        # Test 2: Query operation should time out
+        with self.assertRaises((exceptions.CosmosClientTimeoutError, ServiceResponseError)):
+            list(container.query_items(
+                query="SELECT * FROM c WHERE c.pk = @pk",
+                parameters=[{"name": "@pk", "value": "partition0"}]
+            ))


The timeout_client is not being properly closed, and the container is not being cleaned up. This could lead to resource leaks. Consider wrapping the client in a try-finally block to ensure proper cleanup, and add container deletion at the end.

Suggested change

container.read_item(

item='test_item_0',

partition_key='partition0'

)

except Exception as e:

print(f"Exception is {e}")

# Test 1: Point read operation should time out

with self.assertRaises((exceptions.CosmosClientTimeoutError, ServiceResponseError)):

container.read_item(

item='test_item_0',

partition_key='partition0'

)

# Test 2: Query operation should time out

with self.assertRaises((exceptions.CosmosClientTimeoutError, ServiceResponseError)):

list(container.query_items(

query="SELECT * FROM c WHERE c.pk = @pk",

parameters=[{"name": "@pk", "value": "partition0"}]

))

try:

container.read_item(

item='test_item_0',

partition_key='partition0'

)

except Exception as e:

print(f"Exception is {e}")

# Test 1: Point read operation should time out

with self.assertRaises((exceptions.CosmosClientTimeoutError, ServiceResponseError)):

container.read_item(

item='test_item_0',

partition_key='partition0'

)

# Test 2: Query operation should time out

with self.assertRaises((exceptions.CosmosClientTimeoutError, ServiceResponseError)):

list(container.query_items(

query="SELECT * FROM c WHERE c.pk = @pk",

parameters=[{"name": "@pk", "value": "partition0"}]

))

finally:

# Ensure resources are cleaned up

try:

self.databaseForTest.delete_container(normal_container)

except Exception:

# Best-effort cleanup; ignore failures to avoid masking test errors

pass

try:

timeout_client.close()

except Exception:

pass

Copilot · 2025-12-18T17:27:54Z

sdk/cosmos/azure-cosmos/tests/test_crud.py

+        try:
+            container.read_item(
+                item='test_item_0',
+                partition_key='partition0'
+            )
+        except Exception as e:
+            print(f"Exception is {e}")
+


This try-except block that catches and prints the exception serves no testing purpose and should be removed. The actual test assertions are on lines 1644-1656, making this block redundant.

Suggested change

try:

container.read_item(

item='test_item_0',

partition_key='partition0'

)

except Exception as e:

print(f"Exception is {e}")

tvaron3 · 2025-12-18T17:25:10Z

sdk/cosmos/azure-cosmos/tests/test_crud_async.py

+            credential=self.masterKey,
+            connection_policy=policy
+        )
+        await timeout_client.__aenter__()


Shouldn't it have timed out at this point ? This causes get database account calls.

tvaron3 · 2025-12-18T17:26:28Z

sdk/cosmos/azure-cosmos/azure/cosmos/aio/_cosmos_client.py

+
+    # Check if read_timeout is explicitly provided in kwargs (client-level)
+    if 'read_timeout' in kwargs:
+        policy.ReadTimeout = kwargs.pop('read_timeout')


should it be popped? WOuldn't this prevent any calls during the initialization to not have the timeouts? Same for the existing ones above

Shouldn't we just use timeout values in ConnectionPolicy instead of one in kwargs? It looks like we are passing ConnectionPolicy in later workflows.

tvaron3 · 2025-12-18T17:29:28Z

sdk/cosmos/azure-cosmos/azure/cosmos/aio/_cosmos_client.py

+    if 'read_timeout' in kwargs:
+        policy.ReadTimeout = kwargs.pop('read_timeout')
+        # Otherwise, check if policy has the new read_timeout property
+    elif hasattr(policy, 'read_timeout') and policy.read_timeout is not None:


Sorry, I don't understand what scenario this is targeting. Is this if a customer adds a field to connection policy called read_timeout then we would grab it from there ?

I agreed with Tomas, if read_timeout was set in policy by mistake, it is an user error and we shouldn't use it.

Also, what is proper behavior when users set ReadTimeout in policy and also provide read_timeout in kwargs? WIth current change, we overwrite one in policy to one in kwargs. Have you confirm if this was expected behavior?

allenkim0129 · 2025-12-18T19:28:27Z

sdk/cosmos/azure-cosmos/azure/cosmos/aio/_cosmos_client.py

+    if 'read_timeout' in kwargs:
+        policy.ReadTimeout = kwargs.pop('read_timeout')
+        # Otherwise, check if policy has the new read_timeout property
+    elif hasattr(policy, 'read_timeout') and policy.read_timeout is not None:


I agreed with Tomas, if read_timeout was set in policy by mistake, it is an user error and we shouldn't use it.

Also, what is proper behavior when users set ReadTimeout in policy and also provide read_timeout in kwargs? WIth current change, we overwrite one in policy to one in kwargs. Have you confirm if this was expected behavior?

allenkim0129 · 2025-12-18T19:32:16Z

sdk/cosmos/azure-cosmos/azure/cosmos/aio/_cosmos_client.py

+
+    # Check if read_timeout is explicitly provided in kwargs (client-level)
+    if 'read_timeout' in kwargs:
+        policy.ReadTimeout = kwargs.pop('read_timeout')


Shouldn't we just use timeout values in ConnectionPolicy instead of one in kwargs? It looks like we are passing ConnectionPolicy in later workflows.

allenkim0129 · 2025-12-18T19:38:40Z

sdk/cosmos/azure-cosmos/tests/test_crud.py

+        database = timeout_client.get_database_client(self.databaseForTest.id)
+        container = database.get_container_client(normal_container.id)
+        print(f"test_crud about to execute read operation")
+        # Test 1: Point read with request-level timeout should succeed (overrides client timeout)


Shouldn't we also add a case when requests failed with the short timeout? That would show all other requests were successful because of request level timeouts.

allenkim0129 · 2025-12-18T19:44:36Z

sdk/cosmos/azure-cosmos/azure/cosmos/aio/_container.py

@@ -107,6 +107,8 @@ async def _get_properties_with_options(self, options: Optional[dict[str, Any]] =
                kwargs[Constants.OperationStartTime] = options[Constants.OperationStartTime]
            if "timeout" in options:
                kwargs['timeout'] = options['timeout']
+            if "read_timeout" in options:
+                kwargs['read_timeout'] = options['read_timeout']


Please use constant like Constants.OperationStartTime instead of hard-coded string here.

Ideally, all other options should be replaced with constants, we can do that in separate PR.

allenkim0129 · 2025-12-18T19:57:31Z

@dibahlfi Another comment not directly related to this PR. Shouldn't read_timeout and timeout keys in _COMMON_OPTIONS in _base.py ? It looks like they were always set to options in build_options method.

Also, please make both read_timeout and timeout to be constants, so we can avoid using hard-coded strings.

tvaron3 · 2025-12-19T15:09:12Z

sdk/cosmos/azure-cosmos/azure/cosmos/aio/_container.py

@@ -107,6 +107,8 @@ async def _get_properties_with_options(self, options: Optional[dict[str, Any]] =
                kwargs[Constants.OperationStartTime] = options[Constants.OperationStartTime]
            if "timeout" in options:
                kwargs['timeout'] = options['timeout']
+            if "read_timeout" in options:


We should add a changelog entry.

fix: honoring read_timeout at the cosmos client level

a8c0527

dibahlfi requested a review from a team as a code owner December 18, 2025 17:15

Copilot AI review requested due to automatic review settings December 18, 2025 17:15

github-actions bot added the Cosmos label Dec 18, 2025

github-project-automation bot added this to CosmosDB Python Eco-System Dec 18, 2025

Copilot started reviewing on behalf of dibahlfi December 18, 2025 17:16 View session

Copilot AI reviewed Dec 18, 2025

View reviewed changes

tvaron3 reviewed Dec 18, 2025

View reviewed changes

allenkim0129 reviewed Dec 18, 2025

View reviewed changes

tvaron3 reviewed Dec 19, 2025

View reviewed changes

	connection_policy.read_timeout = 0.001
	connection_policy.ReadTimeout = 0.001

	# Otherwise, check if policy has the new read_timeout property
	# Otherwise, check if policy has the new read_timeout property

fix: honoring read_timeout at the cosmos client level #44472

Are you sure you want to change the base?

fix: honoring read_timeout at the cosmos client level #44472

Uh oh!

Conversation

dibahlfi commented Dec 18, 2025

Uh oh!

dibahlfi commented Dec 18, 2025

Uh oh!

azure-pipelines bot commented Dec 18, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

allenkim0129 commented Dec 18, 2025

Uh oh!

tvaron3 Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tvaron3 Dec 19, 2025 •

edited

Loading