-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Description
- Package Name: azure-data-tables
- Package Version: 12.7.0
- Operating System: Linux Mint 22.2 (First encountered through Azure Databricks)
- Python Version: 3.12.3
Describe the bug
Azure Table Storage returns entities when queried with a filter that requests timestamps strictly greater than the maximum timestamp previously returned by the service. In other words, a query of the form “Timestamp greater than the latest observed Timestamp” still yields rows whose timestamps are less than or equal to that value. This breaks expected ordering guarantees relied on by incremental and watermark-based workflows.
To Reproduce
Steps to reproduce the behavior:
- Insert four entities into an azure table.
- Find the min and max metadata timestamps (min_ts and max_ts) from the the list of entities.
- When you use query_entities with the filter, query_filter=f"Timestamp gt datetime'{max_ts}'", it still returns the last record, where an empty list is expected.
- Similarly, if queried with min_ts, instead of returning only the last 3 records, it returns all of them.
Please see below screenshots for clear examples.
Expected behavior
When an explicit filter is passed with Timestamp gt datetime'{max_ts}', no records should be returned.
In short, the gt operator works more like a ge operator.
Screenshots
Additional context
This behavior is observed against Azure Table Storage (standard storage account), not Cosmos DB Table. The issue was first observed in an Azure Databricks environment and was later reproduced consistently on a local machine to rule out environment-specific factors.