test: Enable test_snapshot_pruning_removes_outdated_records#1735
test: Enable test_snapshot_pruning_removes_outdated_records#1735
test_snapshot_pruning_removes_outdated_records#1735Conversation
… expected source of flaky behavior
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1735 +/- ##
==========================================
+ Coverage 92.41% 92.48% +0.06%
==========================================
Files 156 156
Lines 10628 10628
==========================================
+ Hits 9822 9829 +7
+ Misses 806 799 -7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
vdusek
left a comment
There was a problem hiding this comment.
FAILED tests/unit/_autoscaling/test_snapshotter.py::test_snapshot_pruning_removes_outdated_records - assert 3 == 2
- where 3 = len([CpuSnapshot(used_ratio=0.5, max_used_ratio=0.95, created_at=datetime.datetime(2026, 2, 11, 7, 39, 32, 789908, tzinfo=datetime.timezone.utc)), CpuSnapshot(used_ratio=0.5, max_used_ratio=0.95, created_at=datetime.datetime(2026, 2, 11, 9, 39, 32, 789908, tzinfo=datetime.timezone.utc)), CpuSnapshot(used_ratio=0.5, max_used_ratio=0.95, created_at=datetime.datetime(2026, 2, 11, 10, 39, 32, 789908, tzinfo=datetime.timezone.utc))])
============ 1 failed, 1615 passed, 8 skipped in 345.52s (0:05:45) =============
Error: Process completed with exit code 1.
d7783c5 to
63e8468
Compare
|
@Mantisus , @vdusek, please check the root cause in the issue description. The suggested fix should be ok, but I fear it is too implicit and relies on |
The root cause hypothesis seems incorrect. If the cause was an error processing in the 2-hour snapshot, then there would be 1 snapshot in the failed tests, not 3. It might be worth trying for event_data in events_data:
event_manager.emit(event=Event.SYSTEM_INFO, event_data=event_data)
await event_manager.wait_for_all_listeners_to_complete()Perhaps the reason is that |
Yes, the hypothesis in the issue description seems incorrect. I was referring to the root cause description in this PR |
I'd leave it up to you. If you believe the current approach is stable, I'm okay with it. Otherwise, let's implement the thread locks. |
Got it. Perhaps the race condition also affects this. I hypothesize that when |
Different insertion order is not a problem, as it should be handled by using |
The snapshot is added to Therefore, it is important which snapshot will be processed last. In the next test, 0 snapshots will be deleted. This is because the last snapshot processed is async def test_without_pruning(
snapshotter: Snapshotter, event_manager: LocalEventManager, default_memory_info: MemoryInfo
) -> None:
snapshotter._SNAPSHOT_HISTORY = timedelta(hours=2)
now = datetime.now(timezone.utc)
snapshots = [
CpuSnapshot(used_ratio=0.5, max_used_ratio=0.95, created_at=now - timedelta(hours=delta)) for delta in [5, 2, 0]
]
for snapshot in snapshots: # create `SortedSnapshotList` with 3 elems
snapshotter._cpu_snapshots.add(snapshot)
event_manager.emit(event=Event.SYSTEM_INFO, event_data=EventSystemInfoData(
cpu_info=CpuInfo(used_ratio=0.5, created_at=now - timedelta(hours=3)),
memory_info=default_memory_info,
))
await event_manager.wait_for_all_listeners_to_complete()
cpu_snapshots = cast('list[CpuSnapshot]', snapshotter.get_cpu_sample())
assert len(cpu_snapshots) == 4 # Not one was removed |
|
@Mantisus good point! To ensure that the |
| cpu_info=CpuInfo(used_ratio=0.5, created_at=now - timedelta(hours=delta)), | ||
| memory_info=default_memory_info, | ||
| ) | ||
| for delta in [3, 2, 5, 0] # Out of order timestamps. Snapshotter can not rely on natural ordering. |
There was a problem hiding this comment.
@Mantisus I meant that this order should not matter
Description:
test_snapshot_pruning_removes_outdated_recordsto remove the expected source of flaky behaviorevent_manager.on(event=Event.SYSTEM_INFO, listener=self._snapshot_cpu)addsself._snapshot_cpulistener, but event manager runs sync listeners throughasyncio.to_thread.self._snapshot_cpumodifies in-place instance list (for exampleself._cpu_snapshots) - it doesbisect.insortanddeloperations on the same list from several threads, which creates oportunity for a race condition.Issues:
Closes: #1734