PYTHON-5053 - AsyncMongoClient.close() should await all background tasks by NoahStapp · Pull Request #2127 · mongodb/mongo-python-driver

NoahStapp · 2025-02-03T16:13:53Z

No description provided.

ShaneHarvey · 2025-02-03T17:54:33Z

I scheduled the windows standalone jobs to see if this fixes the "OSError: [WinError 10038] An operation was attempted on something that is not a socket" errors.

ShaneHarvey · 2025-02-03T17:56:46Z

pymongo/asynchronous/monitor.py

+            await self._executor.join()
        await self._rtt_monitor.close()
        # Increment the generation and maybe close the socket. If the executor
        # thread has the socket checked out, it will be closed when checked in.


I think moving the join to happen after _reset_connection will result in faster close() in some cases.

ShaneHarvey · 2025-02-03T17:59:48Z

pymongo/asynchronous/monitor.py

        """
        self.gc_safe_close()
+        if not _IS_SYNC:
+            await self._executor.join()


Currently, it is not safe to call join() in close(). The problem is there are cases (at least one) where the Monitor task itself calls close(). That would attempt to join() itself which will hang forever (or maybe python detects that case and raises an error, either way it's a problem).

So we either need to remove self calls to close() or move the join() logic to another method.

Good catch. It makes more sense to me to have a separate Monitor.join() method that we call in the async API whenever we call Monitor.close() from a non-monitor task.

ShaneHarvey · 2025-02-03T21:17:22Z

pymongo/asynchronous/monitor.py

        self._rtt_monitor.gc_safe_close()
        self.cancel_check()

+    async def join(self, timeout: Optional[int] = None) -> None:


Do we ever pass a timeout here?

Not currently, but we likely should once we're productionizing the async API for release. Currently there's too much variability between platforms and test suites to pick a good timeout number.

Let's remove the timeout parameter since it's not used.

ShaneHarvey · 2025-02-03T21:19:38Z

pymongo/asynchronous/topology.py

        ):
            await self._srv_monitor.close()
+            if not _IS_SYNC:
+                await self._srv_monitor.join()


I think we should have a separate code path for join() all the way down. That way we first signal everything to shutdown, then we wait for everything to exit. Joining each task individually will slowdown the close().

I meant to put this comment on Topology.close().

I think this can deadlock. It's not safe to call join() while holding the topology lock because the task we're attempting to join() may be blocking on acquiring the same lock.

You're suggesting that we make executor.close() not call task.cancel() and we do it inside the join() instead?

No I'm suggesting two changes.

We call close like we do currently. Then after everything is closed, we call join on everything.

we never call join() while holding a lock.

tasks = [] with self.lock: for s in servers: await s.close() tasks.append(s) ... # Only join after releasing the lock await asyncio.gather(t.join() for t in tasks)

The current approach is slow because of 1) and risky because of 2). The slowness is because joining each task inline after close() essentially serializes the shutdown process. So if you have 50 servers and it takes 10ms to join the task it will take 500ms altogether. With my suggestion it will only take 10ms total since all the tasks can exit concurrently.

Ah, something like this inside Topology?

async def join(self): # Join all monitors ...

And then we call topology.join() inside AsyncMongoClient.close()?

task.cancel() might prevent the deadlock scenario in the async path.

ShaneHarvey · 2025-02-03T21:39:14Z

Also we should make sure that when a server is removed from the Topology for other reasons (besides close) that we also join() it properly. For example when a server is no longer found in the replica set config.

ShaneHarvey · 2025-02-04T19:47:17Z

pymongo/asynchronous/mongo_client.py

        self._closed = True
+        if not _IS_SYNC:
+            self._topology._monitor_tasks.append(self._kill_cursors_executor)  # type: ignore[arg-type]
+            join_tasks = [t.join() for t in self._topology._monitor_tasks]  # type: ignore[func-returns-value]


Reading _monitor_tasks like this is not thread safe. Anytime we iterate it we need to guard against the list being mutated from another thread, something like:

tasks = [] try: while self._topology._monitor_tasks: tasks.append(self._topology._monitor_tasks.pop()) except IndexError: pass

Are we supporting multithreaded async workloads? My understanding was that we are explicitly not supporting such use cases and assume that all AsyncMongoClient operations will take place on a single thread.

Or is this a futureproofing suggestion for when we do the same joining process for synchronous tasks?

Made this change in the interest of covering our bases and reducing future changes for the sync API.

Oh sorry, I let my sync brain bleed into the async code. Yeah async is single threaded so it's safe to iterate the list as along as there no yield points.

Any objection to doing your suggested change anyway for the reasons I stated above?

ShaneHarvey · 2025-02-04T19:49:00Z

pymongo/synchronous/topology.py

                # Close servers and clear the pools.
                for server in self._servers.values():
                    server.close()
+                    self._monitor_tasks.append(server._monitor)


I believe we only want to record these tasks on async. Otherwise we'll have an unbounded list of threads in the sync version.

Good catch, missed this one.

ShaneHarvey · 2025-02-04T19:50:03Z

pymongo/asynchronous/mongo_client.py

            await self._encrypter.close()
        self._closed = True
+        if not _IS_SYNC:
+            self._topology._monitor_tasks.append(self._kill_cursors_executor)  # type: ignore[arg-type]


Let's avoid appending to the topology's private state here.

ShaneHarvey · 2025-02-04T19:50:32Z

pymongo/asynchronous/monitor.py


+    async def join(self) -> None:
+        await self._executor.join()
+        await self._rtt_monitor.join()


This should use gather too right?

ShaneHarvey · 2025-02-04T21:03:14Z

pymongo/asynchronous/topology.py

                cast(Server, self.get_server_by_address(sd.address)) for sd in server_descriptions
            ]

+        if not _IS_SYNC and self._monitor_tasks:


Worth putting a comment here to explain why this code is here. Also this should happen before selecting the server. Doing it after will increase the risk of returning stale information.

The risk being the delay added by the cleanup between selecting the server and actually returning it? Makes sense.

Yep that's it.

ShaneHarvey · 2025-02-04T21:07:23Z

pymongo/asynchronous/topology.py

+            except IndexError:
+                pass
+            join_tasks = [t.join() for t in join_tasks]  # type: ignore[func-returns-value]
+            await asyncio.gather(*join_tasks)


Could you refactor this into a method that MongoClient.Close can call too?

ShaneHarvey · 2025-02-04T22:02:07Z

pymongo/asynchronous/mongo_client.py

            await self._encrypter.close()
        self._closed = True
+        if not _IS_SYNC:
+            await asyncio.gather(


I believe we should be using return_exceptions=True on all these gather() calls:

If return_exceptions is True, exceptions are treated the same as successful results, and aggregated in the result list.

https://docs.python.org/3/library/asyncio-task.html#asyncio.gather

Otherwise we may accidentally propagate an exception we don't care about.

At this point should we be using asyncio.wait() instead of gather()?

Good point. I think we can safely use asyncio.wait() instead here, since we don't need the aggregated result returned by gather.

Oh wait, asyncio.wait() explicitly does not support waiting for coroutines, only Task or Future objects. Sticking with gather seems a little simpler, even with return_exceptions=True.

ShaneHarvey · 2025-02-05T18:41:37Z

pymongo/asynchronous/mongo_client.py

        self._closed = True
+        if not _IS_SYNC:
+            await asyncio.gather(
+                *[self._topology.cleanup_monitors(), self._kill_cursors_executor.join()],  # type: ignore[func-returns-value]


One last nit: could you remove the *[]?

ShaneHarvey · 2025-02-05T18:41:41Z

pymongo/asynchronous/monitor.py


+    async def join(self) -> None:
+        await asyncio.gather(
+            *[self._executor.join(), self._rtt_monitor.join()], return_exceptions=True


One last nit: could you remove the *[]?

ShaneHarvey

LGTM!

PYTHON-5053 - AsyncMongoClient.close() should await all background tasks

68c4a6e

NoahStapp requested a review from ShaneHarvey February 3, 2025 16:13

ShaneHarvey requested changes Feb 3, 2025

View reviewed changes

Don't call join() inside close()

d14d8e8

NoahStapp requested a review from ShaneHarvey February 3, 2025 21:15

ShaneHarvey reviewed Feb 3, 2025

View reviewed changes

ShaneHarvey requested changes Feb 3, 2025

View reviewed changes

NoahStapp added 2 commits February 4, 2025 11:51

Store tasks to be awaited inside Topology

6c6a32d

Merge branch 'master' into PYTHON-5053

a0a85c5

NoahStapp requested a review from ShaneHarvey February 4, 2025 18:19

ShaneHarvey requested changes Feb 4, 2025

View reviewed changes

Address review

861dbb5

NoahStapp requested a review from ShaneHarvey February 4, 2025 20:26

ShaneHarvey requested changes Feb 4, 2025

View reviewed changes

Address review

2ac4ea3

NoahStapp requested a review from ShaneHarvey February 4, 2025 21:33

ShaneHarvey requested changes Feb 4, 2025

View reviewed changes

return_exceptions=True for gather calls

24e96f0

NoahStapp requested a review from ShaneHarvey February 5, 2025 14:13

ShaneHarvey reviewed Feb 5, 2025

View reviewed changes

Cleanup gathers

a2eb4bf

NoahStapp requested a review from ShaneHarvey February 5, 2025 19:45

ShaneHarvey approved these changes Feb 5, 2025

View reviewed changes

NoahStapp merged commit 1b81847 into mongodb:master Feb 5, 2025
34 of 36 checks passed

Conversation

NoahStapp commented Feb 3, 2025

Uh oh!

ShaneHarvey commented Feb 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ShaneHarvey Feb 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ShaneHarvey Feb 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ShaneHarvey commented Feb 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NoahStapp Feb 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ShaneHarvey Feb 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NoahStapp Feb 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

ShaneHarvey Feb 3, 2025 •

edited

Loading

ShaneHarvey Feb 3, 2025 •

edited

Loading

NoahStapp Feb 4, 2025 •

edited

Loading

ShaneHarvey Feb 4, 2025 •

edited

Loading

NoahStapp Feb 4, 2025 •

edited

Loading