PYTHON-5144 - Add async performance benchmarks#2188
PYTHON-5144 - Add async performance benchmarks#2188NoahStapp merged 8 commits intomongodb:masterfrom
Conversation
.evergreen/config.yml
Outdated
|
|
||
|
|
||
| # Platform notes | ||
| # Platform notes |
There was a problem hiding this comment.
I was going to ask if this was an intentional change but then I noticed all these platform notes are long out of date. Let's remove them?
test/performance/async_perf_test.py
Outdated
|
|
||
| async def do_task(self): | ||
| command = self.client.perftest.command | ||
| await asyncio.gather(*[command("hello", True) for _ in range(NUM_DOCS)]) |
There was a problem hiding this comment.
For this and other asyncio.gather calls can we replace with the concurrent() helper?
There was a problem hiding this comment.
In what way? Currently each test calls concurrent(self.n_tasks, self.do_task), which results in self.n_tasks calls of self.do_tasks, each of which has a gather call for NUM_DOCS coroutines.
There was a problem hiding this comment.
Oh I see the real problem. We're running NUM_DOCS concurrent tasks here even though this benchmark is supposed to be serial. We need to change this to:
async def do_task(self):
command = self.client.perftest.command
for _ in range(NUM_DOCS):
await command("hello", True)Only the concurrent tests should be using asyncio.gather.
There was a problem hiding this comment.
Ah, so we need to have separate serial benchmarks and concurrent ones using the same tests? So only the benchmarks with n_tasks > 1 should use gather?
test/performance/async_perf_test.py
Outdated
|
|
||
|
|
||
| # class TestRunCommand800Tasks(TestRunCommand): | ||
| # n_tasks = 800 |
There was a problem hiding this comment.
Any reason these are commented out?
There was a problem hiding this comment.
Whoops--these were benchmarks to measure extremely concurrent workloads, they take too long to run regularly imo.
|
|
||
| async def concurrent(n_tasks, func): | ||
| tasks = [func() for _ in range(n_tasks)] | ||
| await asyncio.gather(*tasks) |
There was a problem hiding this comment.
In the threaded version of the tests we run X threads each running a loop of operations so we're comparing different things here. Perhaps we should change the sync version to use a ThreadPoolExecutor and farm out the operations one by one. I'll create a ticket for it.
test/performance/async_perf_test.py
Outdated
|
|
||
| async def do_task(self): | ||
| command = self.client.perftest.command | ||
| await asyncio.gather(*[command("hello", True) for _ in range(NUM_DOCS)]) |
There was a problem hiding this comment.
Oh I see the real problem. We're running NUM_DOCS concurrent tasks here even though this benchmark is supposed to be serial. We need to change this to:
async def do_task(self):
command = self.client.perftest.command
for _ in range(NUM_DOCS):
await command("hello", True)Only the concurrent tests should be using asyncio.gather.
|
The test failure is also present in master, tracked in jira.mongodb.org/browse/PYTHON-5198. |
|
A run of the latest changes: https://spruce.mongodb.com/version/67d0840be58c830007b75115/tasks?sorts=STATUS%3AASC%3BBASE_STATUS%3ADESC. |
|
I did some wrangling and here's the comparison on the last patch (8.0 non-ssl):
For the "async/sync" column, >1 means async throughput is higher and <1 means sync is higher. What I did is use the Download CSV link on the EVG perf panel, import both files into Sheets, then sort by test name, delete all columns but "patch_result" and "test", then combine the sheets. Is there an easier way to compare the results? I'd like to see the same for the 6.0 ssl tasks. |
test/performance/async_perf_test.py
Outdated
|
|
||
| async def do_task(self): | ||
| command = self.client.perftest.command | ||
| await asyncio.gather(*[command("hello", True) for _ in range(NUM_DOCS)]) |
There was a problem hiding this comment.
We're still using gather() here and below accidentally. concurrent() should be the only place gather() is used.
There was a problem hiding this comment.
Using asyncio.gather() here spawns a new task for each command() call. Are you picturing a different model of concurrency than the one in use here?
There was a problem hiding this comment.
Right but we don't want to spawn concurrent tasks in any benchmark besides the XXTasks ones and in those we only want to spawn XX concurrent tasks, not NUM_DOCS concurrent tasks.
There was a problem hiding this comment.
But the XXTask benchmarks will then only be testing small concurrent workloads that contain lots of looping. Wouldn't we get a better measurement of concurrent performance with less looping and more concurrent tasks?
There was a problem hiding this comment.
Sure but that's what we want. Currently the benchmark says "FindOneByID80Tasks" but it's not running 80 tasks, it's running 80*NUM_DOCS tasks. That's a bug.
There was a problem hiding this comment.
Is that what we want? We want some benchmarks that run lots of concurrent tasks that more accurately represents a highly concurrent workload. If the issue here is just the naming, then I can add those benchmarks separately.
|
Writing a script to ingest the raw JSON from the EG perf panel and spit out this comparison should be pretty straightforward. Let me take a stab at it. |
|
Using https://gist.github.com/NoahStapp/16aa44b865d009261c564c841a0685ad, here are the results from a run of the The run itself: https://spruce.mongodb.com/version/67d0840be58c830007b75115/tasks?sorts=STATUS%3AASC%3BBASE_STATUS%3ADESC |
|
Updated 8.0 comparison: 6.0 with ssl: |
| class TestSmallDocInsertOne(SmallDocInsertTest, AsyncPyMongoTestCase): | ||
| async def do_task(self): | ||
| insert_one = self.corpus.insert_one | ||
| await asyncio.gather(*[insert_one(doc) for doc in self.documents]) |
There was a problem hiding this comment.
Still need to remove this asyncio.gather
| class TestLargeDocInsertOne(LargeDocInsertTest, AsyncPyMongoTestCase): | ||
| async def do_task(self): | ||
| insert_one = self.corpus.insert_one | ||
| await asyncio.gather(*[insert_one(doc) for doc in self.documents]) |
There was a problem hiding this comment.
Still need to remove this asyncio.gather
test/performance/async_perf_test.py
Outdated
| n_tasks = 80 | ||
|
|
||
|
|
||
| class TestRunCommandManyTasks(TestRunCommand): |
There was a problem hiding this comment.
Should we call these "UnlimitedTasks" rather than "Many"?
test/performance/async_perf_test.py
Outdated
|
|
||
|
|
||
| class TestFindManyAndEmptyCursorManyTasks(TestFindManyAndEmptyCursor): | ||
| n_tasks = 1000 |
There was a problem hiding this comment.
Should we remove the n_tasks=1000 benchmarks? We already have 80, do we need 1000?
There was a problem hiding this comment.
Agreed. This benchmark doesn't meaningfully scale with concurrency.
First draft.