[WIP] BP5 Put performance optimization#1756
[WIP] BP5 Put performance optimization#1756franzpoeschel wants to merge 2 commits intoopenPMD:devfrom
Conversation
|
I checked it. Works!! |
04982dd to
17c8621
Compare
| engine.Put(var, ptr); | ||
| auto do_defer = | ||
| ba.m_is_bp5 ? adios2::Mode::Sync : adios2::Mode::Deferred; | ||
| engine.Put(var, ptr, do_defer); |
There was a problem hiding this comment.
This incurs an overhead in EndStep() / PerformDataWrite(). Use Async in those cases.
|
@pnorbert @eisenhauer can you take a look at this? @franzpoeschel and I wonder if this is a performance bug that rather should be fixed in ADIOS2. |
IMHO, PerformPuts was always kind of an odd thing. Semantically it's the equivalent of "Remember when I did put deferred earlier? Forget about that, I really meant put sync so copy the data now so I can reuse those buffers". BP4 is always going to copy the data into internal buffers at some point, so it didn't necessarily matter much if it happened in Put, PerformPuts, or EndStep (which itself calls PerformPuts in BP4). On the other hand, BP5 at least has the ability to not copy the data at all. I.E. if you do Put deferred, we just keep that pointer and EndStep can write the data from application memory directly to disk without any copies. Note that I say "can". For smaller data blocks BP5 copies at the time of Put whether you say deferred or sync. Some aggregators may also end up copying the data into a contiguous block. There are various other reasons why zero copy I/O might not happen, but at least for BP5 put deferred and not doing PerformPuts (ever) gives you a chance of zero copy with BP5. However, if you need to reuse buffers before EndStep, you've got to force the copy sometime. I'm not sure it would make much of a difference if you do that in Put sync or PerformPuts. |
17c8621 to
ce15b5f
Compare
ce15b5f to
8a48b61
Compare
According to #1751, BP5 performance takes a hit by calling PerformPuts(). Instead, for BP5, always use Put(Sync), so that we can skip PerformPuts().
@guj Can you check if this really improves performance?