Conversation
|
Yep, I see the hangup, too. Will have a look. |
|
Hey Junmin, the issue in here is that the Span-based storeChunk API is collective, which was not properly documented so far. Fixing that with #1794. For your code, this means that all ranks must call storeChunk() when using the Span API. The following patch should do that: diff --git a/examples/16_btd_write_parallel.cpp b/examples/16_btd_write_parallel.cpp
index a7b420ba2..65a602d94 100644
--- a/examples/16_btd_write_parallel.cpp
+++ b/examples/16_btd_write_parallel.cpp
@@ -204,6 +204,10 @@ void doWork(
input.get(), input.get() + numElements, spanBuffer.data());
}
}
+ if (m_span)
+ {
+ mymesh.storeChunk<double>({0, 0, 0}, {0, 0, 0}).currentBuffer();
+ }
}
int main(int argc, char *argv[]) |
Oh, I did not know that. |
Okay, I take it back, the Span API can be used non-collectively, but there is currently a bug in |
I just checked, and it works. will you merge your fix? |
|
I have merged it now, you can rebase this branch |
for more information, see https://pre-commit.ci
|
@ax3l @franzpoeschel can we merge this one? |
|
@franzpoeschel I adjusted the examples, and looks like it is worth to tryout the following commands: It will finish successfully (h5 or bp) however, if you try "h5ls -r" for each of the h5 files, not all of them show datasets. And those files are not readable by h5dump. mpirun -n 4 ./bin/16_btd_write_parallel -s -v -d 0 To reduce run time, you can change line 65 to use one field "B" instead of three fields "B" "j" "E". |
| std::string options = ""; | ||
| if (m_adiosFlattenSteps) | ||
| options = R"(adios2.engine.parameters.FlattenSteps = "on")"; | ||
| std::unique_ptr<Series> series = std::make_unique<openPMD::Series>( |
There was a problem hiding this comment.
You could store the series just as a variable and pass it around by reference, but ok so as well if you try to emulate some pattern we use in codes?
There was a problem hiding this comment.
As it gets created and destroyed in main(), this way is cleaner to me
| auto meshes = series->iterations[w.whichSnapshot].meshes; | ||
|
|
||
| // is this the trouble maker? | ||
| series->iterations[w.whichSnapshot].open(); |
There was a problem hiding this comment.
@franzpoeschel should we use
| series->iterations[w.whichSnapshot].open(); | |
| series->snapshots()[w.whichSnapshot].open(); |
now?
There was a problem hiding this comment.
for writing, more straightforward to use iterations[]?
|
@franzpoeschel can you please take a look at the flush call issues described by @guj? |
| << comp_name << std::endl; | ||
| } | ||
|
|
||
| auto numElements = size_t(m_blockX) * m_blockY * m_blockZ; |
There was a problem hiding this comment.
@franzpoeschel should we add a helper function to calculate he number of elements from chunk_offset and chunk_extent (as used in storeChunk below)?
Something like
std::size_t numElements = openPMD::count_chunk_elements(chunk_offset, chunk_extent);There was a problem hiding this comment.
This probably depends on chunk_extent only? But yeah, I have manually written that often enough, a helper would be nice.
There was a problem hiding this comment.
Hm, not sure what I meant here, this is independent of the offset...
just verified that this bug is gone |
Also I retried this one.
|
The branch is up to date, and you can reproduce h5 issue with: " mpirun -n 4 ./bin/16_btd_write_parallel -v -s" which invokes span, and files 0 2 are good but files 1 & 3 are not. " mpirun -n 3 ./bin/16_btd_write_parallel -v -s" will be ok, as well as " mpirun -n 4 ./bin/16_btd_write_parallel -v -s -t 3". In the later two cases, rank 0 processed the first buffer of all files. Not sure why that makes difference in span call of h5 files. If needed, I am happy to enforce this test to use span only with bp. what do you think? |
This is a simple version of BTD. While writing this, it looks like something unexpected is observed.
to reproduce:
@franzpoeschel if you have time, please verify.