Offline generation of routehandles to have bfb with online generation #615
Replies: 5 comments 1 reply
-
|
Thank you for your detailed report. I am going to transfer this to a discussion, which seems more appropriate here. I'll add some comments there. |
Beta Was this translation helpful? Give feedback.
-
|
I hoped to be able to give you some help here, but reading back through your description, it looks like you have already dug as far – or farther – than I can get quickly with what I know about the relevant parts of CMEPS. So I'll reach out to others to see if they can help. I looked at the ESMF release notes to see if there may have been any issues fixed recently that would help with this. I do see that there was a fix for second-order conservative remapping, but my sense is that probably doesn't apply in this case. And, based on your findings - particularly, trying to write your own offline generator - it seems like this issue indeed has some CMEPS-specific behavior tied into it, and doesn't seem to simply be a general ESMF issue. However, I will mention this to the ESMF team in case there are some performance improvements that could be applied to the vector remapping. I can give you some general answers to some of your questions: One of the big advantages that came with CMEPS over our previous coupling infrastructure (in CESM) was that it did away with the need for offline generation of mapping files. As such, I don't think we (in CESM, anyway) have plans for any kind of official support for offline map generation. However, there have periodically been considerations of adding a capability to write the RouteHandle (mapping) information from a run so that it can be read in future runs rather than being regenerated each time. See #335 for additional thoughts on that feature. So far we haven't seen a use case where this has felt important enough to be worth prioritizing, but if you would find that feature useful, then we would welcome contributions to add it. I'm not sure that we (again, referring to CESM) can justify developing that feature in our near-term development priorities, but (pending some more discussion here) we may be able to at least support you or someone from your team in its development. |
Beta Was this translation helpful? Give feedback.
-
|
@billsacks We (UFS) do have a branch that contains the "write/read" RH feature already. We had it prepared for one of the operational implementations, where the layout is fixed and we know it won't be changing. However, I think some on the ESMF team are aware of some weird issues that came along w/ trying to use that feature (specifically, measurably slowing the post-ice and post-ocn phases) and so we didn't end up using it. They're trying to figure out why.... However, for other operational implementations (such as the new DATM + 1/12 MOM/CICE6), the initial hope is that mapfiles will be usable. The initialization cost, when you're just making 9-day runs and they have to run w/in a certain operational window make any improvement in the start-up cost worth the effort. But, we're talking fractions of minutes , not hours (ie a few minutes vs <1min) for the overall RH creation step in DataInitialize. Back to mapfiles though....For our config, ocean and ice are always the same grid (same mesh file) so we won't be generating separate "A->O" and "A->I" mapfiles. The mapfiles contain just the weights (Gerhard/Bob explained in a meeting, which is why they can be layout-independent), so if the mesh is the same, there wouldn't any need for two mapfiles, right? EDIT---re-reading now, I see that your meshes are the same. |
Beta Was this translation helpful? Give feedback.
-
|
I guess I would first try updating ESMF to the latest - 8.9.0 is available - and if you can point to a scaling performance issue in ESMF or in CMEPS, I think we should address that, there is no reason the online regridding should not scale. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks all for your comments! For anyone interested, @DeniseWorthen and I also have some related discussion here: ACCESS-NRI/om3-scripts#91.
@jedwards4b This thread ACCESS-NRI/access-om3-configs#334 (comment) includes a preliminary scaling study for our global 25 km ACCESS-OM3 configuration: . In the first figure (black solid line), you can see that once the mediator cpu core count exceeds 144, the initialisation time increases sharply. The two additional plots below show the corresponding scaling and efficiency results as a function of MED core counts:
Thanks @jedwards4b I'll upgrade to ESMF 8.9.0 and re-run the tests and see how it goes. Edit: The plots differ slightly because the screenshots come from our more recent scaling study. |
Beta Was this translation helpful? Give feedback.


Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
TL;DR;
Our goal is to fully replicate CMEPS online routehandle generation in an offline workflow using esmf mesh files while remaining bfb. This is to avoid very expensive mediator initialisation at high med core counts. Any guidance, example code on how to do this would be greatly appreciated!
Context
We are using cmeps as the coupler for our ACCESS-OM3 MOM6-CICE6 configuration with datm and drof as data components, and are running into initialisation cost issues - online generation of routehandles, when scaling up the mediator core counts.
It runs well at moderate med counts. However as we increase
cpl_ntasks, the online routehandle generation becomes expensive, especially the vector mapping step (mappatch_uv3d) whenmapuv_with_cart3d = .true..This becomes a major bottleneck for higher-resolution configurations. Another evidence can be found here for a moderate resolution configuration.Below is what we've been doing for now and where we're stuck.
Current configuration:
What we already tried offline
1. use
ESMF_RegridWeightGendirectlyWe firstly tried generating weight files with
ESMF_RegridWeightGenusing the same ESMF mesh files, such as foratm->ocn, and wired them intonuopc.runconfig(I made some changes of the source code and included*_smapname / *_fmapname / *_vmapnameattributes.)And this is what inlcudes in
nuopc.runconfigResults:
bilinearandconserveweight files for scalar mappings (e.g. atm2ocn_smapname, atm2ocn_fmapname, and similarly for ICE), we get bfb identical results compared to CMEPS onlinemapbilnrandmapconsf.mappatch/mappatch_uv3dmapping.2. Offline generator mirroring cmeps logic
I then wrote a small standalone program that tries to re-implement cmeps
mappatch_uv3dusing only mesh files,ESMF_MeshCreateESMF_RouteHandleWrite.Results:
The test configuration is run in serial mode - ocn and ice core counts are the same and we are using the same ocn and ice esmf meshes, this offline program hence produces two identical files for:
In contrast, the RH files produced online by CMEPS:
are clearly different, and swapping them in
atm2ocn_vmapname/atm2ice_vmapnamechanges the model results, confirming that each component genuinely has distinct mapping. So it suggests that the online generation is using additional information beyond mesh + scalar srcMaskValues/dstMaskValues. It appears cmeps incorporates additional component-specific masking or med state?Current workaround
We can run a short case to allow CMEPS to generate all required routehandles online. We then copy the routehandles into
INPUTand reference them innuopc.runconfigfor production runs. This is BFB with a fully online workflow.Request
We are seeking help on,
Beta Was this translation helpful? Give feedback.
All reactions