-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PDLIB Block Explicit scheme: Fix MPI restart reproducibility #1134
Comments
hi @aronroland I'm working on resolving b4b differences in the restart files for PDLIB block-explicit, and one thing I noticed is that those file sizes will vary when you vary the MPI count. In this case I am not doing a warm start run using a 03Z restart file from a separate run, like described above. Here, I'm only doing two separate runs of the
Looking into this I found something in Lines 630 to 663 in f66b6d4
thanks |
I just checked on my side and I can confirm that it is not b4b, which is quite strange. Our commit long time ago with the fixes for the block explicit passed the ufs testcase and was b4b at that time. Otherwise for the restart issues I need to look at it more detailed. It is very long time ago that I worked on this part of the code. |
@aronroland everything but the restart file itself reproduces. I believe this has always been the case with the block-explicit. Should we be calling UNST_PDLIB_WRITE_TO_FILE when writing restart files for the block explicit case? |
@aronroland please give me some time to look at it ... |
Hello, so as I have reviewed the code and it is intentional, we have tried basically to reproduce the behavior of the structured scheme within the unstructured schemes as defined by IOSTYP = 0, 1. IOSTYP 2 and 3 was not streamlined to the unstructured DD part. As for the problem above I would follow the path of IOSTYP = 0. Is it b4b for the simple explicit schemes? There should not be any difference between the block_explicit or other explicit schemes when the writing of the restart files is of concern. If all other files for the block_explicit are b4b, which is the case as I have understood it, there is only little room left. It needs to be traced from w3_wave down to the call to the writing of the restart file and there should be no difference whether block_explicit or explicit. The call to UNST_PDLIB_WRITE_TO_FILE is done for IOSTYP = 1. |
I have tested the different output methods from above and found that IOSTYP = 0 is results in about 10% improved runtime. (testcase was tp2.6) |
Hello @aronroland, thank you for looking into this and offering insight. I will do some more checking on my end with what you've said. I have a quick follow up, though Regarding your question, if it is b4b for the simple explicit scheme? I can say I tested this scheme (grid a) with different MPI counts, and it is not b4b. If I could ask one clarification question? You suggested we use IOSTYP=0. So in this case UNST_PDLIB_WRITE_TO_FILE will not be called, and it will only write the spectra directly from w3iors (structured)? As I said I will do some tests on my side and will let you know what I find. |
Yes, as I see things IOSTYP = 0 does a direct write and 1 analogous to the ww3 logics for the structured grids |
Hi @aronroland, When IOSTYP=0, this block of code gets executed for writing spectra: Lines 630 to 639 in f66b6d4
Though when IOSTYP=1, the Lines 654 to 663 in f66b6d4
I did runs for each IOSTYP, for two different arbitrary choices for MPI task counts: 40, 20. When I display the sizes of the restart files, the restarts generated from the runs with IOSTYP=0 have varying file sizes based on number of MPI tasks.
Is it expected that you would get different sized restart files for different MPI task counts, when using IOSTYP=0? |
Hi @MatthewMasarik-NOAA, indeed this is strange and I wonder whether both restart files work? Did u check this, anyway it is something that needs to be cleaned. |
Hi @aronroland, I have already done some different types of checking on these restart files. To answer whether both restarts work? They both warm start a run and finish without issue, but the output fields from those runs aren't reproducible. I can confirm that in either case, IOSTYP=0,1, the restart files themselves are not b4b for differing MPI counts. The out_grd from warm starting with a differing MPI count is also not b4b. It's concerning that for IOSTYP=0 the restart file size will change with MPI count. Do you still think the best path is going with IOSTYP=0? |
@MatthewMasarik-NOAA can you share the runs where the output fields of the runs are not reproducible. This is not was I was expecting or have seen in my testing when IOSTYPE=0 (I would have to double check that though). |
@JessicaMeixner-NOAA, please clarify what you mean by 'runs where the output fields of the runs are not reproducible.' I qualified those as warm-start runs from a restart file from a run with a different MPI count. Was that your interpretation? |
I was referring to the situation that this issue was started for:
|
I thought the other output fields (gridded, point) except the restart file itself replicates. |
They do not. The test.comp output is in the header listing those. |
Hi @MatthewMasarik-NOAA, @JessicaMeixner-NOAA at the time when @aliabdolali and me worked on it it was b4b, otherwise it could not have been merged. As for the restart file size i think that this should not be different for a different number of tasks. This does not look right. |
back in date, it was tested by the emc code managers, and merged after review and approval, and from what I recall, all of ufs_1.1, ufs1.1_unstr, ufs1.2 and ufs1.3 tess have mpi and restart reproducibility checks. If not, I think you just need to activate them in matrix.base. Let me know if you need me to show you where I added them. @DeniseWorthen also did some checks in the coupled application. |
@aliabdolali I don't recall ever trying a case where I generated a restart file running w/ a certain decomp, used it to restart a different decomp and compared answers. If I understand correctly, that is the issue being discussed here. |
I'm going to run some tests and will report back tomorrow. |
I do not remember exactly Denise, but I remember you were checking the ghost nodes, for b4b reproducibility, it might be for mpi not restart or vice versa. It is a long time since we did it. |
@aliabdolali I set up a test case for the ATM-WAV RT test in UFS using the small mesh we did a lot of initial testing on. I ran out 2 hours on 10PEs, and used the restart after 1 hour to start up either a 10PE case or a 14PE case. All three case have identical ATM and mediator restart files at the end of 2 hours. I did run in debug mode, which might prevent some optimizations (ie, order of operations) which could conceivably change answers. But this basic test appears to pass for me. |
Super @DeniseWorthen, it helps us a lot. Let's see is the outcomes of standalone test. We can compare and see the differences, it might be in the configuration. |
As for the future i suggest that we add sanity tests for the source term and solver part, in this way we will be able to easier find out where the issue occurs. As for the UG part of the code we have "solver_coherence" introduced within a cpp pragma. I will do some checks based on the ideas of @MathieuDutSik |
Hi Everyone - Thanks for your patience as I ran extra tests. I did not repeat the coupled tests as I got the same results as @DeniseWorthen before and she confirmed that so I focused on standalone tests. I do see what @MatthewMasarik-NOAA saw with the fact that the out_grid had b4b differences in the standalone set-up. The differences are limited to the initial time step for the restart and were in wind, current, Charnock and Ustar. For Wind we have a known issue: #218 that explains why this is different. I suspect this is also the reason for current. Charnock and Ustart - I'll be honest I don't remember if those had differences in the structured case, this is something that we could check. They are both written to the restart file, so this could be somewhere to look into further for answer changes but again all other fields at the initial time are the same in the gridded out and all fields at all other times are the same, so I think we are getting the same results as we were before, I think we just didn't look at the first time step before due to the wind issue. @aronroland additional sanity checks are always great. My suspicion is that something is being written to the restart file that is not actually being used and that is causing the differences. They're just really hard to track down in the binary file output. |
Describe the bug
When using the Block-explicit solver, the runs are not MPI restart reproducible. Meaning, if you perform a run using MPI tasks=X, and use one of those restart files to initiate a separate run, which uses MPI tasks=Y, the runs are not reproducible.
To Reproduce
You can create this situation by starting with the regtest
ww3_ufs1.1/unstr
, and turning the Block explicit scheme on by selecting grid b. From here you can set up the restart functionality by using theww3_ufs1.2
03Z restart test as a template. Here's how the jobcard commands could look for differing MPI counts, <X> and <Y>:Expected behavior
The expected behavior is that the results should be identical between these runs (running
test.comp
shows that ALL files are identical). However, when runningtest.comp
for different values of <X> and <Y>, the results are not identical (out_grd
,out_pnt
, andrestart
s are different).Screenshots
test.comp
output for <X>=16, <Y>=11:Additional context
N/A.
The text was updated successfully, but these errors were encountered: