Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak issue running GCHP14.3.1 on AWS #466

Open
YanshunLi-washu opened this issue Jan 1, 2025 · 2 comments
Open

Memory leak issue running GCHP14.3.1 on AWS #466

YanshunLi-washu opened this issue Jan 1, 2025 · 2 comments
Assignees
Labels
category: Question Further information is requested topic: Performance Related to GCHP model speed and/or memory

Comments

@YanshunLi-washu
Copy link

Your name

Yanshun Li

Your affiliation

WashU

Please provide a clear and concise description of your question or discussion topic.

Hi Team,

I'm running GCHP 14.3.1 C180 simulation on AWS and got a memory leak problem. The memory usage fill up to abortion within 24 hours' model run:

AGCM Date: 2020/12/31 Time: 23:10:00 Throughput(days/day)[Avg Tot Run]: 0.7 0.7 23.7 TimeRemaining(Est) 135:06:43 73.9% : 64.3% Mem Comm:Used
Mem/Swap Used (MB) at MAPL_Cap:TimeLoop= 1.248E+05 0.000E+00
...
AGCM Date: 2021/01/01 Time: 23:00:00 Throughput(days/day)[Avg Tot Run]: 40.6 83.2 137.2 TimeRemaining(Est) 001:46:22 108.3% : 98.1% Mem Comm:Used
Mem/Swap Used (MB) at MAPL_Cap:TimeLoop= 1.848E+05 0.000E+00

I was using hourly anthropogenic emission inventories and take the option to output check points. When using only monthly emission inventories and opt out for writing check points, the simulation can last for one month.

My intuition is that the memory leak is related to netcdf reading and writing, but I got no such issue when running on NASA pleiades or WashU compute1. I attached the log of building gchp executable so that more info about the linux environment can be provided.

Appreciate if there could be any suggestions and solutions.
ecbuild.log

Yanshun

@YanshunLi-washu YanshunLi-washu added the category: Question Further information is requested label Jan 1, 2025
@lizziel
Copy link
Contributor

lizziel commented Jan 8, 2025

Hi @YanshunLi-washu, have you checked in with other GCHP users at WashU who use AWS? I think @BettyCroft has been doing runs with at least that version. If she has not seen this memory leak then I wonder if it has something to do with your settings. You can try to narrow down the issue by systematically reverting your changes back to default, such as turning off mid-run checkpoints, and then turning off hourly emissions.

@lizziel lizziel self-assigned this Jan 8, 2025
@lizziel lizziel added the topic: Performance Related to GCHP model speed and/or memory label Jan 8, 2025
@YanshunLi-washu
Copy link
Author

Hi @lizziel I've identified that using hourly emission and write mid-run checkpoints are both factors. Optioning out for these two can make the C180 simulation last for one month.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Question Further information is requested topic: Performance Related to GCHP model speed and/or memory
Projects
None yet
Development

No branches or pull requests

2 participants