Giter Club home page Giter Club logo

Comments (14)

sit23 avatar sit23 commented on June 15, 2024

Hi @eocene. I would imagine that the new nodes are using a different version of MPI, or something like that. So that I can help more, could you tell us what kind of model are you running? I.e. are you using grey radiation / RRTM / held-suarez etc? I would say that your error here is almost certainly not a problem with the interpolation routine, but is a symptom of some other problem. ( I would also advise against doing too much digging in the interpolation routine. It's very long and not that easy to read!).

from isca.

eocene avatar eocene commented on June 15, 2024

Hi Stephen, thank you very much indeed for the (quick) response!

I've checked the MPI versions and apparently they are all the same. I absolutely agree about the symptom-not-cause diagnosis. For completeness, I have run various test_cases and mysteriously they all fail differently. All these have been recompiled again, and tried with various PE numbers etc. (Another thing I've tried is increasings "num_iters" in horiz_interp_conserve.F90 based on reading the code/docs, but to no avail).

I'm sure that there is something amiss/stupid/negligent on my end, but since the root cause seems to be a bit obscure at the moment any hints really would be very much appreciated! Thank you very much indeed.

axisymmetric fails exactly like my personal configuration:
201901-29 12:34:06,399 - isca - DEBUG - NOTE from PE 0: MPP_IO_SET_STACK_SIZE: stack size set to 131072. 2019-01-29 12:34:06,402 - isca - DEBUG - NOTE from PE 0: MPP_DOMAINS_SET_STACK_SIZE: stack size set to 600000. 2019-01-29 12:34:06,410 - isca - DEBUG - starting 1 OpenMP threads per MPI-task 2019-01-29 12:34:06,410 - isca - DEBUG - ATMOS MODEL DOMAIN DECOMPOSITION 2019-01-29 12:34:06,410 - isca - DEBUG - X-AXIS = 128 2019-01-29 12:34:06,411 - isca - DEBUG - Y-AXIS = 8 8 8 8 8 8 8 8 2019-01-29 12:34:06,425 - isca - DEBUG - mean surface pressure= NaN mb 2019-01-29 12:34:06,437 - isca - DEBUG - NOTE from PE 0: idealized_moist_phys: Using Frierson Quasi-Equilibrium convection scheme. 2019-01-29 12:34:06,445 - isca - DEBUG - NOTE from PE 0: interpolator_mod :sn_1.000_sst.nc is a year-independent climatology file 2019-01-29 12:34:06,446 - isca - DEBUG - 2019-01-29 12:34:06,446 - isca - DEBUG - FATAL from PE 1: horiz_interp_conserve_mod:no latitude index found: n,sph= 1 NaN 2019-01-29 12:34:06,446 - isca - DEBUG -

Held-Suarez fails on a segmentation fault:
2019-01-29 12:07:35,307 - isca - DEBUG - / 2019-01-29 12:07:35,308 - isca - DEBUG - NOTE: MPP_IO_SET_STACK_SIZE: stack size set to 131072. 2019-01-29 12:07:35,310 - isca - DEBUG - NOTE: MPP_DOMAINS_SET_STACK_SIZE: stack size set to 600000. 2019-01-29 12:07:35,316 - isca - DEBUG - starting 1 OpenMP threads per MPI-task 2019-01-29 12:07:35,316 - isca - DEBUG - ATMOS MODEL DOMAIN DECOMPOSITION 2019-01-29 12:07:35,316 - isca - DEBUG - X-AXIS = 128 2019-01-29 12:07:35,316 - isca - DEBUG - Y-AXIS = 64 2019-01-29 12:07:35,376 - isca - DEBUG - mean surface pressure= NaN mb 2019-01-29 12:07:35,528 - isca - DEBUG - forrtl: severe (174): SIGSEGV, segmentation fault occurred 2019-01-29 12:07:35,528 - isca - DEBUG - Image PC Routine Line Source 2019-01-29 12:07:35,528 - isca - DEBUG - libintlc.so.5 00002AB0523DABF1 tbk_trace_stack_i Unknown Unknown 2019-01-29 12:07:35,528 - isca - DEBUG - libintlc.so.5 00002AB0523D8D2B tbk_string_stack_ Unknown Unknown 2019-01-29 12:07:35,528 - isca - DEBUG - libifcoremt.so.5 00002AB050A22AC2 Unknown Unknown Unknown 2019-01-29 12:07:35,528 - isca - DEBUG - libifcoremt.so.5 00002AB050A22916 tbk_stack_trace Unknown Unknown 2019-01-29 12:07:35,528 - isca - DEBUG - libifcoremt.so.5 00002AB05097BAB0 for__issue_diagno Unknown Unknown 2019-01-29 12:07:35,528 - isca - DEBUG - libifcoremt.so.5 00002AB05098D658 for__signal_handl Unknown Unknown 2019-01-29 12:07:35,529 - isca - DEBUG - libpthread-2.17.s 00002AB0505005E0 Unknown Unknown Unknown 2019-01-29 12:07:35,529 - isca - DEBUG - held_suarez.x 00000000006C4EEC Unknown Unknown Unknown 2019-01-29 12:07:35,529 - isca - DEBUG - held_suarez.x 00000000006BFBA7 Unknown Unknown Unknown 2019-01-29 12:07:35,529 - isca - DEBUG - held_suarez.x 00000000006BD426 Unknown Unknown Unknown 2019-01-29 12:07:35,529 - isca - DEBUG - held_suarez.x 000000000045197C Unknown Unknown Unknown 2019-01-29 12:07:35,529 - isca - DEBUG - held_suarez.x 0000000000411B40 Unknown Unknown Unknown 2019-01-29 12:07:35,529 - isca - DEBUG - held_suarez.x 0000000000468D75 Unknown Unknown Unknown 2019-01-29 12:07:35,529 - isca - DEBUG - held_suarez.x 0000000000907BEF Unknown Unknown Unknown 2019-01-29 12:07:35,529 - isca - DEBUG - held_suarez.x 000000000040520E Unknown Unknown Unknown 2019-01-29 12:07:35,529 - isca - DEBUG - libc-2.17.so 00002AB05264FC05 __libc_start_main Unknown Unknown 2019-01-29 12:07:35,529 - isca - DEBUG - held_suarez.x 0000000000405119 Unknown Unknown Unknown

and Realistic-Continents fails on 'regularize: Failure to converge'
2019-01-29 12:24:46,273 - isca - DEBUG - NOTE from PE 0: MPP_IO_SET_STACK_SIZE: stack size set to 131072. 2019-01-29 12:24:46,277 - isca - DEBUG - NOTE from PE 0: MPP_DOMAINS_SET_STACK_SIZE: stack size set to 600000. 2019-01-29 12:24:46,286 - isca - DEBUG - starting 1 OpenMP threads per MPI-task 2019-01-29 12:24:46,286 - isca - DEBUG - ATMOS MODEL DOMAIN DECOMPOSITION 2019-01-29 12:24:46,286 - isca - DEBUG - X-AXIS = 128 2019-01-29 12:24:46,287 - isca - DEBUG - Y-AXIS = 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 2019-01-29 12:24:46,459 - isca - DEBUG - 2019-01-29 12:24:46,460 - isca - DEBUG - FATAL from PE 1: regularize: Failure to converge 2019-01-29 12:24:46,460 - isca - DEBUG - 2019-01-29 12:24:46,460 - isca - DEBUG - application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1 2019-01-29 12:24:46,460 - isca - DEBUG - 2019-01-29 12:24:46,460 - isca - DEBUG - FATAL from PE 2: regularize: Failure to converge 2019-01-29 12:24:46,460 - isca - DEBUG - 2019-01-29 12:24:46,460 - isca - DEBUG - application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2 2019-01-29 12:24:46,460 - isca - DEBUG -

from isca.

AlexAudette avatar AlexAudette commented on June 15, 2024

I obtained a similar error using the realistic-continents case. The result being a crash with message :"regularize: Failure to converge".

from isca.

sit23 avatar sit23 commented on June 15, 2024

Strange - @eocene did you end up getting a handle on this problem?

from isca.

sit23 avatar sit23 commented on June 15, 2024

@AlexAudette - do you also find other test cases to be failing, or is it just the realistic continents one?

from isca.

AlexAudette avatar AlexAudette commented on June 15, 2024

@sit23 So far it is only with the realistic continents. I am able to run my simulation at T42 using the era_land_T42.nc land mask file, but when I create my own at T85, I get the same error as eocene.

from isca.

sit23 avatar sit23 commented on June 15, 2024

@AlexAudette OK - that's a slightly different problem, which we have encountered ourselves. The background is that when you put data like topography into the spectral dynamical core, the spikiness of the data and the finite number of Fourier modes means that you form Gibbs ripples etc in the topography. To help counter this, the model automatically smooths the incoming topography, which reduces the size of the ripples. The degree of the smoothing is controlled by the parameter ocean_topog_smoothing in the spectral_dynamics_nml. The parameter represents a measure of the smoothness of the topography, with higher values meaning smoother topography, and a smoothing method is applied recursively until the incoming topography is as smooth as the parameter dictates. When you change resolution though, it's possible that the smoothing algorithm cannot smooth the topography enough for it to be smoother than the parameter dictates, and so it fails to converge, as per the error message you have. To sort this out, you can reduce the ocean_topog_smoothing parameter. That way you should find that the regularisation converges, and the model will stop giving you that error message.

from isca.

AlexAudette avatar AlexAudette commented on June 15, 2024

@sit23 Thanks for you answer. I tried reducing the ocean_topog_smoothing parameter from 0.8 to 0.05 by increments of 0.2, but still no success, I still get the same error :

2020-03-10 09:28:42,216 - isca - INFO - process running as 110162
2020-03-10 09:28:42,386 - isca - DEBUG - loadmodules for niagara machines
2020-03-10 09:28:42,470 - isca - DEBUG - The following modules were not unloaded:
2020-03-10 09:28:42,470 - isca - DEBUG - (Use "module --force purge" to unload all):
2020-03-10 09:28:42,470 - isca - DEBUG -
2020-03-10 09:28:42,470 - isca - DEBUG - 1) NiaEnv/2018a
2020-03-10 09:28:43,401 - isca - DEBUG - NOTE from PE 0: MPP_DOMAINS_SET_STACK_SIZE: stack size set to 32768.
2020-03-10 09:28:43,401 - isca - DEBUG - &MPP_IO_NML
2020-03-10 09:28:43,401 - isca - DEBUG - HEADER_BUFFER_VAL = 16384,
2020-03-10 09:28:43,401 - isca - DEBUG - GLOBAL_FIELD_ON_ROOT_PE = T,
2020-03-10 09:28:43,401 - isca - DEBUG - IO_CLOCKS_ON = F,
2020-03-10 09:28:43,401 - isca - DEBUG - SHUFFLE = 0,
2020-03-10 09:28:43,401 - isca - DEBUG - DEFLATE_LEVEL = -1
2020-03-10 09:28:43,401 - isca - DEBUG - /
2020-03-10 09:28:43,405 - isca - DEBUG - NOTE from PE 0: MPP_IO_SET_STACK_SIZE: stack size set to 131072.
2020-03-10 09:28:43,407 - isca - DEBUG - NOTE from PE 0: MPP_DOMAINS_SET_STACK_SIZE: stack size set to 600000.
2020-03-10 09:28:43,411 - isca - DEBUG - starting 1 OpenMP threads per MPI-task
2020-03-10 09:28:43,412 - isca - DEBUG - ATMOS MODEL DOMAIN DECOMPOSITION
2020-03-10 09:28:43,412 - isca - DEBUG - X-AXIS = 256
2020-03-10 09:28:43,412 - isca - DEBUG - Y-AXIS = 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
2020-03-10 09:28:43,910 - isca - DEBUG -
2020-03-10 09:28:43,911 - isca - DEBUG - FATAL from PE 15: regularize: Failure to converge
2020-03-10 09:28:43,911 - isca - DEBUG -
...
2020-03-10 09:28:43,912 - isca - DEBUG -
2020-03-10 09:28:43,912 - isca - DEBUG - FATAL from PE 0: regularize: Failure to converge
2020-03-10 09:28:43,912 - isca - DEBUG -
2020-03-10 09:28:43,912 - isca - DEBUG - --------------------------------------------------------------------------
2020-03-10 09:28:43,912 - isca - DEBUG - MPI_ABORT was invoked on rank 14 in communicator MPI_COMM_WORLD
2020-03-10 09:28:43,912 - isca - DEBUG - with errorcode 1.
2020-03-10 09:28:43,912 - isca - DEBUG -
2020-03-10 09:28:43,912 - isca - DEBUG - NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
2020-03-10 09:28:43,912 - isca - DEBUG - You may or may not see output from other processes, depending on
2020-03-10 09:28:43,913 - isca - DEBUG - exactly when Open MPI kills them.
2020-03-10 09:28:43,913 - isca - DEBUG - --------------------------------------------------------------------------

from isca.

sit23 avatar sit23 commented on June 15, 2024

OK - could you try setting it to 0? That should turn off the regularisation, and we can see if it runs then or not. You could also try increasing the parameter, just in case I've mis-remembered the way you need to go!

from isca.

AlexAudette avatar AlexAudette commented on June 15, 2024

So it runs now with the parameter set to 0, thank you very much. I tried as well to increase the parameter to 0.96 and still crashed at the place. I will keep an eye out for truncation effects. Thanks again!

from isca.

sit23 avatar sit23 commented on June 15, 2024

OK - you will probably find that the gibbs ripples are significant without any smoothing. You'll particularly see it in the vertical velocity and the precipitation. When we've run with topography at T85, we have managed to run the smoothing, but I can't quite lay my hands on the smoothing parameter we used. I'll let you know if I find it. We are working on alternatives to this smoothing algorithm, which should be available soon.

from isca.

sit23 avatar sit23 commented on June 15, 2024

Just found it - looks like I tried 0.85 for the smoothing parameter and it worked with T85 topography.

from isca.

AlexAudette avatar AlexAudette commented on June 15, 2024

Interesting, I just tried with this same value and it still fails to regularize. Did you do anything special with your topography file?

from isca.

sit23 avatar sit23 commented on June 15, 2024

Well, you're welcome to try the T85 topography file that I used and see if it works for you. You can find it here:
https://drive.google.com/file/d/1lsYsVE1pIDxOC_CV4SDJu8oxUxmgQ0za/view?usp=sharing

from isca.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.