Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues on juwels #679

Open
patrickscholz opened this issue Jan 29, 2025 · 1 comment
Open

Issues on juwels #679

patrickscholz opened this issue Jan 29, 2025 · 1 comment
Assignees

Comments

@patrickscholz
Copy link
Contributor

Im actully playing in JUWELSwith extended GNU compiler options, due to some issues that occur with Vasco's mesh.

elseif(${CMAKE_Fortran_COMPILER_ID} STREQUAL  GNU )
#    target_compile_options(${PROJECT_NAME} PRIVATE -O3 -finit-local-zero  -finline-functions -fimplicit-none  -fdefault-real-8 -ffree-line-length-none)
   if(${FESOM_PLATFORM_STRATEGY} STREQUAL ubuntu )
      message(STATUS "Allowing type mismatches on Ubuntu for CI Testing" )  # NOTE(PG): Would be nicer to grab the CI=True from the env variable
      target_compile_options(${PROJECT_NAME} PRIVATE -O2 -g -fbacktrace -ffloat-store -finit-local-zero  -finline-functions -fimplicit-none  -fdefault-real-8 -fdefault-double-8 -ffree-line-length-none -fallow-argument-mismatch)
   else()
      target_compile_options(${PROJECT_NAME} PRIVATE -O2 -g -fbacktrace -ffloat-store -finit-local-zero  -finline-functions -fimplicit-none  -fdefault-real-8 -fdefault-double-8 -ffree-line-length-none)
   endif()
   if(CMAKE_Fortran_COMPILER_VERSION VERSION_GREATER_EQUAL 10 )
      target_compile_options(${PROJECT_NAME} PRIVATE -fallow-argument-mismatch) # gfortran v10 is strict about erroneous API calls: "Rank mismatch between actual argument at (1) and actual argument at (2) (scalar and rank-1)"
   endif()
   
   # extended debugging flags for GNU compiler
   target_compile_options(${PROJECT_NAME} PRIVATE 
                           -ffloat-store 
                           -finit-local-zero  
                           -finline-functions 
                           -fimplicit-none  
                           -fdefault-real-8 
                           -fdefault-double-8 
                           -ffree-line-length-none 
                           -fallow-argument-mismatch 
                           -fno-fast-math -fno-signed-zeros 
                           -freciprocal-math 
                           -mveclibabi=svml 
                           -flto
                           -pg 
                           -fbacktrace 
                           -fcheck=all,bounds 
                           -finit-real=snan 
                           -finit-integer=-9999 
                           -fsanitize=undefined,address
                        )
   # if use -fsanitize=undefined,address you also need ... PRIVATE -lubsan)           
   target_link_libraries(${PROJECT_NAME} PRIVATE -lubsan)

... i already managed to fix a couple of issues. The model seems to run based full MPI, But occasionally at runtime at different places i receive the warning/error message

At line 1798 of file /p/home/jusers/scholz6/juwels/fesom2/src/gen_halo_exchange.F90
Fortran runtime warning: An array temporary was created for argument 'buf' of procedure 'mpi_send'
At line 1798 of file /p/home/jusers/scholz6/juwels/fesom2/src/gen_halo_exchange.F90
Fortran runtime warning: An array temporary was created for argument 'buf' of procedure 'mpi_send'
At line 1798 of file /p/home/jusers/scholz6/juwels/fesom2/src/gen_halo_exchange.F90

... chatGPT says that this message originates from the fact that MPI does not recognize that the array that is send to MPI_send is contiguous and therefor creates a copy of it to make it contiguous. The problem is that this issue seems to occure somewhere and somewhen at runtime, already when this point has been passed in the timestep loop several times!

 --> call compute_diagnostics(1)
  --> call output (n)
 FESOM =======================================================
 FESOM step:          58  day:           1  year:        1958

  --> call compute_vel_nodes
  --> call ocean2ice(n)
  --> call update_atm_forcing(n)
  --> call ice_timestep(n)
      --> call EVPdynamics_m...
      --> call ice_TG_rhs_div...
      --> call ice_fct_solve...
      --> call ice_update_for_div...
      --> call cut_off...
      --> call thermodynamics...
 ___ICE STEP EXECUTION TIMES____________________________
        Ice Dyn.        : 2.889E-01
        Ice Advect.     : 6.359E-03
        Ice Thermodyn.  : 7.875E-03
    _______________________________
        Ice TOTAL       : 3.031E-01

  --> call oce_fluxes_mom...
  --> call oce_timestep_ale
      --> call pressure_bv
      --> call pressure_force_4_...
      --> call oce_mixing_KPP
      --> call compute_vel_rhs
At line 1798 of file /p/home/jusers/scholz6/juwels/fesom2/src/gen_halo_exchange.F90
Fortran runtime warning: An array temporary was created for argument 'buf' of procedure 'mpi_send'
At line 1798 of file /p/home/jusers/scholz6/juwels/fesom2/src/gen_halo_exchange.F90
....

... I try to follow the suggestions of chatGPT to enforce the array to be contiguous with by using
call MPI_SEND( arr2D(1:myDim_nod2D), myDim_nod2D, MPI_DOUBLE_PRECISION, 0, 2, MPI_COMM_FESOM, MPIerr )
instead of
call MPI_SEND( arr2D, myDim_nod2D, MPI_DOUBLE_PRECISION, 0, 2, MPI_COMM_FESOM, MPIerr )
but the problem is still there!

You have any ideas or suggestions about this?

@patrickscholz
Copy link
Contributor Author

In moment i also get a not much saying error message for:

/p/home/jusers/scholz6/juwels/fesom2/src/io_meandata.F90 NetCDF: HDF error

where the model stops afterwards. Any Idea about that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants