You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am doing a parallel write of netcdf4 format with 1 task and running into a write error in the HDF5 layer.
Debugging shows the error is happening at line 1620 of H5FDmpio.c This code to confirm the number of bytes written is purely diagnostic so I was able
to comment out the section of code from line 1600 to 1620 after which I have confirmed that the file is written correctly and the diagnostic which generates the error is incorrect.
I've confirmed that the bug is in the mpi layer by building and running with openmpi/4.1.4
when using mpt on the third call to MPI_get_elements_x a count of 9 is returned.
I added a print statement at line 1620
For the successful openmpi case I see:
39: bytes_written 6270 io_size 6270
39: bytes_written 18874368 io_size 18874368
39: bytes_written 18874368 io_size 18874368
39: bytes_written 3548 io_size 3548
39: bytes_written 48 io_size 48
For the failing mpt case I see:
39: bytes_written 6270 io_size 6270
39: bytes_written 18874368 io_size 18874368
39: bytes_written 9 io_size 18874368
39: ERROR: 0 NetCDF: HDF error err_num = -101 fname = /glade/u/home/jedwards/sandboxes/pio/src/clib/pio_darray_int.c line = 463
The text was updated successfully, but these errors were encountered:
This is a bug in the MPI library mpt/2.25
Using mpt/2.25
intel/19.1.1
https://github.com/jedwards4b/netcdf-c/tree/jedwards/add_udf2 this is a recent branch of netcdf-c main.
hdf5 1.12.2
I am doing a parallel write of netcdf4 format with 1 task and running into a write error in the HDF5 layer.
Debugging shows the error is happening at line 1620 of H5FDmpio.c This code to confirm the number of bytes written is purely diagnostic so I was able
to comment out the section of code from line 1600 to 1620 after which I have confirmed that the file is written correctly and the diagnostic which generates the error is incorrect.
I've confirmed that the bug is in the mpi layer by building and running with openmpi/4.1.4
when using mpt on the third call to MPI_get_elements_x a count of 9 is returned.
I added a print statement at line 1620
For the successful openmpi case I see:
39: bytes_written 6270 io_size 6270
39: bytes_written 18874368 io_size 18874368
39: bytes_written 18874368 io_size 18874368
39: bytes_written 3548 io_size 3548
39: bytes_written 48 io_size 48
For the failing mpt case I see:
39: bytes_written 6270 io_size 6270
39: bytes_written 18874368 io_size 18874368
39: bytes_written 9 io_size 18874368
39: ERROR: 0 NetCDF: HDF error err_num = -101 fname = /glade/u/home/jedwards/sandboxes/pio/src/clib/pio_darray_int.c line = 463
The text was updated successfully, but these errors were encountered: