Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunks #10

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Chunks #10

wants to merge 2 commits into from

Conversation

kostrzewa
Copy link
Member

attempt at resolving #9, not sure how to do this for the NonBlocking functions

@kostrzewa
Copy link
Member Author

kostrzewa commented May 27, 2023

There are more things that go wrong when attempting to work with very large files and too few MPI tasks but these seem to be dependent upon the specifics of the underlying MPI-I/O implementation.

This seems to work in lemon_benchmark but I'm actually not 100% sure whether the modification is correct.

It doesn't seem to solve the problem at hand though: tmLQCD simply appears to get stuck when running on LUMI when attempting to read a gauge configuration with more than 2GB per MPI task.

@kostrzewa
Copy link
Member Author

It seems to work with the implementation in OpenMPI 4.1.4, however, which did not work previously because of the various counters in LEMON are integers.

There are still some strange things which happen, however:

# Writing gauge field to .conf.t00000.tmp.
# Constructing LEMON writer for file .conf.t00000.tmp for append = 0
# Time spent writing 19.3 Gb was 268 s.
# Writing speed: 72.1 Mb/s (18.0 Mb/s per MPI process).
# Scidac checksums for gaugefield .conf.t00000.tmp:
#   Calculated            : A = 0x06c6ce67 B = 0xd68e9c01.
# Write completed, verifying write...
# Constructing LEMON reader for file .conf.t00000.tmp ...
found header xlf-info, will now read the message
found header ildg-format, will now read the message
found header ildg-binary-data, will now read the message
# Time spent reading 19.3 Gb was 164 s.
# Reading speed: 118 Mb/s (29.4 Mb/s per MPI process).
found header scidac-checksum, will now read the message
[LEMON] Node 0 reports in readAndParseHeader:
        read 0 for magic number, expected 456789ab.
ReaderNextRecord returned status -3.
[LEMON] Node 1 reports in readAndParseHeader:
        read 0 for magic number, expected 456789ab.
ReaderNextRecord returned status -3.
[LEMON] Node 3 reports in readAndParseHeader:
        read 0 for magic number, expected 456789ab.
ReaderNextRecord returned status -3.
[LEMON] Node 2 reports in readAndParseHeader:
        read 0 for magic number, expected 456789ab.
ReaderNextRecord returned status -3.
# Scidac checksums for gaugefield .conf.t00000.tmp:
#   Calculated            : A = 0x06c6ce67 B = 0xd68e9c01.
#   Read from LIME headers: A = 0x06c6ce67 B = 0xd68e9c01.
# Reading ildg-format record:
#   Precision = 64 bits (double).
#   Lattice size: LX = 64, LY = 64, LZ = 64, LT = 128.
# Input parameters:
#   Precision = 64 bits (double).
#   Lattice size: LX = 64, LY = 64, LZ = 64, LT = 128.
# Write successfully verified.

@kostrzewa
Copy link
Member Author

Yeah, as I suspected this needs some more work as the individual offsets must be set at each read and write. Shouldn't be too hard to do though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant