Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Usage #13

Open
philip-davis opened this issue Mar 18, 2021 · 0 comments
Open

Memory Usage #13

philip-davis opened this issue Mar 18, 2021 · 0 comments

Comments

@philip-davis
Copy link

Pradeep was seeing sig 9 termination on Summit with servers when running with 64GB/ts/node (looked like the 3rd ts, so either 192 or 256GB storage.) This is earlier than expected since Summit has 512GB/node. This might be a system limitation, but we should confirm that we aren't doing something silly like allocating 2x memory (or failing to signal to libfabric or margo to release buffers) or similar silliness.

Useful job script excerpt:

# disable MR cache in libfabric; still problematic as of libfabric 1.4.1
export FI_MR_CACHE_MAX_COUNT=0
# use shared recv context in RXM; should improve scalability
export FI_OFI_RXM_USE_SRX=1

rm -rf conf.ds
## Create dataspaces configuration file
echo "## Config file for DataSpaces
ndim = 3
dims = 32768, 32768, 32768
max_versions = 4
max_readers = 128
lock_type = 2
num_apps = 2
" > dataspaces.conf
TD=32768
NS=2
NW=64
NR=4
let "DR=TD/NR"
let "DW=TD/NW"
# Note that we explicitly specify the libfabric domain of "mlx5_0" on
# Summit. Otherwise libfabric and/or libibverbs may select a default port
# that does not work out of the box.
jsrun -n $NS -a 1 -r $NS ./dspaces_server verbs://mlx5_0 >& server_"$NS"_"$NW"_"$NR".log &
serverproc=$!
jsrun -n $NW -a 1 -r 32 ./test_writer 3 1 1 $NW 512 512 $DW 4 -s 8 -m server >& writer_"$NS"_"$NW"_"$NR".log
writerproc=$!
jsrun -n $NR -a 1 -r $NR ./test_reader 3 1 1 $NR 512 512 $DR 4 -s 8 -t >& reader_"$NS"_"$NW"_"$NR".log
wait $writerproc
wait $serverproc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant