Skip to content

Commit 41f0cde

Browse files
Apply performance tuning when mounting FSx for Lustre
We have to create a custom mount helper to better accommodate parameters persistence across instance reboot and head node instance type update. lctl set_param commands have to be executed after the file systems are mounted and do not persist over instance reboot. The FSx document advises to use a boot cron job to set the parameter after reboot. However, a boot cron job is not compatible with our use case because FSx Lustre file systems are mounted upon first access instead of instance reboot (see code (https://github.com/aws/aws-parallelcluster-cookbook/blob/develop/cookbooks/aws-parallelcluster-config/resources/manage_fsx.rb#L60)). Therefore, we have to create a custom mount helper (see mount man page (https://linux.die.net/man/8/mount)): Q: Are these operations affect the client FSx configuration or the server configuration? A: Client only. Q: How it will work if a customer mounts FSx manually? A: If they use lustre as the mount type, the performance tuning will not be applied. Q: How do customers know they will have to use the mount helper? A: They will have to read ParallelCluster official doc. Signed-off-by: Hanwen <[email protected]>
1 parent 8b1a85a commit 41f0cde

File tree

2 files changed

+44
-2
lines changed

2 files changed

+44
-2
lines changed

cookbooks/aws-parallelcluster-config/resources/manage_fsx.rb

+6-2
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,10 @@
2323
default_action :mount
2424

2525
action :mount do
26+
template "/sbin/mount.lustre_with_performance_tuning" do
27+
source 'shared_storages/mount.lustre_with_performance_tuning.erb'
28+
mode '0755'
29+
end
2630
fsx_fs_id_array = new_resource.fsx_fs_id_array.dup
2731
fsx_fs_type_array = new_resource.fsx_fs_type_array.dup
2832
fsx_shared_dir_array = new_resource.fsx_shared_dir_array.dup
@@ -63,7 +67,7 @@
6367

6468
mount fsx_shared_dir do
6569
device "#{dns_name}@tcp:/#{mount_name}"
66-
fstype 'lustre'
70+
fstype 'lustre_with_performance_tuning'
6771
dump 0
6872
pass 0
6973
options mount_options
@@ -75,7 +79,7 @@
7579

7680
mount fsx_shared_dir do
7781
device "#{dns_name}@tcp:/#{mount_name}"
78-
fstype 'lustre'
82+
fstype 'lustre_with_performance_tuning'
7983
dump 0
8084
pass 0
8185
options mount_options
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
#!/bin/bash
2+
3+
# Copyright 2013-2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
4+
#
5+
# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance with the
6+
# License. A copy of the License is located at
7+
#
8+
# http://aws.amazon.com/apache2.0/
9+
#
10+
# or in the "LICENSE.txt" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES
11+
# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions and
12+
# limitations under the License.
13+
14+
set -ex
15+
16+
if [ $(ohai cpu/total) -gt 64 ] && (! (/sbin/lsmod | grep -q ^lustre) || [ $(/sbin/lsmod | grep ^lustre | awk '{print $3}') -eq 0 ]); then
17+
modprobe_conf_path="/etc/modprobe.d/modprobe.conf"
18+
ptlrpcd_per_cpt_max="options ptlrpc ptlrpcd_per_cpt_max"
19+
ksocklnd_credits="options ksocklnd credits"
20+
if ! grep -q "$ptlrpcd_per_cpt_max" "$modprobe_conf_path" && ! grep -q "$ksocklnd_credits" "$modprobe_conf_path"; then
21+
sudo sh -c "echo $ptlrpcd_per_cpt_max=32 >> /etc/modprobe.d/modprobe.conf"
22+
sudo sh -c "echo $ksocklnd_credits=2560 >> /etc/modprobe.d/modprobe.conf"
23+
# Reload Lustre kernel module to apply the above two settings
24+
sudo lustre_rmmod && sudo modprobe lustre
25+
fi
26+
fi
27+
28+
sudo mount -t lustre "$@"
29+
30+
if [ $(ohai cpu/total) -gt 64 ]; then
31+
sudo lctl set_param osc.*OST*.max_rpcs_in_flight=32
32+
sudo lctl set_param mdc.*.max_rpcs_in_flight=64
33+
sudo lctl set_param mdc.*.max_mod_rpcs_in_flight=50
34+
fi
35+
total_memory_kb=$(cat /proc/meminfo | grep MemTotal | awk '{print $2}')
36+
if [ "$total_memory_kb" -gt 274877907 ]; then
37+
sudo lctl set_param llite.*.max_cached_mb=$((total_memory_kb/10000)) # set this value to be 10% of customer client instance physical memory in mb
38+
fi

0 commit comments

Comments
 (0)