FAQs for HKUST hpcs (SuperPOD, HPC4)
Q: Why I got "QOSMinGRES" error?
A: When allocating GPU nodes, please use --gres, --gpus-per-node or --gpus-per-task to specify the number of GPU requested.
Q: Why does my SSH session frequently disconnect during long operations?
A: SSH sessions can timeout due to network settings or inactivity. Here are solutions to maintain your connection:
- Configure SSH Keep-Alive Settings:
Add these settings to your
~/.ssh/config
file:
Host ust-hpc4-login
HostName hpc4.ust.hk
User YOUR_USERNAME
ServerAliveInterval 30
ServerAliveCountMax 3
TCPKeepAlive yes
-
Use Terminal Multiplexers, which keep the sessions alive when disconnect:
- GNU Screen:
# Start new session screen # Reconnect screen -r
- Tmux:
# Start new session tmux # Reconnect tmux attach
- GNU Screen:
-
For Long Operations:
- Use batch jobs instead of interactive sessions
- Always run important processes in Screen/Tmux
- Consider using
nohup
for background processes
For running an overlap job:
srun -A <account> --overlap --jobid <jobid> --pty bash