-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
split worker for reduced concurrency #1393
Comments
No matter what the prevention solution is -- fixing cutechess or making workarounds inside the worker -- fishtest should be able to manage tasks with timelosses separately from high residual tasks (e.g. rejecting them or purging them etc) |
technologov-28cores-r345 has lots of issues. See e.g. #1360. |
I've pinged him on discord. But there are at least two other workers with a lot of losses. |
Marked as draft because it needs review/testing but in principle, it should be merged as soon as possible. This addresses server-side filtering of problematic tasks caused by official-stockfish#1393. Obviously we should also prevent this worker-side problem in the first place, but in the short run this should at least prevent fishtests from being polluted by garbage as they currently are
Marked as draft because it needs review/testing but in principle, it should be merged as soon as possible. This addresses server-side filtering of problematic tasks caused by official-stockfish#1393. Obviously we should also prevent this worker-side problem in the first place, but in the short run this should at least prevent fishtests from being polluted by garbage as they currently are
Script for linux:
To revert:
#!/bin/bash
# setup_worker.sh
# to setup a fishtest worker on Ubuntu 20.04, simply run:
# sudo bash setup_workers.sh 2>&1 | tee setup_workers.sh.log
# print CPU information
cpu_model=$(grep "^model name" /proc/cpuinfo | sort | uniq | cut -d ':' -f 2)
n_cpus=$( grep "^physical id" /proc/cpuinfo | sort | uniq | wc -l)
online_cores=$(grep "^bogo" /proc/cpuinfo | wc -l)
n_siblings=$(grep "^siblings" /proc/cpuinfo | sort | uniq | cut -d ':' -f 2)
n_cpu_cores=$(grep "^cpu cores" /proc/cpuinfo | sort | uniq | cut -d ':' -f 2)
total_siblings=$((${n_cpus} * ${n_siblings}))
total_cpu_cores=$((${n_cpus} * ${n_cpu_cores}))
printf "CPU model : ${cpu_model}\n"
printf "CPU : %3d - Online cores : %3d\n" ${n_cpus} ${online_cores}
printf "Siblings : %3d - Total siblings : %3d\n" ${n_siblings} ${total_siblings}
printf "CPU cores : %3d - Total CPU cores : %3d\n" ${n_cpu_cores} ${total_cpu_cores}
# read the fishtest credentials and the number of cores to be contributed
echo
echo "Write your fishtest username:"
read usr_name
echo "Write your fishtest password:"
read usr_pwd
echo "Write the number of cores to be contributed to fishtest:"
echo "(max suggested 'Total CPU cores - 1')"
read n_cores
# install required packages
apt update && apt full-upgrade -y && apt autoremove -y && apt clean
apt install -y python3 python3-venv git build-essential
# new linux account used to run the worker
worker_user='fishtest'
# create user for fishtest
useradd -m -s /bin/bash ${worker_user}
# add the bash variable for the python virtual env
sudo -i -u ${worker_user} << 'EOF'
echo export VENV=${HOME}/fishtest/worker/env >> .profile
EOF
# download fishtest
sudo -i -u ${worker_user} << EOF
git clone --single-branch --branch master https://github.com/glinscott/fishtest.git
cd fishtest
git config user.email "[email protected]"
git config user.name "your_name"
EOF
# fishtest worker setup and first start only to write the "fishtest.cfg" configuration file
sudo -i -u ${worker_user} << EOF
python3 -m venv \${VENV}
\${VENV}/bin/python3 -m pip install --upgrade pip setuptools wheel
\${VENV}/bin/python3 -m pip install requests
\${VENV}/bin/python3 \${HOME}/fishtest/worker/worker.py --concurrency ${n_cores} ${usr_name} ${usr_pwd} --only_config True && echo "concurrency successfully set" || echo "Restart the script using a proper concurrency value"
EOF
# copy the worker directory N=5 times (change according your needs)
sudo -i -u ${worker_user} << 'EOF'
cd fishtest
for ((k=0; k<=4; k++)); do
cp -r worker worker${k}
done
EOF
echo
echo "Setup fishtest-worker as a service"
read -p "Press <Enter> to continue or <CTRL+C> to exit ..."
# install fishtest-worker as systemd service
# start/stop the worker with:
# sudo systemctl start fishtest-worker@{0..4}
# sudo systemctl stop fishtest-worker@{0..4}
# check the log with:
# sudo journalctl -u [email protected]
# the service uses the worker configuration file "fishtest.cfg"
# get the worker_user $HOME
worker_user_home=$(sudo -i -u ${worker_user} << 'EOF'
echo ${HOME}
EOF
)
cat << EOF > /etc/systemd/system/[email protected]
[Unit]
Description=Fishtest worker %i
After=multi-user.target
[Service]
Type=simple
StandardOutput=file:${worker_user_home}/fishtest/worker%i/worker.log
StandardError=inherit
ExecStart=${worker_user_home}/fishtest/worker%i/env/bin/python3 ${worker_user_home}/fishtest/worker%i/worker.py
User=${worker_user}
WorkingDirectory=${worker_user_home}/fishtest/worker%i
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
echo
echo "Start fishtest-worker service"
read -p "Press <Enter> to continue or <CTRL+C> to exit ..."
systemctl start fishtest-worker@{0..4}.service
echo
echo "Enable fishtest-worker service auto start"
read -p "Press <Enter> to continue or <CTRL+C> to exit ..."
systemctl enable fishtest-worker@{0..4}.service |
I think the cleanest way to achieve splitting is to create a If For each clone there should probably be a controlling thread in the master to manage its life cyle... I am a bit worried about Clone workers do not upgrade. If the master upgrades then the clone workers are stopped and deleted. They will be recreated when the master restarts. The main reason for doing it this way would be to keep the error handling manageable. If instead we would be starting multiple copies of cutechess within a single worker, the error handling would be a nightmare I think. |
yes, I agree that this could be done at a higher level like you describe. |
Allowing the clone workers to update would lead to pretty bad race conditions. So I adapted the proposal accordingly. |
Problems:
|
I was thinking that from the point of view of the user the result of The two solutions ( |
We can set the default value of pool to ceil(concurrency/32). In that way nothing would change for workers with <= 32 cores. A 33 core worker would split up as a 16 core worker and a 17 core worker. |
it seems like the time loss problem is getting more acute with multiple large core workers.
See e.g. https://tests.stockfishchess.org/tests/view/62e523e2b383a712b1386193
We know this is probably due to cutechess not being able to deal with a large concurrency, and probably our best workaround is to split the worker internally (so not visible from the user side), to have multiple cutechess processes each with reduced concurrency.
The text was updated successfully, but these errors were encountered: