backup downloading was fixed for node helm chart #283

kogeler · 2023-08-09T14:26:53Z

R2 backups and rclone have some issues in the paritydb case. The main reason is the fact that paritydb uses really big files (up to 40+GB). I spent a couple of days to find all of them. :)

Issues:

for big files R2 backend can provide the 206 (Partial Content) HTTP code without request from the client side, it is against the protocol specifications. Normally we shouldn't have it on the client side when we use the Cloudflare edge cache. The edge cache has to terminate this answer on its side, cache the whole file and then send it to the client. For some reason, the edge cache sometimes passes the 206 answers from R2 to clients and caches these R2 answers. It leads to the issue when the client tries to download the file next time the client will receive the cached 206 answers from the edge cache, not from R2 backends. It breaks downloads. 206 answers are excluded from the edge cache for now by the cache rule
since the 1.63.0 version rclone uses the copy strategy by default. It means that it downloads files to a temp directory then copy them and removes the temp files in the end. In the paritydb case when we download big files in parallel mode this strategy leads to using extra space (100+ GB). If we don't have enough free space we will have an endless loading loop as a result. The --inplace flag fixes it
if we use a list of HTTP links to download files we don't have information about the real modification time of files. If rclone has an error it tries to check all already downloaded files before the next attempt, during this check rclone decides to download files again because it finds a modification time difference (files from the 70s) for some reason. We have an endless loading loop as a result. The --size-only flag fixes it
sometimes rclone has retries because of file size difference errors. It's essential if we have to download big files. Possibly the --no-gzip-encoding flag can reduce the number of these errors, at least it was mentioned as a workaround in some GitHub issues. Also, DB files can't be archived very effetely, it doesn't make sense to use compression on the HTTP level. The flag can save CPU resources.

Other changes:

it doesn't make sense to use high-level retries when we download a single archived file to a pipe, a high-level retry will brake the pipe. rclone uses 3 high-level retries by default, I disabled it. If the download is failed unarchived files have to be removed by the script, the next download attempts will be performed on the next run on the init container. But rclone performs 10 low-level (HTTP) retries anyway.

Signed-off-by: kogeler <[email protected]>

charts/node/templates/statefulset.yaml

backup downloading was fixed for node helm chart

5b662de

Signed-off-by: kogeler <[email protected]>

kogeler requested review from PierreBesson, bakhtin and BulatSaif August 9, 2023 14:26

BulatSaif approved these changes Aug 9, 2023

View reviewed changes

BulatSaif reviewed Aug 9, 2023

View reviewed changes

charts/node/templates/statefulset.yaml Show resolved Hide resolved

apply pre-commit

c9aa11f

bakhtin approved these changes Aug 10, 2023

View reviewed changes

kogeler merged commit 7624824 into main Aug 11, 2023
1 check passed

kogeler deleted the fix-node-backup-downloading branch August 11, 2023 08:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backup downloading was fixed for node helm chart #283

backup downloading was fixed for node helm chart #283

kogeler commented Aug 9, 2023 •

edited

Loading

backup downloading was fixed for node helm chart #283

backup downloading was fixed for node helm chart #283

Conversation

kogeler commented Aug 9, 2023 • edited Loading

kogeler commented Aug 9, 2023 •

edited

Loading