Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http backup restoring was fixed for node role #42

Merged
merged 3 commits into from
Aug 14, 2023

Conversation

kogeler
Copy link
Contributor

@kogeler kogeler commented Aug 11, 2023

R2 backups and rclone have some issues in the paritydb case. The main reason is the fact that paritydb uses really big files (up to 40+GB). I spent a couple of days to find all of them. :)

Issues:

  • for big files R2 backend can provide the 206 (Partial Content) HTTP code without request from the client side, it is against the protocol specifications. Normally we shouldn't have it on the client side when we use the Cloudflare edge cache. The edge cache has to terminate this answer on its side, cache the whole file and then send it to the client. For some reason, the edge cache sometimes passes the 206 answers from R2 to clients and caches these R2 answers. It leads to the issue when the client tries to download the file next time the client will receive the cached 206 answers from the edge cache, not from R2 backends. It breaks downloads. 206 answers are excluded from the edge cache for now by the cache rule
  • since the 1.63.0 version rclone uses the copy strategy by default. It means that it downloads files to a temp directory then copy them and removes the temp files in the end. In the paritydb case when we download big files in parallel mode this strategy leads to using extra space (100+ GB). If we don't have enough free space we will have an endless loading loop as a result. The --inplace flag fixes it
  • if we use a list of HTTP links to download files we don't have information about the real modification time of files. If rclone has an error it tries to check all already downloaded files before the next attempt, during this check rclone decides to download files again because it finds a modification time difference (files from the 70s) for some reason. We have an endless loading loop as a result. The --size-only flag fixes it
  • sometimes rclone has retries because of file size difference errors. It's essential if we have to download big files. Possibly the --no-gzip-encoding flag can reduce the number of these errors, at least it was mentioned as a workaround in some GitHub issues. Also, DB files can't be archived very effetely, it doesn't make sense to use compression on the HTTP level. The flag can save CPU resources.

Other changes:

  • it doesn't make sense to use high-level retries when we download a single archived file to a pipe, a high-level retry will brake the pipe. rclone uses 3 high-level retries by default, I disabled it. If the download is failed unarchived files have to be removed by the script, the next download attempts will be performed on the next run on the init container. But rclone performs 10 low-level (HTTP) retries anyway.

kogeler added 2 commits August 11, 2023 13:07
fix
Signed-off-by: kogeler <[email protected]>
@kogeler kogeler merged commit 329d3c7 into main Aug 14, 2023
2 checks passed
@BulatSaif BulatSaif deleted the node-role-fix-http-backup-restoring branch August 15, 2023 07:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants