Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve files cache #8385

Closed
ThomasWaldmann opened this issue Sep 17, 2024 · 2 comments · Fixed by #8389
Closed

improve files cache #8385

ThomasWaldmann opened this issue Sep 17, 2024 · 2 comments · Fixed by #8389
Assignees
Labels
Milestone

Comments

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Sep 17, 2024

We can build the files cache by reading the "previous" archive from the repo after we have the "backup series" feature (#8379 is merged), see:

#7930 (comment)

Pros:

  • saves local disk space of the files cache
  • no need to "persist the files cache" (the "previous archive" is persisted anyway)
  • the archive always has mtime AND ctime (our files cache implementation only stores one of them)
  • we do not need the exclusive lock on the cache, because there is no read-modify-write on the files cache anymore. that would give us parallel borg create by default (without the BORG_CACHE_IMPL=adhoc hack).
  • we can get rid of some parts of the files cache code related to locking and persistence

Cons:

  • querying the previous archive's metadata stream from the repo and building an "in-memory files cache" is likely slower than loading the files cache from disk
  • supporting "generations" / "TTL" would even mean to query multiple "previous archives" from the repo. guess that isn't worth the effort, so we'll just try with the latest one.
@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Sep 17, 2024

@intelfx noted on IRC that this might be a bit much traffic if the archive is e.g. 5M files big.

True, so guess this will need some optimization afterwards, e.g. persisting the files cache or the archive metadata stream locally.

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Sep 18, 2024

Guess we could do it like that:

  • before create: load the files cache for this archive series from local cache
  • if that fails (because there is no local cache), try rebuilding it from previous archive
  • end of create: save the files cache for this archive series to local cache

Update: Done!

Now usually there is no transfer from repo needed. But that can be done in case the local files cache is lost (still better than re-reading/chunking/hashing everything).

Local "files cache" filename suffix automatically determined by archive (series) name or manually via the env var by the user. Now mtime AND ctime stored in the files cache.

@ThomasWaldmann ThomasWaldmann changed the title replace locally persisted "files cache" by "previous archive" improve files cache Sep 19, 2024
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Sep 19, 2024
- changes to locally stored files cache:

  - store as files.<H(archive_name)>
  - user can manually control suffix via env var
  - if local files cache is not found, build from previous archive.
- enable rebuilding the files cache via loading the previous
  archive's metadata from the repo (better than starting with
  empty files cache and needing to read/chunk/hash all files).
  previous archive == same archive name, latest timestamp in repo.
- remove AdHocCache (not needed any more, slow)
- remove BORG_CACHE_IMPL, we only have one
- remove cache lock (this was blocking parallel backups to same
  repo from same machine/user).

Cache entries now have ctime AND mtime.

Note: TTL and age still needed for discarding removed files.
      But due to the separate files caches per series, the TTL
      was lowered to 2 (from 20).
@ThomasWaldmann ThomasWaldmann self-assigned this Sep 19, 2024
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Sep 19, 2024
- changes to locally stored files cache:

  - store as files.<H(archive_name)>
  - user can manually control suffix via env var
  - if local files cache is not found, build from previous archive.
- enable rebuilding the files cache via loading the previous
  archive's metadata from the repo (better than starting with
  empty files cache and needing to read/chunk/hash all files).
  previous archive == same archive name, latest timestamp in repo.
- remove AdHocCache (not needed any more, slow)
- remove BORG_CACHE_IMPL, we only have one
- remove cache lock (this was blocking parallel backups to same
  repo from same machine/user).

Cache entries now have ctime AND mtime.

Note: TTL and age still needed for discarding removed files.
      But due to the separate files caches per series, the TTL
      was lowered to 2 (from 20).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant