Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use borgstore and other big changes #8332

Merged
merged 79 commits into from
Sep 8, 2024

Conversation

ThomasWaldmann
Copy link
Member

@ThomasWaldmann ThomasWaldmann commented Aug 14, 2024

Fixes #8330 (no cache sync anymore).

Fixes #8325 (no segments, replaying segments anymore).

Fixes #7377, fixes #7379 (no transactions, no refcounting anymore).

Fixes #7154 (new repo locking code).

Fixes #6983 (no repo index any more).

Fixes #6899 (no compact segments any more).

Fixes #6121, fixes #7278 (no cache sync any more, no archives fetching).

Fixes #6567 (not needed any more).

Fixes #6331 (hints file not used anymore).

Fixes #6291, fixes #6289 (no segment files any more, no DEL tags).

Fixes #6288 in a different way (borg rspace command to reserve disk space that can be freed in case of emergencies).

Fixes #5654, fixes #6057, fixes #6094, fixes #7154 (new lock implementation for borgstore).

Fixes #5514 (new locking system allows shared locks for most).

Fixes #5261 (bypass-lock was removed).

Fixes #5050 (no hints any more).

Fixes #4827 (no persistent chunks cache, no cache sync anymore).

Fixes #4438 (not possible, delete/prune just kill the root reference now and don't look at anything else).

Fixes #4428 (separate objects in config/*).

Fixes #4004 - most borg commands now can use a shared repository lock (exceptions: borg check and compact).

Fixes #3128. Fixes #3196. (no cache sync anymore)

Fixes #2454, fixes #2398, fixes #3036 (no commits, no transactions any more, no log-like/append-only segments, new check implementation).

Fixes #2681, fixes #2571 (no cache sync, no chunks.archive.d).

Fixes #2454 (no commits, no transactions anymore).

Fixes #2444 (new borg check implementation, no LoggedIO code used any more).

Fixes #1293 (solved in a different way, delete/prune are super fast now).

Fixes #1244 (no transactions anymore).

Fixes #916, fixes #474, fixes #1766 (no LocalCache any more, no cache transaction any more).

Fixes #768 - most borg commands now can use a shared repository lock (exceptions: borg check and compact).

Maybe builds the foundation to solve / work on:

new repository based on borgstore project

stores chunks into separate files (not: segment/pack files, at least for now).

borgstore has a very simple api that makes implementing backends easy.

in borg 2.0, this will primarily use the "file:" backend from borgstore to implement file: and ssh: repositories, but long term we might go away from the borg.remote code (RPC api via ssh) and just use a "remote" borgstore.

there is also a sftp: repository now implemented via the respective borgstore backend. more might be coming, even cloud stuff should be easily possible with that (PRs welcome!).

repository: convergence rather than transactions

borg 1.x used the segment files in a log-like way (only appending new stuff at the end) and implemented transactions via a COMMIT tag - if the transaction was not completed (no COMMIT at the end), it rolled back the incomplete transaction to the last commit.

the code implementing transactions was rather complex and required an exclusive lock on the repo for correct operations.

borg2 now just adds repo objects in the right order, first pushing referenced objects, then the references to them. even if an operation is interrupted, nothing bad happens.
there might be some unreferenced objects for a while, but they will get referenced if the operation is retried later and completes. borg compact will deal with anything not needed.

no checkpoint archives, no .borg_part files anymore

saving them was only needed due to the transactional/rollback behavior of borg1.

borg2 does not do that rollback any more, so the checkpoints are not needed.
the user can just re-run the interrupted command and it will notice that some stuff is already present in the repo and only transfer new stuff.

new borg compact doing garbage collection

borg compact is still needed to free space in the repo, but it doesn't really need to "compact segments" as there are no segment files anymore. so it will do less I/O to move stuff around in the repo.

but maybe some sort of segment/pack files will come back later, so we'll just keep the command name.

borg compact is now doing more work that was previously done by borg delete and borg check: it determines which chunks are not used anymore (and removes them). because it needs to read the archives for that, borg compact now needs the borg key.

super fast borg delete and borg prune

as borg does not do precise refcounting any more, delete now just kills the archive from archives/* (removing the reference to its root) and lets borg compact clean up all the now unreferenced chunks.

new repo locking code

repository locking code using borgstore locks/*.

locks auto-expire and get deleted if they don't get refreshed regularly (this is good to clean up stale locks of now-dead remote borg processes).

locks also get deleted if their owner process is dead (this is good to clean up stale locks of now-dead local borg processes).

most borg commands now can use a shared repository lock (exceptions: borg check and compact, which must use an exclusive lock).

new repo config / repo key storage

Config is now stored into separate files config/* - less risk (e.g. for the repokey) if other config items need updating.

Repokey now stored into keys/* (only 1 key for now).

new manifest storage

the manifest chunk that also had the list of archives inside was split into config/manifest and separate files archives/*.

some features might come back later (stats, quota, append-only, ...)

@ThomasWaldmann ThomasWaldmann force-pushed the use-borgstore branch 5 times, most recently from 91b7337 to dd57cb3 Compare August 15, 2024 10:52
@codecov-commenter
Copy link

codecov-commenter commented Aug 15, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 73.71212% with 694 lines in your changes missing coverage. Please review.

Project coverage is 81.73%. Comparing base (3a5ee93) to head (3794e32).
Report is 87 commits behind head on master.

Files with missing lines Patch % Lines
src/borg/legacyremote.py 50.65% 260 Missing and 40 partials ⚠️
src/borg/legacyrepository.py 79.63% 163 Missing and 72 partials ⚠️
src/borg/archive.py 81.57% 19 Missing and 9 partials ⚠️
src/borg/archiver/rspace_cmd.py 40.00% 27 Missing ⚠️
src/borg/manifest.py 78.15% 18 Missing and 8 partials ⚠️
src/borg/archiver/compact_cmd.py 83.62% 13 Missing and 6 partials ⚠️
src/borg/storelocking.py 91.83% 7 Missing and 5 partials ⚠️
src/borg/archiver/delete_cmd.py 57.14% 6 Missing and 3 partials ⚠️
src/borg/archiver/debug_cmd.py 73.33% 7 Missing and 1 partial ⚠️
src/borg/cache.py 91.35% 6 Missing and 1 partial ⚠️
... and 9 more

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8332      +/-   ##
==========================================
+ Coverage   81.63%   81.73%   +0.10%     
==========================================
  Files          67       70       +3     
  Lines       12158    12648     +490     
  Branches     2194     2287      +93     
==========================================
+ Hits         9925    10338     +413     
- Misses       1647     1665      +18     
- Partials      586      645      +59     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ThomasWaldmann
Copy link
Member Author

Guess this needs some review. Anybody?

Simplify the repository a lot:

No repository transactions, no log-like appending, no append-only, no segments,
just using a key/value store for the individual chunks.

No locking yet.

Also:

mypy: ignore missing import
there are no library stubs for borgstore yet, so mypy errors without that option.

pyproject.toml: install borgstore directly from github
There is no pypi release yet.

use pip install -e . rather than python setup.py develop
The latter is deprecated and had issues installing the "borgstore from github" dependency.
It uses xxh64 hashes of the meta and data parts to verify their validity.
On a server with borg, this can be done server-side without the borg key.

The new RepoObj header has meta_size, data_size, meta_hash and data_hash.
@ThomasWaldmann ThomasWaldmann force-pushed the use-borgstore branch 2 times, most recently from ac47878 to 4ae1842 Compare September 7, 2024 21:08
Archives was built with a dictionary-like api,
but in future we want to go away from a
read-modify-write archives list.
previously, borg always read all archives entries, modified the
list in memory, wrote back to the repository (similar as borg 1.x
did).

now borg works directly with archives/* in the borgstore.
old borg just didn't commit the transaction and
thus caused a transaction rollback if not in
repair mode.

we can't do that anymore, thus we must avoid
modifying the repo if not in repair mode.
not for check and compact, these need an exclusive lock.

to try parallel repo access on same machine, same user,
one needs to use a non-locking cache implementation:

export BORG_CACHE_IMPL=adhoc

this is slow due the missing files cache in that implementation,
but unproblematic because no caches/indexes are persisted.
if the manifest file is missing, check generated *.1 *.2 ... archives although an entry for the correct name and id was already
present. BUG!

this is because if the manifest is lost, that does not imply
anymore that the complete archives directory is also lost, as it
did in borg 1.x.

Also improved log messages a bit.
borg delete and borg prune do a quick and dirty archive deletion,
just removing the archives directory entry for them.

--undelete-archives can still find the archive metadata objects
by completely scanning the repository and re-create missing
archives directory entries.

but only until borg compact would remove all unused data.

if only the manifest is missing or corrupted, do not run that
scan, it is not required for the manifest anymore.
@ThomasWaldmann
Copy link
Member Author

I'll merge this now. b10 tomorrow and I also might work on some other stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment