Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use borgstore and other big changes #8332

Merged
merged 79 commits into from
Sep 8, 2024

Commits on Aug 23, 2024

  1. Repository3 / RemoteRepository3: implement a borgstore based repository

    Simplify the repository a lot:
    
    No repository transactions, no log-like appending, no append-only, no segments,
    just using a key/value store for the individual chunks.
    
    No locking yet.
    
    Also:
    
    mypy: ignore missing import
    there are no library stubs for borgstore yet, so mypy errors without that option.
    
    pyproject.toml: install borgstore directly from github
    There is no pypi release yet.
    
    use pip install -e . rather than python setup.py develop
    The latter is deprecated and had issues installing the "borgstore from github" dependency.
    ThomasWaldmann committed Aug 23, 2024
    Configuration menu
    Copy the full SHA
    d30d5f4 View commit details
    Browse the repository at this point in the history
  2. implement Repository3.check

    It uses xxh64 hashes of the meta and data parts to verify their validity.
    On a server with borg, this can be done server-side without the borg key.
    
    The new RepoObj header has meta_size, data_size, meta_hash and data_hash.
    ThomasWaldmann committed Aug 23, 2024
    Configuration menu
    Copy the full SHA
    d95cacd View commit details
    Browse the repository at this point in the history

Commits on Sep 7, 2024

  1. transfer: fix upgrades from borg 1.x by adding a --from-borg1 option

    borg transfer is primarily a general purpose archive transfer function
    from borg2 to related borg2 repos.
    
    but for upgrades from borg 1.x, we also need to support:
    - rcreate with a borg 1.x "other repo"
    - transfer with a borg 1.x "other repo"
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    c740fd7 View commit details
    Browse the repository at this point in the history
  2. locking3: store-based repo locking

    Features:
    - exclusive and non-exclusive locks
    - acquire timeout
    - lock auto-expiry (after 30mins of inactivity), lock refresh
    - use tz-aware datetimes (in utc timezone) in locks
    
    Also:
    - document lock acquisition rules in the src
    - increased default BORG_LOCK_WAIT to 10s
    - better document with-lock test
    
    Stale locks are ignored and automatically deleted.
    Default: stale == 30 Minutes old.
    
    lock.refresh() can be called frequently to avoid that an acquired lock becomes stale.
    It does not do much if the last real refresh was recently.
    After stale/2 time it checks and refreshes the locks in the store.
    
    Update the repository3 code to call refresh frequently:
    - get/put/list/scan
    - inside check loop
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    72d0cae View commit details
    Browse the repository at this point in the history
  3. manifest: store archives separately one-by-one into archives/*

    repository:
    - api/rpc support for get/put manifest
    - api/rpc support to access the store
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    8b9c052 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    b637542 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    c292ee2 View commit details
    Browse the repository at this point in the history
  6. compact: remove "borg compact", not needed any more

    All chunks are separate objects in borgstore.
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    8c2cbdb View commit details
    Browse the repository at this point in the history
  7. compact: reimplement "borg compact" as garbage collection

    It also outputs some statistics and warns about missing/reappeared chunks.
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    8ef5171 View commit details
    Browse the repository at this point in the history
  8. check: remove orphan chunks detection/cleanup

    This is now done in borg compact, so borg check does not need to care.
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    17ea118 View commit details
    Browse the repository at this point in the history
  9. delete: just remove archive from manifest, let borg compact clean up …

    …later.
    
    much faster and easier now, similar to what borg delete --force --force used to do.
    
    considering that speed, no need for checkpointing anymore.
    
    --stats does not work that way, thus it was removed. borg compact now shows some stats.
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    4c052cd View commit details
    Browse the repository at this point in the history
  10. remove LocalCache

    Note: this is the default cache implementation in borg 1.x,
    it worked well, but there were some issues:
    
    - if the local chunks cache got out of sync with the repository,
      it needed an expensive rebuild from the infos in all archives.
    - to optimize that, a local chunks.archive.d cache was used to
      speed that up, but at the price of quite significant space needs.
    
    AdhocCacheWithFiles replaced this with a non-persistent chunks cache,
    requesting all chunkids from the repository to initialize a simplified
    non-persistent chunks index, that does not do real refcounting and also
    initially does not have size information for pre-existing chunks.
    
    We want to move away from precise refcounting, LocalCache needs to die.
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    d6a70f4 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    7a93890 View commit details
    Browse the repository at this point in the history
  12. get rid of the CacheSynchronizer

    Lots of low-level code written back then to optimize runtime of some
    functions.
    
    We'll solve this differently by doing less stats, esp. if it is expensive to compute.
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    0306ba9 View commit details
    Browse the repository at this point in the history
  13. cache: replace .stats() by a dummy

    Dummy returns all-zero stats from that call.
    
    Problem was that these values can't be computed from the chunks cache
    anymore. No correct refcounts, often no size information.
    
    Also removed hashindex.ChunkIndex.summarize (previously used by the above mentioned
    .stats() call) and .stats_against (unused) for same reason.
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    fc6d459 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    dcde484 View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    d59306f View commit details
    Browse the repository at this point in the history
  16. blacken the code

    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    1231c96 View commit details
    Browse the repository at this point in the history
  17. make ruff happy

    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    3e7a4cd View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    cb9ff3b View commit details
    Browse the repository at this point in the history
  19. repository3.check: implement --repair

    Tests were a bit tricky as there is validation on 2 layers now:
    - repository3 does an xxh64 check, finds most corruptions already
    - on the archives level, borg also does an even stronger cryptographic check
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    bfbf3ba View commit details
    Browse the repository at this point in the history
  20. debug dump-repo-objs: remove --ghost

    This was used for an implementation detail of the borg 1.x
    repository code, dumping uncommitted objects. Not needed any more.
    
    Also remove local repository method scan_low_level, it was only used by --ghost.
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    1189fc3 View commit details
    Browse the repository at this point in the history
  21. repository/repository3: remove .scan method

    This was an implementation specific "in on-disk order" list method that made sense
    with borg 1.x log-like segment files only.
    
    But we now store objects separately, so there is no "in on-disk order" anymore.
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    60edc82 View commit details
    Browse the repository at this point in the history
  22. remove the repository.flags call / feature

    this heavily depended on having a repository index where the flags get stored.
    
    we don't have that with borgstore.
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    6605f58 View commit details
    Browse the repository at this point in the history
  23. cache: add log msg to _load_chunks_from_repo

    For big repos, this might take a while, so at least have messages on debug level.
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    68e64ad View commit details
    Browse the repository at this point in the history
  24. Configuration menu
    Copy the full SHA
    5c325e3 View commit details
    Browse the repository at this point in the history
  25. docs: update the repository filesystem docs

    In the end, it will all depend on the borgstore backend that will be used,
    so we better point to the borgstore project for details.
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    c2890ef View commit details
    Browse the repository at this point in the history
  26. remove archive checkpointing

    borg1 needed this due to its transactional / rollback behaviour:
    if there was uncommitted stuff in the repo, next repo opening automatically
    rolled back to last commit. thus we needed checkpoint archives to reference
    chunks and commit the repo.
    
    borg2 does not do that anymore, unused chunks are only removed when the
    user invokes borg compact.
    
    thus, if a borg create gets interrupted, the user can just run borg create
    again and it will find some chunks are already in the repo, making progress
    even if borg create gets frequently interrupted.
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    5e3f2c0 View commit details
    Browse the repository at this point in the history
  27. remove Repository3.commit

    didn't do anything anyway in this implementation.
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    e23231b View commit details
    Browse the repository at this point in the history
  28. Configuration menu
    Copy the full SHA
    d9f24de View commit details
    Browse the repository at this point in the history
  29. Configuration menu
    Copy the full SHA
    2be98c7 View commit details
    Browse the repository at this point in the history
  30. debug: remove refcount-obj command

    borg doesn't do precise refcounting anymore, so this is pretty useless.
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    20c180c View commit details
    Browse the repository at this point in the history
  31. Configuration menu
    Copy the full SHA
    c5023da View commit details
    Browse the repository at this point in the history
  32. parseformat: remove dsize and unique_chunks placeholder

    We don't have precise refcounts, thus we can't compute these.
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    0b85b1a View commit details
    Browse the repository at this point in the history
  33. info: do not output deduplicated_size

    No precise refcounting, can't compute that inexpensively.
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    8455c95 View commit details
    Browse the repository at this point in the history
  34. rcompress: fix help and comments

    no "on-disk order" anymore.
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    15e759c View commit details
    Browse the repository at this point in the history
  35. Configuration menu
    Copy the full SHA
    84bd2b2 View commit details
    Browse the repository at this point in the history
  36. refactor: rename repository/locking classes/modules

    Repository -> LegacyRepository
    RemoteRepository -> LegacyRemoteRepository
    borg.repository -> borg.legacyrepository
    borg.remote -> borg.legacyremote
    
    Repository3 -> Repository
    RemoteRepository3 -> RemoteRepository
    borg.repository3 -> borg.repository
    borg.remote3 -> borg.remote
    
    borg.locking -> borg.fslocking
    borg.locking3 -> borg.storelocking
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    05739aa View commit details
    Browse the repository at this point in the history
  37. Configuration menu
    Copy the full SHA
    7714b65 View commit details
    Browse the repository at this point in the history
  38. Configuration menu
    Copy the full SHA
    ec8a127 View commit details
    Browse the repository at this point in the history
  39. Configuration menu
    Copy the full SHA
    22b68b0 View commit details
    Browse the repository at this point in the history
  40. Configuration menu
    Copy the full SHA
    3408e94 View commit details
    Browse the repository at this point in the history
  41. Configuration menu
    Copy the full SHA
    1a382a8 View commit details
    Browse the repository at this point in the history
  42. Configuration menu
    Copy the full SHA
    a15cd1e View commit details
    Browse the repository at this point in the history
  43. Repository.list: return [(id, stored_size), ...]

    Note: LegacyRepository still returns [id, ...] and so does RemoteRepository.list,
    if the remote repo is a LegacyRepository.
    
    also: use LIST_SCAN_LIMIT
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    c67cf07 View commit details
    Browse the repository at this point in the history
  44. compact: better stats

    - compression factor
    - dedup factor
    - repo size
    
    All values are approx. values without considering overheads.
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    ec1d89f View commit details
    Browse the repository at this point in the history
  45. blacken the code

    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    a40978a View commit details
    Browse the repository at this point in the history
  46. upgrade black to 24.x

    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    5726890 View commit details
    Browse the repository at this point in the history
  47. Configuration menu
    Copy the full SHA
    d27b7a7 View commit details
    Browse the repository at this point in the history
  48. cache/hashindex: remove decref method, don't try to remove chunks on …

    …exceptions
    
    When the AdhocCache(WithFiles) queries chunk IDs from the repo to build the chunks
    index, it won't know their refcount and thus all chunks in the index have their
    refcount at the MAX_VALUE (representing "infinite") and that would never decrease
    nor could that ever reach zero and get the chunk deleted from the repo.
    
    Only completely new chunks first written in the current borg run have a valid
    refcount.
    
    In some exception handlers, borg tried to clean up chunks that won't be used
    by an item by decref'ing them. That is either:
    - pointless due to refcount being at MAX_VALUE
    - inefficient, because the user might retry the backup and would need to
      transmit these chunks to the repo again.
    
    We'll just rely on borg compact ONLY to clean up any unused/orphan chunks.
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    ef47666 View commit details
    Browse the repository at this point in the history
  49. ArchiveChecker.verify_data: simplify / optimize

    .init_chunks has just built self.chunks using repository.list(), so don't
    call that again, but just iterate over self.chunks.
    
    also some other changes, making the code much simpler.
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    bafbf62 View commit details
    Browse the repository at this point in the history
  50. ArchiveChecker: remove unused possibly_superseded code

    We don't care about unused or superseded repo objects any more here,
    borg compact will deal with them.
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    266e6ca View commit details
    Browse the repository at this point in the history
  51. Configuration menu
    Copy the full SHA
    e9c42a7 View commit details
    Browse the repository at this point in the history
  52. ArchiveChecker: don't do precise refcounting here

    That's the job of borg compact and not needed inside borg check.
    check only needs to know if a chunk is present in the repo.
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    f9d2e68 View commit details
    Browse the repository at this point in the history
  53. cache: renamed .chunk_incref -> .reuse_chunk, boolean .seen_chunk

    reuse_chunk is the complement of add_chunk for already existing chunks.
    
    It doesn't do refcounting anymore.
    
    .seen_chunk does not return the refcount anymore, but just whether the chunk exists.
    
    If we add a new chunk, it immediately sets its refcount to MAX_VALUE, so
    there is no difference anymore between previously existing chunks and new
    chunks added. This makes the stats even more useless, but we have less complexity.
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    ccc84c7 View commit details
    Browse the repository at this point in the history
  54. Configuration menu
    Copy the full SHA
    ddf6812 View commit details
    Browse the repository at this point in the history
  55. ChunkIndex: remove unused .merge method

    LocalCache used this to assemble a new overall chunks index from multiple
    chunks.archive.d's single-archive chunks indexes.
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    15c7039 View commit details
    Browse the repository at this point in the history
  56. Configuration menu
    Copy the full SHA
    07ab6e0 View commit details
    Browse the repository at this point in the history
  57. Configuration menu
    Copy the full SHA
    e2aa9d5 View commit details
    Browse the repository at this point in the history
  58. Configuration menu
    Copy the full SHA
    551834a View commit details
    Browse the repository at this point in the history
  59. Configuration menu
    Copy the full SHA
    86dc673 View commit details
    Browse the repository at this point in the history
  60. Configuration menu
    Copy the full SHA
    dc9fff9 View commit details
    Browse the repository at this point in the history
  61. with-lock: refresh repo lock while subprocess is running, fixes borgb…

    …ackup#8347
    
    otherwise the lock might become stale and could get
    killed by any other borg process.
    
    note: ThreadRunner class written by PyCharm AI and
    only needed small enhancements. nice.
    ThomasWaldmann committed Sep 7, 2024
    Configuration menu
    Copy the full SHA
    60a592d View commit details
    Browse the repository at this point in the history
  62. Configuration menu
    Copy the full SHA
    7bf0f47 View commit details
    Browse the repository at this point in the history
  63. Configuration menu
    Copy the full SHA
    1cd2f4d View commit details
    Browse the repository at this point in the history
  64. Configuration menu
    Copy the full SHA
    b14c050 View commit details
    Browse the repository at this point in the history

Commits on Sep 8, 2024

  1. Configuration menu
    Copy the full SHA
    ace97fa View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    60e88ef View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    b82ced2 View commit details
    Browse the repository at this point in the history
  4. manifest.archives: refactor api

    Archives was built with a dictionary-like api,
    but in future we want to go away from a
    read-modify-write archives list.
    ThomasWaldmann committed Sep 8, 2024
    Configuration menu
    Copy the full SHA
    b56c81b View commit details
    Browse the repository at this point in the history
  5. manifest: no read-modify-write for borgstore archives list

    previously, borg always read all archives entries, modified the
    list in memory, wrote back to the repository (similar as borg 1.x
    did).
    
    now borg works directly with archives/* in the borgstore.
    ThomasWaldmann committed Sep 8, 2024
    Configuration menu
    Copy the full SHA
    ef7dd76 View commit details
    Browse the repository at this point in the history
  6. check: only write to repo if --repair is given

    old borg just didn't commit the transaction and
    thus caused a transaction rollback if not in
    repair mode.
    
    we can't do that anymore, thus we must avoid
    modifying the repo if not in repair mode.
    ThomasWaldmann committed Sep 8, 2024
    Configuration menu
    Copy the full SHA
    8412168 View commit details
    Browse the repository at this point in the history
  7. shared locking for many borg commands

    not for check and compact, these need an exclusive lock.
    
    to try parallel repo access on same machine, same user,
    one needs to use a non-locking cache implementation:
    
    export BORG_CACHE_IMPL=adhoc
    
    this is slow due the missing files cache in that implementation,
    but unproblematic because no caches/indexes are persisted.
    ThomasWaldmann committed Sep 8, 2024
    Configuration menu
    Copy the full SHA
    0e183b2 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    a509a0c View commit details
    Browse the repository at this point in the history
  9. check: do not create addtl. archives dir entries if we already have one

    if the manifest file is missing, check generated *.1 *.2 ... archives although an entry for the correct name and id was already
    present. BUG!
    
    this is because if the manifest is lost, that does not imply
    anymore that the complete archives directory is also lost, as it
    did in borg 1.x.
    
    Also improved log messages a bit.
    ThomasWaldmann committed Sep 8, 2024
    Configuration menu
    Copy the full SHA
    bc1f90b View commit details
    Browse the repository at this point in the history
  10. check --repair --undelete-archives: bring archives back from the dead

    borg delete and borg prune do a quick and dirty archive deletion,
    just removing the archives directory entry for them.
    
    --undelete-archives can still find the archive metadata objects
    by completely scanning the repository and re-create missing
    archives directory entries.
    
    but only until borg compact would remove all unused data.
    
    if only the manifest is missing or corrupted, do not run that
    scan, it is not required for the manifest anymore.
    ThomasWaldmann committed Sep 8, 2024
    Configuration menu
    Copy the full SHA
    682aedb View commit details
    Browse the repository at this point in the history
  11. update CHANGES

    ThomasWaldmann committed Sep 8, 2024
    Configuration menu
    Copy the full SHA
    7442cbf View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    b50ed04 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    3794e32 View commit details
    Browse the repository at this point in the history