Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

borg2: borg {compact,create,..} fails with "Failed to release the lock" #8390

Open
mirko opened this issue Sep 18, 2024 · 22 comments · Fixed by #8394
Open

borg2: borg {compact,create,..} fails with "Failed to release the lock" #8390

mirko opened this issue Sep 18, 2024 · 22 comments · Fixed by #8394
Labels
Milestone

Comments

@mirko
Copy link

mirko commented Sep 18, 2024

Have you checked borgbackup docs, FAQ, and open GitHub issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

BUG / QUESTION

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

borg2 2.0.0b11.dev3+gc278a8e8.d20240915 with latest master/HEAD of borgstore

Full borg commandline that lead to the problem (leave away excludes and passwords)

  --- EXEC: borg -r sftp://HOST//storage/backup_mirko/borg2 create [..]
[..]
d rootfs
d .
terminating with success status, rc 0
  --- EXEC: borg -r sftp://HOST//storage/backup_mirko/borg2 prune -v --list --keep-daily=30 --keep-weekly= --keep-monthly=1 --keep-yearly=1
usage: borg prune [-h] [--critical] [--error] [--warning] [--info] [--debug] [--debug-topic TOPIC] [-p] [--iec] [--log-json]
                  [--lock-wait SECONDS] [--show-version] [--show-rc] [--umask M] [--remote-path PATH] [--upload-ratelimit RATE]
                  [--upload-buffer UPLOAD_BUFFER] [--debug-profile FILE] [--rsh RSH] [--socket [PATH]] [-r REPO] [-n] [--list]
                  [--short] [--list-pruned] [--list-kept] [--format FORMAT] [--keep-within INTERVAL] [--keep-last SECONDLY]
                  [--keep-minutely MINUTELY] [-H HOURLY] [-d DAILY] [-w WEEKLY] [-m MONTHLY] [-y YEARLY] [-a PATTERN]
                  [--oldest TIMESPAN | --newest TIMESPAN] [--older TIMESPAN | --newer TIMESPAN]
borg prune: error: argument -w/--keep-weekly: invalid int value: ''
  --- EXEC: borg -r sftp://HOST//storage/backup_mirko/borg2 compact
Failed to release the lock <Store(url='sftp://HOST//storage/backup_mirko/borg2', levels=[('config/', [0]), ('data/', [2])])> (was not locked).

Error:

NotLocked: Failed to release the lock <Store(url='sftp://HOST//storage/backup_mirko/borg2', levels=[('config/', [0]), ('data/', [2])])> (was not locked).

Describe the problem you're observing.

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

I scripted around borg (without checking exit codes) and had a typo in the borg prune line, causing it to fail due to typo in the passed args.

Next command to be executed was borg compact which then surprisingly fails with error:

NotLocked: Failed to release the lock <Store(url='sftp://HOST//storage/backup_mirko/borg2', levels=[('config/', [0]), ('data/', [2])])> (was not locked).

Not sure what's going on here - borg compact failing to release a lock feels rather strange.

Include any warning/errors/backtraces from the system logs

  --- EXEC: borg -r sftp://HOST//storage/backup_mirko/borg2 prune -v --list --keep-daily=30 --keep-weekly= --keep-monthly=1 --keep-yearly=1
usage: borg prune [-h] [--critical] [--error] [--warning] [--info] [--debug] [--debug-topic TOPIC] [-p] [--iec] [--log-json]
                  [--lock-wait SECONDS] [--show-version] [--show-rc] [--umask M] [--remote-path PATH] [--upload-ratelimit RATE]
                  [--upload-buffer UPLOAD_BUFFER] [--debug-profile FILE] [--rsh RSH] [--socket [PATH]] [-r REPO] [-n] [--list]
                  [--short] [--list-pruned] [--list-kept] [--format FORMAT] [--keep-within INTERVAL] [--keep-last SECONDLY]
                  [--keep-minutely MINUTELY] [-H HOURLY] [-d DAILY] [-w WEEKLY] [-m MONTHLY] [-y YEARLY] [-a PATTERN]
                  [--oldest TIMESPAN | --newest TIMESPAN] [--older TIMESPAN | --newer TIMESPAN]
borg prune: error: argument -w/--keep-weekly: invalid int value: ''
  --- EXEC: borg -r sftp://HOST//storage/backup_mirko/borg2 compact
Failed to release the lock <Store(url='sftp://HOST//storage/backup_mirko/borg2', levels=[('config/', [0]), ('data/', [2])])> (was not locked).

Error:

NotLocked: Failed to release the lock <Store(url='sftp://HOST//storage/backup_mirko/borg2', levels=[('config/', [0]), ('data/', [2])])> (was not locked).

If reporting bugs, please include the following:

Traceback (most recent call last):
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/_common.py", line 176, in wrapper
    return method(self, args, repository=repository, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/compact_cmd.py", line 156, in do_compact
    ArchiveGarbageCollector(repository, manifest).garbage_collect()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/compact_cmd.py", line 40, in garbage_collect
    self.repository_chunks = self.get_repository_chunks()
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/compact_cmd.py", line 53, in get_repository_chunks
    result = self.repository.list(limit=LIST_SCAN_LIMIT, marker=marker)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/repository.py", line 339, in list
    self._lock_refresh()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/repository.py", line 163, in _lock_refresh
    self.lock.refresh()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/storelocking.py", line 237, in refresh
    raise LockTimeout(str(self.store))  # our lock was killed, there is no safe way to continue.
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
borg.storelocking.LockTimeout: Failed to create/acquire the lock <Store(url='sftp://HOST//storage/backup_mirko/borg2', levels=[('config/', [0]), ('data/', [2])])> (timeout).

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/__init__.py", line 622, in main
    exit_code = archiver.run(args)
                ^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/__init__.py", line 516, in run
    rc = func(args)
         ^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/_common.py", line 151, in wrapper
    with repository:
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/repository.py", line 138, in __exit__
    self.close()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/repository.py", line 204, in close
    self.lock.release()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/storelocking.py", line 194, in release
    raise NotLocked(str(self.store))
borg.storelocking.NotLocked: Failed to release the lock <Store(url='sftp://HOST//storage/backup_mirko/borg2', levels=[('config/', [0]), ('data/', [2])])> (was not locked).

Platform: Linux thar 6.10.9-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.10.9-1 (2024-09-08) x86_64
Linux: Unknown Linux
Borg: 2.0.0b11.dev3+gc278a8e8.d20240915  Python: CPython 3.11.9 msgpack: 1.1.0 fuse: None [pyfuse3,llfuse]
PID: 356139  CWD: /tmp/.lxc-backups/gsmk
sys.argv: ['/usr/local/bin/borg', '-r', 'sftp://HOST//storage/backup_mirko/borg2', 'compact']
SSH_ORIGINAL_COMMAND: None
@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Sep 18, 2024

The main issue here seems to be this happening while it was doing repository.list():

    raise LockTimeout(str(self.store))  # our lock was killed, there is no safe way to continue.

That other locking error is just something triggered by that, when it __exit__s the context manager.

So, it HAD the lock (and regularly refreshed it while doing repository.list()), but noticed that the lock vanished.

So, did you use break-lock in parallel?

Did your machine go to sleep while doing repository.list?

@mirko
Copy link
Author

mirko commented Sep 18, 2024

The main issue here seems to be this happening while it was doing repository.list():

    raise LockTimeout(str(self.store))  # our lock was killed, there is no safe way to continue.

That other locking error is just something triggered by that, when it __exit__s the context manager.

So, it HAD the lock (and regularly refreshed it while doing repository.list()), but noticed that the lock vanished.

So, did you use break-lock in parallel?

Did your machine go to sleep while doing repository.list?

Nope and nope.

Thinking about it, though, another possibility is that at that moment I had 2 instances of borg run. Though, definitely into different repos (one SFTP, the other local). Is that something I shouldn't do? From the error the lock appears to be repo specific and not global.

@ThomasWaldmann
Copy link
Member

borgs dealing with different repos don't influence each other (except maybe needed ressources, cpu, ram, i/o). with current borg2 a lot even works in parallel using the same repo.

i'll have a look at the locking code later, the storelocking is rather fresh code still, maybe there is a bug.

@ThomasWaldmann ThomasWaldmann added this to the 2.0.0b11 milestone Sep 18, 2024
@mirko
Copy link
Author

mirko commented Sep 18, 2024

OK, it's at least not about the failing borg prune line, neither it's about borg compact:

BACKUP CONTAINER: android
>>> CONTAINER: (android) - METHOD: borg2
  --- SRC_DIR:  /tmp/.lxc-backups/android
  --- DST_URL:  sftp://HOST//storage/backup_mirko/borg2
  --- ARCHIVE:  vm:android_2024-09-18T15:07:34+02:00
  --- EXEC: borg -r sftp://HOST//storage/backup_mirko/borg2 prune -v --list --keep-last=14 --keep-within=2w --keep-daily=30 --keep-weekly=8 --keep-monthly=1 --keep-yearly=1 -a sh:vm:android_* --dry-run
Connected (version 2.0, client dropbear)
Authentication (publickey) successful!
[chan 0] Opened sftp connection (server version 3)
[chan 0] sftp session closed.
  --- EXEC: borg -r sftp://HOST//storage/backup_mirko/borg2 create --show-rc vm:android_2024-09-18T15:07:34+02:00 . 
Failed to release the lock <Store(url='sftp://HOST//storage/backup_mirko/borg2', levels=[('config/', [0]), ('data/', [2])])> (was not locked).

Error:

NotLocked: Failed to release the lock <Store(url='sftp://HOST//storage/backup_mirko/borg2', levels=[('config/', [0]), ('data/', [2])])> (was not locked).

If reporting bugs, please include the following:

Traceback (most recent call last):
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/_common.py", line 176, in wrapper
    return method(self, args, repository=repository, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/create_cmd.py", line 268, in do_create
    create_inner(archive, cache, fso)
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/create_cmd.py", line 169, in create_inner
    self._rec_walk(
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/create_cmd.py", line 527, in _rec_walk
    self._rec_walk(
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/create_cmd.py", line 527, in _rec_walk
    self._rec_walk(
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/create_cmd.py", line 467, in _rec_walk
    status = self._process_any(
             ^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/create_cmd.py", line 284, in _process_any
    return fso.process_file(
           ^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archive.py", line 1359, in process_file
    if not cache.seen_chunk(chunk.id):
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/cache.py", line 614, in seen_chunk
    entry = self.chunks.get(id, ChunkIndexEntry(0, None))
            ^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/cache.py", line 610, in chunks
    self._chunks = build_chunkindex_from_repo(self.repository)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/cache.py", line 576, in build_chunkindex_from_repo
    result = repository.list(limit=LIST_SCAN_LIMIT, marker=marker)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/repository.py", line 339, in list
    self._lock_refresh()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/repository.py", line 163, in _lock_refresh
    self.lock.refresh()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/storelocking.py", line 237, in refresh
    raise LockTimeout(str(self.store))  # our lock was killed, there is no safe way to continue.
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
borg.storelocking.LockTimeout: Failed to create/acquire the lock <Store(url='sftp://HOST//storage/backup_mirko/borg2', levels=[('config/', [0]), ('data/', [2])])> (timeout).

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/__init__.py", line 622, in main
    exit_code = archiver.run(args)
                ^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/__init__.py", line 516, in run
    rc = func(args)
         ^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/_common.py", line 151, in wrapper
    with repository:
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/repository.py", line 138, in __exit__
    self.close()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/repository.py", line 204, in close
    self.lock.release()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/storelocking.py", line 194, in release
    raise NotLocked(str(self.store))
borg.storelocking.NotLocked: Failed to release the lock <Store(url='sftp://HOST//storage/backup_mirko/borg2', levels=[('config/', [0]), ('data/', [2])])> (was not locked).

Platform: Linux thar 6.10.9-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.10.9-1 (2024-09-08) x86_64
Linux: Unknown Linux
Borg: 2.0.0b11.dev3+gc278a8e8.d20240915  Python: CPython 3.11.9 msgpack: 1.1.0 fuse: None [pyfuse3,llfuse]
PID: 466264  CWD: /tmp/.lxc-backups/android
sys.argv: ['/usr/local/bin/borg', '-r', 'sftp://HOST//storage/backup_mirko/borg2', 'create', '--show-rc', 'vm:android_2024-09-18T15:07:34+02:00', '.']
SSH_ORIGINAL_COMMAND: None


terminating with error status, rc 74

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Sep 18, 2024

Can you try if this fixes it?

diff --git a/src/borg/repository.py b/src/borg/repository.py
index 23c3b150..66db295c 100644
--- a/src/borg/repository.py
+++ b/src/borg/repository.py
@@ -336,11 +336,11 @@ def list(self, limit=None, marker=None):
         list <limit> infos starting from after id <marker>.
         each info is a tuple (id, storage_size).
         """
-        self._lock_refresh()
         collect = True if marker is None else False
         result = []
         infos = self.store.list("data")  # generator yielding ItemInfos
         while True:
+            self._lock_refresh()
             try:
                 info = next(infos)
             except StoreObjectNotFound:

I suspect if the backend is rather slow, the lock might time out while fetching LIST_SCAN_LIMIT (100000) items from store.list(). That must never happen, because in that case, any borg might kill that lock.

@ThomasWaldmann ThomasWaldmann changed the title After failing "borg prune", "borg compact" also fails with "Failed to release the lock" with SFTP backend borg2 sftp: "borg compact" fails with "Failed to release the lock" Sep 18, 2024
@mirko
Copy link
Author

mirko commented Sep 19, 2024

Backup completed over night without any error incl. above proposed change. However performance unfortunately is devastating.
Backing up 64GB of data took almost 8h. No compression options passed, so defaults. This might very well be paramiko, but I also I noticed borg exhausting my CPU when I was checking occasionally.
Transferring data via e.g. dd if=/dev/zero bs=64M count=1024 | ssh HOST "( dd of=/storage/backup_mirko/borg2/NULL.img bs=64M oflag=direct )" passes through at ~100Mbps speed.

@ThomasWaldmann
Copy link
Member

Oh, that's slow.

Maybe add the sftp related insights there: borgbackup/borgstore#44

Also add conn speed and the roundtrip time on that connection (ping) there.

Default compression is lz4.

Exhausting CPU sounds interesting, can you determine with what?
I'ld have rather expected it idling most of the time if the store is so slow.

@ThomasWaldmann
Copy link
Member

You could also try to do a 64MiB sftp upload on that connection for comparison (not: ssh pipe, that works differently).

@mirko
Copy link
Author

mirko commented Sep 19, 2024

First back to original issue - I hit it again, this time with the local file backend:

>>> ROOT - METHOD: borg2
  --- SRC_DIR:  /
  --- DST_URL:  /mnt/borg2/
  --- ARCHIVE:  root_2024-09-19T09:01:22+02:00
  --- EXEC: borg -r /mnt/borg2/ prune -v --list --keep-last=14 --keep-within=2w --keep-daily=30 --keep-weekly=8 --keep-monthly=1 --keep-yearly=1 -a 'sh:root_*' --dry-run
  --- EXEC: borg -r /mnt/borg2/ create --show-rc 'sys:root_2024-09-19T09:01:22+02:00' /
/home/mirko/.cache/mozilla/firefox/oxpi9a9p.XXXX/cache2/entries/4826AC1B36DDDE16CF4CEABA5279381E0DE30B1A: stat: [Errno 2] No such file or directory: '4826AC1B36DDDE16CF4CEABA5279381E0DE30B1A'
Failed to release the lock <Store(url='file:///mnt/borg2', levels=[('config/', [0]), ('data/', [2])])> (was not locked).

Error:

NotLocked: Failed to release the lock <Store(url='file:///mnt/borg2', levels=[('config/', [0]), ('data/', [2])])> (was not locked).

If reporting bugs, please include the following:

Traceback (most recent call last):
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/_common.py", line 176, in wrapper
    return method(self, args, repository=repository, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/create_cmd.py", line 268, in do_create
    create_inner(archive, cache, fso)
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/create_cmd.py", line 169, in create_inner
    self._rec_walk(
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/create_cmd.py", line 527, in _rec_walk
    self._rec_walk(
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/create_cmd.py", line 527, in _rec_walk
    self._rec_walk(
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/create_cmd.py", line 527, in _rec_walk
    self._rec_walk(
  [Previous line repeated 6 more times]
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/create_cmd.py", line 467, in _rec_walk
    status = self._process_any(
             ^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/create_cmd.py", line 284, in _process_any
    return fso.process_file(
           ^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archive.py", line 1377, in process_file
    self.process_file_chunks(
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archive.py", line 1139, in process_file_chunks
    chunk_entry = chunk_processor(chunk)
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archive.py", line 1129, in chunk_processor
    chunk_entry = cache.add_chunk(chunk_id, {}, data, stats=stats, wait=False, ro_type=ROBJ_FILE_STREAM)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/cache.py", line 654, in add_chunk
    self.repository.put(id, cdata, wait=wait)
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/repository.py", line 400, in put
    self._lock_refresh()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/repository.py", line 163, in _lock_refresh
    self.lock.refresh()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/storelocking.py", line 237, in refresh
    raise LockTimeout(str(self.store))  # our lock was killed, there is no safe way to continue.
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
borg.storelocking.LockTimeout: Failed to create/acquire the lock <Store(url='file:///mnt/borg2', levels=[('config/', [0]), ('data/', [2])])> (timeout).

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/__init__.py", line 622, in main
    exit_code = archiver.run(args)
                ^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/__init__.py", line 516, in run
    rc = func(args)
         ^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/_common.py", line 151, in wrapper
    with repository:
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/repository.py", line 138, in __exit__
    self.close()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/repository.py", line 204, in close
    self.lock.release()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/storelocking.py", line 194, in release
    raise NotLocked(str(self.store))
borg.storelocking.NotLocked: Failed to release the lock <Store(url='file:///mnt/borg2', levels=[('config/', [0]), ('data/', [2])])> (was not locked).

Platform: Linux thar 6.10.9-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.10.9-1 (2024-09-08) x86_64
Linux: Unknown Linux
Borg: 2.0.0b11.dev3+gc278a8e8.d20240915  Python: CPython 3.11.9 msgpack: 1.1.0 fuse: None [pyfuse3,llfuse]
PID: 896117  CWD: /home/mirko
sys.argv: ['/usr/local/bin/borg', '-r', '/mnt/borg2/', 'create', '--show-rc', 'sys:root_2024-09-19T09:01:22+02:00', '/']
SSH_ORIGINAL_COMMAND: None


terminating with error status, rc 74

@mirko mirko changed the title borg2 sftp: "borg compact" fails with "Failed to release the lock" borg2: borg {compact,create,..] fails with "Failed to release the lock" Sep 19, 2024
@mirko mirko changed the title borg2: borg {compact,create,..] fails with "Failed to release the lock" borg2: borg {compact,create,..} fails with "Failed to release the lock" Sep 19, 2024
@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Sep 19, 2024

That's unexpected. But maybe update your local master branch, so that my recent repository.list fix gets in there.

Can you say how long it ran before crashing within that borg invocation?

@mirko
Copy link
Author

mirko commented Sep 19, 2024

That's unexpected. But maybe update your local master branch, so that my recent repository.list fix gets in there.

2a20ebe does not seem to differ to what you proposed earlier in this ticket and what I applied (and still is applied) manually

Can you say how long it ran before crashing within that borg invocation?

real 67m17.522s
user 42m30.524s
sys 7m44.778s

@mirko
Copy link
Author

mirko commented Sep 19, 2024

You could also try to do a 64MiB sftp upload on that connection for comparison (not: ssh pipe, that works differently).

11.9MB/s via sftp put

@ThomasWaldmann
Copy link
Member

#8396 should help debugging the locking issue, use --debug.

@mirko
Copy link
Author

mirko commented Sep 19, 2024

#8396 should help debugging the locking issue, use --debug.

Ongoing, what I also noticed when using the SFTP backend and now seeing with --debug enabled - that literally hours are spent on listdir():

[..]
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/ce')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/cf')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/d0')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/d1')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/d2')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/d3')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/d4')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/d5')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/d6')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/d7')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/d8')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/d9')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/da')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/db')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/dc')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/dd')
[chan 0] listdir(b'/storage/backup_mirko/borg2/data/0d/de')
[..]

That at least explains why locks could timeout when not refreshed within respective loop.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Sep 19, 2024

@mirko that's when it queries all the chunk ids from remote. there is no cache for that yet.

it distributes the objects using 2 levels of directories (e.g. 12/34/ contains all chunks that have an ID starting with 1234), so that is up to 65536+256+1 dirs.

could be configured to only use 1 level of directories (that would be 256+1 dirs), but that would potentially put a lot of objects into a single directory, so i thought 2 levels are more appropriate for that kind of storage.

@mirko
Copy link
Author

mirko commented Sep 19, 2024

@mirko that's when it queries all the chunk ids from remote. there is no cache for that yet.

it distributes the objects using 2 levels of directories (e.g. 12/34/ contains all chunks that have an ID starting with 1234), so that is up to 65536+256+1 dirs.

could be configured to only use 1 level of directories (that would be 256+1 dirs), but that would potentially put a lot of objects into a single directory, so i thought 2 levels are more appropriate for that kind of storage.

Fair enough - it's still listing, though - and it already has been 2h+ when I wrote the last comment. Mind, we're talking about a repo on an SFTP backend, which for some reason appears to be really slow already (with paramiko and/or borg).

So we're now at 6h+ and the backup didn't even start :)

EDIT:

[chan 0] listdir(b'/storage/backup_mirko/borg2/data/ff/ff')
queried 1679538 chunk IDs in 25815.41996549 s (18 requests), ~2.21 kB/s

Now borg is doing something on 100% CPU for a while already without printing any (debug-)statement.

EDIT2: Started backing up next VM into the same repo. Again, going through that listdir()-loop. Is this happening for every borg create command? And if so, previous round took 7+ hours at a speed of 2.2 kB/s. Suboptimal at least.

@ThomasWaldmann
Copy link
Member

Yeah, that is unbearable. Yes, it currently always ad-hoc builds the chunks index only in memory (after querying all object ids from repo), no local cache.

I have some ideas how to optimize this, will do after finishing the files cache work.

@mirko
Copy link
Author

mirko commented Sep 20, 2024

  --- SRC_DIR:  /tmp/.lxc-backups/VM
  --- DST_URL:  /mnt/borg2
  --- ARCHIVE:  vm:VM_2024-09-20T02:19:32+02:00
  --- EXEC: borg -r /mnt/borg2 prune -v --list --keep-last=14 --keep-within=2w --keep-daily=30 --keep-weekly=8 --keep-monthly=1 --keep-yearly=1 -a 'sh:vm:VM_*' --dry-run
Keeping archive (rule: within #1):           vm:VM_2024-09-18T12:52:56+02:00    Wed, 2024-09-18 21:03:12 +0200 [fe86b11528cbb7bd79e0e03b95c8ae1ec064bc72ed0a392f4e94342ec1da76ec]
  --- EXEC: borg --debug -r /mnt/borg2 create --show-rc 'vm:VM_2024-09-20T02:19:32+02:00' .
using builtin fallback logging configuration
19 self tests completed in 0.05 seconds
LOCK-CREATE: creating lock. key: e6883cbecdc69dbe lock: {'exclusive': False, 'hostid': 'thar@92448960952742', 'processid': 1106305, 'threadid': 0, 'time': '2024-09-20T01:35:04.030+00:00'}
security: read previous location '/mnt/borg2'
security: read manifest timestamp '2024-09-20T01:35:00.750948+00:00'
security: determined newest manifest timestamp as 2024-09-20T01:35:00.750948+00:00
security: repository checks ok, allowing access
Creating archive at "/mnt/borg2"
Using /root/.cache/borg/9aa48dc40275451b76cea15a6540dc6701e9b3427aafa37e09b0fcd8becf6833 as cache
security: read previous location '/mnt/borg2'
security: read manifest timestamp '2024-09-20T01:35:00.750948+00:00'
security: determined newest manifest timestamp as 2024-09-20T01:35:00.750948+00:00
security: repository checks ok, allowing access
Processing files ...
Reading files cache ...
Verified integrity of /root/.cache/borg/9aa48dc40275451b76cea15a6540dc6701e9b3427aafa37e09b0fcd8becf6833/files
querying the chunk IDs list from the repo...
queried 3531510 chunk IDs in 785.4829800549778 s (37 requests), ~152.86 kB/s
writing item metadata stream chunk d494fab4cb6661aefffde0440c97bd9b5ee6355f5f1eb84c45490cf747e53f78
writing item metadata stream chunk 06a0ab67456a3ad4712d94c0f7935c4c0f3e4175faf221586b089861c4f2f84d
writing item metadata stream chunk 3de8d4f0f08963bb6cb0de74f4cf5c9589b5f0b092c820731d85862bed2555a3
[..]
writing item_ptrs chunk fa2269ac5f0706c473c42508837693cc79b402fe461bd7a61e661acf9de44bb1
writing item_ptrs chunk f64fb9292138782c10bf59e833bdfc982a573fee630b665fcd637ebe9791df3f
writing item_ptrs chunk 413324e0c9367dea5aace1418bd6c18f560d6aa93db2d5d143eabf06a1fb9f60
LOCK-STALE: lock is too old, it was not refreshed. lock: {'exclusive': False, 'hostid': 'thar@92448960952742', 'processid': 1106305, 'threadid': 0, 'time': '2024-09-20T01:35:04.030+00:00', 'key': 'e6883cbecdc69dbe', 'dt': datetime.datetime(2024, 9, 20, 1, 35, 4, 30000, tzinfo=datetime.timezone.utc)}
LOCK-DELETE: key: e6883cbecdc69dbe
LOCK-REFRESH: our lock was killed, there is no safe way to continue.
security: saving state for 9aa48dc40275451b76cea15a6540dc6701e9b3427aafa37e09b0fcd8becf6833 to /root/.local/share/borg/security/9aa48dc40275451b76cea15a6540dc6701e9b3427aafa37e09b0fcd8becf6833
security: current location   /mnt/borg2
security: key type           64
security: manifest timestamp 2024-09-20T01:35:00.750948+00:00
Failed to release the lock <Store(url='file:///mnt/borg2', levels=[('config/', [0]), ('data/', [2])])> (was not locked).

Error:

NotLocked: Failed to release the lock <Store(url='file:///mnt/borg2', levels=[('config/', [0]), ('data/', [2])])> (was not locked).

If reporting bugs, please include the following:

Traceback (most recent call last):
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/_common.py", line 176, in wrapper
    return method(self, args, repository=repository, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/create_cmd.py", line 268, in do_create
    create_inner(archive, cache, fso)
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/create_cmd.py", line 202, in create_inner
    archive.save(comment=args.comment, timestamp=args.timestamp)
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archive.py", line 650, in save
    self.cache.add_chunk(self.id, {}, data, stats=self.stats, ro_type=ROBJ_ARCHIVE_META)
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/cache.py", line 654, in add_chunk
    self.repository.put(id, cdata, wait=wait)
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/repository.py", line 400, in put
    self._lock_refresh()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/repository.py", line 163, in _lock_refresh
    self.lock.refresh()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/storelocking.py", line 242, in refresh
    raise LockTimeout(str(self.store))
borg.storelocking.LockTimeout: Failed to create/acquire the lock <Store(url='file:///mnt/borg2', levels=[('config/', [0]), ('data/', [2])])> (timeout).

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/__init__.py", line 622, in main
    exit_code = archiver.run(args)
                ^^^^^^^^^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/__init__.py", line 516, in run
    rc = func(args)
         ^^^^^^^^^^
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/archiver/_common.py", line 151, in wrapper
    with repository:
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/repository.py", line 138, in __exit__
    self.close()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/repository.py", line 204, in close
    self.lock.release()
  File "/opt/pipx/venvs/borgbackup/lib/python3.11/site-packages/borg/storelocking.py", line 196, in release
    raise NotLocked(str(self.store))
borg.storelocking.NotLocked: Failed to release the lock <Store(url='file:///mnt/borg2', levels=[('config/', [0]), ('data/', [2])])> (was not locked).

Platform: Linux thar 6.10.9-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.10.9-1 (2024-09-08) x86_64
Linux: Unknown Linux  
Borg: 2.0.0b11.dev3+gc278a8e8.d20240915  Python: CPython 3.11.9 msgpack: 1.1.0 fuse: None [pyfuse3,llfuse]
PID: 1106305  CWD: /tmp/.lxc-backups/VM
sys.argv: ['/usr/local/bin/borg', '--debug', '-r', '/mnt/borg2', 'create', '--show-rc', 'vm:VM_2024-09-20T02:19:32+02:00', '.']
SSH_ORIGINAL_COMMAND: None


terminating with error status, rc 74

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Sep 20, 2024

writing item_ptrs chunk 413324e0c9367dea5aace1418bd6c18f560d6aa93db2d5d143eabf06a1fb9f60

notable: ^^^ that may call repository.put() which refreshes the lock - except if the chunk is already in the repo.

LOCK-STALE: lock is too old, it was not refreshed. lock: {'exclusive': False, 'hostid': 'thar@92448960952742', 'processid': 1106305, 'threadid': 0, 'time': '2024-09-20T01:35:04.030+00:00', 'key': 'e6883cbecdc69dbe', 'dt': datetime.datetime(2024, 9, 20, 1, 35, 4, 30000, tzinfo=datetime.timezone.utc)}

hmm, a logging formatter with timestamps to the left of all log output would have been useful here.

@ThomasWaldmann
Copy link
Member

LOCK-STALE: lock is too old, it was not refreshed. ...
LOCK-DELETE: key: e6883cbecdc69dbe
LOCK-REFRESH: our lock was killed, there is no safe way to continue.

BTW, that looks a bit stupid (like it shooting itself in the foot by killing its own lock), but is actually useful to find locking issues, because it could have been shot by some other borg process also, due to the same root cause.

@ThomasWaldmann
Copy link
Member

LOCK-CREATE: creating lock. key: e6883cbecdc69dbe lock: {'time': '2024-09-20T01:35:04.030+00:00', ...}
...
... (there is not one LOCK-REFRESH here!)
...
Reading files cache ...
...
querying the chunk IDs list from the repo...
!!! should have called lock.refresh() when using current master branch code / the patch.
...
writing item metadata stream chunk d494fab4...
!!! should have called lock.refresh() in repository.put() or be fast when reusing a chunk.
writing item_ptrs chunk fa2269ac5f0706c473c42508837693cc79b402fe461bd7a61e661acf9de44bb1
!!! same here
LOCK-STALE: lock is too old, it was not refreshed. lock: {'time': '2024-09-20T01:35:04.030+00:00', ...}
!!! it is still the timestamp from LOCK-CREATE, thus it did not really refresh the lock.

@ThomasWaldmann
Copy link
Member

maybe #8399 helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants