Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.6.x doesn't start on armv6 anymore #541

Open
arminus opened this issue Nov 7, 2024 · 12 comments
Open

1.6.x doesn't start on armv6 anymore #541

arminus opened this issue Nov 7, 2024 · 12 comments
Labels
bug Something isn't working p1

Comments

@arminus
Copy link
Contributor

arminus commented Nov 7, 2024

Describe the bug
Starting with 1.6.0, backrest doesn't work on a Pi-Zero anymore, 1.5.1 works fine

2024-11-07T20:55:41.834+0100 �[34mINFO�[0m writing logs to: /root/.local/share/backrest/processlogs
unexpected fault address 0xb6c75fff
fatal error: fault
[signal SIGBUS: bus error code=0x2 addr=0xb6c75fff pc=0x6822c0]

To Reproduce
See the attached log
out.log

Platform Info
Raspbian GNU/Linux 9.13 (stretch)
Linux raspberrypi 4.19.66+ #1253 Thu Aug 15 11:37:30 BST 2019 armv6l GNU/Linux
Backrest 1.6.1 (https://github.com/garethgeorge/backrest/releases/download/v1.6.1/backrest_Linux_armv6.tar.gz)

@arminus arminus added the bug Something isn't working label Nov 7, 2024
@garethgeorge
Copy link
Owner

garethgeorge commented Nov 9, 2024

Stack trace is very helpful, looks like this is related to the new dependency on sqlite introduced in 1.6.0. Thanks for reporting this.

I'm wondering -- does this happen first run on a fresh install? Or is it only on subsequent runs? It looks like the failure is coming from some bug in the WAL implementation that's hitting specifically on arm.

It could be related to running a few unnecessary pragmas in the init block each time I open the database / trying to change the page size here: https://github.com/garethgeorge/backrest/blob/main/internal/oplog/sqlitestore/sqlitestore.go#L67-L68 . It could also be a fundamental issue with WAL.

If you have an ability to build the binary yourself, you might try deleting those PRAGMA lines from the init block or I can provide you with a custom build to try out. I find it very suspicious that it's a WAL related crash that's triggering WHEN the database first tries to execute a query that changes mode and page size settings.

@arminus
Copy link
Contributor Author

arminus commented Nov 9, 2024

Trying ...

pi@raspberrypi:~/backrest/cmd/backrest $ go build .
warning: GOPATH set to GOROOT (/home/pi/go) has no effect
../../webui/webuinix.go:15:12: pattern dist: no matching files found

Does the backend have a dependency on the webui? I'm pressed for space on that Pi-Zero, not sure if I manage to build the UI as well with all the npm dependencies...

@garethgeorge
Copy link
Owner

Ah, it does but can be “faked” by simply creating that directory it’s looking for and placing an empty file in it. Sorry you’re running into that!

@arminus
Copy link
Contributor Author

arminus commented Nov 9, 2024

Ok, managed to compile it with your change, still crashes:

out2.log

@garethgeorge
Copy link
Owner

garethgeorge commented Nov 12, 2024

Thanks for giving this a go -- confirmed with another raspberry pi user in #536 that this does work on their raspberry pi device which means the crash is inconsistent. It's reproducible for you though and raspberry pi zero is a device that I'd expect to work. Unfortunately fixing this will be tricky as I don't have test hardware.

I'm using https://pkg.go.dev/modernc.org/sqlite which seems to be where the crash comes from. The package is pure go and is intended to be very easy to cross compile. It's also supposedly getting mature -- but it feels like we may be hitting an edge case on a less common architecture here.

The alternative is to look at switching to https://github.com/mattn/go-sqlite3 which depends on "C" sqlite , and introduce a zig toolchain to build it a bit like what this article describes: https://zig.news/kristoff/building-sqlite-with-cgo-for-every-os-4cic

Unfortunately not an easy fix, but perhaps worthwhile if I can't trust the stability of modernc sqlite.

In terms of remaining debugging steps, if you have the bandwidth I'd be very appreciative if you could try:

  • If you have other raspberry pi zero devices, I'd be very interested to know what hardware this does / does not reproduce on. If you're willing to fiddle with this a bit, you may also try changing.
  • If you're able to fiddle with the binary a bit more, I think it would be worth trying different ways of initializing the sqlite driver. If this is an issue specific to WAL mode, it can be disabled e.g. by fiddling with the initialization flags here https://github.com/garethgeorge/backrest/blob/main/internal/oplog/sqlitestore/sqlitestore.go#L38 . Try dropping the WAL flag and ensure that it doesn't open a -wal file when accessing the database.

Note, if you only have bandwidth to try one more thing, I'd say try deleting the OpenWAL flag on the linked line above. I'm realizing that this is probably still causing the problem I was worried about with the pragma's I had you delete. If that works, I can publish a new release that disables WAL for 32 bit devices.

@arminus
Copy link
Contributor Author

arminus commented Nov 12, 2024

Thanks for giving this a go -- confirmed with another raspberry pi user in #536 that this does work on their raspberry pi device which means the crash is inconsistent.

I have Pi4, it works without a problem there as well.

Also tested it on another Zero with a newer OS: Raspbian GNU/Linux 11 (bullseye), Linux raspberrypi 5.10.103-v7+ #1529 SMP Tue Mar 8 12:21:37 GMT 2022 armv7l GNU/Linux - same problem as on the first Zero. So it's not limited to just one OS/kernel on the Zero.

I changed L38 like so (for now on the same Pi Zero)

                Flags:    sqlite.OpenReadWrite | sqlite.OpenCreate,

Result:

2024-11-12T17:39:20.431+0100    INFO    writing logs to: /home/pi/.local/share/backrest/processlogs
2024-11-12T17:39:20.479+0100    WARN    operation log may be corrupted, if errors recur delete the file "/home/pi/.local/share/backrest/oplog.sqlite" and restart. Your backups stored in your repos are safe.
2024-11-12T17:39:20.483+0100    FATAL   error creating oplog: init sqlite: sqlite: step: SQL logic error: cannot change into wal mode from within a transaction

(of course I deleted the file as suggested and restarted, same issue)

but there's another

PRAGMA journal_mode=WAL;

here: https://github.com/garethgeorge/backrest/blob/main/internal/oplog/sqlitestore/sqlitestore.go#L67 - maybe that also needs to be set to something other than WAL?

@garethgeorge
Copy link
Owner

garethgeorge commented Nov 13, 2024

Thanks for digging into this and trying on another device, appreciate the time you're spending looking at it.

It's very encouraging that it's getting further with WAL mode disabled on open. I think you should remove both of https://github.com/garethgeorge/backrest/blob/main/internal/oplog/sqlitestore/sqlitestore.go#L38 and also the pragma you linked on https://github.com/garethgeorge/backrest/blob/main/internal/oplog/sqlitestore/sqlitestore.go#L67. I'm fairly optimistic that'll get backrest to a place where it's starting up successfully.

In general, any reference to WAL in the codebase will probably need to be disabled. You may run into a similar problem with the log store so you'll want to disable WAL here https://github.com/garethgeorge/backrest/blob/main/internal/logstore/logstore.go#L46 and here https://github.com/garethgeorge/backrest/blob/main/internal/logstore/logstore.go#L80 (anywhere it's referenced in the code).

If WAL is the root cause I'll likely go ahead and open an issue on https://pkg.go.dev/modernc.org/sqlite as well. If the fixes work and you can link me your fork, I'll also make sure I include a patch for this going forward with the next release.

@arminus
Copy link
Contributor Author

arminus commented Nov 13, 2024

I now removed both pragmans in L67-68 as well as the | sqlite.OpenWAL in L38, no improvement:
out3.log

@garethgeorge
Copy link
Owner

garethgeorge commented Nov 14, 2024

Hey, would you mind uploading your fork and possibly linking your fork so I can inspect the changes and possibly PR any additional fixes that are needed to get this working in your fork? It looks like the stack trace is still coming from executing the script that contains the pragma PRAGMA journal_mode=WAL; block which is confusing / it looks like the DB is still trying to use WAL mode.

I've gone ahead and opened an issue with your stack trace on the underlying database driver: https://gitlab.com/cznic/sqlite/-/issues/197

arminus added a commit to arminus/backrest that referenced this issue Nov 14, 2024
@arminus
Copy link
Contributor Author

arminus commented Nov 14, 2024

Here you go: main...arminus:backrest:main

@garethgeorge
Copy link
Owner

garethgeorge commented Nov 15, 2024

Thanks for uploading that -- just PR'd another fix arminus/backrest@main...garethgeorge:backrest:patch-1 based on some feedback I got from the sqlite driver developer in https://gitlab.com/cznic/sqlite/-/issues/197 .

If you can give yet another build with that dependency change a go, I'm fairly optimistic that a libc version mismatch would explain this.

@arminus
Copy link
Contributor Author

arminus commented Nov 15, 2024

The downgrade fixes the problem, see #559 for the full PR (WAL flag and pragma removal rollback)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working p1
Projects
None yet
Development

No branches or pull requests

2 participants