Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug | flatpak-managed-install.service Fails to Start When Not Connected to the Internet #45

Open
ReedClanton opened this issue Feb 17, 2024 · 23 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@ReedClanton
Copy link
Contributor

Description

When a host NixOS machine rebuilds a system that includes nix-flatpak while not connected to the internet, flatpak-manged-install.service fails to start with the error message provided bellow.

reloading user units for reedclanton...
setting up tmpfiles
restarting the following units: wpa_supplicant-wlp4s6.service
warning: the following units failed: flatpak-managed-install.service

× flatpak-managed-install.service
     Loaded: loaded (/etc/systemd/system/flatpak-managed-install.service; enabled; preset: enabled)
     Active: failed (Result: exit-code) since Fri 2024-02-16 17:37:06 CST; 299ms ago
TriggeredBy: ● flatpak-managed-install.timer
    Process: 37129 ExecStart=/nix/store/9ra2bx3n45rznggnmjwr8dl55w86dali-flatpak-managed-install (code=exited, status=1/FAILURE)
   Main PID: 37129 (code=exited, status=1/FAILURE)
         IP: 0B in, 0B out
        CPU: 28ms

Feb 16 17:37:06 nixos-desktop-gnome systemd[1]: Starting flatpak-managed-install.service...
Feb 16 17:37:06 nixos-desktop-gnome 9ra2bx3n45rznggnmjwr8dl55w86dali-flatpak-managed-install[37132]: error: Can't load uri https://flathub.org/beta-repo/flathub-beta.flatpakrepo: While fetching https://flathub.org/beta-repo/flathub-beta.flatpakrepo: [6] Couldn't resolve host name
Feb 16 17:37:06 nixos-desktop-gnome systemd[1]: flatpak-managed-install.service: Main process exited, code=exited, status=1/FAILURE
Feb 16 17:37:06 nixos-desktop-gnome systemd[1]: flatpak-managed-install.service: Failed with result 'exit-code'.
Feb 16 17:37:06 nixos-desktop-gnome systemd[1]: Failed to start flatpak-managed-install.service.
warning: error(s) occurred while switching to the new configuration

Addition Information

This error occurs on the latest version of main as well as on commit 6079344, 6622918, and presumably most/all others. This is worth pointing out because it means it wasn't caused/solved by #30.

Once this issue is encountered, the user may reboot without issue. In other words, it doesn't cause any failures during boot.

I tested this very thing here and didn't see this issue. This could be caused by:

  • Sloppy/poor testing on my part.
  • I updated my flakes, with nix flake update, for the first time since installing NixOS (~month). Something could have changed outside of nix-flatpak.

Environment

This issue occurs on a laptop and desktop. The configuration uses flakes and installs a single flatpak via the nix-flatpak module and many flatpaks via the Home Manager module.

@gmodena
Copy link
Owner

gmodena commented Feb 17, 2024

When a host NixOS machine rebuilds a system that includes nix-flatpak while not connected to the internet, flatpak-manged-install.service fails to start with the error message provided bellow.

That is expected behavior if you build a system with services.flatpak.update.onActivation = true, and you have no connectivity.

Once this issue is encountered, the user may reboot without issue. In other words, it doesn't cause any failures during boot.

There's a chance that upon reboot, you network connection went back up. The flatpak-manged-install.service unit is started only after multi-user.target target is reached (meaning, that network & connectivity are expected to have started).

@gmodena gmodena added the documentation Improvements or additions to documentation label Feb 17, 2024
@mrnetlex
Copy link

The same happens to me - flatpak-managed-install.service fails with same way.
I have nix-flatpak.url = "github:gmodena/nix-flatpak"; in my flake.nix, so presumably I use latest version.
services.flatpak.update.onActivation isn't specified, so it should default to false.

(I don't know if it could be related, but after every reboot I get alert from nextcloud-client that says it couldn't connect. I would assume that it should start way after system got network connection, so maybe there's some common cause way services try to connect to early, but this whole reasoning is probably too far-fetched.)

@cig0
Copy link

cig0 commented May 8, 2024

Hi,

I'm experiencing the same issue.

When wired to my router, the service behaves as expected; however, it will fail when relying on the WiFi connection (I'm using the NetworkManager service here).

I tried all sorts of combinations to make the service start correctly at boot time (by editing the nixos.nix file), i.e.:

  • Adding a Wants=network-online.target statement, as explained in man 7 systemd.special
  • Wants= and Requires= with network.target, network-online.target, and NetworkManager.service

In all cases, flatpak-managed-install.service will fail at start-up. After scratching my head for a reasonable amount of time, my hunch is that the issue has to be with how NixOS handles networking (I am new to NixOS; I've been around less than two weeks, so I can't affirm anything!).

In my case, something that would help me avoid having my system tainted (as shown with # systemctl list-machines) right off the bat on a fresh boot would be to disable the service from automatically starting at boot time and let the timer trigger it. I looked at the code but couldn't find a way to have the service created with a disabled state.

My configuration is as follows:

/etc/nixos/modules/flatpak.nix

{
  services.flatpak ={ 
    enable = true;
    update = {
      auto = {
        enable = true;
        onCalendar = "weekly"; # Default value
      };
      onActivation = false;
    };

    uninstallUnmanaged = true;
    packages = [
    ...
    ...
    ];
  };
}

/etc/nixos/configuration.nix

{
  imports =
    [
      ...
      # Flatpak
      ./modules/flatpak.nix
      ./modules/nix-flatpak/modules/nixos.nix
      ...
    ];
}

On a side note, I love this development; it makes using NixOS even more enjoyable. Thank you! 🙌
I'd be more than glad to chip in some money if you set up a sponsor link :)

@cig0
Copy link

cig0 commented May 8, 2024

Systemd's documentation on network configuration: https://systemd.io/NETWORK_ONLINE/

@gmodena
Copy link
Owner

gmodena commented May 8, 2024

Hey @mrnetlex @cig0

I have been trying to replicate this issue, but so far no luck. Is there chance you could share you network config? In my experience these things tend to be a bit flaky.

FWIW: my baseline env can be found under testing-base; with that setup, I was not able to replicate the issue. I never noticed any issue switching to/from wired/wifi connections on my daily driver either, but I don't reboot that often.

@mrnetlex you are right - if services.flatpak.update.onActivation is not specified, it should default to false, and not try to download flatpaks at boot. This is what I would expect the unit status to look like with default settings:

[antani@nixos:~]$ systemctl --user status flatpak-managed-install
○ flatpak-managed-install.service
     Loaded: loaded (/home/antani/.config/systemd/user/flatpak-managed-install.servic>
     Active: inactive (dead) since Wed 2024-05-08 18:40:03 UTC; 329ms ago
    Process: 1524 ExecStart=/nix/store/930iss74cxw2ailj31bjjfkhi6dvhmi7-flatpak-manag>
   Main PID: 1524 (code=exited, status=0/SUCCESS)
        CPU: 288ms

May 08 18:40:01 nixos systemd[1514]: Starting flatpak-managed-install.service...
May 08 18:40:03 nixos systemd[1514]: Finished flatpak-managed-install.service.

Do you (still) experience a different behavior?

@cig0

When wired to my router, the service behaves as expected; however, it will fail when relying on the WiFi connection (I'm using the NetworkManager service here).

I tried all sorts of combinations to make the service start correctly at boot time (by editing the nixos.nix file), i.e.:

* Adding a `Wants=network-online.target` statement, as explained in `man 7 systemd.special`

* `Wants=` and `Requires=` with `network.target`, `network-online.target`, and `NetworkManager.service`

The systemd unit that nix-flatpak installs (flatpak-managed-install should start after systemd's multi-user.target, and is wanted by default.target. My understanding (that could be wrong) is that the unit would not try to kick off a downlad till GUI and network are up and running.

Could you maybe paste me the output systemctl status flatpak-managed-install (--user if you install it as a home-manager module) after startup ? Does journalctl report any useful info?

In my case, something that would help me avoid having my system tainted (as shown with # systemctl list-machines) right off the bat on a fresh boot would be to disable the service from automatically starting at boot time and let the timer trigger it. I looked at the code but couldn't find a way to have the service created with a disabled state.

Ah! I wonder if setting services.flatpak.update.auto.enabled=true is triggering the download attempt at boot (thus overriding services.flatpak.update.onActivation=false). This could happen:
If a timer had expired while a machine was off/asleep, it will fire upon resume. See https://wiki.archlinux.org/title/systemd/Timers for details.

Just realized that OP also reports a TriggeredBy: ● flatpak-managed-install.timer in the error message.

FWIW services.flatpak.update.auto.enabled is kinda documented (in the module's options docs), but in hindsight it might be a bit counterintuitive/unclear. Tbh I need to triple check this code path (it has been a while ); I'll f/up in thread.

On a side note, I love this development; it makes using NixOS even more enjoyable. Thank you! 🙌 I'd be more than glad to chip in some money if you set up a sponsor link :)

Thanks for the kind words! Happy to hear you find this project useful.
I appreciate the offer to sponsor, but right now I don't have a significant amount of resources invested in this project. Any help in the form of bug reports (like this one!), feature discussions and doc improvements is very much welcome & appreciated :)

@gmodena gmodena self-assigned this May 8, 2024
@cig0
Copy link

cig0 commented May 8, 2024

@gmodena I'm afraid there's not much information in the service logs, only the mention it can't fetch the remote object:

~ λ journalctl -b -u flatpak-managed-install.service
May 08 17:46:08 perrrkele systemd[1]: Starting flatpak-managed-install.service...
May 08 17:46:08 perrrkele vyb3rjlabp427icswxan9fhz3dpxqgwm-flatpak-managed-install[1313]: error: Can't load uri https://dl.flathub.org/repo/flathub.flatpakrepo: While fetching https://dl.flathub.org/repo/flathub.flatpakrepo: [6] Couldn't resolve host name
May 08 17:46:08 perrrkele systemd[1]: flatpak-managed-install.service: Main process exited, code=exited, status=1/FAILURE
May 08 17:46:08 perrrkele systemd[1]: flatpak-managed-install.service: Failed with result 'exit-code'.
May 08 17:46:08 perrrkele systemd[1]: Failed to start flatpak-managed-install.service.

The endpoint is perfectly reachable otherwise:

~ λ curl -I https://dl.flathub.org/repo/flathub.flatpakrepo
HTTP/2 200 
server: nginx/1.18.0 (Ubuntu)
content-type: application/octet-stream
last-modified: Fri, 12 Jan 2018 12:24:05 GMT
etag: "5a58a8e5-fc8"
expires: Thu, 15 Feb 2024 12:30:10 GMT
cache-control: max-age=3600, public
backend-name: 3DxooTFj8SlVTdJ0UTX8Jd--F_front_hex2
via: 1.1 varnish, 1.1 varnish
accept-ranges: bytes
date: Wed, 08 May 2024 20:58:12 GMT
age: 1707
x-served-by: cache-lhr7381-LHR, cache-gru-sbgr1930057-GRU
x-cache: HIT, HIT
x-cache-hits: 52515, 1
x-timer: S1715201893.865167,VS0,VE2
strict-transport-security: max-age=31557600
alt-svc: h3=":443";ma=86400,h3-29=":443";ma=86400,h3-27=":443";ma=86400
content-length: 4040

@cig0
Copy link

cig0 commented May 8, 2024

By the way, this is my DNS resolver configuration in case you may be wondering if the issue could be related to it:

/etc/nixos/configuration.nix

  # Enable networking
  networking = {
    hostName = "perrrkele";
    # networking.wireless.enable = true;  # Enables wireless support via wpa_supplicant.
    networkmanager = {
      enable = true;
      dns = "systemd-resolved";
    };
  };

(Module)

{
  services.resolved = {
    enable = true;
    fallbackDns = [
      "82.96.65.2" "94.140.14.14" "1.1.1.1"
    ];
  };
}

@gmodena
Copy link
Owner

gmodena commented May 8, 2024

@cig0 ack - thanks for info.

Just to be sure; are restarts after book (systemctl restart flatpak-managed-install ) successful?

A workaround I can think if of would be testing if domains can be resolved in the installer script, and retrying if not. But I am not super fond of introducing a busy wait at boot (or forcing a success for a failing service).

@cig0
Copy link

cig0 commented May 8, 2024

@gmodena Manually restarting the service works as expected -- take a look at this beautiful output:
Screenshot_20240508_190715

Yeah, I'm not fond neither of introducing dirty workarounds or obfuscating a system's behavior if it's not absolutely necessary, which is not the case IMO.

I will continue digging here. I want to understand why, if the service behaves correctly on your end, it is failing on my side, especially considering this is a fairly fresh NixOS installation—it isn't even two weeks old.

I'll get back to you on Discourse once I have a first draft ready 👍

@cig0
Copy link

cig0 commented May 8, 2024

By the way, what kind of networking setup do you have, @gmodena? Are you also using NetworkManager?

@gmodena
Copy link
Owner

gmodena commented May 9, 2024

By the way, what kind of networking setup do you have, @gmodena? Are you also using NetworkManager?

I am also using NetworkManager (networking.networkmanager.enable = true;).

@cig0
Copy link

cig0 commented May 9, 2024

By the way, what kind of networking setup do you have, @gmodena? Are you also using NetworkManager?

I am also using NetworkManager (networking.networkmanager.enable = true;).

This makes this issue even more interesting! Given Nix(OS) very nature, I wonder what settings are introducing noise for OP and me, making the service fail 🤔

@cig0
Copy link

cig0 commented May 10, 2024

@ReedClanton @gmodena @mrnetlex I'm happy to inform you that I've found the root cause of the issue, which can be solved with a tiny change: #67

@io12
Copy link

io12 commented May 16, 2024

This command waits 60 seconds for an internet connection.

''
  ${pkgs.networkmanager}/bin/nm-online --quiet --timeout 60
''

https://forum.manjaro.org/t/for-those-who-use-systemd-services-that-rely-on-a-network-connection/83626/1

@gmodena
Copy link
Owner

gmodena commented May 18, 2024

Thanks for the pointer @io12, that thread contained a lot of useful info.

The command your shared would work for NetworkManager users, but I would not want to enforce a dep on NetworkManager on every system. FWIW NetworkManager ships with a service to address the problem discussed in this issue: https://man.archlinux.org/man/NetworkManager-wait-online.service.8.en (under the hood it runs: nm-online -s -q ).

I was hoping that an explicit Wants=on network-online.target would help, but there is no guarantee of what online means.

From systemd's doc
[...]
Units that strictly require a configured network connection should pull in network-online.target (via a Wants= type dependency) and order themselves after it. This target unit is intended to pull in a service that delays further execution until the network is sufficiently set up. What precisely this requires is left to the implementation of the network managing service.
[...]
.

@gmodena
Copy link
Owner

gmodena commented May 18, 2024

@ReedClanton @gmodena @mrnetlex I'm happy to inform you that I've found the root cause of the issue, which can be solved with a tiny change: #67

Hey @cig0, I did not have a change to f/up on the PR before you closed it. Sorry about that.

Don't know if you already came across this, but there's no need to modify upstream to alter a systemd unit. You should be able to add a sleep to the flatpak-managed-install service by adding something like this to your config (not tested):

  systemd.services."flatpak-managed-install" = {
    serviceConfig = {
      ExecStartPre = "${pkgs.coreutils}/bin/sleep 5";
    };
  };

Hope this helps.

@cig0
Copy link

cig0 commented May 23, 2024

@ReedClanton @gmodena @mrnetlex I'm happy to inform you that I've found the root cause of the issue, which can be solved with a tiny change: #67

Hey @cig0, I did not have a change to f/up on the PR before you closed it. Sorry about that.

Don't know if you already came across this, but there's no need to modify upstream to alter a systemd unit. You should be able to add a sleep to the flatpak-managed-install service by adding something like this to your config (not tested):

  systemd.services."flatpak-managed-install" = {
    serviceConfig = {
      ExecStartPre = "${pkgs.coreutils}/bin/sleep 5";
    };
  };

Hope this helps.

This is pretty cool! Yesterday, I was thinking about a similar approach using an overlay (I started learning about them), but your solution is much simpler. K.I.S.S. FTW 🚀

@dezren39 dezren39 mentioned this issue Nov 3, 2024
@dezren39
Copy link

dezren39 commented Nov 3, 2024

I actually am in offline mode but have encountered this on activation. I had auto-update and onActivation enabled, but I also encountered this when rebuild with just the flake module imported. I would like if the apps list doesn't add any new programs that the service would exit 0 or something.

(I don't use networkmanager, i use wpa_supplicant by way of networking.wireless)

@gmodena
Copy link
Owner

gmodena commented Nov 3, 2024

Hey @dezren39, setting update.onActivation=true assumes that connectivity is available during activation. Otherwise, the module should support the offline mode you described in #92.

but I also encountered this when rebuild with just the flake module imported

Ack. This sounds like unwanted / buggy behavior. I need to verify that I did not introduce a regression.

Would you mind sharing the following information?

  • Did you install nix-flatpak as a Home Manager module or as a NixOS module?
  • Are you tracking a released version or the main branch of this repository?

Thanks!

@gmodena
Copy link
Owner

gmodena commented Nov 3, 2024

FWIW, I just tried an offline build (I switched off networking) with nix-flatpak installed as a home-manager module, and services.flatpak.update.onActivation=false (default value).

The system built, and this is the status of the systemd unit post activation:

○ flatpak-managed-install.service
     Loaded: loaded (/home/gmodena/.config/systemd/user/flatpak-managed-install.service; enabled; preset: enabled)
     Active: inactive (dead) since Sun 2024-11-03 19:40:13 CET; 3h 24min ago
   Main PID: 3198 (code=exited, status=0/SUCCESS)
        CPU: 4.872s

nov 03 19:40:11 framework-nixos-1 1bszkfgay14f34map1xvfrd85y3kw8aq-flatpak-managed-install[3285]: Skipping: com.logseq.Logseq/x86_64/stable is already installed
nov 03 19:40:11 framework-nixos-1 1bszkfgay14f34map1xvfrd85y3kw8aq-flatpak-managed-install[3291]: Skipping: com.jetbrains.IntelliJ-IDEA-Community/x86_64/stable is already insta>
nov 03 19:40:11 framework-nixos-1 1bszkfgay14f34map1xvfrd85y3kw8aq-flatpak-managed-install[3297]: Skipping: com.jetbrains.PyCharm-Community/x86_64/stable is already installed
nov 03 19:40:12 framework-nixos-1 1bszkfgay14f34map1xvfrd85y3kw8aq-flatpak-managed-install[3326]: Skipping: org.signal.Signal/x86_64/stable is already installed
nov 03 19:40:12 framework-nixos-1 1bszkfgay14f34map1xvfrd85y3kw8aq-flatpak-managed-install[3361]: Skipping: io.typora.Typora/x86_64/stable is already installed
nov 03 19:40:12 framework-nixos-1 1bszkfgay14f34map1xvfrd85y3kw8aq-flatpak-managed-install[3457]: Skipping: net.ankiweb.Anki/x86_64/stable is already installed
nov 03 19:40:13 framework-nixos-1 1bszkfgay14f34map1xvfrd85y3kw8aq-flatpak-managed-install[3489]: Skipping: com.visualstudio.code/x86_64/stable is already installed
nov 03 19:40:13 framework-nixos-1 1bszkfgay14f34map1xvfrd85y3kw8aq-flatpak-managed-install[3519]: Skipping: io.github.zen_browser.zen/x86_64/stable is already installed
nov 03 19:40:13 framework-nixos-1 systemd[3185]: Finished flatpak-managed-install.service.
nov 03 19:40:13 framework-nixos-1 systemd[3185]: flatpak-managed-install.service: Consumed 4.872s CPU time.

The timestamps are consistent with timer execution schedule.

Did I understand it correctly that you had offline activations fail with services.flatpak.update.onActivation=false? If that's the case, I wonder if it could be a side effect of how systemd timers are persisted. Could you try explicitly setting services.flatpak.update.auto=false between activations?

In the meanwhile, I'll try to repro with nix-flatpak installed as a nixos module too.

@Nick1296
Copy link

I think I have an update on this. I am also suffering from the service failing right after boot when I am not connected to the internet and update.onActivation=false.
I have a few ideas that might help:

While there is no way to prevent this if update.onActivation=true there could be a not-too-bad workaround in this case:
The module could check if NeworkManager is enabled as a nixos option and either use nm-online as suggested before or fallback to maybe using a sleep like in #67 to wait a bit more after reaching multi-user.target. This can all be in an ExecStartPre option in the systemd service.

In case update.onActivation=false instead, I think there is a slight issue. Systemd services that are only activated by a timer should not have a WantedBy option, otherwise they will get added as dependencies for the items listed in the option (see systemd.unit(5)).
flatpak-managed-install.service should then be wanted by default.target only if update.onActivation=true, since the timer can activate the target anyway, and like this the service will not be activated at every boot. This is also how nixpkgs handles automatic updates, I took that service as a reference before starting to talk about things I am not super familiar with.

I am happy to provide a pull request with these changes if you want (I would like a bit of guidance on how to test my code in that case) and thanks a lot for this project!

@gmodena
Copy link
Owner

gmodena commented Dec 20, 2024

I think I have an update on this. I am also suffering from the service failing right after boot when I am not connected to the internet and update.onActivation=false. I have a few ideas that might help:

Hi @Nick1296 thank you so much for looking into this.
I believe the root cause of this issue should be addressed in #110. I hope to get to it during the upcoming holiday season.

That said, I’d like to make this module more reliable under network issues, and I think your suggestions would be helpful in achieving that.

While there is no way to prevent this if update.onActivation=true there could be a not-too-bad workaround in this case: The module could check if NeworkManager is enabled as a nixos option and either use nm-online as suggested before or fallback to maybe using a sleep like in #67 to wait a bit more after reaching multi-user.target. This can all be in an ExecStartPre option in the systemd service.

Let me think about it. On the one hand, I don’t like having an implicit (if optional) dependency on a system-enabled NetworkManager. On the other hand, it might be the pragmatic way to go.

Re sleep: sounds good to me, as long as we can make the sleep option configurable.

In case update.onActivation=false instead, I think there is a slight issue.

Yep. And nix-flatpak's behavior is buggy to begin with, since it tries to lookup a repository even if no install is required (#110 ).

Systemd services that are only activated by a timer should not have a WantedBy option, otherwise they will get added as dependencies for the items listed in the option (see systemd.unit(5)). flatpak-managed-install.service should then be wanted by default.target only if update.onActivation=true, since the timer can activate the target anyway, and like this the service will not be activated at every boot. This is also how nixpkgs handles automatic updates, I took that service as a reference before starting to talk about things I am not super familiar with.

In general, I'm very much keen on adopting conventions from nixpkgs. Let me review, but in principle what you propose makes sense to me.

I am happy to provide a pull request with these changes if you want (I would like a bit of guidance on how to test my code in that case) and thanks a lot for this project!

Yes please! A contribution with the changes you propose would be terrific.

I don't have an automated way to test these types of changes; usually, I just run (manual) integration tests in a VM where I can shut down or throttle networking. Did you have a look at the flake in testing-base? If you happen to be on IRC (Libera) or Matrix, we can sync up there for quicker response times :)

To keep testing simple, I’d suggest having dedicated, separate PRs for the update.onActivation=false and update.onActivation=true scenarios if feasible.

@Nick1296
Copy link

Thanks! I will try to make a contribution when I get a chance to it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants