Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvidia-video-G06/nvidia-compute-G06: Configure services according to presets #50

Merged
merged 6 commits into from
Nov 30, 2024

Conversation

sndirsch
Copy link
Collaborator

No description provided.

Despite service was already moved to nvidia-persistenced package,
which is now required by nvidia-compute-G06 package, the preset
for this service was still in nvidia-compute-G06. Fixed this now
by moving the preset to this separate package as well.
Not only define presets for service, but also use these settings.
- nvidia-hibernate.service
- nvidia-powerd.service
- nvidia-resume.service
- nvidia-suspend.service
@sndirsch
Copy link
Collaborator Author

@scaronni-nvidia JFYI ...

@scaronni
Copy link
Contributor

Hi, a couple of points.

I'm surprised about the preset, that's a built in thing in systemd, you should not do anything to enable the units. One of the reasons to add the preset was actually to trim down the SPEC fie.

Regarding moving the preset, that's on purpose in the main CUDA driver package part, as you don't want to have the persistence daemon starting (and failing) if someone just installs it standalone. I think you should revert that.

@scaronni
Copy link
Contributor

Another example (fedora): https://src.fedoraproject.org/rpms/fedora-release/tree/rawhide

Depending on the release (workstation, server, etc.) a different subset of services get started, and that is done only with the preset file.

@sndirsch
Copy link
Collaborator Author

I'm surprised about the preset, that's a built in thing in systemd, you should not do anything to enable the units. One of the reasons to add the preset was actually to trim down the SPEC fie.

I'm definitely not an expert in this area, but recently I noticed that defining a preset to enable a service by default does not enable it. So I was looking what needs to be done and found this.

https://chatgpt.com/share/673f0048-b398-800c-9bf5-2cb166898de8

Tried this and noticed, that %systemd_post does not enable the service either, because this macro expands to

# Preset the service to follow the system's policy
:
if [ -x /usr/bin/systemctl ]; then                                                      
        test -n "$FIRST_ARG" || FIRST_ARG="$1"                                          
        [ -d /var/lib/systemd/migrated ] || mkdir -p /var/lib/systemd/migrated || :     
                                                                                        
        if [ "$YAST_IS_RUNNING" != "instsys" ]; then                                    
                /usr/bin/systemctl daemon-reload || :                                   
        fi                                                                              
        for service in load-nvidia-drm-${flavor}.service ; do                                                   
                sysv_service=${service%.*}                                              
                                                                                        
                if [ -e /run/systemd/rpm/needs-preset/$service ]; then                  
                        /usr/bin/systemctl preset $service || :                         
                        rm "/run/systemd/rpm/needs-preset/$service" || :                
                                                                                        
                elif [ -e /run/systemd/rpm/needs-sysv-convert/$service ]; then          
                        /usr/sbin/systemd-sysv-convert --apply $sysv_service || :       
                        rm "/run/systemd/rpm/needs-sysv-convert/$service" || :          
                        touch /var/lib/systemd/migrated/$sysv_service || :              
                fi                                                                      
        done                                                                            
fi                                             

And since nothing touches /run/systemd/rpm/needs-preset/$service before running this macro this won't enable the service either. So added my workaround /usr/bin/systemctl preset nvidia-suspend.service since it felt wrong to touch that file. I mean why use a macro if you need to know the content of the implementation?

Regarding moving the preset, that's on purpose in the main CUDA driver package part, as you don't want to have the persistence daemon starting (and failing) if someone just installs it standalone. I think you should revert that.

Ok. I thought you just forgot about moving it to the external package. Well, I don't think anyone is going use the nvidia-persistenced package without the nvidia driver though. I'm not a big supporter of referencing filenames from other packages and having that to keep in sync. But I guess it can't be done differently.

@sndirsch
Copy link
Collaborator Author

sndirsch commented Nov 21, 2024

Another example (fedora): https://src.fedoraproject.org/rpms/fedora-release/tree/rawhide

Depending on the release (workstation, server, etc.) a different subset of services get started, and that is done only with the preset file.

Interesting. Maybe things on RH/Fedora and SUSE work differently. As said I'm not an expert here.

@scaronni
Copy link
Contributor

I've made a test on a VM and it looks fine:

# systemctl status nvidia-persistenced.service | grep Loaded
     Loaded: loaded (/usr/lib/systemd/system/nvidia-persistenced.service; disabled; vendor preset: enabled)

It's enabled. Not the moment you install it, of course, but you're supposed to reboot.

@scaronni
Copy link
Contributor

The command systemctl preset <unit> resets the status (enabled, disabled, dependencies, etc.) according to the preset, so basically the commands you added are a no-op...

@sndirsch
Copy link
Collaborator Author

I don't know what the expected behaviour should be, but I can't reproduce this. On sle15-sp6-aarch64. Different driver package but also enabled via presets.

Right after installation

$ systemctl status load-nvidia-drm-default
○ load-nvidia-drm-default.service - Load nvidia-drm
     Loaded: loaded (/usr/lib/systemd/system/load-nvidia-drm-default.service; disabled; preset: enabled)
     Active: inactive (dead)

Reboot and then still the same.

$ systemctl status load-nvidia-drm-default
○ load-nvidia-drm-default.service - Load nvidia-drm
     Loaded: loaded (/usr/lib/systemd/system/load-nvidia-drm-default.service; disabled; preset: enabled)
     Active: inactive (dead)

Nobody enables that. There is no such magic apparently during boot process.

@sndirsch
Copy link
Collaborator Author

I'm afraid I need some working solution, not sth. officially correct and looking clean. :-(

@scaronni
Copy link
Contributor

According to the openSUSE documentation it should work: https://en.opensuse.org/openSUSE:Systemd_packaging_guidelines#Enabling_units

Also there are other presets already installed on the system...

@scaronni
Copy link
Contributor

I'll make some tests.

@scaronni
Copy link
Contributor

Just installed on a freshly installed opensuse VM:

○ nvidia-persistenced.service - NVIDIA Persistence Daemon
     Loaded: loaded (/usr/lib/systemd/system/nvidia-persistenced.service; enabled; vendor preset: enabled)
     Active: inactive (dead)

Note that the output is slightly different than my comment #50 (comment)

@scaronni
Copy link
Contributor

And all the other nvidia units attempt to start as well as boot. Please note that they appear as failed when the system is booting but somehow on suse they are not reported as failed once the system is started, they still show as inactive.

Are you by chance testing these things on a VM without an nvidia GPU?

@scaronni
Copy link
Contributor

systemctl list-unit-files --type=target

@sndirsch
Copy link
Collaborator Author

And all the other nvidia units attempt to start as well as boot. Please note that they appear as failed when the system is booting but somehow on suse they are not reported as failed once the system is started, they still show as inactive.

Are you by chance testing these things on a VM without an nvidia GPU?

Nope. It's a nVidia Jetson Orin AGX (aarch64) running SLE15-SP6. I'm wondering which system you're using.

/usr/lib/systemd/system/load-nvidia-drm-default.service

[Unit]
Description=Load nvidia-drm
Before=xdm.service
After=systemd-user-sessions.service systemd-logind.service

[Service]
Type=oneshot
ExecStart=/sbin/modprobe nvidia-drm
RemainAfterExit=yes

[Install]
WantedBy=graphical.target

/usr/lib/systemd/system-preset/70-nvidia-jetson-36_4-kmp-default.preset

enable load-nvidia-drm-default.service

@sndirsch
Copy link
Collaborator Author

systemctl list-unit-files --type=target

the service file doesn't occur in this list at all. Not even if I enabled and started it manually.

@scaronni
Copy link
Contributor

Sorry, my command above was also wrong, it was supposed to be:

systemctl list-unit-files --type=service | grep nvidia

I think I found something, if the system has been upgraded from a driver version where the systemd units were customized by the user or the %post* scripts, when the package gets upgraded, the "customization" is retained.

Was that your case? I still believe you shouldn't do anything, as all the tests I've done result in correctly enabled services. At maximum a systemcl preset <unit> would restore them all to an enabled state.

@scaronni
Copy link
Contributor

scaronni commented Nov 22, 2024

On a freshly installed system:

localhost:~ # hostnamectl | grep Operating
  Operating System: openSUSE Leap 15.5
localhost:~ # systemctl list-unit-files --type=service | grep nvidia
nvidia-hibernate.service                     enabled         enabled
nvidia-persistenced.service                  enabled         enabled
nvidia-powerd.service                        enabled         enabled
nvidia-resume.service                        enabled         enabled
nvidia-suspend.service                       enabled         enabled

@sndirsch
Copy link
Collaborator Author

Ok. I will check on a Leap 15.5/15.6 x86_64 system as soon as possible. Could be that behaviour on sle15-sp6-aarch64 is just broken.

…nced"

This reverts commit 83d344f. This
was just wrong. For more details see
#50 (comment)
Not only define preset for service, but also use this setting.
- nvidia-persistenced.service
@sndirsch sndirsch changed the title move also preset of nvidia-persistenced service to nvidia-persistenced package/nvidia-video-G06: Configure services according to presets nvidia-video-G06/nvidia-compute-G06: Configure services according to presets Nov 29, 2024
@sndirsch
Copy link
Collaborator Author

sndirsch commented Nov 29, 2024

I reverted move of nvidia-persistenced service to nvidia-persistenced package.

I see different results on a freshly installed Leap 15.6. Every service is disabled (preset: enabled) unless I add a
systemctl preset <service> line. This is already with this %systemd_post <service> line in %post sections. Reboot does not NOT change that.

@sndirsch
Copy link
Collaborator Author

Let's accept this now. I openened issue #51 to track that we would like to get rid of this systemctl preset <service> workaround.

@sndirsch sndirsch merged commit 43ada7e into openSUSE:main Nov 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants