Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCI devices are assigned in random order => random issues with Wi-Fi connection #6587

Closed
v6ak opened this issue May 7, 2021 · 9 comments
Closed
Labels
C: other eol-4.0 Closed because Qubes 4.0 has reached end-of-life (EOL) P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. pr submitted A pull request has been submitted for this issue. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.

Comments

@v6ak
Copy link

v6ak commented May 7, 2021

Qubes OS version
R4.0

Affected component(s) or functionality

Assigning PCI devices to qubes.

Brief summary
PCI devices are assigned in random order. This causes the interface names to vary across reboots (e.g., wls6 vs. wls7). Network Manager uses this identifier to identify the network card, so “randomly“ (=depending on the interface name) ignores saved Wi-Fi networks.

How Reproducible

Randomly – something like 50 % chance of different order with two PCI devices.

To Reproduce

Steps to reproduce the behavior:

  1. Assign at least two PCI devices to a qube (e.g. Wi-Fi + Ethernet to sys-net)
  2. Run lspci in the qube and save the content.
  3. Reboot computer. (Maybe restart of qubesd would be enough; Restarting just the single qube is not enough.)
  4. Run lspci in the qube and compare with the previous output.

Expected behavior
The order of those devices is always the same.

Actual behavior
The order varies randomly.

Additional context

IIUC, qubes.devices.PersistentCollection stores the identifiers of PCI devices in a Dict. Dictionaries in Python don't guarrant the order of keys (try running python3 -c 'print({"1":None, "2":None})' several times…), IIUC, it has some per-process random nonce in order to prevent collision-based DoS attack. It seems that the order is propagated to the affected DomU. I can see varying order of pcidevs in /var/log/libvirt/libxl/<qube-name>.log.

In my opinion, keeping a stable order might be also useful for some other kind of persistently attachable devices, so maybe the problem should be solved there.

Solutions you've tried

  • Workaround for NetworkManager: For each Wi-Fi network, add two (or more, depending on the number of PCI devices) connections – one for each possible interface name.
  • Workaround for NetworkManager: When Wi-Fi does not connect, edit the network connection and adjust the Device identifier.
  • [Not yet tried] Workaround for network interface names: Add some kernel parameter that affects the network interface naming to be based on something other than on order of PCI devices.
@v6ak v6ak added P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. labels May 7, 2021
@andrewdavidwong andrewdavidwong added C: other needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. labels May 7, 2021
@andrewdavidwong andrewdavidwong added this to the Release 4.0 updates milestone May 7, 2021
@rustybird
Copy link

rustybird commented May 7, 2021

IIUC, qubes.devices.PersistentCollection stores the identifiers of PCI devices in a Dict. Dictionaries in Python don't guarrant the order of keys (try running python3 -c 'print({"1":None, "2":None})' several times…)

R4.1 might not have this bug then, because Python 3.7+ (de facto CPython 3.6+) dicts do preserve insertion order.

(Still worth fixing for R4.0, of course.)

@v6ak
Copy link
Author

v6ak commented May 8, 2021

Good point. I am not sure if the deterministic order is enough. Maybe it is, maybe it isn't.

  • It surely prevents shuffling the devices in PersistentCollection.
  • I am rather convinced that the deviced are not shuffled after enumerating PersistentCollection.
  • But I am not sure if the devices are put to PersistentCollection in a deterministic order. I don't fully understand the storage mechanism of qubes.xml.

Nevertheless, it seems to be worth trying. I've adjusted devices.py in my Qubes 4.0 installation – just changed {} to OrderedDict() and added from collections import OrderedDict. We'll see what will happen. If I don't write any outcome, remind me please.

@fepitre
Copy link
Member

fepitre commented May 9, 2021

@v6ak I've already experimented this issue a while ago and also other people I know (possibly R4.1 too, I don't remember for sure). There is indeed some ordering to fix for PCI devices. Never had the time to do it so thank you.

@v6ak
Copy link
Author

v6ak commented May 10, 2021

Well, it isn't enough. It seems to be also shuffled elsewhere. From what I've seen, I guess it is shuffled rather before it goes to PersistentCollection than after it goes there.

So, I've created a simple script that enumerates the devices and see if it changes. If it does, I'll probably adjust __iter__ to sort the devices.

@v6ak
Copy link
Author

v6ak commented May 11, 2021

Now, the order of the devices in sys-net is swapped (compared to the previous boot).

The order of devices written by the script is still the same. However, my the script seems to be inconclusive, because qubes.devices.DeviceCollection.assignments uses set, which shuffles the result.

Another finding: qubes.vm.BaseVM.__xml__, which seems to handle XML serialization, calls self.devices[devclass].assignments(persistent=True). While not 100% sure, it seems to call qubes.devices.DeviceCollection.assignments. This would mean the order would be shuffled after every serialization to qubes.xml. The order would be probably the same until qubesd is restarted (=> usually until the computer is rebooted), but not after reboot. I am not sure when qubes.xml is updated, maybe it is not updated when no change to VM config is made.

This would also explain why this may happen on 4.1 (as @fepitre seems to remember). While Python 3.7 (or CPython 3.6) made all dicts ordered (and lowered the difference between dict and OrderedDict), this does not apply to sets.

How to fix it?

a. Sort all items in PersistentCollection.__iter__. I am not 100% sure about the types there, though. It's tempting to replace return self._dict.values().__iter__() by return map(lambda x: x[1], sorted(self._dict.items(), key=lambda x: x[0])) in PersistentCollection.__iter__. Drawback: Adding a device could reorder other devices.
b. Adjust qubes.devices.DeviceCollection.assignments to return list instead of set. I find this solution more appropriate (after all, we would probably want to keep the order), but also more invasive to all related code..

@DemiMarie
Copy link

@v6ak The second approach (using list instead of set) is the one I recommend.

@v6ak
Copy link
Author

v6ak commented May 18, 2021

PR draft: QubesOS/qubes-core-admin#407

Maybe I should look for the other occurrences of DeviceCollection.assignments calls, especially those that are PCI-speciffic.

@andrewdavidwong andrewdavidwong added the pr submitted A pull request has been submitted for this issue. label May 18, 2021
@sajkbflksadbfkasdjf
Copy link

Can confirm that this also happens on Qubes 4.1

@github-actions
Copy link

github-actions bot commented Aug 5, 2023

This issue is being closed because:

If anyone believes that this issue should be reopened and reassigned to an active milestone, please leave a brief comment.
(For example, if a bug still affects Qubes OS 4.1, then the comment "Affects 4.1" will suffice.)

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 5, 2023
@andrewdavidwong andrewdavidwong removed the needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. label Aug 5, 2023
marmarek pushed a commit to marmarek/qubes-core-admin that referenced this issue Oct 9, 2023
marmarek pushed a commit to marmarek/qubes-core-admin that referenced this issue Oct 11, 2023
See QubesOS/qubes-issues#6587.

Attempt to fix the rest of PCI device reordering
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: other eol-4.0 Closed because Qubes 4.0 has reached end-of-life (EOL) P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. pr submitted A pull request has been submitted for this issue. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.
Projects
None yet
Development

No branches or pull requests

6 participants