Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong PCIe devices get passed through to wrong qubes after adding new PCIe cards #7792

Open
ddevz opened this issue Sep 25, 2022 · 28 comments
Open
Labels
affects-4.1 This issue affects Qubes OS 4.1. C: core hardware support needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. security This issue pertains to the security of Qubes OS. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.

Comments

@ddevz
Copy link

ddevz commented Sep 25, 2022

After installing qubes and setting up your passthrough PCI devices, if you add a new PCIe card, it can change the device numbers of the old cards. This will cause qubes to pass through the wrong devices to the wrong HVM qubes.

The device renumbering is obviously a hardware thing, and preferably there would be a way to make persistent names similar to how linux gives ethernet cards persistent names. If you remove eth0 from a linux system and put another ethernet card in, linux starts calling that new card eth1, not eth0. Linux probably determines uniqueness by MAC address, so there might not be any way to uniquely identify PCIe cards like that.

I expect this to become a bigger problem in the future when people are using sys-gui-gpu and video cards getting a new PCIe number means they will no longer pass through to sys-gui-gpu qube.

(this is related to, but different then #6587 )

@ddevz ddevz added P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: enhancement Type: enhancement. A new feature that does not yet exist or improvement of existing functionality. labels Sep 25, 2022
@DemiMarie
Copy link

Ouch. This is actually a security concern: passing e.g. your primary SSD to sys-net would be bad.

The only solution I can think of is to identify devices by PCI bus path, which should be deterministic and enforced by hardware.

@DemiMarie DemiMarie added the security This issue pertains to the security of Qubes OS. label Sep 25, 2022
@andrewdavidwong andrewdavidwong added C: core hardware support T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. and removed T: enhancement Type: enhancement. A new feature that does not yet exist or improvement of existing functionality. labels Sep 26, 2022
@andrewdavidwong andrewdavidwong added this to the Release 4.1 updates milestone Sep 26, 2022
@unman
Copy link
Member

unman commented Sep 26, 2022 via email

@DemiMarie
Copy link

@unman is there any sort of identifier associated with a physical slot on the motherboard somewhere, and which cannot change?

@unman
Copy link
Member

unman commented Sep 28, 2022 via email

@Garbage4F
Copy link

I've experienced this issue whilst trying to add a 3rd GPU to this motherboard https://www.msi.com/Motherboard/X399-SLI-PLUS/Specification - in particular when 2 cards were nvidia and 1 ati all the pci IDs changed compared to a known good configuration with 1 less GPU. However going from 1 -> 2 GPUs with them in the white slots, they kept their IDs

@DemiMarie
Copy link

@unman friendly ping

@unman
Copy link
Member

unman commented Nov 18, 2022 via email

@brendanhoar
Copy link

It can be reduced (not eliminated but reduced) by validating the pci vendor id and pci device id match what they were during qube assignments.

If they change block qube auto-start.

Special case perhaps to have sys-gui also auto-assign based on these ids if the current device doesn't match (woe to multi-card systems).

B

@unman
Copy link
Member

unman commented Nov 18, 2022 via email

@jamke
Copy link

jamke commented Nov 20, 2022

For people who are interested: the issue #7892 has some possible security-breaching cases and workarounds for users about this issue.

@3hhh
Copy link

3hhh commented Nov 20, 2022

It can be reduced (not eliminated but reduced) by validating the pci vendor id and pci device id match what they were during qube assignments.

If they change block qube auto-start.

Special case perhaps to have sys-gui also auto-assign based on these ids if the current device doesn't match (woe to multi-card systems).

B

I think Qubes OS would need a more wholesome approach for PCI devices:
One idea would be to maintain a list of all PCI devices that were found at first boot and inform users that they'll have to edit that list from some rescue boot mode whenever they modify devices. Refuse to boot in normal mode, if any changes are detected. Similar as heads does it whenever it detects changes to /boot/.

This would also resolve problems with PCI devices suddenly attached to dom0 (e.g. #7886).

@brendanhoar
Copy link

...and inform users that they'll have to edit that list from some rescue boot mode whenever they modify devices.

This would be rather user-unfriendly on thunderbolt systems.

B

@3hhh
Copy link

3hhh commented Nov 20, 2022 via email

@marmarek
Copy link
Member

It can be reduced (not eliminated but reduced) by validating the pci vendor id and pci device id match what they were during qube assignments.

This is IMO a good idea. To validate VID+PID in addition to device bus-device-function (BDF). If it changes, refuse to start, and request the user to detach the device and attach it again. It wouldn't automatically try to find where the device moved to (which, as suggested later, could result in attaching wrong device, if there are multiple matches).

@brendanhoar
Copy link

It can be reduced (not eliminated but reduced) by validating the pci vendor id and pci device id match what they were during qube assignments.
This is IMO a good idea. To validate VID+PID in addition to device bus-device-function (BDF).

Thanks.

This gets a bit more annoying for sys-gui (w/o dom0 fallback) and sys-usb (w/o ps2 mouse/keyboard), but...well we already have those problems today. :)

B

@marmarek
Copy link
Member

There was a complementary feature considered for portable cases (Qubes installed on external disk, plugged into different machines) - assignment based not on specific devices, but all devices of a given class: #214
But that's separate topic we can revise, not a fix for the current per-device assignment.

@DemiMarie
Copy link

@brendanhoar: I don’t know if sys-gui w/o dom0 fallback is going to happen. For one, it would likely block a port to Apple Silicon, as the GPU is not behind an SMMU (ARM IOMMU) on that platform.

@DemiMarie
Copy link

@marmarek: do you know what the PCIe Location Path that Microsoft refers to is? @brendanhoar On a Thunderbolt system, plugging or unplugging a Thunderbolt device should not cause the numbers of other devices to change. If it does, that is a bug in the hardware or firmware, IMO.

@marmarek
Copy link
Member

For one, it would likely block a port to Apple Silicon, as the GPU is not behind an SMMU (ARM IOMMU) on that platform.

That's irrelevant now, since ARM isn't supported by Qubes yet. I can totally see, if/when we'll be doing ARM port, deciding to require IOMMU properly put in front of all relevant devices (including GPU). Anyway, that's some years in the future, hardware may look different at that time.

@marmarek: do you know what the PCIe Location Path that Microsoft refers to is?

Seems to be BDF, based on a quick google: https://superuser.com/a/1202958 lists "PCI bus 3, device 0, function 0" as "LocationInfo"

@DemiMarie
Copy link

For one, it would likely block a port to Apple Silicon, as the GPU is not behind an SMMU (ARM IOMMU) on that platform.

That's irrelevant now, since ARM isn't supported by Qubes yet. I can totally see, if/when we'll be doing ARM port, deciding to require IOMMU properly put in front of all relevant devices (including GPU). Anyway, that's some years in the future, hardware may look different at that time.

That is fair. If we were to support ARM now, then Apple Silicon would be an obvious target, which would make such a requirement difficult to meet. But this might change in the future.

@marmarek: do you know what the PCIe Location Path that Microsoft refers to is?

Seems to be BDF, based on a quick google: https://superuser.com/a/1202958 lists "PCI bus 3, device 0, function 0" as "LocationInfo"

Interesting. Is BDF what is used for switching PCIe packets “on-the-wire”?

@marmarek

This comment was marked as off-topic.

@marmarek
Copy link
Member

Interesting. Is BDF what is used for switching PCIe packets “on-the-wire”?

Yes, I think so. It is at least used to access config space.

@DemiMarie
Copy link

Is it also used when programming the IOMMU?

@marmarek
Copy link
Member

Short answer: yes. At least for VT-d, DMAR ACPI table describes mapping BDF to IOMMU engines.

@DemiMarie
Copy link

Short answer: yes. At least for VT-d, DMAR ACPI table describes mapping BDF to IOMMU engines.

When BDF changes due to adding or removing a card, does the DMAR table also change in such a way that the IOMMU engine number of existing devices does not change? If this is always the case, then the IOMMU engine number could be used as an additional key.

@marmarek
Copy link
Member

I don't think that's useful in practice. In common desktop systems you have 1 or 2 IOMMU engines, so this "additional key" would be the same for all devices in most cases anyway.

@DemiMarie
Copy link

Ah, good point.

3hhh added a commit to 3hhh/qubes-core-admin-linux that referenced this issue Nov 22, 2022
This feature can be used by advanced users to assign devices to pciback
in a policy-like manner based on various PCI device attributes.

References QubesOS/qubes-issues#7886 QubesOS/qubes-issues#7792
3hhh added a commit to 3hhh/qubes-core-admin-linux that referenced this issue Nov 26, 2022
This feature can be used by advanced users to assign devices
to pciback in a policy-like manner based on various PCI device
attributes.

References
QubesOS/qubes-issues#7886
QubesOS/qubes-issues#7792
@ddevz
Copy link
Author

ddevz commented Nov 28, 2022

This is IMO a good idea. To validate VID+PID in addition to device bus-device-function (BDF).
This gets a bit more annoying for sys-gui (w/o dom0 fallback) and sys-usb (w/o ps2 mouse/keyboard), but...well we already have those problems today. :)

Depends at what stage of the boot process it happened at. For example, if the "The PCI-ID of your USB controller changed, what do you want to do?" happened during the grub bootloader stage before xen loaded, then it wouldn't be a problem.
Of course we would need to consider the security implications of passing qube configuration change decisions into dom0.

(Also, note: I believe there is supposedly some information that can be gathered before xen loads that cannot be gathered afterwards. (I read something about software tools that can't run under xen (as dom0), which can find out which PCI devices need to be passed as part of the same "group" (can't seem to find the article right now)))

Also, dom0 would either need to "publish" the VM to PCI device map for that to be accessible by grub, or grub would need to be able to read the luks volumes.

Anyway, I'm not pushing for this, but though I'd mention it as a possibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-4.1 This issue affects Qubes OS 4.1. C: core hardware support needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. security This issue pertains to the security of Qubes OS. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.
Projects
Status: PCI pain
Development

No branches or pull requests

9 participants