Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wipe RAM of VM when it shuts down or is killed, and during memory balancing #4488

Closed
SuzanneSoy opened this issue Nov 7, 2018 · 12 comments
Closed
Labels
C: core P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. security This issue pertains to the security of Qubes OS.

Comments

@SuzanneSoy
Copy link

Qubes OS version:

4.0

Affected component(s):

Domains


Steps to reproduce the behavior:

  1. Close or pause all untrusted VMs
  2. Read a file containing some secret information (cryptography key, etc.) in a VM
  3. Shut down the VM
  4. Start an untrusted VM which uses some attack to read the contents of the RAM
  5. I could not find any documentation indicating that the RAM that was occupied by the "secret" VM is wiped on shutdown (or when memory balancing deallocates a chunk of RAM). If that's the case, then simply shutting down a VM is not enough to protect in-RAM secrets from an attack that happens after the VM halts.

Expected behavior:

When a chunk of RAM is released during memory balancing, normal shutdown or kill, it should be zeroed out. If that's already the case, I'll make a PR to add this information to the docs.

Actual behavior:

I assume that the contents of the RAM remain accessible to an attacker. Otherwise, good news, but the feature is undocumented.

General notes:


Related issues:

These issues seem are about clearing the RAM of the host when it shuts down, to prevent local forensics (e.g. take the RAM out and plug it into another computer). In this issue, I'm talking about the RAM of a VM being accessed by another VM (e.g. via a future attack similar to Meltdown) after the first was halted.

#1562
#1563

@andrewdavidwong andrewdavidwong added enhancement C: core security This issue pertains to the security of Qubes OS. labels Nov 8, 2018
@andrewdavidwong andrewdavidwong added this to the Far in the future milestone Nov 8, 2018
@andrewdavidwong
Copy link
Member

This would be helpful to protect against Spectre/Meltdown-style attacks.

@rustybird
Copy link

I'd love to see this too. It would go a long way towards fixing the one aspect (RAM forensics) where Split dm-crypt is less secure than directly attaching encrypted block devices via qvm-block.

@awokd
Copy link

awokd commented Sep 13, 2019

Isn't this covered by #1562 (comment)?

@andrewdavidwong
Copy link
Member

Isn't this covered by #1562 (comment)?

Yes, it appears so. Closing as a duplicate of #1562.

@andrewdavidwong andrewdavidwong added the R: duplicate Resolution: Another issue exists that is very similar to or subsumes this one. label Sep 14, 2019
@andrewdavidwong andrewdavidwong removed this from the Release TBD milestone Jul 10, 2023
@andrewdavidwong
Copy link
Member

In #1562 (comment), @adrelanos wrote:

Therefore, I am reopening this issue.

@andrewdavidwong andrewdavidwong added P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. and removed R: duplicate Resolution: Another issue exists that is very similar to or subsumes this one. labels Jan 13, 2024
@tasket
Copy link

tasket commented Jun 30, 2024

You should look into Xen's scrub-domheap option, which is being discussed in #1562.

@marmarek
Copy link
Member

marmarek commented Jul 1, 2024

You should look into Xen's scrub-domheap option, which is being discussed in #1562.

That's literally no-op if you run any non-ancient Linux (newer than 3.0 I believe) in a VM.
The option is only about scrubbing pages on balloon, but Linux does that already before giving them back to Xen.
On VM shutdown (graceful or not), Xen scrubs the VM's memory before using it for anything else (including using for another VM). Regardless of the scrub-domheap option.

See documentation of the option:

### scrub-domheap
> `= <boolean>`

> Default: `false`

Scrub domains' freed pages. This is a safety net against a (buggy) domain
accidentally leaking secrets by releasing pages without proper sanitization.

and for anybody curious, here is relevant implementation (xen/common/page_alloc.c):

            /*
             * Normally we expect a domain to clear pages before freeing them,
             * if it cares about the secrecy of their contents. However, after
             * a domain has died we assume responsibility for erasure. We do
             * scrub regardless if option scrub_domheap is set.
             */
            scrub = d->is_dying || scrub_debug || opt_scrub_domheap;

This concludes question in this ticket.

@marmarek marmarek closed this as completed Jul 1, 2024
@tasket
Copy link

tasket commented Jul 1, 2024

This issue's threat model is defined by the possibility of an exploit against the hypervisor (which IMO includes physical interventions such as coldboot as well). In the moments just after an attack, having freed memory pages (from non-ballooned memory) queued for eventual scrubbing is of little help.

Additionally, a non-Linux guest might have been run, so whatever scrubbing we expect from Linux won't be in effect.

The strongly implied goal here is to have a free memory heap that is kept clean at any given moment (as #1562 states, not having the opportunity to shutdown is a problem). I think its a reasonable goal so I suggest re-opening this issue.


Looking through the Xen source code, there is a scrub_free_pages() function that is called from Xen's idle_loop(). If the scrub function could also be called synchronously (i.e. completed) on each domain termination then we might have the desired solution of quickly clearing out statically (de)allocated memory.

@marmarek
Copy link
Member

marmarek commented Jul 1, 2024

Additionally, a non-Linux guest might have been run, so whatever scrubbing we expect from Linux won't be in effect.

An OS that supports balloon driver is expected to clear pages from secrets before returning it. It can also choose to not support balloon driver at all if it's not willing to clear pages itself. Balloon driver submitted upstream to Linux in 2008 did scrubbing already. The option exists as a workaround for really ancient kernels (when Xen support wasn't even upstream yet, but as some 3rd-party patches). If your threat model involves such systems, you can also enable this option (but IMO in that case, you have bigger concerns...).
OTOH, effectively clearing each page twice on every balloon operation by enabling scrub-domheap option, it will make memory balancing noticeable slower, which will be user visible regression, especially on systems with less total memory.

Anyway, the option has nothing to do with scrubbing on shutdown (clean or not), when Xen is scrubbing the pages in any case.

This issue's threat model is defined by the possibility of an exploit against the hypervisor (which IMO includes physical interventions such as coldboot as well). In the moments just after an attack, having freed memory pages (from non-ballooned memory) queued for eventual scrubbing is of little help.

So, you mean the threat model is attacker gaining access to your system just after you shutdown some VM? Like, within (mili)seconds? If that's the threat you worry about, synchronous scrubbing likely won't save you either (the only difference is you may see that really scrubbing hasn't completed yet, if you are lucky). On the systems Qubes OS is designed for, this memory scrubbing really is quite quick operation (even via idle_loop). It takes longer if you run a server with several TB of RAM (in a single VM), but that's not our target audience.

Anyway, if somebody is willing to contribute the synchronous scrubbing feature to Xen (upstream) and it gets accepted there, we can consider enabling it (and maybe also backporting relevant patches to the Xen version we use at that time). In that case, the issue may warrant being re-opened. Otherwise, I'd rather not have an open issue with questionable gain to real users and negligible chance to ever being done.

The initial ticket was opened based on a speculation that a released memory is not scrubbed at all. That's not true, so the original point is mute.

@tasket
Copy link

tasket commented Jul 2, 2024

So, you mean the threat model is attacker gaining access to your system just after you shutdown some VM? Like, within (mili)seconds?

I think it could be seconds or minutes (or longer), if the user has the misfortune of running large CPU-intensive jobs, preventing the idle loop from completing a scrub. Perhaps a corner case, but not a small one. It becomes even larger if the attack originates in a domU and all they have to do to prevent idle scrubbing is busy-wait or do other heavy processing.

(Also, systems tend to be quite busy while shutting down. So then what happens to dom0's released static memory? Is the idle loop allowed to complete? A synchronous hypervisor scrub would address this, too.)

The Xen documentation leaves much to be desired, with few details about a domain's lifecycle (for instance, does a memory-balanced domain start with a static allocation, or a bundle of balloon allocations?). We have experienced Qubes users and contributors concurring with the OP because we're not getting a clear picture of a satisfactory process.

@tasket
Copy link

tasket commented Jul 3, 2024

Informative link to a patch that lead to the current idle-only scrubbing (when pages are not quickly re-allocated to a new VM). So older versions of Qubes probably do not have this issue's (assumed) vulnerability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: core P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. security This issue pertains to the security of Qubes OS.
Projects
None yet
Development

No branches or pull requests

7 participants