DMA hint calculation in OpenHCL bootshim and fallback mem allocator for NVMe #1190

yupavlen-ms · 2025-04-15T21:09:07Z

To get some early feedback before creating a proper PR.

There is a comment that we should not be creating a big lookup table and always use heuristics instead. There is a problem with that: heuristics might significantly change depending on the device types assigned to this specific VM (MFND, ASAP, MANA) so if we really want heuristics then some host changes are needed:

Host (worker process) must check if at least one MFND device is assigned to VTL2 and then append "nvme" flag.
WP must check if at least one ASAP device is assigned to VTL2 and then append "asap" flag.
WP must check if at least one MANA device is assigned to VTL2 and then append "mana" flag.

Currently, in AH2025+, "nvme" flag is always present and the deistinction is made by non-zero dma-pages property. We may need to rework that to make "nvme" flag optional only if MFND devices are detected.

yupavlen-ms · 2025-04-15T21:10:04Z

@chris-oo @justus-camp-microsoft please read the PR description.

chris-oo · 2025-04-15T21:28:58Z

openhcl/openhcl_boot/src/host_params/dma_hint.rs

+                    dma_hint_4k = f.dma_hint_mb as u64 * 1048576 / PAGE_SIZE_4K;
+                    break;
+                } else {
+                    // Prepare for possible extrapolation.


Should we just instead round to the next bucket, aka be pessimistic?

Possible but also will reserve more than needed. If VTL2 Linux is okay with that (could be) then we can simplify it. Some thorough testing is needed, e.g. to create all Underhill VM sizes on TiP.

openhcl/openhcl_boot/src/dt.rs

chris-oo · 2025-04-15T21:30:36Z

openhcl/openhcl_boot/src/host_params/dma_hint.rs

+    },
+];
+
+/// Returns calculated DMA hint value, in 4k pages.


I'd probably want some more rationale on what we should do if we don't match one of these lookup tables exactly.

chris-oo · 2025-04-15T21:31:12Z

openhcl/openhcl_boot/src/host_params/dt.rs

@@ -463,7 +467,17 @@ impl PartitionInfo {
                crate::cmdline::parse_boot_command_line(storage.cmdline.as_str())
                    .enable_vtl2_gpa_pool;

-            max(dt_page_count.unwrap_or(0), cmdline_page_count.unwrap_or(0))
+            let hostval = max(dt_page_count.unwrap_or(0), cmdline_page_count.unwrap_or(0));
+            if hostval == 0 &&


I wonder if we should also make some indication to usermode that we calculated dma memory ourselves based on heuristics, instead of the host providing it. And make that available via inspection and log it?

That would be nice, so it looks like you mean another vmbus protocol chunk? Besides some logging later.

i don't know if we need to do anything other than report something more in the device tree we pass to the linux kernel (which i think we already do today?)

We do it today, I haven't found a case where we need to change anything there yet.

Ah i mean that we should report some indication (that we can log) so we know that we didn't get a hint from the host, and cacluated it ourselves. I don't think we have that part today.

yupavlen-ms · 2025-04-24T18:52:11Z

openhcl/openhcl_boot/src/host_params/dt.rs

+            let hostval = max(dt_page_count.unwrap_or(0), cmdline_page_count.unwrap_or(0));
+            if hostval == 0
+                && parsed.nvme_keepalive
+                && params.isolation_type == IsolationType::None


Based on other discussion, this may need to be revisited.

chris-oo · 2025-04-25T18:20:36Z

vm/page_pool_alloc/src/lib.rs

+    }
+
+    /// Not supported for this allocator.
+    fn fallback_alloc_size(&self) -> u64 {


While this value is useful, I'm wondering if is_persistent is enough here? If you query at save time "are all the allocations tied to this client persistent?" and the answer is no, then that's good enough right?

How does this PR handle if a given allocation has persistent and non-persistent allocations in a client? Do we expect clients to do full teardown before we save/restore dma_manager state? Or are we expecting dma_manager to know that the "allocations" tied to a client that did fallback are actually free ranges?

The idea of is_persistent() is to return the default behavior for the allocator. It is like informational value
To distinguish if fallback was used the function fallback_alloc_size must return non-zero, this triggers runtime disable of persistent state save.

So looks like you want a function which just tells that fallback did happen?

chris-oo · 2025-04-25T18:22:46Z

vm/devices/storage/disk_nvme/nvme_driver/src/driver.rs

@@ -161,6 +162,8 @@ enum NvmeWorkerRequest {
    CreateIssuer(Rpc<u32, ()>),
    /// Save worker state.
    Save(Rpc<(), anyhow::Result<NvmeDriverWorkerSavedState>>),
+    /// Query how much memory was allocated with fallback allocator.
+    QueryAllocatorStats(Rpc<(), DmaClientAllocStats>),


If we keep a clone of the client in nvme_manager, do we even need this? Since we have an Arc<dyn DmaClient> couldn't we just ask directly without needing to rpc to the driver?

We keep dma manager in nvme manager, but we don't know which clients were spawned for the individual devices.
I was looking into keeping everything in nvme manager, but don't see an easy connection between the manager and get_namespace / get_driver where we allocate a new client.

chris-oo · 2025-04-25T18:23:30Z

openhcl/underhill_core/src/nvme_manager.rs

@@ -341,6 +414,45 @@ impl NvmeManagerWorker {
            .map_err(|source| InnerError::Namespace { nsid, source })
    }

+    /// Copy of the code from get_driver.
+    fn get_dma_client(&self, pci_id: String) -> Result<Arc<dyn DmaClient>, InnerError> {


prefer fn names without get. In this case, it's actually a new operation right?

ok

context: I would like to use this function in both places where it is needed: get_driver and restore. But get_driver uses &mut self and I cannot call it from there.

Open for suggestions.

chris-oo · 2025-04-25T18:24:00Z

openhcl/underhill_core/src/nvme_manager.rs

@@ -295,18 +346,40 @@ impl NvmeManagerWorker {
        let driver = match self.devices.entry(pci_id.to_owned()) {
            hash_map::Entry::Occupied(entry) => entry.into_mut(),
            hash_map::Entry::Vacant(entry) => {
+                let device_name = format!("nvme_{}", pci_id);
+                let lower_vtl_policy = LowerVtlPermissionPolicy::Any;


why is this block copy pasted if we have the new_dma_client fn below?

This block:
let driver = match self.devices.entry(pci_id.to_owned()) {

prevents me from calling it (error about mutable vs immutable reference). If there is a solution please advise.

chris-oo · 2025-04-25T18:24:24Z

openhcl/underhill_core/src/nvme_manager.rs

+        {
+            Ok(s) => {
+                if s.stats.fallback_alloc > 0 {
+                    tracing::warn!(


I think we could log this within dma_manager itself? why do it here?

ok let me check.

the idea is that we need to know the final amount of fallback mem allocs at the moment of servicing. or we can log every fallback allocation, maybe, but I think the final number is more useful.

chris-oo · 2025-04-29T18:39:28Z

We discussed splitting the dma calculation and fallback code into different PRs, because they are logically distinct. Are you doing that?

mattkur · 2025-06-19T23:10:03Z

Should fix / obviate #670 , right?

github-actions · 2025-07-10T00:23:12Z

At least one Petri test failed.

chris-oo reviewed Apr 15, 2025

View reviewed changes

yupavlen-ms changed the title ~~[DRAFT] [Sneak Peek] DMA hint calculation in OpenHCL bootshim~~ [DRAFT] DMA hint calculation in OpenHCL bootshim Apr 18, 2025

yupavlen-ms force-pushed the dma_self_hint branch from 7a45be2 to 7fb5071 Compare April 23, 2025 14:59

yupavlen-ms marked this pull request as ready for review April 24, 2025 18:37

yupavlen-ms requested review from a team as code owners April 24, 2025 18:37

yupavlen-ms changed the title ~~[DRAFT] DMA hint calculation in OpenHCL bootshim~~ DMA hint calculation in OpenHCL bootshim Apr 24, 2025

yupavlen-ms changed the title ~~DMA hint calculation in OpenHCL bootshim~~ DMA hint calculation in OpenHCL bootshim and fallback mem allocator for NVMe Apr 24, 2025

yupavlen-ms commented Apr 24, 2025

View reviewed changes

chris-oo reviewed Apr 25, 2025

View reviewed changes

yupavlen-ms force-pushed the dma_self_hint branch from 0b3c9ac to 1c34003 Compare May 3, 2025 00:34

yupavlen-ms added 15 commits July 9, 2025 15:39

Thinking where to add it

b8a11fd

Improved syntax

0830eb1

More experiments

7257b7a

Added table lookup (incomplete table yet)

c9b7f77

Minor styling

e33d5b9

Add unit test

c4aea3a

More tests

79497de

Replace float point with integers

724143d

Make it more compact

9773e49

Expand the table

51ee797

Some progress to propagate fallback flag uphill but not quite done yet

794a9d7

Simplify some return values

aee64f2

Closer to final code

fe428f3

Validated fallback allocator for NVMe, more tracing

5e6c4a3

Move fallback logic to dma clients

b378d2d

yupavlen-ms added 4 commits July 9, 2025 16:03

Tracking and inspecting improvements

ebe442f

Inform openhcl about self hint and print it

5358b56

Swapped one row

5e7df87

Rebase on more recent main

af82415

yupavlen-ms force-pushed the dma_self_hint branch from 1c34003 to af82415 Compare July 9, 2025 23:41

DMA hint calculation in OpenHCL bootshim and fallback mem allocator for NVMe #1190

Are you sure you want to change the base?

DMA hint calculation in OpenHCL bootshim and fallback mem allocator for NVMe #1190

Conversation

yupavlen-ms commented Apr 15, 2025

Uh oh!

yupavlen-ms commented Apr 15, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chris-oo commented Apr 29, 2025

Uh oh!

mattkur commented Jun 19, 2025

Uh oh!

github-actions bot commented Jul 10, 2025

Uh oh!

Uh oh!