Skip to content

openhcl: refactor shared pool to general purpose page pool #260

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 2, 2024

Conversation

chris-oo
Copy link
Member

@chris-oo chris-oo commented Nov 7, 2024

Refactor the shared visibility pool to a more general purpose page pool. This is in preparation for additional changes to support save restore, and private allocations.

Add a new new_private_pool method to support future private memory page pools.

Add additional tracking of allocations with device_ids and device names, which will be used for save restore. This can also help track current allocations. Update NvmeManager to hand out per-device allocators, which better helps allocations per device.

In the future, it would be good to additionally update the VfioDmaBuffer trait to have additional tagging, as today all allocations are just tagged as "vfio dma" which isn't very helpful when inspecting.

@chris-oo chris-oo requested review from a team as code owners November 7, 2024 18:11
@chris-oo chris-oo marked this pull request as ready for review November 15, 2024 19:09
.as_ref()
.map(|p| p.allocator_spawner());

let vfio_dma_buffer_spawner =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll ultimately want a trait for this so it's clearer what we're passing around. But this is fine for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I think this whole thing will get another refactor with the central DMA management code coming soon, so I don't want to do too much overkill...

device_ids: Vec<String>,
}

// Manually implement inspect so device_ids can be rendered as strings, not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to have done this might have been to make a StateView<'a> type that mirrors State but has a resolved &str device ID, then derive inspect on that. And then in the inspect impl, just .map State to StateView<'_>.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Do we do this pattern anywhere else? I hate losing #[derive(inspect)] because it's so nice.

})
.expect("must find allocation");

inner.state[index] = State::Free {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we still don't do any merging on free.

I guess that's OK in practice since we just allocate during startup and don't really free.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the flip side, I wonder if we should stop handing out contiguous allocations and just always hand out a list of pages. Do any callers rely on contiguous pages?

(Don't need to change anything for this PR, but I'd like to understand the requirements.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC all the usage by devices expect contiguous pages, for queues and bounce buffers (unless I'm wrong and they're all just 4k each?). A lot of other callers just need a single 4K page, but I'd have to double check them all.

And yeah, for free we don't really expect any runtime allocations (the GET is the only one that might free but... we should probably fix that).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NVMe queues are 4k each and this probably will never change. NVMe bounce buffers are bigger - 512k as of today, should be reduced once we investigate that.

Do any callers rely on contiguous pages

yes, as discussed in NVMe save/restore, the assumption will be that pages are contiguous.

anyhow::bail!("device name {device_name} already in use");
}

inner.device_ids.push(device_name.clone());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never free this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't undersatnd this comment - the clone is required because we also store the string version of the id in the allocator itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've actually removed this in the save-restore PR coming next. It's useless as it turns out. I can remove it here if you want, or it's already gone in my tip of tree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you're pushing onto the device_ids array but you never free.

Copy link
Member Author

@chris-oo chris-oo Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. If you drop the allocator, we should remove you from the array. But, the policy is a bit weird - do we allow you to get your previous ID if you create an allocator with the same state? You may have pending allocations that are tracked by a different handle.

My proposal is this:

  1. The device_id is always tied to this device, but if you free the allocator, you can create it again. We'll track Used/Unassigned state internally within the allocator. This is because I do not think with the scheme we have (storing indices inside the allocations themselves) there is any sane/safe way to remove the device_id without walking every allocation.
  2. Outstanding allocations are allowed to stay live, and are still tied to the allocator that may be marked as Unassigned. We only use this for device_id lookups for inspect anyways. If you create a new allocator with the same device_name, we link it to the existing id as you should be the same device.

Copy link
Member

@jstarks jstarks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits

renames

rename crate

use device id for uniqueness

save

spawner

tag nvme allocations per device

new lock

thiserror lock

actually fix thiserror

undo cargo.toml

implement friendly inspect

fix macos

actually fix macos

boxed spawner
@@ -82,15 +84,15 @@ impl NvmeManager {
pub fn new(
driver_source: &VmTaskDriverSource,
vp_count: u32,
dma_buffer: Arc<dyn VfioDmaBuffer>,
dma_buffer_spawner: Box<dyn Fn(String) -> anyhow::Result<Arc<dyn VfioDmaBuffer>> + Send>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we define a type for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can yeah. Lets take it as a follow up.

@chris-oo chris-oo merged commit ea5de5e into microsoft:main Dec 2, 2024
24 checks passed
@chris-oo chris-oo added the backport_2411 Change should be backported to the release/2411 branch label Dec 2, 2024
chris-oo added a commit to chris-oo/openvmm that referenced this pull request Dec 12, 2024
…#260)

Refactor the shared visibility pool to a more general purpose page pool.
This is in preparation for additional changes to support save restore,
and private allocations.

Add a new `new_private_pool` method to support future private memory
page pools.

Add additional tracking of allocations with device_ids and device names,
which will be used for save restore. This can also help track current
allocations. Update `NvmeManager` to hand out per-device allocators,
which better helps allocations per device.

In the future, it would be good to additionally update the VfioDmaBuffer
trait to have additional tagging, as today all allocations are just
tagged as "vfio dma" which isn't very helpful when inspecting.
chris-oo added a commit that referenced this pull request Dec 13, 2024
Refactor the shared visibility pool to a more general purpose page pool.
This is in preparation for additional changes to support save restore,
and private allocations.

Add a new `new_private_pool` method to support future private memory
page pools.

Add additional tracking of allocations with device_ids and device names,
which will be used for save restore. This can also help track current
allocations. Update `NvmeManager` to hand out per-device allocators,
which better helps allocations per device.

In the future, it would be good to additionally update the VfioDmaBuffer
trait to have additional tagging, as today all allocations are just
tagged as "vfio dma" which isn't very helpful when inspecting.

Backport of #260
@jstarks
Copy link
Member

jstarks commented Feb 6, 2025

Backported in #474

@jstarks jstarks added backported_2411 PR that has been backported to release/2411 and removed backport_2411 Change should be backported to the release/2411 branch labels Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backported_2411 PR that has been backported to release/2411
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants