-
Notifications
You must be signed in to change notification settings - Fork 132
openhcl: refactor shared pool to general purpose page pool #260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
be6a2d7
to
3259bdc
Compare
131b81d
to
149ae0b
Compare
96e6fdc
to
488801c
Compare
openhcl/underhill_core/src/worker.rs
Outdated
.as_ref() | ||
.map(|p| p.allocator_spawner()); | ||
|
||
let vfio_dma_buffer_spawner = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll ultimately want a trait for this so it's clearer what we're passing around. But this is fine for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I think this whole thing will get another refactor with the central DMA management code coming soon, so I don't want to do too much overkill...
device_ids: Vec<String>, | ||
} | ||
|
||
// Manually implement inspect so device_ids can be rendered as strings, not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way to have done this might have been to make a StateView<'a>
type that mirrors State
but has a resolved &str
device ID, then derive inspect on that. And then in the inspect impl, just .map
State
to StateView<'_>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Do we do this pattern anywhere else? I hate losing #[derive(inspect)]
because it's so nice.
openhcl/page_pool_alloc/src/lib.rs
Outdated
}) | ||
.expect("must find allocation"); | ||
|
||
inner.state[index] = State::Free { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we still don't do any merging on free.
I guess that's OK in practice since we just allocate during startup and don't really free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the flip side, I wonder if we should stop handing out contiguous allocations and just always hand out a list of pages. Do any callers rely on contiguous pages?
(Don't need to change anything for this PR, but I'd like to understand the requirements.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC all the usage by devices expect contiguous pages, for queues and bounce buffers (unless I'm wrong and they're all just 4k each?). A lot of other callers just need a single 4K page, but I'd have to double check them all.
And yeah, for free we don't really expect any runtime allocations (the GET is the only one that might free but... we should probably fix that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NVMe queues are 4k each and this probably will never change. NVMe bounce buffers are bigger - 512k as of today, should be reduced once we investigate that.
Do any callers rely on contiguous pages
yes, as discussed in NVMe save/restore, the assumption will be that pages are contiguous.
openhcl/page_pool_alloc/src/lib.rs
Outdated
anyhow::bail!("device name {device_name} already in use"); | ||
} | ||
|
||
inner.device_ids.push(device_name.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We never free this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't undersatnd this comment - the clone is required because we also store the string version of the id in the allocator itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've actually removed this in the save-restore PR coming next. It's useless as it turns out. I can remove it here if you want, or it's already gone in my tip of tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you're pushing onto the device_ids
array but you never free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. If you drop the allocator, we should remove you from the array. But, the policy is a bit weird - do we allow you to get your previous ID if you create an allocator with the same state? You may have pending allocations that are tracked by a different handle.
My proposal is this:
- The device_id is always tied to this device, but if you free the allocator, you can create it again. We'll track Used/Unassigned state internally within the allocator. This is because I do not think with the scheme we have (storing indices inside the allocations themselves) there is any sane/safe way to remove the device_id without walking every allocation.
- Outstanding allocations are allowed to stay live, and are still tied to the allocator that may be marked as Unassigned. We only use this for device_id lookups for inspect anyways. If you create a new allocator with the same device_name, we link it to the existing id as you should be the same device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits
renames rename crate use device id for uniqueness save spawner tag nvme allocations per device new lock thiserror lock actually fix thiserror undo cargo.toml implement friendly inspect fix macos actually fix macos boxed spawner
e92bab3
to
fa3ac76
Compare
@@ -82,15 +84,15 @@ impl NvmeManager { | |||
pub fn new( | |||
driver_source: &VmTaskDriverSource, | |||
vp_count: u32, | |||
dma_buffer: Arc<dyn VfioDmaBuffer>, | |||
dma_buffer_spawner: Box<dyn Fn(String) -> anyhow::Result<Arc<dyn VfioDmaBuffer>> + Send>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we define a type for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can yeah. Lets take it as a follow up.
…#260) Refactor the shared visibility pool to a more general purpose page pool. This is in preparation for additional changes to support save restore, and private allocations. Add a new `new_private_pool` method to support future private memory page pools. Add additional tracking of allocations with device_ids and device names, which will be used for save restore. This can also help track current allocations. Update `NvmeManager` to hand out per-device allocators, which better helps allocations per device. In the future, it would be good to additionally update the VfioDmaBuffer trait to have additional tagging, as today all allocations are just tagged as "vfio dma" which isn't very helpful when inspecting.
Refactor the shared visibility pool to a more general purpose page pool. This is in preparation for additional changes to support save restore, and private allocations. Add a new `new_private_pool` method to support future private memory page pools. Add additional tracking of allocations with device_ids and device names, which will be used for save restore. This can also help track current allocations. Update `NvmeManager` to hand out per-device allocators, which better helps allocations per device. In the future, it would be good to additionally update the VfioDmaBuffer trait to have additional tagging, as today all allocations are just tagged as "vfio dma" which isn't very helpful when inspecting. Backport of #260
Backported in #474 |
Refactor the shared visibility pool to a more general purpose page pool. This is in preparation for additional changes to support save restore, and private allocations.
Add a new
new_private_pool
method to support future private memory page pools.Add additional tracking of allocations with device_ids and device names, which will be used for save restore. This can also help track current allocations. Update
NvmeManager
to hand out per-device allocators, which better helps allocations per device.In the future, it would be good to additionally update the VfioDmaBuffer trait to have additional tagging, as today all allocations are just tagged as "vfio dma" which isn't very helpful when inspecting.