Skip to content

propolis-server: collect and report VSS minus VM mappings #889

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

iximeow
Copy link
Member

@iximeow iximeow commented Mar 26, 2025

collect VSS from /proc/self/psinfo and report it as an Oximeter metric for the VM. VSS is maintained as mappings are adjusted so it's about as close to free as a syscall can get. Propolis' VSS includes guest-mapped regions; this change subtracts out the size of a guest's ASpace so we only get the amount Propolis has a say in. no RSS measurement here, but that's really only useful to tell us if the virtual mappings are overly-large (like, do we have a 128 MiB heap when we're only using 4 MiB?) - that'd be nice to know in the limit, but somewhat less interesting now. most of our VSS comes from propolis-server itself anyway.

the original approach here was walking through /proc/{getpid()}/xmap to get both RSS and VSS, and letting us exclude mappings that are part of the guest's accessible memory - in some sense this is aesthetically pleasant, but collecting and iterating the thousands of mappings meant this took up to seconds(!) of time that potentially could block the guest VM (!) if it needed the process lock at any point while we were measuring.

github-actions[bot]

This comment was marked as resolved.


// Safety: getpid() does not alter any state, or even read mutable state.
let mypid = unsafe { libc::getpid() };
let xmap_path = format!("/proc/{}/xmap", mypid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could just be "/proc/self/xmap" ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i hoped and thought so, but i don't have a /proc/self* at all locally. i was kinda curious about it: if you have it maybe it's added by something downstream of illumos, and helios doesn't get the fun?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pmooney informs me that /proc/self/ in fact there and just not in ls /proc (or, consequently, tab completion for /proc/s :) )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the s-e-l-f is silent

iximeow and others added 4 commits April 12, 2025 00:33
not suitable for.. a few reasons. but this does get reasonable numbers!
`propolis-server` memory use is in no small part... the binary itself.

given a 32 GiB VM with a few cores, this takes almost five seconds on
my workstation to get through all of /proc/{getpid()}/xmap. given
there being no I/O here, that's five seconds of CPU time to calculate
memory stats. not good, not something i'd want to merge.

additionally, while walking /proc/{getpid()}/xmap we at various points
hold the process' address space lock (for writing!), which can be very
much observed by the guest VM itself. i happened to log into the
above-mentioned VM while memory stat collection was going on, and login
hung for two seconds while that finished. steady-state behavior is less
bad (`while true; do date; sleep 0.2; done` does not show any
disruption for example), but, again, not shippable behavior without at
*least* a feature flag.

using `pmap -x <pid>` does similar operations with respect to the
system, opening `xmap` and reading entries and on, but it seems somewhat
faster than the code here. hundreds of milliseconds instead of plural
seconds. i'm not sure what's different, but the service-interrupting
nature of the metric collection makes it kind of a showstopper either
way.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@iximeow
Copy link
Member Author

iximeow commented Apr 12, 2025

this now collects useful information with at most 30 microseconds holding locks we'd care about - i'm not certain it would get through to oximeter yet (at least i haven't seen it work locally) but this is not outright harmful anymore!

…l rev

this obviously wouldn't be for landing, but it's demonstrative. to get
an updated Oximeter schema here, we'd need to bump Omicron. then
Propolis depends on one Omicron while Crucible depends on another, or
this is sequenced like:

* merge Propolis schema change in Omicron
* update Omicron in Crucible
* update Omicron and Crucible in Propolis
* happy

... conveniently i already want to do this for Progenitor anyway.
Copy link
Member Author

@iximeow iximeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Omicron change this points to is oxidecomputer/omicron#8071.

i've left a comment on the Cargo.toml diff but i do not intend for the Cargo.toml or most of the Cargo.lock changes to land here. the CI failure is due to three tests that seem downstream of the version bumping, but i... really do not understand why.

Cargo.toml Outdated
Comment on lines 183 to 195
[patch."https://github.com/oxidecomputer/omicron"]
#internal-dns = { path = "../omicron/internal-dns" }
nexus-client = { git = "https://github.com/oxidecomputer//omicron.git", rev = "3950a44ba51c1696b19392d20fd2cbb9457a000d" }
omicron-uuid-kinds = { git = "https://github.com/oxidecomputer//omicron.git", rev = "3950a44ba51c1696b19392d20fd2cbb9457a000d" }
nexus-types = { git = "https://github.com/oxidecomputer//omicron.git", rev = "3950a44ba51c1696b19392d20fd2cbb9457a000d" }
illumos-utils = { git = "https://github.com/oxidecomputer//omicron.git", rev = "3950a44ba51c1696b19392d20fd2cbb9457a000d" }
omicron-common = { git = "https://github.com/oxidecomputer//omicron.git", rev = "3950a44ba51c1696b19392d20fd2cbb9457a000d" }
oximeter-instruments = { git = "https://github.com/oxidecomputer//omicron.git", rev = "3950a44ba51c1696b19392d20fd2cbb9457a000d" }
oximeter-producer = { git = "https://github.com/oxidecomputer//omicron.git", rev = "3950a44ba51c1696b19392d20fd2cbb9457a000d" }
oximeter = { git = "https://github.com/oxidecomputer//omicron.git", rev = "3950a44ba51c1696b19392d20fd2cbb9457a000d" }

# i've just picked what Omicron happens to currently point to.
tufaceous-artifact = { git = "https://github.com/oxidecomputer//tufaceous.git", rev = "04681f26ba09e144e5467dd6bd22c4887692a670" }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this all specifically is not intended to merge. to get a new Oximeter metric here, that's a change the the schema (and bump of oximeter). having Oximeter at one rev while Omicron is at other ones got me to a state where Propolis wouldn't build. bumping all of Omicron here left Propolis and Crucible using different Omicrons which again did not build because Cargo.toml had an old tufaceous-artifact.

this has somehow ended up with the three test failures in the last build, which all look like:

  phd-runner: [INSTANCE_ENSURE_INTERNAL - EVENT] permanent error from instance_spec_ensure 
    e = Error Response: status: 400 Bad Request; headers: {
      "content-type": "application/json", "x-request-id": "58e1ca30-797a-48bb-9191-a36b956e243b", 
      "content-length": "533", "date": "Wed, 30 Apr 2025 08:59:40 GMT"
    }; value: Error {
      error_code: None, 
      message: "unable to parse JSON body: init.value.spec.components.boot-disk.type: unknown variant `NvmeDisk`,
      expected one of `virtio_disk`, `nvme_disk`, `virtio_nic`, `serial_port`, `pci_pci_bridge`, `qemu_pvpanic`,
        `boot_settings`, `soft_npu_pci_port`, `soft_npu_port`, `soft_npu_p9`, `p9fs`, `migration_failure_injector`,
        `crucible_storage_backend`, `file_storage_backend`, `blob_storage_backend`, `virtio_network_backend`,
        `dlpi_network_backend` at line 1 column 180",
      request_id: "58e1ca30-797a-48bb-9191-a36b956e243b"
    } 

and frankly i'm at a loss about what changed the casing of variants here. it is presumably related to my wanton version bumping.

anyway, this PR is at "i think it's good, would folks +1 the new metric?" - if others agree i'll go PR the schema change in Omicron and figure out getting that through to Propolis.

@iximeow iximeow changed the title collect memory stats, but it's not really suitable propolis-server: collect and report VSS minus VM mappings Apr 30, 2025
@iximeow iximeow marked this pull request as ready for review April 30, 2025 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants