Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iowait consuming too much cpu resources #30

Open
renweihang opened this issue Jan 28, 2024 · 7 comments
Open

iowait consuming too much cpu resources #30

renweihang opened this issue Jan 28, 2024 · 7 comments

Comments

@renweihang
Copy link

renweihang commented Jan 28, 2024

After BMC normal startup, check the CPU usage:
image

then stop all sensor service , used the follow command:
systemctl stop xyz.openbmc_project.hwmontempsensor.service
systemctl stop xyz.openbmc_project.fansensor.service
systemctl stop xyz.openbmc_project........service
......

Check the CPU usage again:
image

Even if I just started one sensor hwmon service(xyz.openbmc_project.hwmontempsensor.service), and without any sensor, this issue still here
image

The following are the situations before and after stopping the hwmon service
image

@renweihang
Copy link
Author

renweihang commented Jan 28, 2024

It seems to be related to sdbusplus. the problem occurs whenever sdbusplus is used.
The following is the debugging code.
Using sdbusplus:
image

Not using sdbusplus:
image

PS: The OpenBMC commit I am using is 6fddef299932b1270a799e78566e25daa911f742

So, I opened a new issue at openbmc/sdbusplus#92
Hope to get your help,thanks a lot !!

@edtanous
Copy link
Contributor

What platform was this tested on?

@renweihang
Copy link
Author

What platform was this tested on?

meta-g220a

And I Used a new commit , The issue still there:
OpenBmc Commit: 1f0056e138d1eb872784fc20c21e1e340d64a74c (Fri Dec 15 17:20:20 2023 -0600)
dbus-sensors Commit: 28b8823

@edtanous
Copy link
Contributor

edtanous commented Jan 29, 2024

Looking at https://github.com/openbmc/meta-bytedance/blob/master/meta-g220a/recipes-phosphor/configuration/entity-manager/g220a_baseboard.json

First off, this file shouldn't be in the meta layer. Issues with this file would've been caught earlier by CI if it had been put in the right place.

I see a large number of very "expensive" to read sensors. Considering this is an ast2500, it seems very likely that the io load you're seeing is real, and a result of too much IO being done on that platform with that configuration. I also see a number of config stanzas that are just unsupported by upstream (like pmem). How certain are you that you tested this on an upstream build?

To triage, I would start by removing the various config types, until you find the one that's causing the most contention, then look at what you can do to increase the performance of those sensor types. It's very likely that you just need to optimize your platforms read rates to account for the bandwidth of your i2c lanes, especially for pmbus devices, which are non-trivial to read.

Note, that a high iowait percentage is not a bug in itself. It was likely that in the past this platform was just blocking in userspace, and sensors were scanning slower than specified in the config file. When we moved to uring, now that same contention shows up as iowait instead of silently happening in userspace. This doesn't mean that the actual sensor scan rates are any worse than it was before. In fact, they're likely better because of uring, but do make this problem more aparent.

Good luck with your debug. Let us know what your findings are, and if we can transfer this bug to be g220 specific.

@renweihang
Copy link
Author

renweihang commented Feb 2, 2024

Looking at https://github.com/openbmc/meta-bytedance/blob/master/meta-g220a/recipes-phosphor/configuration/entity-manager/g220a_baseboard.json

First off, this file shouldn't be in the meta layer. Issues with this file would've been caught earlier by CI if it had been put in the right place.

I see a large number of very "expensive" to read sensors. Considering this is an ast2500, it seems very likely that the io load you're seeing is real, and a result of too much IO being done on that platform with that configuration. I also see a number of config stanzas that are just unsupported by upstream (like pmem). How certain are you that you tested this on an upstream build?

To triage, I would start by removing the various config types, until you find the one that's causing the most contention, then look at what you can do to increase the performance of those sensor types. It's very likely that you just need to optimize your platforms read rates to account for the bandwidth of your i2c lanes, especially for pmbus devices, which are non-trivial to read.

Note, that a high iowait percentage is not a bug in itself. It was likely that in the past this platform was just blocking in userspace, and sensors were scanning slower than specified in the config file. When we moved to uring, now that same contention shows up as iowait instead of silently happening in userspace. This doesn't mean that the actual sensor scan rates are any worse than it was before. In fact, they're likely better because of uring, but do make this problem more aparent.

Good luck with your debug. Let us know what your findings are, and if we can transfer this bug to be g220 specific.

Thanks a lot, it is indeed related to io_uring
image

So, As you said, this is not a bug? Just a feature of io_uring? Do we need to pay attention to this issue anymore?
If left unattended, will the low CPU idle affect the normal use of other processes?

@y11627
Copy link

y11627 commented Apr 20, 2024

iowait will drop when revert this kernel commit "io_uring: Use io_schedule* in cqring "
openbmc/linux@f32dfc8

@amboar
Copy link
Member

amboar commented Apr 21, 2024

The linked patch is a change to accounting more than anything else. I don't think it's particularly concerning?

https://lore.kernel.org/lkml/[email protected]/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants