Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[thermalctld] Add check for psu presence to thermalctld when updating PSU thermals #387

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions sonic-thermalctld/scripts/thermalctld
Original file line number Diff line number Diff line change
Expand Up @@ -592,6 +592,8 @@ class TemperatureUpdater(logger.Logger):
self._refresh_temperature_status(CHASSIS_INFO_KEY, thermal, index)

for psu_index, psu in enumerate(self.chassis.get_all_psus()):
if not psu.get_presence():
Copy link
Collaborator

@Junchao-Mellanox Junchao-Mellanox Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that we should check presence here. "N/A" is good enough to tell user that the PSU thermal is not available at the moment, no matter it is because of PSU absence or hardware error. Maybe they should fix their test expectation. But if you insist the change, please consider following items:

  1. Do we need check psu.get_power_good_status here? If PSU is present but not powered on, thermal sensor should not be available.
  2. Do we need remove previous DB entry here?
  3. For show platform fan and show platform psu, non presence PSU will be displayed, should we align it with this design?
  4. For existing sonic-mgmt test, can we guarantee no test is affected?
  5. For other vendor, who may have private test based on existing design might run into issue
  6. Unit test is needed to cover the new change.

Copy link
Collaborator

@prgeor prgeor Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cytsao1 @Junchao-Mellanox how about get_chassis.all_psus() return only those PSUs which are present? In that case, "N/A" would indicate some issue with physical present PSU?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only returning present PSUs from get_all_psus would cause issues with inconsistent indexing throughout the code; when enumerate is used, numbering will always be sequential from 0, and inserting a new PSU may cause numbering of existing PSUs to change.

continue
parent_name = 'PSU {}'.format(psu_index + 1)
for thermal_index, thermal in enumerate(psu.get_all_thermals()):
if self.task_stopping_event.is_set():
Expand Down