-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor DPU Tools into a uniform interface #27
base: main
Are you sure you want to change the base?
Conversation
SamD2021
commented
Nov 8, 2024
- All scripts have been either turned into modules, or been housed in existing modules.
- Merged Dockerfiles together.
- Added a new DPU Types enum class.
- Added DPU Hardware Detection.
- Refactored DPU Tools script into a class, since there were many variables that should be shared between functions
- Transformed pxeboot script into a module and refactored it into a class.
- Added firmware class for Bluefields into fwutils module.
- Made capture output the default for the run function in the common module.
- Most tools can be utilized with auto hardware detection
- Added Console check for DPU Hardware Type
This is in preparation for making a unified interface for dpu tools that is able to detected the hardware its running on an call its respective tools ipu: Remove IPU dir
Remove redundant fwversion Remove redundant fwup Remove redundant fwdefaults Remove listbf script The list bf script has been integrated into the list dpus command and function. Now we can use this information to auto detect hardware and have the tool behave accordingly depending on its hardware
Make the imports match the new structure
Many of these functions aren't necessarily tied to the ipu so moving it to the common module makes more sense
Matching imports to new structure
Since we want to unify the scripts behind dpu-tools we should also build the tools using the same Dockerfile
This enum keeps us with a structured way to track different DPU types. Previously we would just use hardcoded strings, but with this class we can even validate if the DPU Type is within our tracked types
Since DPU tools often relies passing the args to each function, it would be nice to have a data structure to bundle all of this in. Likewise it would be good if we could also keep track of what DPU type are working with.
This script is useful because it lets rshim run in the background, which the Bluefields DPUs need to console into them.
By making this class we can keep together related logic to reset, upgrade and version firmware across dpu types
Many of the Bluefield scripts requires capturing the output. It would be better to have it default to capturing output and have an option to turn it off when we think its not useful
Here we are using a single mode subcommand that can be used to retrieve the mode and set if passed the set-mode flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I am happy to merge if @bn222 approves.
From a user-perspective, I feel everything is working and intuitive (i.e. help for each subcommand provides good output).
From a maintainers persepctive, this could use a little more work to make it easy to extend this in the future for whatever DPUs we add support for moving forward.
``` | ||
|
||
## Tools | ||
|
||
All the tools can directly be ran inside the container. All the tools automatically find and act on the first BF in the system. | ||
All the tools can directly be interacted with through the dpu-tools interface. All the tools automatically find and act on the first DPU in the system. One small difference is that using the console subcommand requires specifying what DPU type you are working with. Check out its help page by passing the `-h` flag. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"one small difference" awkward wording, maybe something like "Note* specific subcommands require knowledge of the DPU type in question. For example, DPU-type is required when running the console command..."
| `get_mode` | Gets the BF mode. | | ||
| `cx_fwup` | Upgrades the firmware on a CX | | ||
| `mode` | Gets the BF mode. Use `--set-mode` to change the mode to either dpu or nic | | ||
| `utils` | Access common or non-dpu specific utilities. {cw_fwup, bfb} | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it, we just need to make sure to make an update accordingly in https://github.com/bn222/cluster-deployment-automation/blob/main/host.py#L428.
This applies to all the commands called from CDA.
|
||
def reset(self) -> None: | ||
if self.dpu_type == DPUType.IPU.name: | ||
run("ssh [email protected] sudo reboot") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should update this to use the IMC address, this should be a required argument for any IPU commands that need connection to the IMC
logger = logging.getLogger(__name__) | ||
|
||
|
||
class DPUTools: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make sense to make this an abstract-base-class that each DPU derives from. Imagine you are adding a new DPU. You don't want to have to reach through every single line and add a bunch of new "if DPUType.my_new_dpu"... statements.
Might be nicer to do this in one place, and create a class object that inherits from DPUTools for each of these.
That said, this PR works and has been tested, so fine with adding this in a future PR.
|
||
|
||
def main() -> None: | ||
detected = detect_dpu_type() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should take the DPU type from the user if they provide it, otherwise we use the auto-detected dpu type.
We could even through a warning if the users dpu type and the autodetected do not match.
) | ||
bfb_parser.set_defaults(subcommand="bfb") | ||
# Parse arguments and initialize DPUTools | ||
args = parser.parse_args() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to have this main() function call out to another function to set up all the above arguments. The way it is now makes the code a bit harder to read / harder to track where subsequent changes need to be added.
So main would essentially do:
- set up arguments
- set up logging
- create dpuTools
- dispatch dpuTools
Any then the more granular argument organization could go in the "set up arguments" function.
|
||
|
||
@dataclasses.dataclass(frozen=True) | ||
class Result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we just take this from common.py?
I'm also happy with the changes. We will need to specify which dpu type we're using since we can't detect it. |