Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor DPU Tools into a uniform interface #27

Open
wants to merge 50 commits into
base: main
Choose a base branch
from

Conversation

SamD2021
Copy link
Contributor

@SamD2021 SamD2021 commented Nov 8, 2024

  • All scripts have been either turned into modules, or been housed in existing modules.
  • Merged Dockerfiles together.
  • Added a new DPU Types enum class.
  • Added DPU Hardware Detection.
  • Refactored DPU Tools script into a class, since there were many variables that should be shared between functions
  • Transformed pxeboot script into a module and refactored it into a class.
  • Added firmware class for Bluefields into fwutils module.
  • Made capture output the default for the run function in the common module.
  • Most tools can be utilized with auto hardware detection
  • Added Console check for DPU Hardware Type

This is in preparation for making a unified interface for dpu tools that
is able to detected the hardware its running on an call its respective
tools

ipu: Remove IPU dir
Remove redundant fwversion

Remove redundant fwup

Remove redundant fwdefaults

Remove listbf script

The list bf script has been integrated into the list dpus command and
function. Now we can use this information to auto detect hardware and
have the tool behave accordingly depending on its hardware
Make the imports match the new structure
Many of these functions aren't necessarily tied to the ipu so moving it
to the common module makes more sense
Matching imports to new structure
Since we want to unify the scripts behind dpu-tools we should also build
the tools using the same Dockerfile
This enum keeps us with a structured way to track different DPU types.
Previously we would just use hardcoded strings, but with this class we
can even validate if the DPU Type is within our tracked types
Since DPU tools often relies passing the args to each function, it would
be nice to have a data structure to bundle all of this in. Likewise it
would be good if we could also keep track of what DPU type are working
with.
This script is useful because it lets rshim run in the background, which
the Bluefields DPUs need to console into them.
By making this class we can keep together related logic to reset,
upgrade and version firmware across dpu types
Many of the Bluefield scripts requires capturing the output. It would be
better to have it default to capturing output and have an option to turn
it off when we think its not useful
Here we are using a single mode subcommand that can be used to retrieve
the mode and set if passed the set-mode flag
Copy link
Contributor

@SalDaniele SalDaniele left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I am happy to merge if @bn222 approves.

From a user-perspective, I feel everything is working and intuitive (i.e. help for each subcommand provides good output).

From a maintainers persepctive, this could use a little more work to make it easy to extend this in the future for whatever DPUs we add support for moving forward.

```

## Tools

All the tools can directly be ran inside the container. All the tools automatically find and act on the first BF in the system.
All the tools can directly be interacted with through the dpu-tools interface. All the tools automatically find and act on the first DPU in the system. One small difference is that using the console subcommand requires specifying what DPU type you are working with. Check out its help page by passing the `-h` flag.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"one small difference" awkward wording, maybe something like "Note* specific subcommands require knowledge of the DPU type in question. For example, DPU-type is required when running the console command..."

| `get_mode` | Gets the BF mode. |
| `cx_fwup` | Upgrades the firmware on a CX |
| `mode` | Gets the BF mode. Use `--set-mode` to change the mode to either dpu or nic |
| `utils` | Access common or non-dpu specific utilities. {cw_fwup, bfb} |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it, we just need to make sure to make an update accordingly in https://github.com/bn222/cluster-deployment-automation/blob/main/host.py#L428.

This applies to all the commands called from CDA.


def reset(self) -> None:
if self.dpu_type == DPUType.IPU.name:
run("ssh [email protected] sudo reboot")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update this to use the IMC address, this should be a required argument for any IPU commands that need connection to the IMC

logger = logging.getLogger(__name__)


class DPUTools:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make sense to make this an abstract-base-class that each DPU derives from. Imagine you are adding a new DPU. You don't want to have to reach through every single line and add a bunch of new "if DPUType.my_new_dpu"... statements.

Might be nicer to do this in one place, and create a class object that inherits from DPUTools for each of these.

That said, this PR works and has been tested, so fine with adding this in a future PR.



def main() -> None:
detected = detect_dpu_type()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should take the DPU type from the user if they provide it, otherwise we use the auto-detected dpu type.

We could even through a warning if the users dpu type and the autodetected do not match.

)
bfb_parser.set_defaults(subcommand="bfb")
# Parse arguments and initialize DPUTools
args = parser.parse_args()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to have this main() function call out to another function to set up all the above arguments. The way it is now makes the code a bit harder to read / harder to track where subsequent changes need to be added.

So main would essentially do:

  • set up arguments
  • set up logging
  • create dpuTools
  • dispatch dpuTools

Any then the more granular argument organization could go in the "set up arguments" function.



@dataclasses.dataclass(frozen=True)
class Result:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just take this from common.py?

@bn222
Copy link
Owner

bn222 commented Nov 12, 2024

I'm also happy with the changes. We will need to specify which dpu type we're using since we can't detect it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants