Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RHUI and RPM Transaction configuration #1142

Draft
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

abadger
Copy link
Member

@abadger abadger commented Nov 7, 2023

Add configuration options for leapp's RHUI handling and RPM transactions.

RHUI Configuration

Example of RHUI configuration:

rhui:
  use_config : True
  source_clients: [ rh-amazon-rhui-client ]
  target_clients: [ rh-amazon-rhui-client ]
  cloud_provider: aws
  upgrade_files:
    /root/rhui-client-config-server-9.crt : /etc/pki/rhui/product/
    /root/rhui-client-config-server-9.key : /etc/pki/rhui/private/
    /root/leapp-aws.repo : /etc/yum.repos.d/
   rhui_target_repositories_to_use:
    - rhel-9-appstream-rhui-rpms
    - rhel-9-baseos-rhui-rpms

Leapp reads the configuration, checking the use_config field to know whether the values from config should be used, instead of leapp's decision routines. If use_config=True, leapp will not do any automatic decisions about RHUI whatsoever. Therefore, the configured values determine what leapp will do.

The following configuration options are added:

Configuration field Description
source_clients RHUI clients to uninstall during the upgrade
target_clients RHUI clients to install during the upgrade
cloud_provider so that leapp knows it is running on cloud
upgrade_files mapping of paths of files on the source system to the corresponding destinations in the scratch container. Once the target container is set up, the files provided by the target client are used and the files from upgrade_files are removed.
enabled_target_repositories a list of repositories to enable. Functions the same way as using leapp --enablerepo

Known limitations

Using a repofile withing upgrade_files that uses different repoids than the target client(s) in combination with enable_target_repositories causes DNF transaction check to fail.

RPM Transaction Configuration

Jira refs:

Acceptance criteria for RHUI configuration:

  • It is possible to use config to upgrade unknown regions
  • Should make HA upgrades on AWS possible (achieved by a leapp-rhui-aws-ha that does not require)
  • Make rhui_repositories a required field (or rename to target_repoids_to_enable)
  • Make sure that repomapping does not error on unknown cloud providers.
    • Using cloud_provider = 'some_random_provider has been tried (with enabled_target_repositories), there were no problems - no code modifications are required.
  • Remove defaults - require user to supply everything

Copy link

github-actions bot commented Nov 7, 2023

Thank you for contributing to the Leapp project!

Please note that every PR needs to comply with the Leapp Guidelines and must pass all tests in order to be mergeable.
If you want to request a review or rebuild a package in copr, you can use following commands as a comment:

  • review please @oamg/developers to notify leapp developers of the review request
  • /packit copr-build to submit a public copr build using packit

Packit will automatically schedule regression tests for this PR's build and latest upstream leapp build. If you need a different version of leapp from PR#42, use /packit test oamg/leapp#42

It is possible to schedule specific on-demand tests as well. Currently 2 test sets are supported, beaker-minimal and kernel-rt, both can be used to be run on all upgrade paths or just a couple of specific ones.
To launch on-demand tests with packit:

  • /packit test --labels kernel-rt to schedule kernel-rt tests set for all upgrade paths
  • /packit test --labels beaker-minimal-8.9to9.3,kernel-rt-8.9to9.3 to schedule kernel-rt and beaker-minimal test sets for 8.9->9.3 upgrade path

[Deprecated] To launch on-demand regression testing public members of oamg organization can leave the following comment:

  • /rerun to schedule basic regression tests using this pr build and latest upstream leapp build as artifacts
  • /rerun 42 to schedule basic regression tests using this pr build and leapp*PR42* as artifacts
  • /rerun-sst to schedule sst tests using this pr build and latest upstream leapp build as artifacts
  • /rerun-sst 42 to schedule sst tests using this pr build and leapp*PR42* as artifacts

Please open ticket in case you experience technical problem with the CI. (RH internal only)

Note: In case there are problems with tests not being triggered automatically on new PR/commit or pending for a long time, please contact leapp-infra.

@abadger abadger marked this pull request as draft November 7, 2023 11:16
Comment on lines 20 to 32
class Transaction_ToKeep(Config):
section = "transaction"
name = "to_keep"
type_ = fields.List(fields.String(), default=[
"leapp",
"python2-leapp",
"python3-leapp",
"leapp-repository",
"snactor",
])
description = """
List of packages to be kept in the upgrade transaction.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just trying to think why it cannot be same as with models? I mean:

Suggested change
class Transaction_ToKeep(Config):
section = "transaction"
name = "to_keep"
type_ = fields.List(fields.String(), default=[
"leapp",
"python2-leapp",
"python3-leapp",
"leapp-repository",
"snactor",
])
description = """
List of packages to be kept in the upgrade transaction.
"""
class Transaction_ToKeep(Config):
section = "transaction"
to_keep = fields.List(fields.String(), default=[
"leapp",
"python2-leapp",
"python3-leapp",
"leapp-repository",
"snactor",
])
"""
List of packages to be kept in the upgrade transaction.
"""

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we've settled on yaml as the actual file system format, I think that loading more information into fields will be fine (default and type_). to_keep and the docstring treatment were things I didn't know about. I think they'll be fine too.

One ramification of relying on fields (I think in any incarnation) is that we should allow mapping values. I heard that those were specifically excluded from the present fields because they were considered too complex and unnecessary for Models. I do not think those are true for config values. Yaml has dictionaries as basic data types and in cases where you want to group record-style information, yaml mappings make the most sense.

hosts:
    balancer.example.com: 192.168.100.2
    git.example.com: 192.168.100.3

I don't know if we'll want to keep field.Mapping (or field.Dict) from being used in models or not care now that we have a need for it for a different purpose.

Copy link
Member

@pirat89 pirat89 Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abadger I was thinking about that during weekend, as I think I misunderstood you originally after reading this your comment and the example. I thought that this problem is addressed by fields.JSON which we can allow for configs to not be so strict. However, thinking about that, I think you provided totally valid point about having something like a Map/Mapping/Dict field, which could have variable keys (like hosts, where we do not know number of such items, nor specific keys itself). But we could on the other hand still define what will be values like in case of other fields. So the use for the example above, I could imagine to allow something like:

class MyConfig(Config):
    hosts = fields.Mapping(fields.String)  # or e.g. fields.Mapping(value_type=fields.String)

Where we can define what will be values, hence the key is expected to be always string (no sure whether we would need anything else for keys than string). For the completness, I can imagine we could resolve it also differently, but usually it would lead to some magical tweaks around serialisation/deserialisation, or little bit worse UX, e.g. to require user to write config like:

hosts:
    - hostname: balancer.example.com
    - ip: 192.168.100.3

Which is doable and still kind of reasonable, but thinking about the trade-off, it make sense to me to be less strict here. The benefix of the "Mapping" field is, that we can still define properly structure of expected values, so the only unknown it's value/number of the key(s), which in this case it's not so problematic. In case of the JSON field, it's completely unknown (nor key, nor value) and reduce our possibilities of the validation much more.

@abadger Do I understand correctly your idea?


Just to provide context why we usually do not allow fields.JSON in Models inside our official repositories. It's because the JSON field cannot be verified for the content. From the point, we can verify the structure is JSON, but we cannot verify it contains particular fields / content inside, so the structure can vary and we couldn't guarantee that part of this structure will be always a specific key/item (due to bugs, or changes in actor that produce a particular msg, so some other actors - e.g. custom ones - could start to be broken without clean visibility of the problem). Having strict structure in models, provides clear definition of the content and people can easily see it.

@abadger
Copy link
Member Author

abadger commented Sep 12, 2024

/packit test oamg/leapp#870

3 similar comments
@abadger
Copy link
Member Author

abadger commented Sep 12, 2024

/packit test oamg/leapp#870

@abadger
Copy link
Member Author

abadger commented Sep 12, 2024

/packit test oamg/leapp#870

@abadger
Copy link
Member Author

abadger commented Sep 12, 2024

/packit test oamg/leapp#870

@abadger abadger force-pushed the actor-config branch 7 times, most recently from 4e4ad3a to 40260c0 Compare September 13, 2024 09:01
Comment on lines +100 to +103
# Note: actor_config.load() stores the loaded actor config into a global
# variable which can then be accessed by functions in that file. Is this
# the right way to store that information?
actor_config.load(actor_config_path, actor_config_schemas)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that we are doing it this way because we want to load the configs before anything else is done. However, this leads to the problem that whether config is loaded is not within the power of the framework. Instead of having it here in the command, wouldn't it be better to have load() in some early stage of workflow.run()? That way we can be sure that configs are loaded from the point of view of the framework

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember if Petr and I discussed workflow.run() as a possible place for this. A quick look seems to show that workflows don't have access to the RepositoryManager. Does this mean we cannot lookup information about all of the available Actors from within a workflow? Or just that we would have to access them in another way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Workflow should collect all discovered actors during its initialization (https://github.com/oamg/leapp/blob/7565b5c8cd5a098522565ff0ed5278eb2b04b7a8/leapp/workflows/__init__.py#L167). I have used this before and thought it might be helpful again here.

To me, loading the configuration just after workflow starts and before any actor is processed is a place where I would put it personally. However, there might be reasons not to do this if it has not come up in a discussion. Have you thought about this? @pirat89 Or am I missing something

Copy link
Member

@pirat89 pirat89 Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure - I mean, we looked for a place where we could put it and that seemed good - especially in case we would like to have in future repository-shared configs (configs dir defined in the repository level); which we decided last time that is not way we want to go (at least now). We already have such a handling in case of answerfiles, but you are right that having the possibility to keep it implemented just inside the framework side (without the need to know additional magic that should be specified in CLI commands on the repositoriy side) sounds much better to me. Also, remember on snactor! We need a way how to execute an actor itself with snactor without the need to run the workflow. (it can be done later, but it should be done still for the upcoming version)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkubek The code you're pointing to looks like it just looks at the actors available in all the phases that the workflow runs rather than all of the actors in the repositories?

Although I'm not sure if we need all of them or not.

  • If we get all the actors, we can warn the user that there is unrecognized configuration. This will catch typos and configuration of old versions which is no longer valid.
  • If we only get the actors used by the workflow, then we wll have false positives if we emit warnings for that (so I would probably not emit a warning on unrecognized config entries).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also, I agree with you that if we can put this into framework code, it is the better place).

@abadger abadger force-pushed the actor-config branch 12 times, most recently from 4af8f20 to 98630ff Compare September 19, 2024 09:39
@Rezney
Copy link
Member

Rezney commented Oct 4, 2024

Got this when testing:

2024-10-04 15:12:10.758 DEBUG    PID: 31702 leapp.repository.system_upgrade_common: Starting actor discovery in actors/cloud/checkhybridimage
2024-10-04 15:12:10.775 DEBUG    PID: 31702 leapp.repository.system_upgrade_common: Starting actor discovery in actors/cloud/checkrhui
Process Process-37:
Traceback (most recent call last):
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/site-packages/leapp/repository/actor_definition.py", line 32, in inspect_actor
    definition.load()
  File "/usr/lib/python3.6/site-packages/leapp/repository/actor_definition.py", line 199, in load
    self._module = load_module(importer, name)
  File "/usr/lib/python3.6/site-packages/leapp/compat.py", line 86, in load_module
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/etc/leapp/repos.d/system_upgrade/common/actors/cloud/checkrhui/actor.py", line 2, in <module>
    from leapp.configs.common.rhui import all_rhui_cfg
ModuleNotFoundError: No module named 'leapp.configs.common'
2024-10-04 15:12:10.788 ERROR    PID: 31702 leapp.repository.system_upgrade_common: Process inspecting actor in actors/cloud/checkrhui failed with 1

@MichalHe @dkubek

source_clients: [ rh-amazon-rhui-client ]
target_clients: [ rh-amazon-rhui-client ]
cloud_provider: aws
upgrade_files:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is looking good! I also know of some use cases that need to specify the CA certificate used for the dnf calls to the RHUI repo, but perhaps that configuration is encapsulated in the leapp-aws.repo file...?

Copy link
Member

@MichalHe MichalHe Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the sslcacert, sslclientcert and sslclientkey fields in repository definition? If yes, then this is doable with the proposed configuration. You just specify all the ssl* fields in a repofile provided in upgrade_files as you would normally do. Then you need to list the certificates/keys also in upgrade_files, together with information on where to place them. For example, your repofile /root/my-upgrade-repos.repo would contain a repository with:

sslclientcert=/etc/pki/rhui/product/content-rhel8.crt

Then you need tell leapp that it should grab the content-rhel8.crt somewhere from the source system and place it at /etc/pki/rhui/product/content-rhel8.crt.

The configuration would then look like:

rhui:
  use_config : True
  source_clients: [  rhel8-client ]   # what client to remove from the source system
  target_clients: [ rhel9-client ]  # what client should be installed during the upgrade
  cloud_provider: aws  # for leapp to use internally 
  upgrade_files:
    /root/my-upgrade-repos.repo : /etc/yum.repos.d  # during upgrade, place this repofile into /etc/yum.repos.d
    /root/content-rhel8.crt : /etc/pki/rhui/product/  # during the upgrade, place this key here. Same path as in sslclientcert in the repo
   #  .... other keys/certs mentioned in the repofile, or otherwise important for leapp to access target OS content

This should enable basically anyone running a RHUI system to upgrade, assuming that provider's rhui clients do not use some eccentric python dependencies (although that should be fine too, I just haven't tested these setups).

You get all the necessary upgrade_files by spawning a RHEL VM running the target OS, and copying the files, keys and certs provided by the RHUI client. There are also other ways, but this is likely the simplest one. If you are upgrading an entire fleet, you naturally need to spawn the target machine only once.

Also note that if you rename the repositories in the repofile for some reason, and then use these new names in enabled_target_repositories, the DNF transaction check leapp does might fail. Therefore, use the same repoids as the target client uses and everything should be fine. If you would like to test this, please ping me with any questions. It is always nice to get feedback before we merge this 😀.

@MichalHe MichalHe changed the title [DRAFT] Initial work to demonstrate an actor config framework. Add RHUI and RPM Transaction configuration Oct 7, 2024
@MichalHe
Copy link
Member

MichalHe commented Oct 9, 2024

/packit build

@Rezney
Copy link
Member

Rezney commented Oct 10, 2024

Got:

2024-10-10 06:10:11.407 INFO     PID: 52881 leapp.workflow.configuration_phase: Starting stage Before of phase configuration_phase
2024-10-10 06:10:11.408 INFO     PID: 52881 leapp.workflow.configuration_phase: Starting stage Main of phase configuration_phase
2024-10-10 06:10:11.409 INFO     PID: 52881 leapp.workflow.configuration_phase: Executing actor ipu_workflow_config 
Process Process-202:
Traceback (most recent call last):
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/site-packages/leapp/repository/actor_definition.py", line 73, in _do_run
    skip_dialogs=skip_dialogs)
  File "/usr/lib/python3.6/site-packages/leapp/actors/__init__.py", line 131, in __init__
    self.config = retrieve_config(self.config_schemas)
NameError: name 'retrieve_config' is not defined
2024-10-10 06:10:11.453 ERROR    PID: 52881 leapp.workflow.configuration_phase: Actor ipu_workflow_config has crashed: None

Tested with:

leapp-0.18.0-0.20241008065723373633.pr870.16.gb993556.el8.noarch
leapp-rhui-azure-1.0.0-14.el8.noarch
leapp-upgrade-el8toel9-0.21.0-0.20241009173513741982.pr1142.16.g13508f65.el8.noarch

@MichalHe @dkubek

@dkubek
Copy link
Member

dkubek commented Oct 10, 2024

@Rezney My bad. Should be fixed now in leapp

@matejmatuska
Copy link
Member

matejmatuska commented Oct 18, 2024

I performed an upgrade from RHEL 8.10 SAP to RHEL 9.4 SAP e4s on AWS and everything went okay

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants