Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NO-ISSUE: moving images to be multiplatform #19

Open
wants to merge 43 commits into
base: master
Choose a base branch
from
Open

Conversation

tsorya
Copy link
Owner

@tsorya tsorya commented Jun 28, 2021

No description provided.

machacekondra and others added 30 commits July 4, 2021 08:55
)

This commit add a retry mechanism while waiting for the operator to be
ready. If we apply the CR of the operator it may happen (bug 1968606),
that the OLM will report the Failed state, even that it's actually
progressing. So we decided to ignore the failed state for few times.
…ess (openshift#298)

* MGMT-6663: Update progress of operators only when there's a new progress

* MGMT-6663: Enhance log operator update status
- Minor log changes on `setting EFI boot order unsupported`
 - In `SetBootOrder` function - passing `liveLogger=nil` in order to not have an uneeded log at that stage
   (Failed executing nsenter...)
- Minor change to log regarding boot order setup on BIOS systems
Older versions of go are out of support, so for security compliance, we were trying to get all components on the latest version. 1.14 is already out of support, i.e. https://endoflife.date/go
Switch to a a generic `isOperatorAvailable` function that would be used
for getting any class that inherits `OperatorHandler`.

This function gets the operator status from the service, if the status
is available - stop running. Otherwise, get the operator status
locally and check if it is different than the status at the service, if
so - send an update to the service.

There are 3 implementations for the OperatorHandler` interface:

- ClusterOperatorHandler (such as console)
- ClusterVersionHandler (only CVO)
- ClusterServiceVersionHandler (such as OCS, LSO)
…ift#332)

The method call getProgressingOLMOperators could return error in case
the assisted service API is unavailable. In that case we wouldn't update
the status of the operators which are pending and it may happend that
the cluster will hang in finalizing state.
…ft#329)

* MGMT-4893: Add Must-Gather reports when olm controllers fail

  - Add support for JSON-formatted MUST_GATHER_IMAGE variable
  - Backward compatability with other formats of MUST_GATHER_IMAGE
  - When one of the Olm operator fails or timesout it is marked on the controller status
  - At the end of the installation process (either normal or aborted) we check if must-gather
    report should be collected and with what scope

* NO-ISSUE: correcting typo in log

Co-authored-by: Yuval Goldberg <[email protected]>

Co-authored-by: Yuval Goldberg <[email protected]>
…rap kube-apiserver though the kube-apiserver moved to one of the masters (openshift#327)

On the bootstrap node the assisted-installer use the loopback-kubeconfig
to query the kube-apiserver for the number of ready master nodes.

Usually both master nodes join the cluster and become ready before
bootkube takes down the bootstrap control plane so the loopback kubeconfig works.
But in case clusteer bootstrap finish before the 2 master nodes
are ready the assisted-installer will wait forever since the it's
using the loopback-kubeconfig and the bootstrap control plane is down
resulting in "connection refused"

The assisted-installer should query the kube-apiserver running on
one of the master nodes, for that to work it should use the real
kubeconfig instead of the loopback kubeconfig.
… on Cluster Version Operator (openshift#334)

Fix for very specific case when cvo has new messages all the time but in
reality it is stuck.
Adding CVOMaxTimeout with 3 hours.
openshift#335)

This will allow both string and json input to the must-gather image
Right now in case we kube-api-server is not reachable we will not send
any logs. This code is changing, from now in case kube-api error we will
send this error as log
…penshift#339)

We have a workaround that deletes service that took address of dns
service. Till now it supported only ipv4. This change adds ipv6 support
…-env for the host (openshift#336)

Assisted-installer will get infra-env-id as part of install command aragument, and will use it
to update host progress and to download host ignition
Assisted-installer-controller will use each host's InfraEnvID field to update it progress in
assisted-service
Adding function that will change token value to <SECRET>
…end (openshift#342)

joined status to the cluster.
Now they will send joined and only then done
…pping (openshift#346)

* NO-ISSUE: remove obsolete installation-timeout parameter

* MGMT-7635: Fix logs gathering on SNO when failing to complete bootstrapping
The log_sender command failed to mount /root/.ssh due to: "no such file
or directory" error.
This code ensure the directory get created once the bootstrap flow begin
This PR is fixing the manifest JSON parsing and improves logging.
…stalling with IPv6 (openshift#350)

Updated the regex to allow more chars between the host IP and 'Ignition'
this is required because in the MCS log the host IP is logged as scoped literal IPv6 address
e.g. [fe80::ff:fe9d:12ac%ens3]:42692
This should also allow master nodes to get updated to 'Configuring'
SetBootOrder is using efibootmgr for selecting the correct device.
The specified loader should be set with an appropriate efi file
according to the runtime CPU architecture.
I.e.
x86_64 -> shimx64.efi
arm64 -> shimaa64.efi
mkowalski and others added 13 commits September 22, 2021 07:23
…enshift#359)

This commit adds an ability to match nodes during the installation using
their IP addresses as well as reported hostnames.

Currently only the hostname of the node is taken into account and
compared against the known inventory. With this PR we are adding a
feature that, in case of a name mismatch, performs a scan over IP
addresses of the reporting node and nodes in the inventory and if the
match is found, accepts the node.

This is to cover cases where the node name in the inventory is not an
exact match with the name reported by the node itself.

Contributes-to: MGMT-7315
…unicating with assisted-service (openshift#358)

This PR mainly converts the inventory_client to use V2 APIs instead of V1 for all its
internal implementation. Since many function access hosts data now need InfraEnvID,
it is taken from the configuration of the assisted-installer (set during install command)
The original plan was to move all images to ubi8. This is not possible due to the lack
of some packages that are needed for other projects. We are now going to switch all images
to stream8 with the hope that the consistency accross repos will prevent (or help) with
debugging current/future issues in CI.

The goal is to keep component's builds as consistent as possible in the channels we are
releasing them on

Signed-off-by: Flavio Percoco <[email protected]>

Co-authored-by: Flavio Percoco <[email protected]>
…ace (openshift#291)

* Bug 1966621: Do not use run-level label for assisted-installer namespace

Namespaces using run-level label are considered to be highly privileged
and Security Context Constrains are not applied to the workload running
in them.

One of the deployment models for assisted-installer uses cluster
deployed by the AI service to deploy next clusters. In this scenario, if
the same `assisted-installer` namespace is used for deploying Assisted
Service Operator, the pods do not get any securityContext properties
applied. This, apart from the potential security violations, causes
functional errors as e.g. Postgres container is running using wrong UID.

This PR changes configuration of the `assisted-installer` namespace so
that it does not have run-level label applied and is treated like any
other customer namespace.

Contributes-to: OCPBUGSM-29833

* Bug 1966621: Clean up the code after run-level label removal

With the `run-level` label being completely dropped we are now
removing the remaining logic handling it in the post-installation steps.

Contributes-to: OCPBUGSM-29833

* Bug 1966621: Allow assisted-installer service account to use SCCs

This commits adds additional permissions to the service account used by
the assisted-installer-controller. As we no longer override Security
Context Constrains for the whole assisted-installer namespace, we are
adding explicit permissions to the account used to run the AI controller
pod.

Contributes-to: OCPBUGSM-29833
…as moved in GitHub (openshift#365)

The https://github.com/irifrance/gin repo we indirectly depended on via our
`github.com/operator-framework/operator-lifecycle-manager v0.18.0` dependency
has moved to https://github.com/go-air/gin. This invalid dependency was fixed in newer
versions of the operator-lifecycle-manager, but I prefer just fixing this issue
with a `replace` directive rather than dealing with an olm upgrade.

Without this `replace` directive, attempting to work on the installer repository
locally causes my IDE / go commands to complain about irifrance/gin being gone,
this `replace` directive fixes those issues.
…on failure (openshift#368)

LogURL field is filled in case nay instance of the logs was generated
In case cluster install fails due to the bootstrap node being stuck,
the logs are generated by the 2 masters at the ned of installation process
(writing to disk). In that case code directly brings up the send_logs command via podman,
and the infra-env parameter should be passed there also
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.