Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bootstrap not turning red and worker0 get Internal Server Error #270

Open
spazgirl opened this issue Feb 8, 2022 · 6 comments
Open

Bootstrap not turning red and worker0 get Internal Server Error #270

spazgirl opened this issue Feb 8, 2022 · 6 comments
Labels
help wanted Extra attention is needed question/support This is not a bug but a question or support

Comments

@spazgirl
Copy link

spazgirl commented Feb 8, 2022

I have 2 issues trying to install kvm static x86 ocp cluster following OCP x86 4.9.0
https://github.com/redhat-cop/ocp4-helpernode/blob/main/docs/quickstart-static.md

  1. Bootstrap is not turning Red after Master0, Master1, Master2 turn green.
    Will add the openshift-install wait-for bootstrap-complete --log-level debug info when it finishes

  2. Worker0 fails at getting worker ignition files. But worker1 installs with no issues.
    [ 6.099465] ignition[768]: GET https://api-int.ocp4.mongodbx86.com:22623/config/worker: attempt RFE: Playbook assumes root #3
    [ 6.100702] ignition[768]: GET error: Get "https://api-int.ocp4.mongodbx86.com:22623/config/worker": dial tcp: lookup api-int.ocp4.mongodbx86.com on [::1]:53: read udp [::1]:38495->[::1]:53: read: connection refused
    [ 6.901357] ignition[768]: GET https://api-int.ocp4.mongodbx86.com:22623/config/worker: attempt RFE: Create ssh-keys #4
    [ 6.902689] ignition[768]: GET error: Get "https://api-int.ocp4.mongodbx86.com:22623/config/worker": dial tcp: lookup api-int.ocp4.mongodbx86.com on [::1]:53: read udp [::1]:59599->[::1]:53: read: connection refused
    [ 8.502516] ignition[768]: GET https://api-int.ocp4.mongodbx86.com:22623/config/worker: attempt RFE: Update for EL8 #5
    [ 8.503994] ignition[768]: GET error: Get "https://api-int.ocp4.mongodbx86.com:22623/config/worker": dial tcp: lookup api-int.ocp4.mongodbx86.com on [::1]:53: read udp [::1]:37353->[::1]:53: read: connection refused
    [* ] A start job is running for Ignition (fetch) (8s / no limit)[ 11.705655] ignition[768]: GET https://api-int.ocp4.mongodbx86.com:22623/config/worker: attempt how to set ocp node hostname? #6
    [** ] A start job is running for Ignition (fetch) (24s / no limit)[ 27.379893] ignition[768]: GET error: Get "https://api-int.ocp4.mongodbx86.com:22623/config/worker": dial tcp: lookup api-int.ocp4.mongodbx86.com on 129.40.83.31:53: read udp 129.40.83.36:44374->129.40.83.31:53: i/o timeout
    [ *** ] A start job is running for Ignition (fetch) (29s / no limit)[ 32.380293] ignition[768]: GET https://api-int.ocp4.mongodbx86.com:22623/config/worker: attempt Fix spelling in README.md and add task to start firewalld prior enabling required ports #7
    [ 32.387018] ignition[768]: GET result: Internal Server Error
    [ *] A start job is running for Ignition (fetch) (34s / no limit)[ 37.388634] ignition[768]: GET https://api-int.ocp4.mongodbx86.com:22623/config/worker: attempt Devel #8

Attachments

worker0 failing.txt

helper x215n31.txt

Please let me know anything else you need. the helper x215n31 is from Rhel 8.5 boot after install to where I am now. when the bootstrap-complete fails I will add output to my issue

@spazgirl
Copy link
Author

spazgirl commented Feb 8, 2022

Here is the openshift-install wait-for bootstrap-complete --log-level debug failure
[root@x215n31 ocp4]# openshift-install wait-for bootstrap-complete --log-level debug
DEBUG OpenShift Installer 4.9.0
DEBUG Built from commit 6e5b992ba719dd4ea2d0c2a8b08ecad45179e553
INFO Waiting up to 20m0s for the Kubernetes API at https://api.ocp4.mongodbx86.com:6443...
INFO API v1.22.0-rc.0+894a78b up
INFO Waiting up to 30m0s for bootstrapping to complete...
ERROR Cluster operator authentication Degraded is True with IngressStateEndpoints_MissingSubsets::OAuthServerConfigObservation_Error::OAuthServerServiceEndpointAccessibleController_SyncError::OAuthServerServiceEndpointsEndpointAccessibleController_SyncError::RouterCerts_NoRouterCertSecret: IngressStateEndpointsDegraded: No subsets found for the endpoints of oauth-server
ERROR OAuthServerConfigObservationDegraded: secret "v4-0-config-system-router-certs" not found
ERROR OAuthServerServiceEndpointAccessibleControllerDegraded: Get "https://172.30.40.206:443/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
ERROR OAuthServerServiceEndpointsEndpointAccessibleControllerDegraded: oauth service endpoints are not ready
ERROR RouterCertsDegraded: neither the custom secret/v4-0-config-system-router-certs -n openshift-authentication or default secret/oauth-openshift -n openshift-authentication could be retrieved: secret "v4-0-config-system-router-certs" not found
INFO Cluster operator authentication Progressing is True with APIServerDeployment_PodsUpdating: APIServerDeploymentProgressing: deployment/apiserver.openshift-oauth-apiserver: 1/3 pods have been updated to the latest generation
INFO Cluster operator authentication Available is False with APIServerDeployment_NoPod::APIServices_PreconditionNotReady::OAuthServerServiceEndpointAccessibleController_EndpointUnavailable::ReadyIngressNodes_NoReadyIngressNodes: APIServerDeploymentAvailable: no apiserver.openshift-oauth-apiserver pods available on any node.
INFO APIServicesAvailable: PreconditionNotReady
INFO OAuthServerServiceEndpointAccessibleControllerAvailable: Get "https://172.30.40.206:443/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
INFO ReadyIngressNodesAvailable: Authentication requires functional ingress which requires at least one schedulable and ready node. Got 0 worker nodes, 3 master nodes, 0 custom target nodes (none are schedulable or ready for ingress pods).
INFO Cluster operator baremetal Disabled is True with UnsupportedPlatform: Nothing to do on this Platform
INFO Cluster operator etcd RecentBackup is Unknown with ControllerStarted:
ERROR Cluster operator etcd Degraded is True with StaticPods_Error: StaticPodsDegraded: pods "etcd-master0.ocp4.mongodbx86.com" not found
ERROR StaticPodsDegraded: pods "etcd-master1.ocp4.mongodbx86.com" not found
ERROR StaticPodsDegraded: pods "etcd-master2.ocp4.mongodbx86.com" not found
INFO Cluster operator etcd Progressing is True with NodeInstaller: NodeInstallerProgressing: 3 nodes are at revision 0; 0 nodes have achieved new revision 2
INFO Cluster operator etcd Available is False with StaticPods_ZeroNodesActive: StaticPodsAvailable: 0 nodes are active; 3 nodes are at revision 0; 0 nodes have achieved new revision 2
INFO Cluster operator ingress Available is Unknown with IngressDoesNotHaveAvailableCondition: The "default" ingress controller is not reporting an Available status condition.
INFO Cluster operator ingress Progressing is True with Reconciling: Not all ingress controllers are available.
ERROR Cluster operator ingress Degraded is Unknown with IngressDoesNotHaveDegradedCondition: The "default" ingress controller is not reporting a Degraded status condition.
INFO Cluster operator insights Disabled is False with AsExpected:
ERROR Cluster operator kube-apiserver Degraded is True with StaticPods_Error: StaticPodsDegraded: pod/kube-apiserver-master0.ocp4.mongodbx86.com container "kube-apiserver" is waiting: CrashLoopBackOff: back-off 5m0s restarting failed container=kube-apiserver pod=kube-apiserver-master0.ocp4.mongodbx86.com_openshift-kube-apiserver(a4414ab1-6be8-49f2-b1ca-ae264c30f587)
ERROR StaticPodsDegraded: pod/kube-apiserver-master0.ocp4.mongodbx86.com container "kube-apiserver-check-endpoints" is waiting: CrashLoopBackOff: back-off 5m0s restarting failed container=kube-apiserver-check-endpoints pod=kube-apiserver-master0.ocp4.mongodbx86.com_openshift-kube-apiserver(a4414ab1-6be8-49f2-b1ca-ae264c30f587)
ERROR StaticPodsDegraded: pods "kube-apiserver-master1.ocp4.mongodbx86.com" not found
ERROR StaticPodsDegraded: pods "kube-apiserver-master2.ocp4.mongodbx86.com" not found
INFO Cluster operator kube-apiserver Progressing is True with NodeInstaller: NodeInstallerProgressing: 3 nodes are at revision 0; 0 nodes have achieved new revision 12
INFO Cluster operator kube-apiserver Available is False with StaticPods_ZeroNodesActive: StaticPodsAvailable: 0 nodes are active; 3 nodes are at revision 0; 0 nodes have achieved new revision 12
INFO Cluster operator monitoring Available is False with MultipleTasksFailed: Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error.
INFO Cluster operator monitoring Progressing is True with RollOutInProgress: Rolling out the stack.
ERROR Cluster operator monitoring Degraded is True with MultipleTasksFailed: Failed to rollout the stack. Error: updating configuration sharing: failed to retrieve Prometheus host: getting Route object failed: the server could not find the requested resource (get routes.route.openshift.io prometheus-k8s)
ERROR updating alertmanager: creating Alertmanager Route failed: creating Route object failed: the server could not find the requested resource (post routes.route.openshift.io)
ERROR updating thanos querier: creating Thanos Querier Route failed: creating Route object failed: the server could not find the requested resource (post routes.route.openshift.io)
ERROR updating prometheus-k8s: creating Prometheus Route failed: creating Route object failed: the server could not find the requested resource (post routes.route.openshift.io)
ERROR updating grafana: creating Grafana Route failed: creating Route object failed: the server could not find the requested resource (post routes.route.openshift.io)
ERROR updating openshift-state-metrics: reconciling openshift-state-metrics Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/openshift-state-metrics: got 1 unavailable replicas
ERROR updating kube-state-metrics: reconciling kube-state-metrics Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/kube-state-metrics: got 1 unavailable replicas
ERROR updating prometheus-adapter: reconciling PrometheusAdapter Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-adapter: got 2 unavailable replicas
ERROR updating telemeter client: reconciling Telemeter client Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/telemeter-client: got 1 unavailable replicas
INFO Cluster operator network ManagementStateDegraded is False with :
INFO Cluster operator network Progressing is True with Deploying: Deployment "openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready
INFO Cluster operator openshift-apiserver Progressing is True with APIServerDeployment_PodsUpdating: APIServerDeploymentProgressing: deployment/apiserver.openshift-apiserver: 1/3 pods have been updated to the latest generation
INFO Cluster operator openshift-apiserver Available is False with APIServerDeployment_NoPod::APIServices_PreconditionNotReady: APIServerDeploymentAvailable: no apiserver.openshift-apiserver pods available on any node.
INFO APIServicesAvailable: PreconditionNotReady
INFO Cluster operator operator-lifecycle-manager-packageserver Available is False with ClusterServiceVersionNotSucceeded: ClusterServiceVersion openshift-operator-lifecycle-manager/packageserver observed in phase Failed with reason: InstallCheckFailed, message: install timeout
INFO Cluster operator operator-lifecycle-manager-packageserver Progressing is True with : Working toward 0.18.3
INFO Use the following commands to gather logs from the cluster
INFO openshift-install gather bootstrap --help
ERROR Bootstrap failed to complete: timed out waiting for the condition
ERROR Failed to wait for bootstrapping to complete. This error usually happens when there is a problem with control plane hosts that prevents the control plane operators from creating the control plane.
FATAL Bootstrap failed to complete
[root@x215n31 ocp4]#

@christianh814
Copy link
Contributor

What are the size of the Masters/workers? The bootstrap turning red after masters turn green is expected, but it seems that bootstraping is failing after the pivot happens.

@christianh814 christianh814 added help wanted Extra attention is needed question/support This is not a bug but a question or support labels Feb 9, 2022
@spazgirl
Copy link
Author

spazgirl commented Feb 10, 2022

I followed the https://github.com/redhat-cop/ocp4-helpernode/blob/main/docs/quickstart-static.md and set them all up as what Bootstrap is said to need 8192 mem, 4 vcpus, and 120G disk

@christianh814
Copy link
Contributor

What I would do is ssh into the bootstrap node (from the helper run ssh core@bootstrap) and check the logs with the journalctl command it shows you to run.

This looks like it's failing in the bootstrap phase. If it is, this isn't the playbook. So, unfortunately, that's more of an OpenShift issue. So we can't really help with anything beyond the playbook.

@spazgirl
Copy link
Author

Here is bootstrap journalctl output. I have not done anything on this cluster setup since it failed. I have left it up
Bootstrap-journalctl-2-10-22.txt
.

@salanisor
Copy link
Contributor

@spazgirl - im no expert but wanted to learn and would like to ask why are you trying to install an old version of OpenShift?

I am seeing lots of repeated entries such as these, but I have no way to prove this is an actual issue?

Feb 08 16:46:27 bootstrap.ocp4.mongodbx86.com release-image-download.sh[1552]: Pulling quay.io/openshift-release-dev/ocp-release@sha256:d262a12de33125907e0b75a5ea34301dd27c4a6bde8295f6b922411f07623e61...
Feb 08 16:46:44 bootstrap.ocp4.mongodbx86.com release-image-download.sh[1552]: Error: Error initializing source docker://quay.io/openshift-release-dev/ocp-release@sha256:d262a12de33125907e0b75a5ea34301dd27c4a6bde8295f6b922411f07623e61: can't talk to a V1 docker registry
Feb 08 16:46:44 bootstrap.ocp4.mongodbx86.com release-image-download.sh[1552]: Pull failed. Retrying quay.io/openshift-release-dev/ocp-release@sha256:d262a12de33125907e0b75a5ea34301dd27c4a6bde8295f6b922411f07623e61...

Feb 09 04:57:34 bootstrap.ocp4.mongodbx86.com kubelet.sh[2399]: I0209 04:57:34.851138    2411 provider.go:102] Refreshing cache for provider: *credentialprovider.defaultDockerConfigProvider
Feb 09 04:57:34 bootstrap.ocp4.mongodbx86.com kubelet.sh[2399]: I0209 04:57:34.851219    2411 provider.go:82] Docker config file not found: couldn't find valid .dockercfg after checking in [/var/lib/kubelet   /]
Feb 09 05:17:49 bootstrap.ocp4.mongodbx86.com kubelet.sh[2399]: I0209 05:17:49.947591    2411 provider.go:102] Refreshing cache for provider: *credentialprovider.defaultDockerConfigProvider
Feb 09 05:17:49 bootstrap.ocp4.mongodbx86.com kubelet.sh[2399]: I0209 05:17:49.947664    2411 provider.go:82] Docker config file not found: couldn't find valid .dockercfg after checking in [/var/lib/kubelet   /]
Feb 09 05:38:04 bootstrap.ocp4.mongodbx86.com kubelet.sh[2399]: I0209 05:38:04.739153    2411 provider.go:102] Refreshing cache for provider: *credentialprovider.defaultDockerConfigProvider
Feb 09 05:38:04 bootstrap.ocp4.mongodbx86.com kubelet.sh[2399]: I0209 05:38:04.739231    2411 provider.go:82] Docker config file not found: couldn't find valid .dockercfg after checking in [/var/lib/kubelet   /]
Feb 09 05:58:19 bootstrap.ocp4.mongodbx86.com kubelet.sh[2399]: I0209 05:58:19.187392    2411 provider.go:102] Refreshing cache for provider: *credentialprovider.defaultDockerConfigProvider
Feb 09 05:58:19 bootstrap.ocp4.mongodbx86.com kubelet.sh[2399]: I0209 05:58:19.187638    2411 provider.go:82] Docker config file not found: couldn't find valid .dockercfg after checking in [/var/lib/kubelet   /]

From your example above 2 days ago

[root@x215n31 ocp4]# openshift-install wait-for bootstrap-complete --log-level debug
DEBUG OpenShift Installer 4.9.0

Image information correlating with your logs

    {
      "version": "4.9.0",
      "payload": "quay.io/openshift-release-dev/ocp-release@sha256:d262a12de33125907e0b75a5ea34301dd27c4a6bde8295f6b922411f07623e61",
      "metadata": {
        "description": "",
        "io.openshift.upgrades.graph.release.channels": "candidate-4.10,candidate-4.9,fast-4.9,stable-4.9",
        "io.openshift.upgrades.graph.release.manifestref": "sha256:d262a12de33125907e0b75a5ea34301dd27c4a6bde8295f6b922411f07623e61",
        "url": "https://access.redhat.com/errata/RHSA-2021:3759"
      }

Latest version is Server Version: 4.9.17; my recommendation is to try using the latest openshift-install and not the oldest to see if you notice any gains.

cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question/support This is not a bug but a question or support
Projects
None yet
Development

No branches or pull requests

3 participants