Troubleshoot Ansible Play Failures in CFS Sessions

View the Kubernetes logs for a Configuration Framework Service (CFS) pod in an error state to determine whether the error resulted from the CFS infrastructure or from an Ansible play that was run by a specific configuration layer in a CFS session.

Use this procedure to obtain important triage information for Ansible plays being called by CFS.

Prerequisites

A failed configuration session exists in CFS.

Procedure

(ncn-mw#) Find the CFS pod that is in an error state.

List all CFS pods in error state.

kubectl get pods -n services | grep -E "^cfs-.*[[:space:]]Error[[:space:]]"

Example output:

cfs-e8e48c2a-448f-4e6b-86fa-dae534b1702e-pnxmn   0/3     Error    0          25h

Set CFS_POD_NAME to the name of the pod to be investigated.

Use the pod name identified in the previous substep.
```
CFS_POD_NAME=cfs-e8e48c2a-448f-4e6b-86fa-dae534b1702e-pnxmn
```

(ncn-mw#) Check to see what containers are in the pod.

kubectl logs -n services "${CFS_POD_NAME}"

Example output:

Error from server (BadRequest): a container name must be specified for pod cfs-e8e48c2a-448f-4e6b-86fa-dae534b1702e-pnxmn, choose one of: [inventory ansible-0 istio-proxy] or one of the init containers: [git-clone-0 istio-init]

Issues rarely occur in the istio-init and istio-proxy containers. These containers can be ignored for now.

(ncn-mw#) Check the git-clone-0, inventory, and ansible-0 containers, in that order.

If there are additional Ansible pods, examine those as well, in ascending order.

Check the git-clone-0 container.

kubectl logs -n services "${CFS_POD_NAME}" git-clone-0

Check the inventory container.

kubectl logs -n services "${CFS_POD_NAME}" inventory

Example output:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed to connect to localhost port 15000: Connection refused
Waiting for Sidecar
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
HTTP/1.1 200 OK
content-type: text/html; charset=UTF-8
cache-control: no-cache, max-age=0
x-content-type-options: nosniff
date: Thu, 05 Dec 2019 15:00:11 GMT
server: envoy
transfer-encoding: chunked

Sidecar available
2019-12-05 15:00:12,160 - INFO    - cray.cfs.inventory - Starting CFS Inventory version=0.4.3, namespace=services
2019-12-05 15:00:12,171 - INFO    - cray.cfs.inventory - Inventory target=dynamic for cfsession=boa-2878e4c0-39c2-4df0-989e-053bb1edee0c
2019-12-05 15:00:12,227 - INFO    - cray.cfs.inventory.dynamic - Dynamic inventory found a total of 2 groups
2019-12-05 15:00:12,227 - INFO    - cray.cfs.inventory - Writing out the inventory to /inventory/hosts

Check the ansible-0 container.

Look towards the end of the Ansible log in the PLAY RECAP section to see if any targets failed. If a target failed, then look above in the log at the immediately preceding play. In the example below, the ncmp_hsn_cns role has an issue when being run against the compute nodes.

kubectl logs -n services "${CFS_POD_NAME}" ansible-0

Example output:

Waiting for Inventory
Waiting for Inventory
Inventory available
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0

[...]

TASK [ncmp_hsn_cns : SLES Compute Nodes (HSN): Create/Update ifcfg-hsnx File(s)] ***
fatal: [x3000c0s19b1n0]: FAILED! => {"msg": "'interfaces' is undefined"}
fatal: [x3000c0s19b2n0]: FAILED! => {"msg": "'interfaces' is undefined"}
fatal: [x3000c0s19b3n0]: FAILED! => {"msg": "'interfaces' is undefined"}
fatal: [x3000c0s19b4n0]: FAILED! => {"msg": "'interfaces' is undefined"}

NO MORE HOSTS LEFT *************************************************************

PLAY RECAP *********************************************************************
x3000c0s19b1n0             : ok=28   changed=20   unreachable=0    failed=1    skipped=77   rescued=0    ignored=1
x3000c0s19b2n0             : ok=27   changed=19   unreachable=0    failed=1    skipped=63   rescued=0    ignored=1
x3000c0s19b3n0             : ok=27   changed=19   unreachable=0    failed=1    skipped=63   rescued=0    ignored=1
x3000c0s19b4n0             : ok=27   changed=19   unreachable=0    failed=1    skipped=63   rescued=0    ignored=1

Run the Ansible play again once the underlying issue has been resolved.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Troubleshoot_Ansible_Play_Failures_in_CFS_Sessions.md

Troubleshoot_Ansible_Play_Failures_in_CFS_Sessions.md

Troubleshoot Ansible Play Failures in CFS Sessions

Prerequisites

Procedure

Files

Troubleshoot_Ansible_Play_Failures_in_CFS_Sessions.md

Latest commit

History

Troubleshoot_Ansible_Play_Failures_in_CFS_Sessions.md

File metadata and controls

Troubleshoot Ansible Play Failures in CFS Sessions

Prerequisites

Procedure