AWS Pacemaker awsvip failing with different errors

Hi All,

We are running a two node pacemaker cluster in AWS and we use "awsvip" resource type to configure the vip IP. Below is the conf

# pcs resource show privip_node1
 Resource: privip_node1 (class=ocf provider=heartbeat type=awsvip)
  Attributes: secondary_private_ip=10.x.x.x
  Operations: migrate_from interval=0s timeout=30s (privip_node1-migrate_from-interval-0s)
              migrate_to interval=0s timeout=30s (privip_node1-migrate_to-interval-0s)
              monitor interval=20s timeout=30s (privip_node1-monitor-interval-20s)
              start interval=0s timeout=30s (privip_node1-start-interval-0s)
              stop interval=0s timeout=30s (privip_node1-stop-interval-0s)
              validate interval=0s timeout=10s (privip_node1-validate-interval-0s)

# pcs resource show node1_vip
 Resource: node1_vip (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=10.x.x.x
  Operations: monitor interval=10s timeout=20s (node1_vip-monitor-interval-10s)
              start interval=0s timeout=20s (node1_vip-start-interval-0s)
              stop interval=0s timeout=20s (node1_vip-stop-interval-0s)

The EC2 instance is configured to use IMDSV2.The fence_aws agent and resource-agent have also been upgraded to the most recent versions, which support imdsv2.  Additionally, the resource is set up to use the IAM Profile credentials.

fence-agents-aws-4.2.1-41.el7_9.3.x86_64
python-s3transfer-0.1.13-1.0.1.el7.noarch
resource-agents-4.1.1-61.el7_9.15.x86_64

pip list | grep -i boto
boto3 (1.10.0)
botocore (1.13.50)

aws --version
aws-cli/2.9.4 Python/3.9.11 Linux/3.10.0-1160.80.1.0.1.el7.x86_64 exe/x86_64.oracle.7 prompt/off

pip3 list |  grep -i boto
boto3              1.23.10
botocore           1.26.10

The privip resource consistently fails with the different errors:

pengine:  warning: unpack_rsc_op_failure:   Processing failed monitor of privip_node2 on node2: unknown error | rc=1
Apr 13 11:09:54 node2 lrmd[3773]: warning: privip_node2_monitor_20000 process (PID 109357) timed out
Apr 13 11:09:54 node2 lrmd[3773]: warning: privip_node2_monitor_20000 process (PID 109357) timed out
Apr 13 11:09:54 node2 lrmd[3773]: warning: privip_node2_monitor_20000:109357 - timed out after 30000ms
======================================================================

Jun 16 10:01:43 node2 lrmd[36967]:  notice: privip_node2_monitor_20000:13042:stderr [ Unable to locate credentials. You can configure credentials by running "aws configure". ]
Jun 16 10:01:43 node2 crmd[36970]:  notice: privip_node2_monitor_20000:91 [   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\r100   359  100   359    0     0  37513      0 --:--:-- --:--:-- --:--:-- 39888\n\nUnable to locate credentials. You can configure credentials by running "aws configure".\n ]
===================================================================================

Jun 22 10:10:10 node1 lrmd[12465]:  notice: privip_node1_monitor_20000:105561:stderr [ #015  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed connect to 169.254.169.254:80; Connection refused ]
Jun 22 10:10:10 node1 lrmd[12465]:  notice: privip_node1_monitor_20000:105561:stderr [ #015  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed connect to 169.254.169.254:80; Connection refused ]
Jun 22 10:10:10 node1 lrmd[12465]:  notice: privip_node1_monitor_20000:105561:stderr [ An error occurred (MissingParameter) when calling the DescribeInstances operation: The request must contain the parameter InstanceId ]
==================================================================================

Failed Resource Actions:
* privip_node1_start_0 on node1 'not running' (7): call=250, status=complete, exitreason='instance_id not found. Is this a EC2 instance?',
    last-rc-change='Fri May 26 07:27:46 2023', queued=0ms, exec=6597ms


Any advice would be great. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AWS Pacemaker awsvip failing with different errors #1876

pcs resource show privip_node1

pcs resource show node1_vip

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AWS Pacemaker awsvip failing with different errors #1876

Description

pcs resource show privip_node1

pcs resource show node1_vip

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions