Skip to content

[BUG] [SM] Error Revoking Default security group ingress rule in Perimeter-Phase1 #1284

Open
@Mobilise-PALZ

Description

@Mobilise-PALZ

Bug reports which fail to provide the required information will be closed without action.

Required Basic Info

  • Accelerator Version: 1.6.3
  • Install Type: Upgrade
  • Upgrade from version: 1.6.2 -> 1.6.3
  • Which State did the Main State Machine Fail in: Deploy Phase 1

Describe the bug
Failure Info

  • What error messages have you identified, if any:
    Following normal upgrade steps to 1.6.3, we encountered no errors until the MainState Machine failed on step Deploy Phase 1. In the CodeBuild Logs it stated stackName: PerimeterPhase1 in our external communications account had failed to deploy with a message returned of InvalidPermission.NotFound: The specified rule does not exit in this security group.
    In the External Communications Account, in the cloudformation stack Perimeter-Phase1-VpcStackPerimeterNestedStackVpcStackPerimeterNestedStackResource we see the Error:

This Custom::VpcDefaultSecurityGroup1 resource is in a CREATE_FAILED state.
Received response status [FAILED] from custom resource. Message returned: InvalidPermission.NotFound: The specified rule does not exist in this security group.

We have found that The default security group in the External Communications account has different security group rules than the those the lambda is trying to remove. We have not touched the default security group rules in this account so are unsure why they are different.

The rules on the security group currently are Type: Custom TCP, Protocol: TCP, Port range: 0, source 0.0.0.0/0.
This is different from the rules the lambda looks like it is trying to remove:
`
"ipProtocol": "-1",
"fromPort": -1,
"toPort": -1,

`

  • What symptoms have you identified, if any:
    The Main state machine failed.

Required files

  • Please provide a copy of your config.json file (sanitize if required)
  • If a CodeBuild step failed- please provide the full CodeBuild Log

`Failed resources:

PALZ-Perimeter-Phase1-VpcStackPerimeterNestedStackVpcStackPerimeterNestedStackResource | 7:22:49 PM | CREATE_FAILED | Custom::VpcDefaultSecurityGroup1 | PerimeterPhase1/VpcStackPerimeter/Perimeter/VpcDefaultSecurityGroup/Resource1/Default (PerimeterVpcDefaultSecurityGroupResource1) Received response status [FAILED] from custom resource. Message returned: InvalidPermission.NotFound: The specified rule does not exist in this security group.
PALZ-Perimeter-Phase1 | 7:22:58 PM | UPDATE_FAILED | AWS::CloudFormation::Stack | PerimeterPhase1/VpcStackPerimeter.NestedStack/VpcStackPerimeter.NestedStackResource (VpcStackPerimeterNestedStackVpcStackPerimeterNestedStackResourceXXX) Embedded stack arn:aws:cloudformation:eu-west-2:XXXX:stack/PALZ-Perimeter-Phase1-VpcStackPerimeterNestedStackVpcStackPerimeterNestedStackResource-XXX/XXX was not successfully updated. Currently in UPDATE_ROLLBACK_IN_PROGRESS with reason: The following resource(s) failed to create: [PerimeterVpcDefaultSecurityGroupResourceXXX].
{"stackName":"PerimeterPhase1 (PALZ-Perimeter-Phase1)","stackEnvironment":{"account":"XXX","region":"eu-west-2","name":"aws://XXX/eu-west-2"},"assumeRoleArn":"arn:aws:iam::XXX:role/PALZ-PipelineRole","message":"Failed to deploy: Error: The stack named PALZ-Perimeter-Phase1 failed to deploy: UPDATE_ROLLBACK_COMPLETE: Received response status [FAILED] from custom resource. Message returned: InvalidPermission.NotFound: The specified rule does not exist in this security group., Embedded stack arn:aws:cloudformation:eu-west-2:XXX:stack/PALZ-Perimeter-Phase1-VpcStackPerimeterNestedStackVpcStackPerimeterNestedStackResource-XX/XX was not successfully updated. Currently in UPDATE_ROLLBACK_IN_PROGRESS with reason: The following resource(s) failed to create: [PerimeterVpcDefaultSecurityGroupResourceXXX]. ","messageType":"ERROR"}`

  • If a Lambda step failed - please provide the full Lambda CloudWatch Log
    In external communications account the Lambda PALZ-Perimeter-Phase1-Vpc-CustomVpcDefaultSecurity in Cloudtrail shows:

`
"eventTime": "2025-05-08T19:25:16Z",

24 | "eventSource": "ec2.amazonaws.com",
25 | "eventName": "RevokeSecurityGroupIngress",
26 | "awsRegion": "eu-west-2",
27 | "sourceIPAddress": "XXXXXXXX",
28 | "userAgent": "aws-sdk-nodejs/2.1473.0 linux/v22.14.0 exec-env/AWS_Lambda_nodejs22.x promise",
29 | "errorCode": "Client.InvalidPermission.NotFound",
30 | "errorMessage": "The specified rule does not exist in this security group.",
31 | "requestParameters": {
32 | "groupId": "sg-XXXXXXX",
33 | "ipPermissions": {
34 | "items": [
35 | {
36 | "ipProtocol": "-1",
37 | "fromPort": -1,
38 | "toPort": -1,
39 | "groups": {
40 | "items": [
41 | {
42 | "groupId": "sg-XXXXXXXX"
43 | }
44 | ]
45 | },
46 | "ipRanges": {},
47 | "ipv6Ranges": {},
48 | "prefixListIds": {}
49 | }
50 | ]
51 | }
52 | },

`

  • In many cases it would be helpful if you went into the failed sub-account and region, CloudFormation, and provided a screenshot of the Events section of the failed, deleted, or rolled back stack including the last successful item, including the first couple of error messages (bottom up)
    In the cloudformation stack after the error message presented above, an UPDATE_ROLLBACK_IN_PROGRESS occurred with [PerimeterVpcDefaultSecurityGroupResourceXXX] and other resources created prior (such as the Role and Role Policy CustomVpcDefaultSecurityGroup1RoleDefaultPolicy) were deleted and cleaned up also.

Steps To Reproduce

  1. Following normal upgrade behaviour (using the Github template) https://aws-samples.github.io/aws-secure-environment-accelerator/v1.5.6-a/installation/upgrades/#13-summary-of-upgrade-steps-all-versions-except-v150
  2. After updating template and releasing ASEA-InstallerPipeline, the Main state machine automatically ran and errored on step Phase 1

Expected behaviour
We expected to have a successful state machine execution to 1.6.3. No errors had presented itself until the state machine execution.

Additional context
The default security group in the External Communications account has different security group rules than the those the lambda is trying to remove. We have not touched the default security group rules in this account.

The rules on the security group currently are Type: Custom TCP, Protocol: TCP, Port range: 0, source 0.0.0.0/0. This is different from the rules the lambda looks like it is trying to remove.

We have currently rolled back to 1.6.2 which has been successful.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions