Error 500 AsyncContext timeout #135

DahlPatric · 2024-02-29T11:33:04Z

Description

Issues configuring CFE version 2.0

Environment information

Cloud Failover Extension Version: 2.0
BIG-IP version: 16.1
Cloud provider: Azure

Severity Level

For bugs, enter the bug severity level. Do not set any labels.

Severity: 3

Error message configuring

<html>

<head>
    <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1" />
    <title>Error 500 AsyncContext timeout</title>
</head>

<body>
    <h2>HTTP ERROR 500 AsyncContext timeout</h2>
    <table>
        <tr>
            <th>URI:</th>
            <td>/mgmt/shared/cloud-failover/inspect</td>
        </tr>
        <tr>
            <th>STATUS:</th>
            <td>500</td>
        </tr>
        <tr>
            <th>MESSAGE:</th>
            <td>AsyncContext timeout</td>
        </tr>
        <tr>
            <th>SERVLET:</th>
            <td>com.f5.rest.app.RestServerServlet-51cdd8a</td>
        </tr>
    </table>
    <hr /><a href="https://eclipse.org/jetty">Powered by Jetty:// 9.4.49.v20220914</a>
    <hr />

</body>

</html>

Code

{
  "class": "Cloud_Failover",
  "environment": "azure",
  "controls": {
    "class": "Controls",
    "logLevel": "silly"
  },
  "externalStorage": {
    "scopingTags":{
      "f5_cloud_failover_label":"bigip_high_availability_solution"
   } 
  },
  "failoverAddresses": {
    "enabled": true,
    "scopingTags": {
      "f5_cloud_failover_label": "bigip_high_availability_solution"
   }
  },
  "failoverRoutes": {
    "enabled": true,
    "routeGroupDefinitions": [
      {
        "scopingName": "rt-ingress-cn3",
        "scopingAddressRanges": [
          {
            "range": "0.0.0.0/0"
          }
        ],
        "defaultNextHopAddresses": {
          "discoveryType": "static",
          "items": [
            "10.45.136.11",
            "10.45.136.12"
          ]
        }
      }
    ]
  }
}

Configured previous working version 1.5 and it's start working.

The text was updated successfully, but these errors were encountered:

CTV-2023 · 2024-02-29T11:56:54Z

We have the same issue. It appeared on version 16.1.4.2 with CFE 2.0.2. We made many tests with F5 support* and reverted CFE to version 1.15 (which still works). We're waiting for an update. Our F5 maintener managed the case with F5 I don't have the ticket number

*for instance, we tried to modify these DB variable (default value 60) as they thought it was related to https://my.f5.com/manage/s/article/K000136003

(F5-AZURE-02)(cfg-sync In Sync)(Standby)(/Common)(tmos)# modify sys db icrd.timeout value 300
(F5-AZURE-02)(cfg-sync In Sync)(Standby)(/Common)(tmos)# modify sys db restjavad.timeout value 300
(F5-AZURE-02)(cfg-sync In Sync)(Standby)(/Common)(tmos)# modify sys db restnoded.timeout value 300

EDIT : note this issue breaks CFE completly, declare won't work anymore, failover tests (dry-run) and real failovers won't work anymore too

DahlPatric · 2024-03-01T14:38:29Z

Confirming version BIG-IP 16.1.4.2 Build 0.0.3 Point Release 2 either 1,5 nor 2.0 CFE seams to work.
When requesting any data from CFE API endpoint we see [f5-cloud-failover] Status: Error getting instance metadata connect ECONNREFUSED 169.254.169.254:80 in logs, if related or not we not sure.

mikeshimkus · 2024-03-05T15:50:44Z

@DahlPatric You need to verify you can query the Azure metadata service from both devices: https://learn.microsoft.com/en-us/azure/virtual-machines/instance-metadata-service?tabs=linux#access-azure-instance-metadata-service

This is a prerequisite of CFE.

@CTV-2023 Sounds like you have a different issue since v1.15.0 works. Can you open a new GitHub issue and provide the case info?

CTV-2023 · 2024-03-05T15:57:56Z

@mikeshimkus it might not be needed, it could be related to the storage account. F5 support provided me this information

The engineering team have reviewed the data and requested if you can login to azure portal and check under storage account 'XXXXX', do they have a container named 'f5cloudfailover' and also make sure this is the first one on the list of containers. If you can share the screenshot of this.
The reason it should be first on the list is due to the fact that the code of CFE 2.0.2, it looks for the first container and if the f5cloudfailover is not first then we will get an error.
If this is not the first container then if you can make sure it is moved up the list and test again.

And in our case we have "boot diagnostics" containers, which might have been created by Azure when we deployed the VMs 2 years ago. I don't know how I can move the container, Azure doesn't seem to provide this option, but I might be able to delete the 2 other containers and try again (waiting for an update)

Maybe @DahlPatric can check his Azure config too

mikeshimkus · 2024-03-05T16:09:31Z

Aha, yes in fact you should have a dedicated storage account for CFE. We don't call that out in the documentation but I will add a task to do just that.

DahlPatric · 2024-03-06T09:23:09Z

@mikeshimkus it might not be needed, it could be related to the storage account. F5 support provided me this information

The engineering team have reviewed the data and requested if you can login to azure portal and check under storage account 'XXXXX', do they have a container named 'f5cloudfailover' and also make sure this is the first one on the list of containers. If you can share the screenshot of this.
The reason it should be first on the list is due to the fact that the code of CFE 2.0.2, it looks for the first container and if the f5cloudfailover is not first then we will get an error.
If this is not the first container then if you can make sure it is moved up the list and test again.

And in our case we have "boot diagnostics" containers, which might have been created by Azure when we deployed the VMs 2 years ago. I don't know how I can move the container, Azure doesn't seem to provide this option, but I might be able to delete the 2 other containers and try again (waiting for an update)

Maybe @DahlPatric can check his Azure config too

Yes the Storage Account (stcfe) that ARM template created have a container called f5cloudfailover.

Changed value to below
"scopingName": "f5cloudfailover",

Pushed configuration again but same "500 AsyncContext timeout" error on CFE 1.5.
Doesn't work on CFE 2.0 either but a different error:

{
    "message": "Error getting instance metadata undefined -> Also see cloud docs link for more help: https://clouddocs.f5.com/products/extensions/f5-cloud-failover/latest/userguide/troubleshooting.html"
}

Executed /mgmt/shared/cloud-failover/inspect

{
    "message": "Failover initialization failed: Error getting instance metadata undefined"
}

Could it be combination with version 16.1.4.2 that cause this issues and not actually CFE?

mikeshimkus · 2024-03-06T15:10:50Z

@DahlPatric The scoping name needs to match the name of the storage account, not the container.

We have tested CFE with 16.1.4.2 in the US regions, so it is not a problem with the VE version. Can you the Azure instance metadata service from the VE (outside of CFE): curl -s -H Metadata:true --noproxy "*" "http://169.254.169.254/metadata/instance?api-version=2021-02-01" | jq

I suspect you cannot, so you need to resolve that before CFE can work.

DahlPatric · 2024-03-06T15:15:34Z

@DahlPatric The scoping name needs to match the name of the storage account, not the container.

We have tested CFE with 16.1.4.2 in the US regions, so it is not a problem with the VE version. Can you the Azure instance metadata service from the VE (outside of CFE): curl -s -H Metadata:true --noproxy "*" "http://169.254.169.254/metadata/instance?api-version=2021-02-01" | jq

I suspect you cannot, so you need to resolve that before CFE can work.

@mikeshimkus, when you say from VE what do you mean? A assume its not from F5 CLI?

mikeshimkus · 2024-03-06T15:29:30Z

Using curl from the shell on the BIG-IP instances.

DahlPatric · 2024-03-07T12:00:21Z

Using curl from the shell on the BIG-IP instances.

No response from IP. I could see traffic exist from SelfIP.

01:57:43.919215 IP (tos 0x0, ttl 64, id 52923, offset 0, flags [DF], proto TCP (6), length 60) 10.45.140.11.13507 > 169.254.169.254.http: Flags [S], cksum 0xea63 (incorrect -> 0x4740), seq 3185618879, win 29200, options [mss 1460,sackOK,TS val 408480110 ecr 0,nop,wscale 7], length 0 out slot1/tmm0 lis= port=1.1 trunk= 01:57:44.921287 IP (tos 0x0, ttl 64, id 52924, offset 0, flags [DF], proto TCP (6), length 60) 10.45.140.11.13507 > 169.254.169.254.http: Flags [S], cksum 0xea63 (incorrect -> 0x4356), seq 3185618879, win 29200, options [mss 1460,sackOK,TS val 408481112 ecr 0,nop,wscale 7], length 0 out slot1/tmm0 lis= port=1.1 trunk=

Proxy reset to default as I have configured.
tmsh modify sys db proxy.host reset-to-default

Found this KB https://my.f5.com/manage/s/article/K000137268 could that be solution?

mikeshimkus · 2024-03-07T15:08:03Z

That KB article is for AS3, but your issue appears to be between the instance and Azure metadata service. I would verify that you have nothing on the BIG-IP blocking access to both 169.254.169.254 and 168.63.129.16. If you don't, then you need to contact Azure to troubleshoot why you can't connect.

DahlPatric · 2024-03-08T07:09:34Z

. Can you the Azure instance metadata service from the VE (outside of CFE): curl -s -H Metadata:true --noproxy "*" "http://169.254.169.254/metadata/instance?api-version=2021-02-01" | jq

We have verified from another VE that we have IP is accessible.

_a_2024-03-08 08_06_43-Chat _ Microsoft Teams classic

This roles out that there is any issues itself inside Azure.
What else can we do to find issues?

mikeshimkus · 2024-03-08T15:47:36Z

So it works on one VE, but not the other? This could still be an issue with Azure depending on security group or route configuration for that particular instance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error 500 AsyncContext timeout #135

Error 500 AsyncContext timeout #135

DahlPatric commented Feb 29, 2024

CTV-2023 commented Feb 29, 2024 •

edited

Loading

DahlPatric commented Mar 1, 2024

mikeshimkus commented Mar 5, 2024

CTV-2023 commented Mar 5, 2024

mikeshimkus commented Mar 5, 2024

DahlPatric commented Mar 6, 2024

mikeshimkus commented Mar 6, 2024

DahlPatric commented Mar 6, 2024

mikeshimkus commented Mar 6, 2024

DahlPatric commented Mar 7, 2024

mikeshimkus commented Mar 7, 2024

DahlPatric commented Mar 8, 2024

mikeshimkus commented Mar 8, 2024

Error 500 AsyncContext timeout #135

Error 500 AsyncContext timeout #135

Comments

DahlPatric commented Feb 29, 2024

Description

Environment information

Severity Level

Error message configuring

Code

Configured previous working version 1.5 and it's start working.

CTV-2023 commented Feb 29, 2024 • edited Loading

DahlPatric commented Mar 1, 2024

mikeshimkus commented Mar 5, 2024

CTV-2023 commented Mar 5, 2024

mikeshimkus commented Mar 5, 2024

DahlPatric commented Mar 6, 2024

mikeshimkus commented Mar 6, 2024

DahlPatric commented Mar 6, 2024

mikeshimkus commented Mar 6, 2024

DahlPatric commented Mar 7, 2024

mikeshimkus commented Mar 7, 2024

DahlPatric commented Mar 8, 2024

mikeshimkus commented Mar 8, 2024

CTV-2023 commented Feb 29, 2024 •

edited

Loading