Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error 500 AsyncContext timeout #135

Open
DahlPatric opened this issue Feb 29, 2024 · 13 comments
Open

Error 500 AsyncContext timeout #135

DahlPatric opened this issue Feb 29, 2024 · 13 comments

Comments

@DahlPatric
Copy link

Description

Issues configuring CFE version 2.0

Environment information

  • Cloud Failover Extension Version: 2.0
  • BIG-IP version: 16.1
  • Cloud provider: Azure

Severity Level

For bugs, enter the bug severity level. Do not set any labels.

Severity: 3

Error message configuring

<html>

<head>
    <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1" />
    <title>Error 500 AsyncContext timeout</title>
</head>

<body>
    <h2>HTTP ERROR 500 AsyncContext timeout</h2>
    <table>
        <tr>
            <th>URI:</th>
            <td>/mgmt/shared/cloud-failover/inspect</td>
        </tr>
        <tr>
            <th>STATUS:</th>
            <td>500</td>
        </tr>
        <tr>
            <th>MESSAGE:</th>
            <td>AsyncContext timeout</td>
        </tr>
        <tr>
            <th>SERVLET:</th>
            <td>com.f5.rest.app.RestServerServlet-51cdd8a</td>
        </tr>
    </table>
    <hr /><a href="https://eclipse.org/jetty">Powered by Jetty:// 9.4.49.v20220914</a>
    <hr />

</body>

</html>

Code

{
  "class": "Cloud_Failover",
  "environment": "azure",
  "controls": {
    "class": "Controls",
    "logLevel": "silly"
  },
  "externalStorage": {
    "scopingTags":{
      "f5_cloud_failover_label":"bigip_high_availability_solution"
   } 
  },
  "failoverAddresses": {
    "enabled": true,
    "scopingTags": {
      "f5_cloud_failover_label": "bigip_high_availability_solution"
   }
  },
  "failoverRoutes": {
    "enabled": true,
    "routeGroupDefinitions": [
      {
        "scopingName": "rt-ingress-cn3",
        "scopingAddressRanges": [
          {
            "range": "0.0.0.0/0"
          }
        ],
        "defaultNextHopAddresses": {
          "discoveryType": "static",
          "items": [
            "10.45.136.11",
            "10.45.136.12"
          ]
        }
      }
    ]
  }
}

Configured previous working version 1.5 and it's start working.

@CTV-2023
Copy link

CTV-2023 commented Feb 29, 2024

We have the same issue. It appeared on version 16.1.4.2 with CFE 2.0.2. We made many tests with F5 support* and reverted CFE to version 1.15 (which still works). We're waiting for an update. Our F5 maintener managed the case with F5 I don't have the ticket number

*for instance, we tried to modify these DB variable (default value 60) as they thought it was related to https://my.f5.com/manage/s/article/K000136003

(F5-AZURE-02)(cfg-sync In Sync)(Standby)(/Common)(tmos)# modify sys db icrd.timeout value 300
(F5-AZURE-02)(cfg-sync In Sync)(Standby)(/Common)(tmos)# modify sys db restjavad.timeout value 300
(F5-AZURE-02)(cfg-sync In Sync)(Standby)(/Common)(tmos)# modify sys db restnoded.timeout value 300

EDIT : note this issue breaks CFE completly, declare won't work anymore, failover tests (dry-run) and real failovers won't work anymore too

@DahlPatric
Copy link
Author

Confirming version BIG-IP 16.1.4.2 Build 0.0.3 Point Release 2 either 1,5 nor 2.0 CFE seams to work.
When requesting any data from CFE API endpoint we see [f5-cloud-failover] Status: Error getting instance metadata connect ECONNREFUSED 169.254.169.254:80 in logs, if related or not we not sure.

@mikeshimkus
Copy link
Contributor

@DahlPatric You need to verify you can query the Azure metadata service from both devices: https://learn.microsoft.com/en-us/azure/virtual-machines/instance-metadata-service?tabs=linux#access-azure-instance-metadata-service

This is a prerequisite of CFE.

@CTV-2023 Sounds like you have a different issue since v1.15.0 works. Can you open a new GitHub issue and provide the case info?

@CTV-2023
Copy link

CTV-2023 commented Mar 5, 2024

@mikeshimkus it might not be needed, it could be related to the storage account. F5 support provided me this information

The engineering team have reviewed the data and requested if you can login to azure portal and check under storage account 'XXXXX', do they have a container named 'f5cloudfailover' and also make sure this is the first one on the list of containers. If you can share the screenshot of this.
The reason it should be first on the list is due to the fact that the code of CFE 2.0.2, it looks for the first container and if the f5cloudfailover is not first then we will get an error.
If this is not the first container then if you can make sure it is moved up the list and test again.

And in our case we have "boot diagnostics" containers, which might have been created by Azure when we deployed the VMs 2 years ago. I don't know how I can move the container, Azure doesn't seem to provide this option, but I might be able to delete the 2 other containers and try again (waiting for an update)

Maybe @DahlPatric can check his Azure config too

@mikeshimkus
Copy link
Contributor

Aha, yes in fact you should have a dedicated storage account for CFE. We don't call that out in the documentation but I will add a task to do just that.

@DahlPatric
Copy link
Author

@mikeshimkus it might not be needed, it could be related to the storage account. F5 support provided me this information

The engineering team have reviewed the data and requested if you can login to azure portal and check under storage account 'XXXXX', do they have a container named 'f5cloudfailover' and also make sure this is the first one on the list of containers. If you can share the screenshot of this.
The reason it should be first on the list is due to the fact that the code of CFE 2.0.2, it looks for the first container and if the f5cloudfailover is not first then we will get an error.
If this is not the first container then if you can make sure it is moved up the list and test again.

And in our case we have "boot diagnostics" containers, which might have been created by Azure when we deployed the VMs 2 years ago. I don't know how I can move the container, Azure doesn't seem to provide this option, but I might be able to delete the 2 other containers and try again (waiting for an update)

Maybe @DahlPatric can check his Azure config too

Yes the Storage Account (stcfe) that ARM template created have a container called f5cloudfailover.
image

Changed value to below
"scopingName": "f5cloudfailover",

Pushed configuration again but same "500 AsyncContext timeout" error on CFE 1.5.
Doesn't work on CFE 2.0 either but a different error:

{
    "message": "Error getting instance metadata undefined -> Also see cloud docs link for more help: https://clouddocs.f5.com/products/extensions/f5-cloud-failover/latest/userguide/troubleshooting.html"
}

Executed /mgmt/shared/cloud-failover/inspect

{
    "message": "Failover initialization failed: Error getting instance metadata undefined"
}

Could it be combination with version 16.1.4.2 that cause this issues and not actually CFE?

@mikeshimkus
Copy link
Contributor

@DahlPatric The scoping name needs to match the name of the storage account, not the container.

We have tested CFE with 16.1.4.2 in the US regions, so it is not a problem with the VE version. Can you the Azure instance metadata service from the VE (outside of CFE): curl -s -H Metadata:true --noproxy "*" "http://169.254.169.254/metadata/instance?api-version=2021-02-01" | jq

I suspect you cannot, so you need to resolve that before CFE can work.

@DahlPatric
Copy link
Author

@DahlPatric The scoping name needs to match the name of the storage account, not the container.

We have tested CFE with 16.1.4.2 in the US regions, so it is not a problem with the VE version. Can you the Azure instance metadata service from the VE (outside of CFE): curl -s -H Metadata:true --noproxy "*" "http://169.254.169.254/metadata/instance?api-version=2021-02-01" | jq

I suspect you cannot, so you need to resolve that before CFE can work.

@mikeshimkus, when you say from VE what do you mean? A assume its not from F5 CLI?

@mikeshimkus
Copy link
Contributor

Using curl from the shell on the BIG-IP instances.

@DahlPatric
Copy link
Author

Using curl from the shell on the BIG-IP instances.

No response from IP. I could see traffic exist from SelfIP.

01:57:43.919215 IP (tos 0x0, ttl 64, id 52923, offset 0, flags [DF], proto TCP (6), length 60) 10.45.140.11.13507 > 169.254.169.254.http: Flags [S], cksum 0xea63 (incorrect -> 0x4740), seq 3185618879, win 29200, options [mss 1460,sackOK,TS val 408480110 ecr 0,nop,wscale 7], length 0 out slot1/tmm0 lis= port=1.1 trunk= 01:57:44.921287 IP (tos 0x0, ttl 64, id 52924, offset 0, flags [DF], proto TCP (6), length 60) 10.45.140.11.13507 > 169.254.169.254.http: Flags [S], cksum 0xea63 (incorrect -> 0x4356), seq 3185618879, win 29200, options [mss 1460,sackOK,TS val 408481112 ecr 0,nop,wscale 7], length 0 out slot1/tmm0 lis= port=1.1 trunk=

Proxy reset to default as I have configured.
tmsh modify sys db proxy.host reset-to-default

Found this KB https://my.f5.com/manage/s/article/K000137268 could that be solution?

@mikeshimkus
Copy link
Contributor

That KB article is for AS3, but your issue appears to be between the instance and Azure metadata service. I would verify that you have nothing on the BIG-IP blocking access to both 169.254.169.254 and 168.63.129.16. If you don't, then you need to contact Azure to troubleshoot why you can't connect.

@DahlPatric
Copy link
Author

. Can you the Azure instance metadata service from the VE (outside of CFE): curl -s -H Metadata:true --noproxy "*" "http://169.254.169.254/metadata/instance?api-version=2021-02-01" | jq

We have verified from another VE that we have IP is accessible.

_a_2024-03-08 08_06_43-Chat _ Microsoft Teams classic

This roles out that there is any issues itself inside Azure.
What else can we do to find issues?

@mikeshimkus
Copy link
Contributor

So it works on one VE, but not the other? This could still be an issue with Azure depending on security group or route configuration for that particular instance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants