Skip to content
This repository was archived by the owner on Apr 30, 2025. It is now read-only.
This repository was archived by the owner on Apr 30, 2025. It is now read-only.

Enhance gorouter retry logic #437

Closed
Closed
@b1tamara

Description

@b1tamara

Current behavior

  • balancing_algorithm: round_robin
  • cf app with two instances
  • each instance has max_connection limit = 1000
  • app code:
import express from 'express';
import http from 'http';
import { setTimeout } from 'timers/promises';

const app = express();

app.get('/', async (req, res) => {
    await setTimeout(5000);
    res.send('Hello, World!');
});

const server = http.createServer(app);
server.maxConnections = 1000;

const PORT = process.env.PORT || 3000;
server.listen(PORT, () => {
    console.log(`Server is running on http://localhost:${PORT}`);
    console.log(`Max connections: ${server.maxConnections}`);
});

Start to send requests:
hey -c 1000 -n 1000 -H "X-Cf-App-Instance":"1edeb42a-076a-41a2-84a6-a70056e9938e:0" "https://retry-test.<domain>"
After 1-2 seconds:
hey -c 1000 -n 1000 "https://retry-test.<domain>"

All 1000 requests with the CF-App-Instance Header are successful.
Ca. 1/4 of request without CF-App-Instance Header finished with 502 as gorouter sent retries to the same app instance.

Observations:

  • Every RequestInfo has a pointer to the EndpointPool. For the same uri(route) it points to the same EndpointPool.
  • The issue is related to the iteration over pool endpoints.
  • Gorouter gets a lot of concurrent requests to the same route.
  • Max_retries = number of app instances.
  • A couple of seconds the app instance with instance_id = 0 cannot accept new connections because of max connection limit in the app code.

Found logs (written by the same gorouter):

"timestamp":"2024-09-10T09:46:47.350182234Z","message":"backend","data":{"instance_id":"0"},"attempt":1,"vcap_request_id":"703d8ce7-b187-41b6-4a02-a122d1ee8b60"

"timestamp":"2024-09-10T09:46:47.350406060Z","message":"backend","data":{"instance_id":"1"},"attempt":1,"vcap_request_id":"ffed058e-a289-4ca9-7d22-0e345c2bd54f"

"timestamp":"2024-09-10T09:46:47.357750671Z","message":"backend","data":{"instance_id":"0"},"attempt":2,"vcap_request_id":"703d8ce7-b187-41b6-4a02-a122d1ee8b60"

"timestamp":"2024-09-10T09:46:47.357588430Z","message":"backend-endpoint-failed","data":{"instance_id":"0"},"error":"EOF (via idempotent request)","attempt":1,"vcap_request_id":"703d8ce7-b187-41b6-4a02-a122d1ee8b60"

"timestamp":"2024-09-10T09:46:47.363893483Z","message":"backend-endpoint-failed","data":{"instance_id":"0"},"error":"EOF (via idempotent request)","attempt":2,"vcap_request_id":"703d8ce7-b187-41b6-4a02-a122d1ee8b60"

The request with vcap_request_id=703d8ce7-b187-41b6-4a02-a122d1ee8b60 was sent to the same instance_id=0 two times, reached the max_retries value and finished with 502.

In other words according to the Round Robin logic for seeking of next endpoint:
The Gorouter gets a first request (703d8ce7-b187-41b6-4a02-a122d1ee8b60) ->

  • pool.currentIndex=0
  • pool.nextIdx = 1
  • attempt = 1
  • selectedEndpoint: app instance with instance_id=0
  • Result: backend-endpoint-failed (EOF) & It should be retried.

The Gorouter gets a request (ffed058e-a289-4ca9-7d22-0e345c2bd54f)

  • pool.currentIndex=1
  • pool.nextIdx = 0
  • attempt=1
  • selectedEndpoint: app instance with instance_id=1
  • Result: Success

The Gorouter retries the first request (703d8ce7-b187-41b6-4a02-a122d1ee8b60) ->

  • pool.currentIndex=0
  • pool.nextIdx = 1
  • attempt=2
  • selectedEndpoint: app instance with instance_id=0
  • Result: backend-endpoint-failed (EOF) & 502 as max_retries reached.

EOF is classified as retriable backend error but it does not belongt to FailableClassifiers.

Desired behavior

  • All requests have to be successful as two app instances can accept and process 2000 requests.
  • Gorouter's retry should not go to the same app instance twice.

Affected Version

routing-release: 0.301

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions