Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Egress HA assigns different egress ips to same egress interface on same egress node #6836

Open
rajnkamr opened this issue Nov 28, 2024 · 4 comments
Labels
area/transit/egress Issues or PRs related to Egress (SNAT for traffic egressing the cluster). kind/bug Categorizes issue or PR as related to a bug. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.

Comments

@rajnkamr
Copy link
Contributor

Describe the bug

Egress HA assigns different egress ips to same egress interface on same egress node. It should be avoided unless nodeselector is provided as same egress node.
NAME EGRESSIP AGE NODE
egress-prod-web 172.18.0.11 19h bgp-worker2

egress-staging-web 172.18.0.12 19h bgp-worker2

Egress Interface

14: antrea-egress0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default link/ether e2:12:06:f2:dc:2f brd ff:ff:ff:ff:ff:ff inet 172.18.0.11/32 scope global antrea-egress0 valid_lft forever preferred_lft forever

inet 172.18.0.12/32 scope global antrea-egress0 valid_lft forever preferred_lft forever

To Reproduce

Apply below config
1.externalippool.yaml

apiVersion: crd.antrea.io/v1beta1
kind: ExternalIPPool
metadata:
  name: external-ip-pool
spec:
  ipRanges:
  - start: 172.18.0.11  # 172.18.0.11-172.18.0.20 can be used as Egress IPs
    end: 172.18.0.20
  nodeSelector: {}     # All Nodes can be Egress Nodes`

2.egress1.yaml

apiVersion: crd.antrea.io/v1beta1
kind: Egress
metadata:
  name: egress-prod-web
spec:
  appliedTo:
    namespaceSelector:
      matchLabels:
        kubernetes.io/metadata.name: prod
    podSelector:
      matchLabels:
        app: web
  externalIPPool: external-ip-pool`

3.egress2.yaml

apiVersion: crd.antrea.io/v1beta1
kind: Egress
metadata:
  name: egress-staging-web
spec:
  appliedTo:
    namespaceSelector:
      matchLabels:
        kubernetes.io/metadata.name: staging
    podSelector:
      matchLabels:
        app: web
  externalIPPool: external-ip-pool`

Expected

Unless nodeselector is a single egress node, the expectation should be to assign egress ip to different interfaces on different nodes to avoid external traffic disruption for all egress workloads !

Actual behavior

Versions:

Additional context

@rajnkamr rajnkamr added kind/bug Categorizes issue or PR as related to a bug. area/transit/egress Issues or PRs related to Egress (SNAT for traffic egressing the cluster). labels Nov 28, 2024
@jainpulkit22
Copy link
Contributor

jainpulkit22 commented Nov 29, 2024

There is already a concept of maxEgressIPs per node, so if the egressIPs per node is less that threshold we can still assign more IPs to the egress interface on that node without having any traffic disruptions. And selecting a node for egress is a random process and it can select any node if that node has not reach the threshold for max number of EgressIPs, and I think that this behaviour is correct also because we cannot restrict a node to have only one EgressIP.
So, IMO this is expected behaviour and is correct also.

cc: @antoninbas @tnqn

@antoninbas
Copy link
Contributor

@rajnkamr EgressIP allocation is "random" (or more accurately, hash-based using a consistent hash map) among all eligible Nodes:

func (m *Map) GetWithFilters(key string, filters ...func(string) bool) string {

Looks like you just got "unlucky" and that both IP addresses got allocated to the same Node. But if you only have 2 Nodes and 2 Egress IPs, then the probability of this allocation is 0.5

The expectation is that with a large number of EgressIPs, they will be evenly distributed over all Nodes. Maybe you can try with 10+ Egress resources instead?

@antoninbas antoninbas added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Dec 2, 2024
@rajnkamr
Copy link
Contributor Author

rajnkamr commented Dec 17, 2024

List of 7 egresses are created, the pods are scheduled on bgp-worker and bgp-worker2

prod                 busybox-pod3                                1/1     Running   45 (20m ago)   19d   10.244.1.3   bgp-worker          <none>           <none>
staging              busybox-pod8                                1/1     Running   45 (19m ago)   19d   10.244.2.5   bgp-worker2         <none>           <none>

Egress ip is getting assigned randomly, could it be preferred to assign the egress ip on same node where pod is running( default case) ?
Although it can be achieved by providing nodeselector

NAME                  EGRESSIP      AGE     NODE
egress-prod-web       172.18.0.11   19d     bgp-worker2
egress-prod-web1      172.18.0.16   2m18s   bgp-control-plane
egress-prod-web2      172.18.0.17   2m18s   bgp-control-plane
egress-staging-web    172.18.0.12   19d     bgp-worker2
egress-staging-web1   172.18.0.13   92m     bgp-worker2
egress-staging-web2   172.18.0.14   92m     bgp-worker2
egress-staging-web3   172.18.0.15   92m     bgp-worker2

@tnqn
Copy link
Member

tnqn commented Dec 17, 2024

Egress ip is getting assigned randomly, could it be preferred to assign the egress ip on same node where pod is running( default case) ?

This was considered. However, it might appear "optimized" for a specific Pod at a particular moment, it does not provide obvious benefits from the perspective of the entire cluster or over an extended period. Pod distribution is random and can change over time. The best distribution at one moment may be the worst at another moment and we can't just migrate the IPs to get the best distribution, which could break all established connections.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/transit/egress Issues or PRs related to Egress (SNAT for traffic egressing the cluster). kind/bug Categorizes issue or PR as related to a bug. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.
Projects
None yet
Development

No branches or pull requests

4 participants