Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calico-VPP pod claims IPv6 address of node but uses IPv4 address instead #651

Open
nesselzzz opened this issue Nov 6, 2023 · 1 comment

Comments

@nesselzzz
Copy link

Environment

  • Calico/VPP version: 3.26.0
  • Kubernetes version: 1.28.3
  • Deployment type: On-prem VM
  • Network configuration: Calico / want to do SRv6 but haven't gotten there yet
  • Containerd - 1.7.8

Issue description
I'm setting up an IPv6 cluster. Each node in the cluster has two interfaces within ESXi. One interface is an ipv4 interface for OOBM, and the other serves as the main interface for kubernetes and is the uplink interface for vpp. Whenever I run "kubectl create -f calico-vpp.yaml", my node loses its IPv6 address (as the documentation states). I would expect this to be hitless if I understand the documentation properly, however anything trying to reach that IP is met with no response. As a result, all kubectl commands stop working since the API was using that address.

I have used nerdctl to exec into the container, and when executing "ip a", the uplink interface I configured shows no IPv6 address...only link local. Surprisingly the IPv4 address and interface is listed in the container, and the node has not lost that IP at all.

Is this a bug or am I doing something wrong?

To Reproduce
Steps to reproduce the behavior:

  • Init kubernetes using kubeadm yaml file below:
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: "::"
  bindPort: 6443
nodeRegistration:
  criSocket: unix:///var/run/containerd/containerd.sock
  kubeletExtraArgs:
    node-ip: "{{ ipv6_node_ip}}"
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
controlPlaneEndpoint: "{{ ipv6_node_ip}}"
apiServer:
  extraArgs:
    oidc-issuer-url: https://werwerr.me
    oidc-client-id: ASF4Os1wJysH6uWvJV9PvyNiph4y4O84tGCHj1FZEE8
networking:
  serviceSubnet: "{{ ipv6_services_subnet }}/108"
  podSubnet: "{{ ipv6_pod_subnet }}64"
---
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  name: default
spec:
  calicoNetwork:
    linuxDataplane: VPP
    nodeAddressAutodetectionV4: {}
    nodeAddressAutodetectionV6:
      interface: {{ uplink interface }}

---
apiVersion: operator.tigera.io/v1
kind: APIServer 
metadata: 
  name: default 
spec: {}

Expected behavior
calico-vpp pod would successfully be created, and I would be able to maintain ipv6 connectivity

@nesselzzz
Copy link
Author

Did a little more troubleshooting. I have found that before even applying the calico-vpp.yaml file, and applying the base calico.yaml file to get calico instantiated, it crashes when I specify the linuxDataplane as VPP. When checking the logs, I see the following errors:

2023-11-07 13:17:43.430 [INFO][18] tunnel-ip-allocator/param_types.go 291: Looking for executable on path name="/usr/local/bin/felix-plugins/felix-api-proxy"
2023-11-07 13:17:43.431 [WARNING][18] tunnel-ip-allocator/param_types.go 295: Path lookup failed error=exec: "/usr/local/bin/felix-plugins/felix-api-proxy": stat /usr/local/bin/felix-plugins/felix-api-proxy: no such file or directory name="/usr/local/bin/felix-plugins/felix-api-proxy"
2023-11-07 13:17:43.431 [ERROR][18] tunnel-ip-allocator/config_params.go 636: Invalid (required) config value. error=Failed to parse config parameter DataplaneDriver; value "/usr/local/bin/felix-plugins/felix-api-proxy": missing file source=environment variable
2023-11-07 13:17:43.431 [PANIC][18] tunnel-ip-allocator/allocateip.go 836: Failed to parse Felix environments error=Failed to parse config parameter DataplaneDriver; value "/usr/local/bin/felix-plugins/felix-api-proxy": missing file

As for applying the calico-vpp.yaml file, I managed to be able to check the logs before kubectl loses connectivity to the API. The logs are below:

time="2023-11-07T13:51:52Z" level=info msg="Version info\nImage tag                   : 20c50cfd71e32ab9c15d4632e2b4a9659993148d\nVPP-dataplane version       : 20c50cf yaml: build yamls to add bgpfilters\nVPP Version                 : 23.10-rc0~6-g892b7bce0\nBinapi-generator version    : v0.8.0-dev\nVPP Base commit             : 03304d1c6 gerrit:34726/3 interface: add buffer stats api\n------------------ Cherry picked commits --------------------\ninterface: Fix interface.api endianness\ncapo: Calico Policies plugin\nacl: acl-plugin custom policies\ncnat: [WIP] no k8s maglev from pods\npbl: Port based balancer\ngerrit:34726/3 interface: add buffer stats api\n-------------------------------------------------------------\n"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_INTERFACES={\n  \"defaultPodIfSpec\": {\n    \"rx\": 1,\n    \"tx\": 1,\n    \"rxqsz\": 0,\n    \"txqsz\": 0,\n    \"isl3\": true,\n    \"rxMode\": 0\n  },\n  \"maxPodIfSpec\": {\n    \"rx\": 10,\n    \"tx\": 10,\n    \"rxqsz\": 1024,\n    \"txqsz\": 1024,\n    \"isl3\": null,\n    \"rxMode\": 0\n  },\n  \"vppHostTapSpec\": {\n    \"rx\": 1,\n    \"tx\": 1,\n    \"rxqsz\": 1024,\n    \"txqsz\": 1024,\n    \"isl3\": false,\n    \"rxMode\": 0\n  },\n  \"uplinkInterfaces\": [\n    {\n      \"rx\": 0,\n      \"tx\": 0,\n      \"rxqsz\": 0,\n      \"txqsz\": 0,\n      \"isl3\": null,\n      \"rxMode\": 0,\n      \"physicalNetworkName\": \"\",\n      \"interfaceName\": \"ens192\",\n      \"vppDriver\": \"af_packet\",\n      \"newDriver\": \"\",\n      \"annotations\": null,\n      \"mtu\": 0\n    }\n  ]\n}"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_GRACEFUL_SHUTDOWN_TIMEOUT=10s"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_SWAP_DRIVER="
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_CONFIG_EXEC_TEMPLATE="
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_HOOK_VPP_RUNNING=#!/bin/sh\n\nHOOK=\"$0\"\nchroot /host /bin/sh <<EOSCRIPT\n\nfix_dns () {\n    if systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; fixing dns...\"\n        sed -i \"s/\\[main\\]/\\[main\\]\\ndns=none/\" /etc/NetworkManager/NetworkManager.conf\n        systemctl daemon-reload\n        systemctl restart NetworkManager\n    fi\n}\n\nundo_dns_fix () {\n    if systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; undoing dns fix...\"\n        sed -i \"0,/dns=none/{/dns=none/d;}\" /etc/NetworkManager/NetworkManager.conf\n        systemctl daemon-reload\n        systemctl restart NetworkManager\n    fi\n}\n\nrestart_network () {\n    if systemctl status systemd-networkd > /dev/null 2>&1; then\n        echo \"default_hook: system is using systemd-networkd; restarting...\"\n        systemctl restart systemd-networkd\n    elif systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; restarting...\"\n        systemctl restart NetworkManager\n    elif systemctl status networking > /dev/null 2>&1; then\n        echo \"default_hook: system is using networking service; restarting...\"\n        systemctl restart networking\n    elif systemctl status network > /dev/null 2>&1; then\n        echo \"default_hook: system is using network service; restarting...\"\n        systemctl restart network\n    else\n        echo \"default_hook: Networking backend not detected, network configuration may fail\"\n    fi\n}\n\nif which systemctl > /dev/null; then\n    echo \"default_hook: using systemctl...\"\nelse\n    echo \"default_hook: Init system not supported, network configuration may fail\"\n    exit 1\nfi\n\nif [ \"$HOOK\" = \"BEFORE_VPP_RUN\" ]; then\n    fix_dns\nelif [ \"$HOOK\" = \"VPP_RUNNING\" ]; then\n    restart_network\nelif [ \"$HOOK\" = \"VPP_DONE_OK\" ]; then\n    undo_dns_fix\n    restart_network\nelif [ \"$HOOK\" = \"VPP_ERRORED\" ]; then\n    undo_dns_fix\n    restart_network\nfi\n\nEOSCRIPT\n"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_LOG_LEVEL=info"
time="2023-11-07T13:51:52Z" level=info msg="Config:SERVICE_PREFIX=[2600:1700:3960:c71f:1::/108]"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_INITIAL_CONFIG={\n  \"vppStartupSleepSeconds\": 1,\n  \"corePattern\": \"/var/lib/vpp/vppcore.%e.%p\",\n  \"extraAddrCount\": 0,\n  \"ifConfigSavePath\": \"\",\n  \"defaultGWs\": \"\"\n}"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_LOG_FORMAT="
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_NATIVE_DRIVER="
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_HOOK_VPP_DONE_OK=#!/bin/sh\n\nHOOK=\"$0\"\nchroot /host /bin/sh <<EOSCRIPT\n\nfix_dns () {\n    if systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; fixing dns...\"\n        sed -i \"s/\\[main\\]/\\[main\\]\\ndns=none/\" /etc/NetworkManager/NetworkManager.conf\n        systemctl daemon-reload\n        systemctl restart NetworkManager\n    fi\n}\n\nundo_dns_fix () {\n    if systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; undoing dns fix...\"\n        sed -i \"0,/dns=none/{/dns=none/d;}\" /etc/NetworkManager/NetworkManager.conf\n        systemctl daemon-reload\n        systemctl restart NetworkManager\n    fi\n}\n\nrestart_network () {\n    if systemctl status systemd-networkd > /dev/null 2>&1; then\n        echo \"default_hook: system is using systemd-networkd; restarting...\"\n        systemctl restart systemd-networkd\n    elif systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; restarting...\"\n        systemctl restart NetworkManager\n    elif systemctl status networking > /dev/null 2>&1; then\n        echo \"default_hook: system is using networking service; restarting...\"\n        systemctl restart networking\n    elif systemctl status network > /dev/null 2>&1; then\n        echo \"default_hook: system is using network service; restarting...\"\n        systemctl restart network\n    else\n        echo \"default_hook: Networking backend not detected, network configuration may fail\"\n    fi\n}\n\nif which systemctl > /dev/null; then\n    echo \"default_hook: using systemctl...\"\nelse\n    echo \"default_hook: Init system not supported, network configuration may fail\"\n    exit 1\nfi\n\nif [ \"$HOOK\" = \"BEFORE_VPP_RUN\" ]; then\n    fix_dns\nelif [ \"$HOOK\" = \"VPP_RUNNING\" ]; then\n    restart_network\nelif [ \"$HOOK\" = \"VPP_DONE_OK\" ]; then\n    undo_dns_fix\n    restart_network\nelif [ \"$HOOK\" = \"VPP_ERRORED\" ]; then\n    undo_dns_fix\n    restart_network\nfi\n\nEOSCRIPT\n"
time="2023-11-07T13:51:52Z" level=info msg="Config:NODENAME=kube-master1"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_INIT_SCRIPT_TEMPLATE="
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_HOOK_BEFORE_IF_READ=#!/bin/sh\n\nHOOK=\"$0\"\nchroot /host /bin/sh <<EOSCRIPT\n\nfix_dns () {\n    if systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; fixing dns...\"\n        sed -i \"s/\\[main\\]/\\[main\\]\\ndns=none/\" /etc/NetworkManager/NetworkManager.conf\n        systemctl daemon-reload\n        systemctl restart NetworkManager\n    fi\n}\n\nundo_dns_fix () {\n    if systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; undoing dns fix...\"\n        sed -i \"0,/dns=none/{/dns=none/d;}\" /etc/NetworkManager/NetworkManager.conf\n        systemctl daemon-reload\n        systemctl restart NetworkManager\n    fi\n}\n\nrestart_network () {\n    if systemctl status systemd-networkd > /dev/null 2>&1; then\n        echo \"default_hook: system is using systemd-networkd; restarting...\"\n        systemctl restart systemd-networkd\n    elif systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; restarting...\"\n        systemctl restart NetworkManager\n    elif systemctl status networking > /dev/null 2>&1; then\n        echo \"default_hook: system is using networking service; restarting...\"\n        systemctl restart networking\n    elif systemctl status network > /dev/null 2>&1; then\n        echo \"default_hook: system is using network service; restarting...\"\n        systemctl restart network\n    else\n        echo \"default_hook: Networking backend not detected, network configuration may fail\"\n    fi\n}\n\nif which systemctl > /dev/null; then\n    echo \"default_hook: using systemctl...\"\nelse\n    echo \"default_hook: Init system not supported, network configuration may fail\"\n    exit 1\nfi\n\nif [ \"$HOOK\" = \"BEFORE_VPP_RUN\" ]; then\n    fix_dns\nelif [ \"$HOOK\" = \"VPP_RUNNING\" ]; then\n    restart_network\nelif [ \"$HOOK\" = \"VPP_DONE_OK\" ]; then\n    undo_dns_fix\n    restart_network\nelif [ \"$HOOK\" = \"VPP_ERRORED\" ]; then\n    undo_dns_fix\n    restart_network\nfi\n\nEOSCRIPT\n"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_BGP_LOG_LEVEL=INFO"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_IPSEC_IKEV2_PSK="
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_DEBUG={}"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_FEATURE_GATES={}"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_IPSEC={\n  \"nbAsyncCryptoThreads\": 0,\n  \"extraAddresses\": 0\n}"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_SRV6={\n  \"localsidPool\": \"\",\n  \"policyPool\": \"\"\n}"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_INTERFACE="
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_CONFIG_TEMPLATE=unix {\n  nodaemon\n  full-coredump\n  cli-listen /var/run/vpp/cli.sock\n  pidfile /run/vpp/vpp.pid\n  exec /etc/vpp/startup.exec\n}\napi-trace { on }\ncpu {\n    workers 0\n}\nsocksvr {\n    socket-name /var/run/vpp/vpp-api.sock\n}\nplugins {\n    plugin default { enable }\n    plugin dpdk_plugin.so { disable }\n    plugin calico_plugin.so { enable }\n    plugin ping_plugin.so { disable }\n    plugin dispatch_trace_plugin.so { enable }\n}\nbuffers {\n  buffers-per-numa 131072\n}"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_HOOK_BEFORE_VPP_RUN=#!/bin/sh\n\nHOOK=\"$0\"\nchroot /host /bin/sh <<EOSCRIPT\n\nfix_dns () {\n    if systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; fixing dns...\"\n        sed -i \"s/\\[main\\]/\\[main\\]\\ndns=none/\" /etc/NetworkManager/NetworkManager.conf\n        systemctl daemon-reload\n        systemctl restart NetworkManager\n    fi\n}\n\nundo_dns_fix () {\n    if systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; undoing dns fix...\"\n        sed -i \"0,/dns=none/{/dns=none/d;}\" /etc/NetworkManager/NetworkManager.conf\n        systemctl daemon-reload\n        systemctl restart NetworkManager\n    fi\n}\n\nrestart_network () {\n    if systemctl status systemd-networkd > /dev/null 2>&1; then\n        echo \"default_hook: system is using systemd-networkd; restarting...\"\n        systemctl restart systemd-networkd\n    elif systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; restarting...\"\n        systemctl restart NetworkManager\n    elif systemctl status networking > /dev/null 2>&1; then\n        echo \"default_hook: system is using networking service; restarting...\"\n        systemctl restart networking\n    elif systemctl status network > /dev/null 2>&1; then\n        echo \"default_hook: system is using network service; restarting...\"\n        systemctl restart network\n    else\n        echo \"default_hook: Networking backend not detected, network configuration may fail\"\n    fi\n}\n\nif which systemctl > /dev/null; then\n    echo \"default_hook: using systemctl...\"\nelse\n    echo \"default_hook: Init system not supported, network configuration may fail\"\n    exit 1\nfi\n\nif [ \"$HOOK\" = \"BEFORE_VPP_RUN\" ]; then\n    fix_dns\nelif [ \"$HOOK\" = \"VPP_RUNNING\" ]; then\n    restart_network\nelif [ \"$HOOK\" = \"VPP_DONE_OK\" ]; then\n    undo_dns_fix\n    restart_network\nelif [ \"$HOOK\" = \"VPP_ERRORED\" ]; then\n    undo_dns_fix\n    restart_network\nfi\n\nEOSCRIPT\n"
time="2023-11-07T13:51:52Z" level=info msg="Config:CALICOVPP_HOOK_VPP_ERRORED=#!/bin/sh\n\nHOOK=\"$0\"\nchroot /host /bin/sh <<EOSCRIPT\n\nfix_dns () {\n    if systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; fixing dns...\"\n        sed -i \"s/\\[main\\]/\\[main\\]\\ndns=none/\" /etc/NetworkManager/NetworkManager.conf\n        systemctl daemon-reload\n        systemctl restart NetworkManager\n    fi\n}\n\nundo_dns_fix () {\n    if systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; undoing dns fix...\"\n        sed -i \"0,/dns=none/{/dns=none/d;}\" /etc/NetworkManager/NetworkManager.conf\n        systemctl daemon-reload\n        systemctl restart NetworkManager\n    fi\n}\n\nrestart_network () {\n    if systemctl status systemd-networkd > /dev/null 2>&1; then\n        echo \"default_hook: system is using systemd-networkd; restarting...\"\n        systemctl restart systemd-networkd\n    elif systemctl status NetworkManager > /dev/null 2>&1; then\n        echo \"default_hook: system is using NetworkManager; restarting...\"\n        systemctl restart NetworkManager\n    elif systemctl status networking > /dev/null 2>&1; then\n        echo \"default_hook: system is using networking service; restarting...\"\n        systemctl restart networking\n    elif systemctl status network > /dev/null 2>&1; then\n        echo \"default_hook: system is using network service; restarting...\"\n        systemctl restart network\n    else\n        echo \"default_hook: Networking backend not detected, network configuration may fail\"\n    fi\n}\n\nif which systemctl > /dev/null; then\n    echo \"default_hook: using systemctl...\"\nelse\n    echo \"default_hook: Init system not supported, network configuration may fail\"\n    exit 1\nfi\n\nif [ \"$HOOK\" = \"BEFORE_VPP_RUN\" ]; then\n    fix_dns\nelif [ \"$HOOK\" = \"VPP_RUNNING\" ]; then\n    restart_network\nelif [ \"$HOOK\" = \"VPP_DONE_OK\" ]; then\n    undo_dns_fix\n    restart_network\nelif [ \"$HOOK\" = \"VPP_ERRORED\" ]; then\n    undo_dns_fix\n    restart_network\nfi\n\nEOSCRIPT\n"
time="2023-11-07T13:51:52Z" level=info msg="Waiting for VPP... [0/10]" component=vpp-api
time="2023-11-07T13:51:54Z" level=info msg="Waiting for VPP... [1/10]" component=vpp-api
time="2023-11-07T13:51:56Z" level=info msg="Waiting for VPP... [2/10]" component=vpp-api
time="2023-11-07T13:51:58Z" level=info msg="Waiting for VPP... [3/10]" component=vpp-api
time="2023-11-07T13:52:00Z" level=info msg="Waiting for VPP... [4/10]" component=vpp-api
time="2023-11-07T13:52:02Z" level=warning msg="Waiting for VPP... [5/10] cannot connect to VPP on socket /var/run/vpp/vpp-api.sock: VPP API socket file /var/run/vpp/vpp-api.sock does not exist" component=vpp-api
time="2023-11-07T13:52:04Z" level=warning msg="Waiting for VPP... [6/10] cannot connect to VPP on socket /var/run/vpp/vpp-api.sock: VPP API socket file /var/run/vpp/vpp-api.sock does not exist" component=vpp-api
time="2023-11-07T13:52:06Z" level=warning msg="Waiting for VPP... [7/10] cannot connect to VPP on socket /var/run/vpp/vpp-api.sock: VPP API socket file /var/run/vpp/vpp-api.sock does not exist" component=vpp-api
time="2023-11-07T13:52:08Z" level=warning msg="Waiting for VPP... [8/10] cannot connect to VPP on socket /var/run/vpp/vpp-api.sock: VPP API socket file /var/run/vpp/vpp-api.sock does not exist" component=vpp-api
time="2023-11-07T13:52:10Z" level=warning msg="Waiting for VPP... [9/10] cannot connect to VPP on socket /var/run/vpp/vpp-api.sock: VPP API socket file /var/run/vpp/vpp-api.sock does not exist" component=vpp-api
time="2023-11-07T13:52:12Z" level=fatal msg="Cannot create VPP client: Cannot connect to VPP after 10 tries"

Any help would be greatly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant