Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong private network interface fails validation during k0s upgrade #651

Open
waahhhh opened this issue Feb 5, 2024 · 4 comments
Open
Labels
bug Something isn't working

Comments

@waahhhh
Copy link

waahhhh commented Feb 5, 2024

My nodes only have 1 private network interface (loopback excluded), therefore I performed the installation automatically.
Now I wanted to upgrade k0s to a newer version and got a validation error.

INFO ==> Running phase: Download k0s on hosts
INFO [OpenSSH] waahhhh-earth: downloading k0s v1.29.1+k0s.0
INFO [OpenSSH] waahhhh-mercury: downloading k0s v1.29.1+k0s.0
INFO [OpenSSH] waahhhh-earth: validating configuration
INFO ==> Apply failed

Therefore I validated the schema with the current and new k0s version. Everything is valid.
The log contains the following message:

time="04 Feb 24 23:49 CET" level=debug msg="[OpenSSH] waahhhh-earth: (stderr) Error: spec: api: address: Invalid value: \"waahhhh-earth\": invalid IP address"

The configuration section looks as follows and has not changed.

apiVersion: k0sctl.k0sproject.io/v1beta1
kind: Cluster
metadata:
  name: universe
spec:
  hosts:
    - role: controller+worker
      openSSH:
        address: waahhhh-earth
    - role: worker
      openSSH:
        address: waahhhh-mercury
  k0s:
    version: v1.29.1+k0s.1
    dynamicConfig: false
    config:
      apiVersion: k0s.k0sproject.io/v1beta1
      kind: ClusterConfig
      metadata:
        name: k0s
      spec:
        api:
          externalAddress: XX.XX.XX.XX # valid IPv4
          k0sApiPort: 9443
          port: 6443
        installConfig: ...

At first I thought that the problem is related to the openSSH definition, because the address (waahhhh-earth) is not specified in the hostname or in the rest of the configuration.

Only after reading the output several times I realized that the wrong network interface was used.

INFO [OpenSSH] waahhhh-earth: discovered tun as private interface
INFO [OpenSSH] waahhhh-mercury: discovered tun as private interface

The interface (tun) was set up by the Kubernetes cluster and k0sctl now considers it to be the right network.
The problem could be solved with the following adjustment:

hosts:
  - role: controller+worker
    privateInterface: eth0
    openSSH:
      address: waahhhh-earth
  - role: worker
    privateInterface: eth0
    openSSH:
      address: waahhhh-mercury

If you read the entire output, it's very confusing at first.
Maybe we could make some improvements here so that other users don't face the same problem.

This is what the complete output looks like, in which the actual problem is hiding as a valid info message.

$ k0sctl apply --config resources/k0s/k0sctl.yaml

⠀⣿⣿⡇⠀⠀⢀⣴⣾⣿⠟⠁⢸⣿⣿⣿⣿⣿⣿⣿⡿⠛⠁⠀⢸⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⠀█████████ █████████ ███
⠀⣿⣿⡇⣠⣶⣿⡿⠋⠀⠀⠀⢸⣿⡇⠀⠀⠀⣠⠀⠀⢀⣠⡆⢸⣿⣿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀███          ███    ███
⠀⣿⣿⣿⣿⣟⠋⠀⠀⠀⠀⠀⢸⣿⡇⠀⢰⣾⣿⠀⠀⣿⣿⡇⢸⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⠀███          ███    ███
⠀⣿⣿⡏⠻⣿⣷⣤⡀⠀⠀⠀⠸⠛⠁⠀⠸⠋⠁⠀⠀⣿⣿⡇⠈⠉⠉⠉⠉⠉⠉⠉⠉⢹⣿⣿⠀███          ███    ███
⠀⣿⣿⡇⠀⠀⠙⢿⣿⣦⣀⠀⠀⠀⣠⣶⣶⣶⣶⣶⣶⣿⣿⡇⢰⣶⣶⣶⣶⣶⣶⣶⣶⣾⣿⣿⠀█████████    ███    ██████████
k0sctl v0.17.4 Copyright 2023, k0sctl authors.
Anonymized telemetry of usage will be sent to the authors.
By continuing to use k0sctl you agree to these terms:
https://k0sproject.io/licenses/eula
INFO ==> Running phase: Connect to hosts
INFO [OpenSSH] waahhhh-earth: connected
INFO [OpenSSH] waahhhh-mercury: connected
INFO ==> Running phase: Detect host operating systems
INFO [OpenSSH] waahhhh-mercury: is running Ubuntu 22.04.3 LTS
INFO [OpenSSH] waahhhh-earth: is running Ubuntu 22.04.3 LTS
INFO ==> Running phase: Acquire exclusive host lock
INFO ==> Running phase: Prepare hosts
INFO ==> Running phase: Gather host facts
INFO [OpenSSH] waahhhh-earth: using earth as hostname
INFO [OpenSSH] waahhhh-mercury: using mercury as hostname
INFO [OpenSSH] waahhhh-earth: discovered tun as private interface
INFO [OpenSSH] waahhhh-mercury: discovered tun as private interface
INFO ==> Running phase: Validate hosts
INFO ==> Running phase: Gather k0s facts
INFO [OpenSSH] waahhhh-earth: found existing configuration
INFO [OpenSSH] waahhhh-earth: is running k0s controller+worker version v1.28.4+k0s.0
WARN [OpenSSH] waahhhh-earth: the controller+worker node will not schedule regular workloads without toleration for node-role.kubernetes.io/master:NoSchedule unless 'noTaints: true' is set
WARN [OpenSSH] waahhhh-earth: k0s will be upgraded
INFO [OpenSSH] waahhhh-mercury: is running k0s worker version v1.28.4+k0s.0
WARN [OpenSSH] waahhhh-mercury: k0s will be upgraded
INFO [OpenSSH] waahhhh-earth: checking if worker mercury has joined
INFO ==> Running phase: Validate facts
INFO ==> Running phase: Download k0s on hosts
INFO [OpenSSH] waahhhh-earth: downloading k0s v1.29.1+k0s.0
INFO [OpenSSH] waahhhh-mercury: downloading k0s v1.29.1+k0s.0
INFO [OpenSSH] waahhhh-earth: validating configuration
INFO ==> Apply failed

The current network interfaces are:

$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether XX:XX:XX:XX:XX:XX brd ff:ff:ff:ff:ff:ff
    altname enp0s3
    altname ens3
    inet XX.XX.XX.XX/22 brd XX.XX.XX.XX scope global eth0
       valid_lft forever preferred_lft forever
    inet6 XX:XX:XX:XX::1/64 scope global
       valid_lft forever preferred_lft forever
    inet6 XX::XX:XX:XX:XX/64 scope link
       valid_lft forever preferred_lft forever
3: kube-bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether XX:XX:XX:XX:XX:XX brd ff:ff:ff:ff:ff:ff
    inet 10.244.0.1/24 brd 10.244.0.255 scope global kube-bridge
       valid_lft forever preferred_lft forever
    inet6 XX::XX:XX:XX:XX/64 scope link
       valid_lft forever preferred_lft forever
5: veth4c8aee89@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master kube-bridge state UP group default
    link/ether XX:XX:XX:XX:XX:XX brd ff:ff:ff:ff:ff:ff link-netns cni-XX-XX-XX-XX-XX
    inet6 XX::XX:XX:XX:XX/64 scope link
       valid_lft forever preferred_lft forever
...
9: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
10: tun-852356528@eth0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip XX.XX.XX.XX peer XX.XX.XX.XX
    inet6 XX::XX:XX:XX:XX/64 scope link
       valid_lft forever preferred_lft forever
@kke kke added the bug Something isn't working label Feb 6, 2024
@kke
Copy link
Contributor

kke commented Feb 6, 2024

(ip route list scope global | grep -E "\b(172|10|192\.168)\.") || (ip route list | grep -m1 default)

This is the command k0sctl uses to detect the private interface. What does the output of that look like?

@waahhhh
Copy link
Author

waahhhh commented Feb 7, 2024

$ echo $((ip route list scope global | grep -E "\b(172|10|192\.168)\.") || (ip route list | grep -m1 default))
10.244.1.0/24 dev tun-852356528 proto 17 src 89.XX.XX.XX
$ echo $(ip route list scope global | grep -E "\b(172|10|192\.168)\.")
10.244.1.0/24 dev tun-852356528 proto 17 src 89.XX.XX.XX
$ echo $(ip route list | grep -m1 default)
default via 89.XX.XX.XX dev eth0 proto static

@kke
Copy link
Contributor

kke commented Feb 12, 2024

So, I think you don't have a private interface and it picks up eth0 as the fallback but only on the first round 🤔

@d3adb5
Copy link

d3adb5 commented Mar 6, 2024

Facing the same issue, is there a workaround while it isn't fixed?

EDIT: Workaround for me was to use IP addresses instead of hostnames in the list of hosts. Not ideal to me, I'd rather use domains, but this works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants