Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not start kubelet due to bind mount failure 'jailing process inside rootfs caused \\\"pivot_root invalid argument\\\"\"": unknown' #1186

Open
rajha-korithrien opened this issue Mar 8, 2019 · 1 comment

Comments

@rajha-korithrien
Copy link

rajha-korithrien commented Mar 8, 2019

RKE version:
rke version v0.1.15
Docker version: (docker version,docker info preferred)

Containers: 24
 Running: 21
 Paused: 0
 Stopped: 3
Images: 10
Server Version: 18.03.1-ce
Storage Driver: zfs
 Zpool: docker-zpool
 Zpool Health: ONLINE
 Parent Dataset: docker-zpool/docker-containers
 Space Used By Parent: 27136
 Space Available: 107374155264
 Parent Quota: 107374182400
 Compression: lz4
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.14.73-rancher
Operating System: RancherOS v1.4.2
OSType: linux
Architecture: x86_64
CPUs: 40
Total Memory: 503.9GiB
Name: argus-a-p.argus.array
ID: TWZY:AHBZ:4OYC:NJAO:BYCH:H7BG:OERD:QIZP:DZ42:YCLC:FOYD:M4KT
Docker Root Dir: /mnt/docker-zpool/docker-containers
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Operating system and kernel: (cat /etc/os-release, uname -r preferred)
4.14.73-rancher
Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)
Bare-metal
cluster.yml file:

cluster_name: rancher
ignore_docker_version: true

nodes:
  - address: 192.168.20.20
    internal_address: 192.168.21.20
    user: rancher
    role: [controlplane,worker,etcd]
    ssh_key_path: ~/.ssh/id_rsa
  - address: 192.168.20.21
    internal_address: 192.168.21.21
    user: rancher
    role: [controlplane,worker,etcd]
    ssh_key_path: ~/.ssh/id_rsa
  - address: 192.168.20.22
    internal_address: 192.168.21.22
    user: rancher
    role: [controlplane,worker,etcd]
    ssh_key_path: ~/.ssh/id_rsa

services:
  etcd:
    snapshot: true
    creation: 6h
    retention: 24h
  kubelet:
    extra_binds:
       - /mnt:/mnt:rshared

Steps to Reproduce:
rke up --ssh-agent-auth --config ./cluster.yml
Results:

INFO[0000] Building Kubernetes cluster                  
INFO[0000] [dialer] Setup tunnel for host [192.168.20.20] 
WARN[0000] Unsupported Docker version found [18.03.1-ce], supported versions are [1.11.x 1.12.x 1.13.x 17.03.x] 
INFO[0000] [dialer] Setup tunnel for host [192.168.20.21] 
WARN[0001] Unsupported Docker version found [18.03.1-ce], supported versions are [1.11.x 1.12.x 1.13.x 17.03.x] 
INFO[0001] [dialer] Setup tunnel for host [192.168.20.22] 
WARN[0002] Unsupported Docker version found [18.03.1-ce], supported versions are [1.11.x 1.12.x 1.13.x 17.03.x] 
INFO[0002] [state] Found local kube config file, trying to get state from cluster 
INFO[0002] [state] Fetching cluster state from Kubernetes 
INFO[0002] [state] Successfully Fetched cluster state to Kubernetes ConfigMap: cluster-state 
INFO[0002] [certificates] Getting Cluster certificates from Kubernetes 
INFO[0002] [certificates] Successfully fetched Cluster certificates from Kubernetes 
INFO[0002] [network] No hosts added existing cluster, skipping port check 
INFO[0002] [reconcile] Reconciling cluster state        
INFO[0002] [reconcile] Check etcd hosts to be deleted   
INFO[0002] [reconcile] Check etcd hosts to be added     
INFO[0002] [reconcile] Rebuilding and updating local kube config 
INFO[0002] Successfully Deployed local admin kubeconfig at [./kube_config_rancher-cluster.yml] 
INFO[0002] [reconcile] host [192.168.20.20] is active master on the cluster 
INFO[0002] [reconcile] Reconciled cluster state successfully 
INFO[0002] [certificates] Deploying kubernetes certificates to Cluster nodes 
INFO[0016] Successfully Deployed local admin kubeconfig at [./kube_config_rancher-cluster.yml] 
INFO[0016] [certificates] Successfully deployed kubernetes certificates to Cluster nodes 
INFO[0016] Pre-pulling kubernetes images                
INFO[0016] Kubernetes images pulled successfully        
INFO[0016] [etcd] Building up etcd plane..              
INFO[0016] [etcd] Saving snapshot [etcd-rolling-snapshots] on host [192.168.20.20] 
INFO[0027] [certificates] Successfully started [rke-bundle-cert] container on host [192.168.20.20] 
INFO[0028] [certificates] successfully saved certificate bundle [/opt/rke/etcd-snapshots//pki.bundle.tar.gz] on host [192.168.20.20] 
INFO[0036] [etcd] Successfully started [rke-log-linker] container on host [192.168.20.20] 
INFO[0038] [remove/rke-log-linker] Successfully removed container on host [192.168.20.20] 
INFO[0038] [etcd] Saving snapshot [etcd-rolling-snapshots] on host [192.168.20.21] 
INFO[0050] [certificates] Successfully started [rke-bundle-cert] container on host [192.168.20.21] 
INFO[0050] [certificates] successfully saved certificate bundle [/opt/rke/etcd-snapshots//pki.bundle.tar.gz] on host [192.168.20.21] 
INFO[0058] [etcd] Successfully started [rke-log-linker] container on host [192.168.20.21] 
INFO[0059] [remove/rke-log-linker] Successfully removed container on host [192.168.20.21] 
INFO[0059] [etcd] Saving snapshot [etcd-rolling-snapshots] on host [192.168.20.22] 
INFO[0071] [certificates] Successfully started [rke-bundle-cert] container on host [192.168.20.22] 
INFO[0072] [certificates] successfully saved certificate bundle [/opt/rke/etcd-snapshots//pki.bundle.tar.gz] on host [192.168.20.22] 
INFO[0080] [etcd] Successfully started [rke-log-linker] container on host [192.168.20.22] 
INFO[0081] [remove/rke-log-linker] Successfully removed container on host [192.168.20.22] 
INFO[0081] [etcd] Successfully started etcd plane..     
INFO[0081] [controlplane] Building up Controller Plane.. 
INFO[0083] [remove/service-sidekick] Successfully removed container on host [192.168.20.21] 
INFO[0083] [remove/service-sidekick] Successfully removed container on host [192.168.20.20] 
INFO[0083] [remove/service-sidekick] Successfully removed container on host [192.168.20.22] 
INFO[0088] [healthcheck] Start Healthcheck on service [kube-apiserver] on host [192.168.20.21] 
INFO[0089] [healthcheck] service [kube-apiserver] on host [192.168.20.21] is healthy 
INFO[0089] [healthcheck] Start Healthcheck on service [kube-apiserver] on host [192.168.20.20] 
INFO[0089] [healthcheck] Start Healthcheck on service [kube-apiserver] on host [192.168.20.22] 
INFO[0089] [healthcheck] service [kube-apiserver] on host [192.168.20.20] is healthy 
INFO[0089] [healthcheck] service [kube-apiserver] on host [192.168.20.22] is healthy 
INFO[0095] [controlplane] Successfully started [rke-log-linker] container on host [192.168.20.21] 
INFO[0096] [controlplane] Successfully started [rke-log-linker] container on host [192.168.20.22] 
INFO[0096] [controlplane] Successfully started [rke-log-linker] container on host [192.168.20.20] 
INFO[0097] [remove/rke-log-linker] Successfully removed container on host [192.168.20.21] 
INFO[0097] [healthcheck] Start Healthcheck on service [kube-controller-manager] on host [192.168.20.21] 
INFO[0097] [healthcheck] service [kube-controller-manager] on host [192.168.20.21] is healthy 
INFO[0097] [remove/rke-log-linker] Successfully removed container on host [192.168.20.22] 
INFO[0097] [healthcheck] Start Healthcheck on service [kube-controller-manager] on host [192.168.20.22] 
INFO[0097] [healthcheck] service [kube-controller-manager] on host [192.168.20.22] is healthy 
INFO[0097] [remove/rke-log-linker] Successfully removed container on host [192.168.20.20] 
INFO[0098] [healthcheck] Start Healthcheck on service [kube-controller-manager] on host [192.168.20.20] 
INFO[0098] [healthcheck] service [kube-controller-manager] on host [192.168.20.20] is healthy 
INFO[0103] [controlplane] Successfully started [rke-log-linker] container on host [192.168.20.21] 
INFO[0104] [controlplane] Successfully started [rke-log-linker] container on host [192.168.20.22] 
INFO[0104] [controlplane] Successfully started [rke-log-linker] container on host [192.168.20.20] 
INFO[0105] [remove/rke-log-linker] Successfully removed container on host [192.168.20.21] 
INFO[0105] [healthcheck] Start Healthcheck on service [kube-scheduler] on host [192.168.20.21] 
INFO[0105] [healthcheck] service [kube-scheduler] on host [192.168.20.21] is healthy 
INFO[0106] [remove/rke-log-linker] Successfully removed container on host [192.168.20.22] 
INFO[0106] [healthcheck] Start Healthcheck on service [kube-scheduler] on host [192.168.20.22] 
INFO[0106] [healthcheck] service [kube-scheduler] on host [192.168.20.22] is healthy 
INFO[0106] [remove/rke-log-linker] Successfully removed container on host [192.168.20.20] 
INFO[0106] [healthcheck] Start Healthcheck on service [kube-scheduler] on host [192.168.20.20] 
INFO[0106] [healthcheck] service [kube-scheduler] on host [192.168.20.20] is healthy 
INFO[0111] [controlplane] Successfully started [rke-log-linker] container on host [192.168.20.21] 
INFO[0113] [controlplane] Successfully started [rke-log-linker] container on host [192.168.20.22] 
INFO[0113] [remove/rke-log-linker] Successfully removed container on host [192.168.20.21] 
INFO[0113] [controlplane] Successfully started [rke-log-linker] container on host [192.168.20.20] 
INFO[0114] [remove/rke-log-linker] Successfully removed container on host [192.168.20.22] 
INFO[0115] [remove/rke-log-linker] Successfully removed container on host [192.168.20.20] 
INFO[0115] [controlplane] Successfully started Controller Plane.. 
INFO[0115] [authz] Creating rke-job-deployer ServiceAccount 
INFO[0115] [authz] rke-job-deployer ServiceAccount created successfully 
INFO[0115] [authz] Creating system:node ClusterRoleBinding 
INFO[0115] [authz] system:node ClusterRoleBinding created successfully 
INFO[0115] [certificates] Save kubernetes certificates as secrets 
INFO[0118] [certificates] Successfully saved certificates as kubernetes secret [k8s-certs] 
INFO[0118] [state] Saving cluster state to Kubernetes   
INFO[0118] [state] Successfully Saved cluster state to Kubernetes ConfigMap: cluster-state 
INFO[0118] [state] Saving cluster state to cluster nodes 
INFO[0125] [state] Successfully started [cluster-state-deployer] container on host [192.168.20.20] 
INFO[0127] [remove/cluster-state-deployer] Successfully removed container on host [192.168.20.20] 
INFO[0133] [state] Successfully started [cluster-state-deployer] container on host [192.168.20.21] 
INFO[0135] [remove/cluster-state-deployer] Successfully removed container on host [192.168.20.21] 
INFO[0141] [state] Successfully started [cluster-state-deployer] container on host [192.168.20.22] 
INFO[0143] [remove/cluster-state-deployer] Successfully removed container on host [192.168.20.22] 
INFO[0143] [worker] Building up Worker Plane..          
INFO[0144] [remove/service-sidekick] Successfully removed container on host [192.168.20.21] 
INFO[0144] [remove/service-sidekick] Successfully removed container on host [192.168.20.20] 
INFO[0144] [remove/service-sidekick] Successfully removed container on host [192.168.20.22] 
FATA[0157] [workerPlane] Failed to bring up Worker Plane: Failed to start [kubelet] container on host [192.168.20.21]: Failed to start [kubelet] container on host [192.168.20.21]: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"rootfs_linux.go:109: jailing process inside rootfs caused \\\"pivot_root invalid argument\\\"\"": unknown

The specific issue is the last FATA error:
FATA[0157] [workerPlane] Failed to bring up Worker Plane: Failed to start [kubelet] container on host [192.168.20.21]: Failed to start [kubelet] container on host [192.168.20.21]: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"rootfs_linux.go:109: jailing process inside rootfs caused \\\"pivot_root invalid argument\\\"\"": unknown

I can get the cluster to come up without error if I delete kublet on each node
docker rm kubelet
And then remove the bind mount

  kubelet:
    extra_binds:
       - /mnt:/mnt:rshared

From cluster.yml and re-run the rke up command everything works as expected.

I need to be able to use local volumes and as such the kublet needs to bind mount the host location as per #500, when I do what #500 suggests I get the error found in #316 but I don't have an unattended AMI update (at least I have not done anything to create such)

Further diging with docker inspect kubelet shows the following Binds on a working configuration:

"HostConfig": {
            "Binds": [
                "/opt/rke/etc/kubernetes:/etc/kubernetes:z",
                "/etc/cni:/etc/cni:rw,z",
                "/opt/cni:/opt/cni:rw,z",
                "/opt/rke/var/lib/cni:/var/lib/cni:z",
                "/var/lib/calico:/var/lib/calico:z",
                "/etc/resolv.conf:/etc/resolv.conf",
                "/sys:/sys:rprivate",
                "/mnt/docker-zpool/docker-containers:/mnt/docker-zpool/docker-containers:rw,rslave,z",
                "/opt/rke/var/lib/kubelet:/opt/rke/var/lib/kubelet:shared,z",
                "/var/lib/rancher:/var/lib/rancher:shared,z",
                "/var/run:/var/run:rw,rprivate",
                "/run:/run:rprivate",
                "/opt/rke/etc/ceph:/etc/ceph",
                "/dev:/host/dev:rprivate",
                "/var/log/containers:/var/log/containers:z",
                "/var/log/pods:/var/log/pods:z",
                "/usr:/host/usr:ro",
                "/etc:/host/etc:ro",
                "/var/lib/kubelet/volumeplugins:/var/lib/kubelet/volumeplugins:shared,z"
            ],

I suspect what is happening is that the /mnt:/mnt:rshared is not working because /mnt/docker-zpool/docker-containers has already been processed

@rajha-korithrien
Copy link
Author

I can confirm that the problem is trying to use a ZFS backed docker that mounts the docker directory under /mnt. When following the instructions for setting this up https://rancher.com/docs/os/v1.2/en/storage/using-zfs/

Specifically:

$ sudo ros config set rancher.docker.storage_driver 'zfs'
$ sudo ros config set rancher.docker.graph /mnt/zpool1/docker

Rancher must ensure that kublet gets /mnt/zpool1/docker (in my case /mnt/docker-zpool/docker-containers). This conflicts with adding

kubelet:
    extra_binds:
       - /mnt:/mnt:rshared

The system works as expected if I manually specify the subdirectories in /mnt that I want to use (for example)

kubelet:
    extra_binds:
       - /mnt/kubernetes-a:/mnt/kubernetes-a:rshared
       - /mnt/kubernetes-b:/mnt/kubernetes-b:rshared
       - /mnt/kubernetes-mirror:/mnt/kubernetes-mirror:rshared

@deniseschannon deniseschannon added this to the Backlog milestone Apr 8, 2019
@superseb superseb removed this from the RKE - Unscheduled milestone Jul 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants