Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rke etcd snapshot-restore fail on "no such file or directory" when snapshot dir is a symlink #1280

Open
Amos-85 opened this issue Apr 10, 2019 · 5 comments

Comments

@Amos-85
Copy link

Amos-85 commented Apr 10, 2019

RKE version: 0.2.1

Docker version:
Client:
Version: 18.09.0
API version: 1.39
Go version: go1.10.4
Git commit: 4d60db4
Built: Wed Nov 7 00:48:22 2018
OS/Arch: linux/amd64
Experimental: false

Server: Docker Engine - Community
Engine:
Version: 18.09.3
API version: 1.39 (minimum version 1.12)
Go version: go1.10.8
Git commit: 774a1f4
Built: Thu Feb 28 06:02:24 2019
OS/Arch: linux/amd64
Experimental: false

Operating system and kernel: RHEL 7.6 ,kernel 3.10.0-957.5.1.el7.x86_64

Type/provider of hosts: Bare-metal / Dell R730 XD

cluster.yml file:

nodes:

  • address: example1.example.com
    user: user # root user (usually 'root')
    role: [controlplane,etcd,worker] # K8s roles for node
    ssh_key_path: ./id_rsa.pem # path to PEM file
  • address: example2.example.com
    user: user
    role: [controlplane,etcd,worker]
    ssh_key_path: ./id_rsa.pem
  • address: example3.example.com
    user: user
    role: [controlplane,etcd,worker]
    ssh_key_path: ./id_rsa.pem
  • address: example4.example.com
    user: user
    role: [controlplane,etcd,worker]
    ssh_key_path: ./id_rsa.pem
  • address: example5.example.com
    user: user
    role: [controlplane,etcd,worker]
    ssh_key_path: ./id_rsa.pem
  • address: example6.example.com
    user: user
    role: [worker]
    ssh_key_path: ./id_rsa.pem
  • address: example7.example.com
    user: user
    role: [worker]
    ssh_key_path: ./id_rsa.pem
  • address: example8.example.com
    user: user
    role: [worker]
    ssh_key_path: ./id_rsa.pem

ignore_docker_version: true

private_registries:

  • url: myregistry.example.com:5001
    user: registry_user
    password: ***
    is_default: true

services:
kube-api:
extra_args:
feature-gates: "ExpandPersistentVolumes=true,ExpandInUsePersistentVolumes=true"
enable-admission-plugins: "PersistentVolumeClaimResize"
kube-controller:
extra_args:
feature-gates: "ExpandPersistentVolumes=true,ExpandInUsePersistentVolumes=true"
horizontal-pod-autoscaler-downscale-delay: "5m0s"
horizontal-pod-autoscaler-upscale-delay: "1m0s"
horizontal-pod-autoscaler-sync-period: "30s"

ingress:
provider: nginx
extra_args:
proxy-connect-timeout: 60

etcd:
snapshot: false # enables recurring etcd snapshots
creation: 6h0s # time increment between snapshots
retention: 24h # time increment before snapshot purge

kubelet:
extra_args:
network-plugin-mtu: 9000
read-only-port: 10255
extra_env:
- "HTTP_PROXY=http://myproxy.com:80"
- "HTTPS_PROXY=http://myproxy.com:80"
- "NO_PROXY=localhost,127.0.0.1,10.42.0.0/16,10.43.0.0/16,*.example.com"

addon_job_timeout: 40

system_images:
kubernetes: rancher/hyperkube:v1.13.5-rancher1
kubernetes_services_sidecar: rancher/rke-tools:v0.1.27
alpine: rancher/rke-tools:v0.1.27
cert_downloader: rancher/rke-tools:v0.1.27
etcd: rancher/coreos-etcd:v3.2.24
kubedns: rancher/k8s-dns-kube-dns-amd64:1.14.13
dnsmasq: rancher/k8s-dns-dnsmasq-nanny-amd64:1.14.13
kubedns_sidecar: rancher/k8s-dns-sidecar-amd64:1.14.13
kubedns_autoscaler: rancher/cluster-proportional-autoscaler-amd64:1.0.0
nginx_proxy: rancher/rke-tools:v0.1.27
pod_infra_container: rancher/pause-amd64:3.1

flannel: rancher/coreos-flannel:v0.10.0
flannel_cni: rancher/coreos-flannel-cni:v0.3.0

calico_node: rancher/calico-node:v3.1.3
calico_cni: rancher/calico-cni:v3.1.3
calico_ctl: rancher/calico-ctl:v2.0.0

canal_node: rancher/calico-node:v3.4.0
canal_cni: rancher/calico-cni:v3.4.0
canal_flannel: rancher/coreos-flannel:v0.10.0

ingress: rancher/nginx-ingress-controller:0.21.0-rancher3
ingress_backend: rancher/nginx-ingress-controller-defaultbackend:1.4

authentication:
strategy: x509
sans:
- "myk8surl.example.com"

Steps to Reproduce:
running rke restore snapshot:

rke etcd snapshot-restore --name my_snapshot --config ./cluster.yml

Results:
FATA[0055] [etcd] Failed to restore etcd snapshot: Failed to run etcd restore container, exit status is: 3, container logs: 2019-04-10 07:22:11.908648 I | pkg/netutil: resolving example1.example.com:2380 to 10.235.38.8:2380
2019-04-10 07:22:11.908729 I | pkg/netutil: resolving example1.example.com:2380 to 10.235.38.8:2380
Error: open /opt/rke/etcd-snapshots/my_snapshot: no such file or directory

the snapshot file is exist /opt/rke/etcd-snapshots/
currently etcd lost all the configuration.

any chance it is relate to this issue?
#768

@Amos-85
Copy link
Author

Amos-85 commented Apr 10, 2019

I found the issue,
it was relate to the symbolic link I use for etcd-snapshot directory
seems the etcd-restore doesn't succeed to mount the symblic link I use and there was no exception about it,
after change /opt/rke/etcd-snapshots to a real directory with all of the content ,
the restore was passed

@Amos-85 Amos-85 closed this as completed Apr 10, 2019
@superseb superseb changed the title rke etcd snapshot-restore fail on "no such file or directory" rke etcd snapshot-restore fail on "no such file or directory" when snapshot dir is a symlink Apr 10, 2019
@superseb superseb reopened this Apr 10, 2019
@superseb
Copy link
Contributor

Link should be followed or a notification has to be shown.

@twecker137
Copy link

Since there are probably hardly any possibilities to follow a symlink from the corresponding restore container, it would be appropriate if "extra_binds" could be configured in cluster.yml for the snapshot containers, or even better if the section services/etcd/extra_binds would also apply to the corresponding functions:

(etcd.go) func RestoreEtcdSnapshot()
(etcd.go) func GetEtcdSnapshotChecksum()
(...)

About my background: I encountered the same problem, as we use RKE on premise and want to store the etcd snapshots from all servers in a separate directory on an NFS share.
The following structure was planned:

/opt/rke/etcd
|_ etcd-snapshots -> /custom/k8s/etcd-snapshots-nfs/server-name
/custom/k8s/etcd-snapshots-nfs/ (nfs mount for all servers)

For e.g. I had to add an extra bind to the etcd-checksum-checker Container:
-v /opt/rke:/opt/rke:z -v /custom/k8s/etcd-snapshots-nfs:/custom/k8s/etcd-snapshots-nfs:z

@Amos-85
Copy link
Author

Amos-85 commented Feb 14, 2020

@pkey1337 / @superseb
Making etcd recurring snapshots path location configurable would be nice solution through cluster.yml file.
👍

@maggieliu maggieliu modified the milestones: v1.2 - Rancher v2.5, v1.1.x - Rancher v2.4.x Jul 27, 2020
@deniseschannon deniseschannon modified the milestones: v1.1.x - Rancher v2.4.x, v1.1 - Backlog - Rancher v2.4 - Backlog Jan 29, 2021
@superseb superseb removed their assignment Oct 25, 2022
@superseb superseb removed this from the RKE v1.x - Backlog - Rancher v2.x milestone Sep 19, 2023
@AlissonLorscheiterTR
Copy link

@pkey1337 / @superseb Making etcd recurring snapshots path location configurable would be nice solution through cluster.yml file. 👍
Any news about this issue ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants