rke etcd snapshot-restore fail on "no such file or directory" when snapshot dir is a symlink #1280

Amos-85 · 2019-04-10T07:52:45Z

RKE version: 0.2.1

Docker version:
Client:
Version: 18.09.0
API version: 1.39
Go version: go1.10.4
Git commit: 4d60db4
Built: Wed Nov 7 00:48:22 2018
OS/Arch: linux/amd64
Experimental: false

Server: Docker Engine - Community
Engine:
Version: 18.09.3
API version: 1.39 (minimum version 1.12)
Go version: go1.10.8
Git commit: 774a1f4
Built: Thu Feb 28 06:02:24 2019
OS/Arch: linux/amd64
Experimental: false

Operating system and kernel: RHEL 7.6 ,kernel 3.10.0-957.5.1.el7.x86_64

Type/provider of hosts: Bare-metal / Dell R730 XD

cluster.yml file:

nodes:

address: example1.example.com
user: user # root user (usually 'root')
role: [controlplane,etcd,worker] # K8s roles for node
ssh_key_path: ./id_rsa.pem # path to PEM file
address: example2.example.com
user: user
role: [controlplane,etcd,worker]
ssh_key_path: ./id_rsa.pem
address: example3.example.com
user: user
role: [controlplane,etcd,worker]
ssh_key_path: ./id_rsa.pem
address: example4.example.com
user: user
role: [controlplane,etcd,worker]
ssh_key_path: ./id_rsa.pem
address: example5.example.com
user: user
role: [controlplane,etcd,worker]
ssh_key_path: ./id_rsa.pem
address: example6.example.com
user: user
role: [worker]
ssh_key_path: ./id_rsa.pem
address: example7.example.com
user: user
role: [worker]
ssh_key_path: ./id_rsa.pem
address: example8.example.com
user: user
role: [worker]
ssh_key_path: ./id_rsa.pem

ignore_docker_version: true

private_registries:

url: myregistry.example.com:5001
user: registry_user
password: ***
is_default: true

services:
kube-api:
extra_args:
feature-gates: "ExpandPersistentVolumes=true,ExpandInUsePersistentVolumes=true"
enable-admission-plugins: "PersistentVolumeClaimResize"
kube-controller:
extra_args:
feature-gates: "ExpandPersistentVolumes=true,ExpandInUsePersistentVolumes=true"
horizontal-pod-autoscaler-downscale-delay: "5m0s"
horizontal-pod-autoscaler-upscale-delay: "1m0s"
horizontal-pod-autoscaler-sync-period: "30s"

ingress:
provider: nginx
extra_args:
proxy-connect-timeout: 60

etcd:
snapshot: false # enables recurring etcd snapshots
creation: 6h0s # time increment between snapshots
retention: 24h # time increment before snapshot purge

kubelet:
extra_args:
network-plugin-mtu: 9000
read-only-port: 10255
extra_env:
- "HTTP_PROXY=http://myproxy.com:80"
- "HTTPS_PROXY=http://myproxy.com:80"
- "NO_PROXY=localhost,127.0.0.1,10.42.0.0/16,10.43.0.0/16,*.example.com"

addon_job_timeout: 40

system_images:
kubernetes: rancher/hyperkube:v1.13.5-rancher1
kubernetes_services_sidecar: rancher/rke-tools:v0.1.27
alpine: rancher/rke-tools:v0.1.27
cert_downloader: rancher/rke-tools:v0.1.27
etcd: rancher/coreos-etcd:v3.2.24
kubedns: rancher/k8s-dns-kube-dns-amd64:1.14.13
dnsmasq: rancher/k8s-dns-dnsmasq-nanny-amd64:1.14.13
kubedns_sidecar: rancher/k8s-dns-sidecar-amd64:1.14.13
kubedns_autoscaler: rancher/cluster-proportional-autoscaler-amd64:1.0.0
nginx_proxy: rancher/rke-tools:v0.1.27
pod_infra_container: rancher/pause-amd64:3.1

flannel: rancher/coreos-flannel:v0.10.0
flannel_cni: rancher/coreos-flannel-cni:v0.3.0

calico_node: rancher/calico-node:v3.1.3
calico_cni: rancher/calico-cni:v3.1.3
calico_ctl: rancher/calico-ctl:v2.0.0

canal_node: rancher/calico-node:v3.4.0
canal_cni: rancher/calico-cni:v3.4.0
canal_flannel: rancher/coreos-flannel:v0.10.0

ingress: rancher/nginx-ingress-controller:0.21.0-rancher3
ingress_backend: rancher/nginx-ingress-controller-defaultbackend:1.4

authentication:
strategy: x509
sans:
- "myk8surl.example.com"

Steps to Reproduce:
running rke restore snapshot:

rke etcd snapshot-restore --name my_snapshot --config ./cluster.yml

Results:
FATA[0055] [etcd] Failed to restore etcd snapshot: Failed to run etcd restore container, exit status is: 3, container logs: 2019-04-10 07:22:11.908648 I | pkg/netutil: resolving example1.example.com:2380 to 10.235.38.8:2380
2019-04-10 07:22:11.908729 I | pkg/netutil: resolving example1.example.com:2380 to 10.235.38.8:2380
Error: open /opt/rke/etcd-snapshots/my_snapshot: no such file or directory

the snapshot file is exist /opt/rke/etcd-snapshots/
currently etcd lost all the configuration.

any chance it is relate to this issue?
#768

The text was updated successfully, but these errors were encountered:

Amos-85 · 2019-04-10T10:32:08Z

I found the issue,
it was relate to the symbolic link I use for etcd-snapshot directory
seems the etcd-restore doesn't succeed to mount the symblic link I use and there was no exception about it,
after change /opt/rke/etcd-snapshots to a real directory with all of the content ,
the restore was passed

superseb · 2019-04-10T11:02:09Z

Link should be followed or a notification has to be shown.

twecker137 · 2020-02-14T08:48:17Z

Since there are probably hardly any possibilities to follow a symlink from the corresponding restore container, it would be appropriate if "extra_binds" could be configured in cluster.yml for the snapshot containers, or even better if the section services/etcd/extra_binds would also apply to the corresponding functions:

(etcd.go) func RestoreEtcdSnapshot()
(etcd.go) func GetEtcdSnapshotChecksum()
(...)

About my background: I encountered the same problem, as we use RKE on premise and want to store the etcd snapshots from all servers in a separate directory on an NFS share.
The following structure was planned:

/opt/rke/etcd
|_ etcd-snapshots -> /custom/k8s/etcd-snapshots-nfs/server-name
/custom/k8s/etcd-snapshots-nfs/ (nfs mount for all servers)

For e.g. I had to add an extra bind to the etcd-checksum-checker Container:
-v /opt/rke:/opt/rke:z -v /custom/k8s/etcd-snapshots-nfs:/custom/k8s/etcd-snapshots-nfs:z

Amos-85 · 2020-02-14T12:41:00Z

@pkey1337 / @superseb
Making etcd recurring snapshots path location configurable would be nice solution through cluster.yml file.
👍

AlissonLorscheiterTR · 2024-06-21T19:50:23Z

@pkey1337 / @superseb Making etcd recurring snapshots path location configurable would be nice solution through cluster.yml file. 👍
Any news about this issue ?

Amos-85 closed this as completed Apr 10, 2019

superseb changed the title ~~rke etcd snapshot-restore fail on "no such file or directory"~~ rke etcd snapshot-restore fail on "no such file or directory" when snapshot dir is a symlink Apr 10, 2019

superseb reopened this Apr 10, 2019

superseb added the kind/bug label Apr 10, 2019

deniseschannon added this to the v0.3.0 milestone May 3, 2019

galal-hussein added the priority/1 label Aug 7, 2019

deniseschannon added team/ca labels Aug 9, 2019

deniseschannon assigned superseb Aug 9, 2019

deniseschannon added [zube]: Backlog and removed [zube]: Next Up labels Aug 9, 2019

alena1108 modified the milestones: v0.3.0, v0.3.x - Backlog Aug 27, 2019

superseb added [zube]: Working and removed [zube]: Backlog labels Sep 10, 2019

superseb added [zube]: Next Up and removed [zube]: Working labels Sep 19, 2019

deniseschannon modified the milestones: v0.3.x - Backlog, v1.1.x Oct 8, 2019

deniseschannon added [zube]: Backlog and removed [zube]: Next Up labels Oct 30, 2019

deniseschannon modified the milestones: v1.1 - Rancher v2.4, v1.2 - Rancher v2.5 Feb 11, 2020

deniseschannon added the [zube]: To Triage label Mar 25, 2020

deniseschannon unassigned superseb Mar 25, 2020

maggieliu assigned superseb Jul 13, 2020

maggieliu added [zube]: Team Red Backlog and removed [zube]: To Triage labels Jul 15, 2020

maggieliu modified the milestones: v1.2 - Rancher v2.5, v1.1.x - Rancher v2.4.x Jul 27, 2020

deniseschannon modified the milestones: v1.1.x - Rancher v2.4.x, v1.1 - Backlog - Rancher v2.4 - Backlog Jan 29, 2021

Jono-SUSE-Rancher removed the [zube]: Team Red Backlog label Nov 23, 2021

deniseschannon added the team/area2 label Dec 4, 2021

superseb removed their assignment Oct 25, 2022

superseb removed this from the RKE v1.x - Backlog - Rancher v2.x milestone Sep 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rke etcd snapshot-restore fail on "no such file or directory" when snapshot dir is a symlink #1280

rke etcd snapshot-restore fail on "no such file or directory" when snapshot dir is a symlink #1280

Amos-85 commented Apr 10, 2019 •

edited

Loading

Amos-85 commented Apr 10, 2019

superseb commented Apr 10, 2019

twecker137 commented Feb 14, 2020

Amos-85 commented Feb 14, 2020

AlissonLorscheiterTR commented Jun 21, 2024

rke etcd snapshot-restore fail on "no such file or directory" when snapshot dir is a symlink #1280

rke etcd snapshot-restore fail on "no such file or directory" when snapshot dir is a symlink #1280

Comments

Amos-85 commented Apr 10, 2019 • edited Loading

Amos-85 commented Apr 10, 2019

superseb commented Apr 10, 2019

twecker137 commented Feb 14, 2020

Amos-85 commented Feb 14, 2020

AlissonLorscheiterTR commented Jun 21, 2024

Amos-85 commented Apr 10, 2019 •

edited

Loading