-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rke etcd snapshot-restore fail on "no such file or directory" when snapshot dir is a symlink #1280
Comments
I found the issue, |
Link should be followed or a notification has to be shown. |
Since there are probably hardly any possibilities to follow a symlink from the corresponding restore container, it would be appropriate if "extra_binds" could be configured in cluster.yml for the snapshot containers, or even better if the section services/etcd/extra_binds would also apply to the corresponding functions: (etcd.go) func RestoreEtcdSnapshot()
(etcd.go) func GetEtcdSnapshotChecksum()
(...) About my background: I encountered the same problem, as we use RKE on premise and want to store the etcd snapshots from all servers in a separate directory on an NFS share.
For e.g. I had to add an extra bind to the etcd-checksum-checker Container: |
@pkey1337 / @superseb |
|
RKE version: 0.2.1
Docker version:
Client:
Version: 18.09.0
API version: 1.39
Go version: go1.10.4
Git commit: 4d60db4
Built: Wed Nov 7 00:48:22 2018
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.3
API version: 1.39 (minimum version 1.12)
Go version: go1.10.8
Git commit: 774a1f4
Built: Thu Feb 28 06:02:24 2019
OS/Arch: linux/amd64
Experimental: false
Operating system and kernel: RHEL 7.6 ,kernel 3.10.0-957.5.1.el7.x86_64
Type/provider of hosts: Bare-metal / Dell R730 XD
cluster.yml file:
nodes:
user: user # root user (usually 'root')
role: [controlplane,etcd,worker] # K8s roles for node
ssh_key_path: ./id_rsa.pem # path to PEM file
user: user
role: [controlplane,etcd,worker]
ssh_key_path: ./id_rsa.pem
user: user
role: [controlplane,etcd,worker]
ssh_key_path: ./id_rsa.pem
user: user
role: [controlplane,etcd,worker]
ssh_key_path: ./id_rsa.pem
user: user
role: [controlplane,etcd,worker]
ssh_key_path: ./id_rsa.pem
user: user
role: [worker]
ssh_key_path: ./id_rsa.pem
user: user
role: [worker]
ssh_key_path: ./id_rsa.pem
user: user
role: [worker]
ssh_key_path: ./id_rsa.pem
ignore_docker_version: true
private_registries:
user: registry_user
password: ***
is_default: true
services:
kube-api:
extra_args:
feature-gates: "ExpandPersistentVolumes=true,ExpandInUsePersistentVolumes=true"
enable-admission-plugins: "PersistentVolumeClaimResize"
kube-controller:
extra_args:
feature-gates: "ExpandPersistentVolumes=true,ExpandInUsePersistentVolumes=true"
horizontal-pod-autoscaler-downscale-delay: "5m0s"
horizontal-pod-autoscaler-upscale-delay: "1m0s"
horizontal-pod-autoscaler-sync-period: "30s"
ingress:
provider: nginx
extra_args:
proxy-connect-timeout: 60
etcd:
snapshot: false # enables recurring etcd snapshots
creation: 6h0s # time increment between snapshots
retention: 24h # time increment before snapshot purge
kubelet:
extra_args:
network-plugin-mtu: 9000
read-only-port: 10255
extra_env:
- "HTTP_PROXY=http://myproxy.com:80"
- "HTTPS_PROXY=http://myproxy.com:80"
- "NO_PROXY=localhost,127.0.0.1,10.42.0.0/16,10.43.0.0/16,*.example.com"
addon_job_timeout: 40
system_images:
kubernetes: rancher/hyperkube:v1.13.5-rancher1
kubernetes_services_sidecar: rancher/rke-tools:v0.1.27
alpine: rancher/rke-tools:v0.1.27
cert_downloader: rancher/rke-tools:v0.1.27
etcd: rancher/coreos-etcd:v3.2.24
kubedns: rancher/k8s-dns-kube-dns-amd64:1.14.13
dnsmasq: rancher/k8s-dns-dnsmasq-nanny-amd64:1.14.13
kubedns_sidecar: rancher/k8s-dns-sidecar-amd64:1.14.13
kubedns_autoscaler: rancher/cluster-proportional-autoscaler-amd64:1.0.0
nginx_proxy: rancher/rke-tools:v0.1.27
pod_infra_container: rancher/pause-amd64:3.1
flannel: rancher/coreos-flannel:v0.10.0
flannel_cni: rancher/coreos-flannel-cni:v0.3.0
calico_node: rancher/calico-node:v3.1.3
calico_cni: rancher/calico-cni:v3.1.3
calico_ctl: rancher/calico-ctl:v2.0.0
canal_node: rancher/calico-node:v3.4.0
canal_cni: rancher/calico-cni:v3.4.0
canal_flannel: rancher/coreos-flannel:v0.10.0
ingress: rancher/nginx-ingress-controller:0.21.0-rancher3
ingress_backend: rancher/nginx-ingress-controller-defaultbackend:1.4
authentication:
strategy: x509
sans:
- "myk8surl.example.com"
Steps to Reproduce:
running rke restore snapshot:
rke etcd snapshot-restore --name my_snapshot --config ./cluster.yml
Results:
FATA[0055] [etcd] Failed to restore etcd snapshot: Failed to run etcd restore container, exit status is: 3, container logs: 2019-04-10 07:22:11.908648 I | pkg/netutil: resolving example1.example.com:2380 to 10.235.38.8:2380
2019-04-10 07:22:11.908729 I | pkg/netutil: resolving example1.example.com:2380 to 10.235.38.8:2380
Error: open /opt/rke/etcd-snapshots/my_snapshot: no such file or directory
the snapshot file is exist /opt/rke/etcd-snapshots/
currently etcd lost all the configuration.
any chance it is relate to this issue?
#768
The text was updated successfully, but these errors were encountered: