Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with node with persistent data-dir #1

Open
mdurell opened this issue May 1, 2016 · 1 comment
Open

Issues with node with persistent data-dir #1

mdurell opened this issue May 1, 2016 · 1 comment

Comments

@mdurell
Copy link

mdurell commented May 1, 2016

This works great as long as all nodes are using ephemeral storage for --data-dir however n the case where this storage is persistent but the IP may not be the node with persistent storage will be evicted from the cluster when its IP changes. This is because etcd2 will not utilize the discovery service if it finds a valid wal file

To show this I setup 3 VMs. Two of these VMs were PXE booted with a RAM disk and the third booted from local disk. This third VM would persist /var/lib/etcd2. Once the cluster was setup I started rebooting the PXE nodes and confirmed that elastic-etcd was properly managing the noes within the discovery service. The non-PXE node was rebooted and would rejoin the cluster with no issue.

Next I reset the MAC address on the NIC in the non-PXE VM to force a new IP. Upon boot it would join the cluster with the new IP address but elastic-etcd did not update the discovery service. After this I setup a 4th PXE VM and booted it into the cluster. elastic-etcd detected only 2 of the 3 nodes from the discovery service and setup etcd2 to join the cluster as if there was no third node.

Upon joining the cluster the VM with the persistent storage was voted off of the island.

I feel that elastic-etcd should query live cluster members from the discovery service for nodes that are up but may not be listed in the discovery service and update the discovery service accordingly. I'm currently investigating how to query the up nodes in the discovery service to determine how/if this is possible.

For completeness here is my cloud-init:

#cloud-config
manage_etc_hosts: "localhost"
ssh_authorized_keys:
  - ssh-rsa AAAAB3...
coreos:
  update:
    reboot-strategy: best-effort
  etcd2:
    advertise-client-urls: http://$public_ipv4:2379
    listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001
    listen-peer-urls: http://$public_ipv4:2380,http://$public_ipv4:7001
  units:
    - name: fleet.service
      command: start
    - name: elastic-etcd.service
      command: start
      content: |
        [Unit]
        Description=Elastic etcd

        [Service]
        Type=oneshot
        ExecStartPre=/usr/bin/mkdir -p /var/lib/elastic-etcd
        ExecStartPre=/usr/bin/curl -L -o /var/lib/elastic-etcd/elastic-etcd https://github.com/sttts/elastic-etcd/releases/download/v0.0.10/elastic-etcd
        ExecStartPre=/usr/bin/chmod +x /var/lib/elastic-etcd/elastic-etcd
        ExecStartPre=/usr/bin/mkdir -p /run/systemd/system/etcd2.service.d
        ExecStart=/bin/sh -c "/var/lib/elastic-etcd/elastic-etcd -o dropin --data-dir=/var/lib/etcd2 --join-strategy prune --initial-advertise-peer-urls=http://$public_ipv4:2380 --name=$(uuidgen) --discovery=https://discovery.etcd.io/74df14b61f0a2de84227440a30f6156f --v=6 --logtostderr > /run/systemd/system/etcd2.service.d/99-elastic-etcd.conf"
        ExecStartPost=/usr/bin/systemctl daemon-reload
    - name: etcd2.service
      command: start
@mdurell
Copy link
Author

mdurell commented May 1, 2016

The current member list can be enumerated from the members API documented here.

I'm thinking that each node in the discovery API should be queried against to ensure that:

1.All nodes that are 'up' agree on the members list; if not, throw an error else continue normally.
2. That the member list from the discovery API matches the authoritative list from above; if not, update the discovery API with the authoritative member list from the cluster members and continue execution normally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant