Skip to content
This repository has been archived by the owner on Jan 8, 2019. It is now read-only.

cluster-assign-roles fails with Net:SSH:HostKeyMismatch when rebuilding cluster #1280

Open
cbaenziger opened this issue Sep 27, 2018 · 4 comments
Labels

Comments

@cbaenziger
Copy link
Member

When deleting a cluster and reusing a bootstrap node, one will get an error like the following:

Refreshed chef-vault item ssh_host_keys/iptables-bcpc-vm1.bcpc.example.com
Refreshed chef-vault item ssh_host_keys/iptables-bcpc-vm2.bcpc.example.com
iptables-bcpc-vm2.bcpc.example.com: Cheffing with runlist 'role[Basic],recipe[bcpc::default],recipe[bcpc::networking]'
bundler: failed to load command: ./cluster_assign_roles.rb (./cluster_assign_roles.rb)
Net::SSH::HostKeyMismatch: fingerprint d0:bb:35:f8:c5:15:bb:dc:89:40:67:9c:d9:71:a1:02 does not match for "10.0.109.12"
  /home/vagrant/chef-bcpc/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/verifiers/secure.rb:48:in `process_cache_miss'
  /home/vagrant/chef-bcpc/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/verifiers/secure.rb:33:in `verify'
  /home/vagrant/chef-bcpc/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/verifiers/strict.rb:16:in `verify'
  /home/vagrant/chef-bcpc/ruby/2.4.0/gems/net-ssh-4.2.0/lib/net/ssh/verifiers/lenient.rb:15:in `verify'

Unexpectedly, even if ~/.ssh/known_hosts is deleted, one gets this issue coming back. This is due to the SSH keys being stored in Knife vault and being replaced on the hosts[1] -- which are different than they were after the OS install. We should setup cluster-assign-roles to pre-load the correct host keys into the ssh known_hosts file or to ignore this issue when rebuilding a machine (maybe a rebuild/ignore known hosts flag)?

If not a production environment, one can also run the following to not overwrite the new ssh host key:

knife data bag delete -y ssh_host_keys <fqdn>
knife data bag delete -y ssh_host_keys <fqdn>_keys
@aespinosa
Copy link
Collaborator

The repxe-host.sh script orchestrates all of this. if you are deleting a cluster, you need to do everything that repxe-host.sh does to get it properly cleaned.

@cbaenziger
Copy link
Member Author

cbaenziger commented Oct 8, 2018

Ah this is a VM specific issue for testing. I can not see how to apply repxe-host.sh to that yet...

@cbaenziger cbaenziger reopened this Oct 8, 2018
@aespinosa
Copy link
Collaborator

aespinosa commented Oct 9, 2018

I'd argue that would be a scope from something between tests/automated_install.sh and the layer right above cluster-assign-roles.sh then. not cluster-assign-roles.sh itself.

@cbaenziger
Copy link
Member Author

cbaenziger commented Oct 11, 2018

@aespinosa Ah good idea; yes, my hang-up was this broke the idempotency of tests/automated_install.sh so it may be possible to envision it doing the necessary work rather than cluster_assign_roles.rb.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants