Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

peer auto-configuration proposals #12

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

bganne
Copy link
Contributor

@bganne bganne commented Dec 8, 2020

Hi Jordan, wgsd is a great project!

Here are some proposals, I'd be curious to get your feedback:

  • consistently use Service Instance Name as <Instance> . <Service> . <Domain> everywhere. I noted the fields in A/AAAA and SRV fields are resolved through Service Instance Name but are returned as <Instance> . <Domain>
  • update wgsd DNS plugin to allow peer auto-configuration: I added a TXT field in the SRV answer to communicate the allowed ip and pubkey. That way, the peers do not have to know other peers configuration before-hand
  • update wgsd-client to use service auto-configuration: wgsd-client can connect to all advertised peers or only to a selected peer
  • add a vagrant environment to easily play around

Best,
ben

Service instance name is defined in RFC6763 section 4.1 as
  Service Instance Name = <Instance> . <Service> . <Domain>
Use it instead of <Instance> . <Domain> for consistency.
This patch allows clients to auto-discover other mesh peers:
 - add a TXT record to default SRV answer containing the allowed ip and
public key: clients no longer have to have other peers pre-configured,
all necessary configuration is contained in the SRV record. Only a
connection to the registry is necessary to be allowed to connect to any
other peer
 - allow clients to request SRV records by allowed ip in addition to
public key: a client wanted to communicate with a specific ip can now
discover the associated peer w/o knowing its public key before-hand

Limitations:
 - the exported allowed ip config is only the 1st subnet configured
 - for 2 peers to communicate, both must setup the wireguard association
This patch allows wgsd-client to auto-discover other mesh peers:
 - use peer allowed ip and pubkey from TXT record in SRV answer
 - iterate on peers extracted from PTR answer instead of local wireguard
configuration: the client no longer need to pre-configured with all the
peers
 - allow to connect to a specific peer using its pubkey or allowed ip
@bganne bganne changed the title For upstream peer auto-configuration proposals Dec 8, 2020
@jwhited
Copy link
Owner

jwhited commented Dec 23, 2020

Hi Ben,

Thanks for the feedback and proposals, some great ideas. Sorry for the delay...

consistently use Service Instance Name as . . everywhere. I noted the fields in A/AAAA and SRV fields are resolved through Service Instance Name but are returned as .

Adhering to https://tools.ietf.org/html/rfc6763#section-4.1 makes sense to me. If this is broken out into its own PR we can merge it. Will probably tag this with a new major version as it's a backwards-incompatible change.

update wgsd DNS plugin to allow peer auto-configuration: I added a TXT field in the SRV answer to communicate the allowed ip and pubkey. That way, the peers do not have to know other peers configuration before-hand

Adding AllowedIPs config into the DNS changes the scope of wgsd as it's currently for endpoint discovery, but I can see how this would be useful for bootstrapping from scratch. Would love to hear more about how you are using this.

Is the duplication of the public key into the TXT record to make it easier to map with the AllowedIPs? This should also be resolvable via the instance name.

update wgsd-client to use service auto-configuration: wgsd-client can connect to all advertised peers or only to a selected peer

Similar to above, would love to hear more about how you're using this.

@bganne
Copy link
Contributor Author

bganne commented Dec 24, 2020

consistently use Service Instance Name as . . everywhere. I noted the fields in A/AAAA and SRV fields are resolved through Service Instance Name but are returned as .

Adhering to https://tools.ietf.org/html/rfc6763#section-4.1 makes sense to me. If this is broken out into its own PR we can merge it. Will probably tag this with a new major version as it's a backwards-incompatible change.

Will do. Note that I am no expert, it is my own understanding of the RFC, but this looks consistent to me.

update wgsd DNS plugin to allow peer auto-configuration: I added a TXT field in the SRV answer to communicate the allowed ip and pubkey. That way, the peers do not have to know other peers configuration before-hand

Adding AllowedIPs config into the DNS changes the scope of wgsd as it's currently for endpoint discovery, but I can see how this would be useful for bootstrapping from scratch. Would love to hear more about how you are using this.

I'd argue it is still about endpoint discovery but in a more dynamic environment. The usecase is this:

  • you want to interconnect services nodes through wireguard
  • these nodes can come and go dynamically (auto scaling)
  • each node will probably connect to only a subset of the other nodes (eg. you probably have a lot of identical nodes for each service and you load-balance)

In the current implementation, when on-boarding a new node, you must pre-configured all of its peers in the wireguard configuration, and each time a node come or go you must also update all configurations.

With these changes, each node only need to be configured with the registry address, and then the configuration does not need to be touched anymore for the lifetime of the node.
When a new node comes in, all other nodes can update their configuration to connect to it.

This is not the end of the story though, the next thing I'd like to support is dynamically connect/disconnect based on active conversations:

  • when a node comes in, it connects to the registry and that's it. There is no other wireguard connection
  • when the node starts to send packet to another node, there will not be any active tunnel for it and packets will be dropped. But those packets can be "punted" to wgsd-client which can then inspect the destination IP, ask the registry for the relevant configuration and setup the wireguard connection. Note we probably need a similar "punt" path on the receiving side, as both endpoints must be configured accordingly
  • subsequent packets flow through the tunnel
  • wgsd-client track active conversations. If a tunnel sees no packets for some time, it can be tear-downed

I say 'wgsd-client' here but it could be another client so we can keep the 2 usecases "simple static full-mesh" and "dynamic partial mesh" separate.

Is the duplication of the public key into the TXT record to make it easier to map with the AllowedIPs? This should also be resolvable via the instance name.

When requesting configuration per AllowedIP (and not per instance name), the public key can be scrapped from the instance name but it seemed cleaner to me to just add an additional record.
I do not have a strong opinion here, in fact I started by scraping the instance name :) and then changed it to add the record.
Let me know what you prefer.

update wgsd-client to use service auto-configuration: wgsd-client can connect to all advertised peers or only to a selected peer

Similar to above, would love to hear more about how you're using this.

Yes it is just extending the client to support the usecase described above.

@jwhited
Copy link
Owner

jwhited commented Dec 29, 2020

I'd argue it is still about endpoint discovery but in a more dynamic environment. The usecase is this:

you want to interconnect services nodes through wireguard
these nodes can come and go dynamically (auto scaling)
each node will probably connect to only a subset of the other nodes (eg. you probably have a lot of identical nodes for each service and you load-balance)

Thanks for elaborating, makes sense to me. I'm on board w/including AllowedIPs in the DNS as TXT records. RFC6763 section 6 has quite a bit to say about formatting of additional configuration in TXT records. Would be good to combine that guidance with whatever learnings can be found from other RFCs/patterns where IP address prefixes are included in the DNS (SPF, APL, ...?)

When requesting configuration per AllowedIP (and not per instance name), the public key can be scrapped from the instance name but it seemed cleaner to me to just add an additional record.

Since the public key is configuration, and configuration should exist as TXT record data I think this makes sense. It's also just convenient when eyeballing base64 keys.

Happy to work on this or review a PR specifically for AllowedIPs/Pubkeys in TXT records. Client changes are obviously welcome to make use of new DNS config data, but may be easier to nail down the DNS contract first.

This is not the end of the story though, the next thing I'd like to support is dynamically connect/disconnect based on active conversations:

This sounds really interesting. Is there a use case where you would have lots of tunnels and the resource cost of maintaining the tunnels is too high? Or is this for security reasons?

@jwhited
Copy link
Owner

jwhited commented Dec 31, 2020

added #17 with an initial idea for pub key and allowed ips in TXT

@jwhited
Copy link
Owner

jwhited commented Dec 31, 2020

added #20 for tracking a Vagrant environment. If you want to break that work out into a new PR happy to review

@bganne
Copy link
Contributor Author

bganne commented Jan 11, 2021

added #17 with an initial idea for pub key and allowed ips in TXT

Should I update the client to take advantage of it?

added #20 for tracking a Vagrant environment. If you want to break that work out into a new PR happy to review

Done in #28

@jwhited
Copy link
Owner

jwhited commented Jan 19, 2021

Should I update the client to take advantage of it?

The original client was built with the intent to update endpoint values for peers that were already configured. Now with config data (allowed IPs) being served via TXT records we can support a full bootstrap/mesh from scratch.

With that being said, I'm not sure if wgsd-client should be extended with flags, or that should be added as its own client. Thoughts?

@m00nwtchr
Copy link

m00nwtchr commented Aug 4, 2024

@bganne

This is not the end of the story though, the next thing I'd like to support is dynamically connect/disconnect based on active conversations:

* when a node comes in, it connects to the registry and that's it. There is no other wireguard connection

* when the node starts to send packet to another node, there will not be any active tunnel for it and packets will be dropped. But those packets can be "punted" to wgsd-client which can then inspect the destination IP, ask the registry for the relevant configuration and setup the wireguard connection. Note we probably need a similar "punt" path on the receiving side, as both endpoints must be configured accordingly

* subsequent packets flow through the tunnel

* wgsd-client track active conversations. If a tunnel sees no packets for some time, it can be tear-downed

All of that is completely unnecessary for WireGuard. There's no 'active tunnels'/connections and no cost associated with having many configured WireGuard peers, assuming equal amount of traffic in any given scenario. WireGuard only sends packets when packets are being sent into the tunnel, unless Persistent Keepalive is on, and you should only need that for the Registry tunnel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants