-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for internet gateways #6475
Conversation
rcgoodfellow
commented
Aug 29, 2024
•
edited
Loading
edited
- fixes Implement the Internet Gateway concept #2154
- depends on Support for omicron internet gateway model opte#588
- requires import fixed igw API oxide.rs#864
e65431a
to
817c8b5
Compare
Something I'm not too sure about with this PR is how it needs to interact with the reconfigurator. If any services move around, we need to trigger the |
// I think we should move decisions around change | ||
// propagation to be based on actual delta | ||
// calculation, rather than trying to manually | ||
// maintain a signal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling this comment out as something to discuss in this PR. CC @FelixMcFelix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I had maybe overlooked the scope of things that could change in connection with an Internet Gateway. That's a big surface area to keep an eye on...
I'm not really sure how we would want to work around this, however. The motivation for using router versions was more for performance rather than correctness. My intent was to prevent each Nexus's copy of the background task from querying every Project's/VPC's/Subnet's state on a minute-by-minute (and per-wakeup) basis, since there are a lot of tables which are referenced during route resolution. This partly stems from the granularity of background task wakeups, which don't AFAIK allow a selective wakeup. So my initial feeling is that having Nexus store resolved tables and then send deltas is orthogonal to that?
It could also, in practice, not be too expensive to just rebuild VPC routing tables periodically! In which case, the places where we perform version bumps and/or background task wakeups would keep the system responsive rather than be crucial for correctness. If cost remains a concern, I wonder if we can just use modified
timestamps in connected tables to drive the route update process (which would, I hope, be less error-prone).
Maybe! The bit of reconfigurator that can actually move services around is also an RPW (
|
goes along with oxidecomputer/omicron#6475
Testing NotesI performed to following procedure to test the functionality of this PR in a local Omicron development environment. This test procedure requires oxidecomputer/oxide.rs#853 and oxidecomputer/opte#588 The test creates two IP pools. The first one is the default IP pool for the silo. The second is an IP pool we create to associate with a gateway. In this test the gateway is called
IP ranges are ones that work for the network my workstation is on. Adjust as necessary for other environments. #
# default ip pool
#
oxide ip-pool create \
--name default \
--description default
oxide ip-pool range add \
--pool default \
--first 192.168.20.32 \
--last 192.168.20.47
oxide ip-pool silo link \
--pool default \
--is-default true \
--silo recovery
#
# sv2 ip pool
#
oxide ip-pool create \
--name sv2 \
--description colo
oxide ip-pool range add \
--pool sv2 \
--first 192.168.20.48 \
--last 192.168.20.63
oxide ip-pool silo link \
--pool sv2 \
--is-default false \
--silo recovery
oxide internet-gateway create \
--project 'test' \
--description 'colo gateway' \
--vpc default \
--name sv2
oxide internet-gateway ip-pool attach \
--project 'test' \
--vpc default \
--gateway sv2 \
--json-body igw-pool-attach.json igw-pool-attach.json {
"gateway": "sv2",
"ip_pool": "sv2",
"name": "sv2",
"description": "sv2 pool attachemnt"
}
This router forwards traffic to oxide vpc router create \
--project 'test' \
--vpc default \
--description 'default router' \
--name default
oxide vpc router route create \
--project 'test' \
--vpc default \
--router default \
--json-body igw-route.json
oxide vpc subnet update \
--project 'test' \
--vpc default \
--subnet default \
--custom-router default igw-route.json {
"name": "sv2",
"description": "route to sv2",
"target": {
"type": "internet_gateway",
"value": "sv2"
},
"destination": {
"type": "ip_net",
"value": "45.154.216.0/24"
}
}
oxide floating-ip create \
--project 'test' \
--name 'default-igw' \
--description 'floating ip for default gateway' \
--pool default
oxide floating-ip create \
--project 'test' \
--name 'sv2-igw' \
--description 'floating ip for sv2 gateway' \
--pool sv2
This assumes the existence of a oxide instance from-image \
--project 'test' \
--name 'oak' \
--description 'a test instance' \
--hostname 'test' \
--memory 1g \
--ncpus 2 \
--image debian12 \
--size 2g \
--start
oxide floating-ip attach \
--project 'test' \
--floating-ip 'default-igw' \
--kind instance \
--parent 'oak'
oxide floating-ip attach \
--project 'test' \
--floating-ip 'sv2-igw' \
--kind instance \
--parent 'oak' |
I've implemented the followup changes from the minor OPTE-side rework -- I had to work around the issue you mentioned in the
So far as I can tell now, generic gateway traffic is going out on the correct IP addresses (priority-respecting)! I suppose this does raise some questions around edge cases (with the caveat that I have been focussed on configuring OPTE, rather than the API side):
I'll run through reviewing the rest of the API (hopefully) tonight -- let me know if you agree with this tagging-style approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for putting this together; this is a huge chunk of work, and I'm excited for this to land soonish.
Reiterating some high-level questions around the actual use/configuration of Internet Gateways in addition to my earlier comment:
- Is attaching a name and description to each link the right model?
- What are the intended link cardinalities between an IGW and address sources?
- I'm guessing that we should be able to link many IP pools/individual addresses to an IGW.
- Between IGWs, should we be imposing any limitations on linked resources? E.g., an IP pool can only be linked to a single IGW in each VPC.
- Do linked IpAddrs need to come from a pool linked to the silo? Or are they intended to fixup certain floating IPs? Should they be unique across all VPCs?
.github/buildomat/jobs/deploy.sh
Outdated
@@ -104,6 +108,20 @@ z_swadm () { | |||
pfexec zlogin oxz_switch /opt/oxide/dendrite/bin/swadm $@ | |||
} | |||
|
|||
# only set this if you want to override the version of opte/xde installed by the | |||
# install_opte.sh script | |||
OPTE_COMMIT="7bba530f897f69755723be2218de2628af0eb771" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marking this to make sure we comment this line out before merging.
DELETE FROM omicron.public.router_route WHERE | ||
time_deleted IS NULL AND | ||
name = 'default-v6' AND | ||
vpc_router_id = '001de000-074c-4000-8000-000000000001'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Migration notes:
- The IDs for the existing
inetgw:outbound
routes in fixed-data are001de000-074c-4000-8000-000000000002
and001de000-074c-4000-8000-000000000003
- If we're deleting these entries we need to replace them with equivalents: the datastore route ensure will only manage the
Subnet
entries today. - We need to create a default inetgw (and its link to the parent silo's default IP pool) per existing VPC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Database migration added in schema/crdb/internet-gateway/up13.sql
RefreshExternalIps { | ||
tx: oneshot::Sender<Result<(), ManagerError>>, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in future, we can cut a lot of this circuitous movement through InstanceRunner
out (including EIP attach/detach) if we place external IP state onto NetworkInterface directly.
Definitely out of scope here given the immediacy of this PR!
Noting that |
I've taken things for a lap with the test described above, and things still work as expected with the floating IP pinning. |
I think I'm happy with how we percolate information out to instances now -- I've been running through the above gateway scenario in- and out-of-order (i.e., setting up routes before a valid gateway is created) and I haven't been able to break it just yet. So far as I can tell, the OPTE state is quickly updating in all cases I've tried, and our source address picking semantics are preserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the OPTE side, I'm happy with the dataplane aspects and how we're setting them up. I think once the DB migration is in, we should be good to go! 🎉
also bump opte rev
This change mirrors OPTE#06ccc5b (and its more pedestrian followup commits) -- InternetGateway routes now tag matched packets such that the NAT layer can use them for selection. Nexus now produces a per-NIC map, containing a list of EIP->IGW UUID mappings -- OPTE uses this map to divide sets of external IPs into separate rules within the NAT layer. This reverts VPC router calculation to a shared set of rules for all NICs in a given subnet. This change was necessary to ensure that, e.g., Ephemeral IP addresses would be consistently chosen over SNAT (which would have been unfortunate to hit!). The router layer isn't exactly best-placed to enforce custom priorities or choices over equal-priority elements. The downsde is that this approach only really works (today) for Instances. Probes/Services on sled agent discard their external IP information fairly quickly -- in these cases, I've written IGW tagging to recreate the legacy (single IGW) behaviour. I'll put this as more evidence that RPW-izing external IP configuration would be useful.
Way, way snappier for updating IGW<->EIP mappings.
The place we were missing router version bumps was CRUD operations on Internet Gateways themselves. The rationale here is that these are resolved by name, so we need to indicate to the RPW that there may be a new resource that can actually be resolved. Returns one `continue` back into active service, since it appears to be behaving well.
I noticed that bumping all routers' versions was awfully expensive in a few places after adding it into IGW create/delete. This should hopefully trim that down a little bit -- it was starting to bite on e.g. VPC create. From what I can tell, test_vpc_routers_custom_delivered_to_instance still remains happy; I'm getting 17s per success on my home helios box.
It's currently possible for several IGWs in a VPC to be linked to the same IP pool -- we now report all such mappings to OPTE and sled-agent. OPTE now allows a given address to be selected for any IGW which would admit it. Previously, this led to a case where an OPTE port would lack usable NAT entries for the IGW it was actually configured to use (due to the presence of a second matching IGW).
igw routes are not syncing correctly without subnet workarounds
instead attach subnets to custom router