Skip to content

Commit

Permalink
Fix issues realted to fail over cr description and routes
Browse files Browse the repository at this point in the history
Signed-off-by: Aswin Suryanarayanan <[email protected]>
  • Loading branch information
aswinsuryan authored and sridhargaddam committed Jun 21, 2023
1 parent 7e5a7e9 commit 7715765
Show file tree
Hide file tree
Showing 5 changed files with 140 additions and 51 deletions.
9 changes: 9 additions & 0 deletions .idea/enhancements.iml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions .idea/modules.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions .idea/vcs.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

66 changes: 66 additions & 0 deletions .idea/workspace.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

102 changes: 51 additions & 51 deletions submariner/OVN-Interconnect.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,19 +38,19 @@ With OVN Interconnect we can have two types of deployment

```bash
annotations:
k8s.ovn.org/ovn-node-transit-switch-port-ips: '["169.254.0.3/16"]'
k8s.ovn.org/ovn-zone: global
k8s.ovn.org/node-transit-switch-port-ifaddr: '["169.254.0.3/16"]'
k8s.ovn.org/zone-name: global
name: cluster1-worker

annotations:
k8s.ovn.org/ovn-node-transit-switch-port-ips: '["169.254.0.5/16"]'
k8s.ovn.org/ovn-zone: az2
k8s.ovn.org/node-transit-switch-port-ifaddr: '["169.254.0.5/16"]'
k8s.ovn.org/zone-name: az2
name: cluster2-worker
```

With the current architecture, Submariner adds routes only in the zone in which it is deployed. For example, if Submariner is deployed in
zone 1 it programs OVN db in zone 1. So only pods in zone 1 nodes will be able to talk to other clusters. Pods in zone 2 or zone 3 will not
be able to reach remote clusters connected via Submariner.
With the current architecture, Submariner network-plugin-syncer adds routes only in a single zone where the network-plugin-syncer pod runs.
For example, if network-plugin-syncer is deployed in zone 1 it programs OVN db in zone 1. So only pods in zone 1 nodes will be able to talk
to other clusters. Pods in zone 2 or zone 3 will not be able to reach remote clusters connected via Submariner.

As part of the proposal, we plan to support both the modes and OVN cluster deployments where interconnect
is not enabled as well.
Expand All @@ -61,58 +61,59 @@ As part of this proposal, we are planning to add two new CRDs in Submariner. The
can be moved to OVN, when OVN provides a way to add custom routes. We can propose this CRD or consume the CRD that they provide at a later
point of time.

#### SubmarinerGwRoute
#### GatewayRoute

This CR will be created when a remote endpoint is added and there will be one CR per remote cluster. This CRD has two fields

* NextHops - Specifies the list of next hop to reach the remote cluster, in this case it will be the IP of ovn-k8s-mp0
interface, the interface used by OVN for host networking.
* RemoteCIDR - Specifies the list of remote CIDRs reachable via this cluster.
* RemoteCIDR - Specifies the list of remote CIDRs reachable via the next hop.

This CR will be used by the route agent pod running on the active-Gateway node to program OVN to send the traffic destined to
remote clusters via the Submariner tunnel.
remote clusters via the Submariner tunnel. Also it adds a route to send the traffic from other zones, destined to remote cluster
to the submariner tunnel

``` go
type SubmarinerGwRoute struct {
type GatewayRoute struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`

SubmarinerRoutePolicySpec SubmarinerRoutePolicySpec `json:"submarinerRoutePolicySpec"`
RoutePolicySpec RoutePolicySpec `json:"spec"`
}

type SubmarinerRoutePolicySpec struct {
//Specifies the list of next hops to reach the remote CIDRs
type RoutePolicySpec struct {
// Specifies the next hops to reach the remote CIDRs
NextHops []string `json:"nextHops"`

//Specifies the remote CIDRs available via the next hop
RemoteCidr []string `json:"remoteCidr"`
// Specifies the remote CIDRs available via the next hop
RemoteCIDRs []string `json:"remoteCIDRs"`
}
```

#### SubmarinerNonGWRoute
#### NonGatewayRoute

This CR will be created when a remote endpoint is created and there will be one created per endpoint.

* NextHops - Specifies the list of next hops. In this case it will be the transit switch IP.
* NextHops - Specifies the list of next hops. In this case ,we will have only one, and it will be the transit switch IP of the zone
where g/w node is present.
* RemoteCIDR - Specifies the list of remote CIDRs reachable via this gateway.

* In non-g/w node - If the route-agent pod is not in the same zone as Gateway node zone, send the traffic to the g/w node zone.
* In g/w node - add route to send the traffic from other zones, destined to remote cluster to the submariner tunnel

``` go
type SubmarinerNonGWRoute struct {
type NonGatewayRoute struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`

SubmarinerRoutePolicySpec SubmarinerRoutePolicySpec `json:"submarinerRoutePolicySpec"`
RoutePolicySpec RoutePolicySpec `json:"spec"`
}

type SubmarinerRoutePolicySpec struct {
//Specifies the list of next hops to reach the remote CIDRs
type RoutePolicySpec struct {
// Specifies the next hops to reach the remote CIDRs
NextHops []string `json:"nextHops"`

//Specifies the remote CIDRs available via the next hop
RemoteCidr []string `json:"remoteCidr"`
// Specifies the remote CIDRs available via the next hop
RemoteCIDRs []string `json:"remoteCIDRs"`
}
```

Expand All @@ -129,33 +130,31 @@ traffic with in a cluster and will be present in every node.

### SubmarinerRouteAgentPod

The Submariner Route-agent pod running on the active gateway node will be responsible for creating the SubmarinerGWRoute CRs. It will be used
only for OVN CNI. For every RemoteEndpointCreated event a SubmarinerGWRoute CR will be created. The nextHop will be the interface IP through
The Submariner Route-agent pod running on the active gateway node will be responsible for creating the GatewayRoute CRs. It will be used
only for OVN CNI. For every RemoteEndpointCreated event a GatewayRoute CR will be created. The nextHop will be the interface IP through
which we can reach the cable driver. In the case of OVN it will be the IP of ovn-k8s-mp0 interface.

The SubmarinerNonGWRoute CRD will also be created by Submariner Route-agent. It will be created per endpoint and will have remoterCIDRS from
the endpoint. The nextHop will be the transit switch IP of the G/W node. If the transit switch IP is missing this CRD will not be created,
which means it is a non-IC setup.
The NonGatewayRoute CRD will also be created by Submariner Route-agent running on the active gateway node . It will be created per endpoint
and will have remoterCIDRS from the endpoint. The nextHop will be the transit switch IP of the G/W node. If the transit switch IP is missing
this CR will not be created, which means it is a non-IC setup.

The RouteAgent will have these controllers added to it and the one running in gateway node responds to the CRUD operations of Submariner
endpoints.

#### SubmarinerGWRoute CR Controller
#### GatewayRoute CR Controller

This controller will be responsible for programming the OVN cluster router, it will react only the active g/w node. When a SubmarinerGWRoute
This controller will be responsible for programming the OVN cluster router, it will react only the active g/w node. When a GatewayRoute
CR is created or modified the controller shall create or update a routing policy in OVN cluster router with a priority of 20000, and it should
redirect any traffic destined to remote CIDR to the ovn-k8s-mp0 interface IP.

```bash
_uuid : 0459f009-3603-47ac-8ee7-9d958540ed31
bfd : []
action : reroute
external_ids : {}
ip_prefix : "10.132.0.0/16"
nexthop : "10.1.1.2"
options : {}
output_port : []
policy : []
route_table : ""
match : "ip4.dst==10.132.0.0/16"
nexthops : ["10.1.1.2"]
options : {"external_ids:{submariner"="true}"}
priority : 20000
```

It also programs a route in the ovn-cluster-router, to route the traffic coming from other zones destined to remote cluster IP range via the
Expand All @@ -173,19 +172,19 @@ policy : []
route_table : ""
```

#### SubmarinerNonGWRoute Controller
#### NonGatewayRoute Controller

This controller will run in every route agent pod. This controller connects to the OVN DB. When a SubmarinerNonGWRoute CR is created
in non-g/w node, it updates the ovn-cluster-route with a router policy using a priority of 20000 to send the traffic to
the remote cluster via next hop mentioned, which is the transit switch IP to the g/w node. Before adding the route it checks if
a route exists, if so it skips adding the route again. This is required to prevent duplicate update since there can be more than
one node in each zone and hence more than one RouteAgent.
This controller will run as part of every route agent pod and connects to the OVN DB. When a NonGatewayRoute CR is created,
the route-agent running on the non-GW node will update the ovn-cluster-router with a logical router policy using a priority of 20000
to send the traffic to the remote cluster via next hop mentioned, which is the transit switch IP to the g/w node. Before adding
the route it checks if a route exists, if so it skips adding the route again. This is required to prevent duplicate update since there
can be more than one node in each zone and hence more than one RouteAgent.

```bash
_uuid : 22db3005-64c5-4e32-aeb0-642423c30742
action : reroute
external_ids : {}
match : "ip4.dst==10.132.0.0/16"
match : "ip4.dst==10.132.0.0/14"
nexthop : []
nexthops : ["169.254.0.1"]
options : {"external_ids:{submariner"="true}"}
Expand All @@ -202,17 +201,18 @@ network-plugin-syncer pods and remove any existing deployments.
### Gateway FailOver

If there are two gateway nodes , the passive one will work like a non-gateway node. It will not be responsible for creating the CRs.
In the case of gateway fail-over all the current SubmarinerGWRoute and SubmarinerNonGWRoute will be deleted by the route agent in
the node that is transitioning to non-gateway node and will be recreated by the new gateway node.
In the case of gateway fail-over all the current GatewayRoute and NonGatewayRoute will be deleted by the route agent in
the node that is transitioning to gateway node and will be recreated with updated values.

#### Open issues

1. The update from older version of Submariner to a newer version will create a datapath downtime.
2. When multiple clusters are updated we need to check if one cluster can be done at a time. The cluster will be down
until all the nodes are updated.
1. When multiple clusters are updated we need to check if one cluster can be done at a time. The cluster will be down
until all the nodes are updated.
2. The update from older version of Submariner to a newer version will create a datapath downtime.
3. When Kubernetes cluster is updated to a version that has IC enabled, there could be a datapath downtime till the
Submariner g/w node is updated. Since the other nodes need the transit switch IP which will be available only when
the g/w node is updated.
4. Explore the possibility of VIP to represent the gateway node switch IP instead of reconfiguring all the routes at non-gw nodes

### Alternatives

Expand Down

0 comments on commit 7715765

Please sign in to comment.