Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service Function Chaining (SFC) Support #181

Open
krishnajs opened this issue May 11, 2021 · 16 comments
Open

Service Function Chaining (SFC) Support #181

krishnajs opened this issue May 11, 2021 · 16 comments

Comments

@krishnajs
Copy link

Is calico planning to support Service function chaining for bare-metal Kubernetes deployment?

VPP which is also the dataplane for Network Service Mesh NSM supports Policy-driven service function chaining.

SFC is a key technology for certain Telco workloads and there are not many CNIs in K8s ecosystem that supports SFC.

@AloysAugustin
Copy link
Collaborator

Hi @krishnajs , we're not planning to add SFC features to the Calico VPP dataplane, however we are planning to integrate it with NSM so you can run a single instance of VPP on your nodes, and use the NSM control plane for the pod SFC configuration. Would that work for your use-case?

@krishnajs
Copy link
Author

Thanks for the feedback. This will definitely give us a path for our use cases. One thing that we need to think about is what would happen if the cluster already has other Service Mesh on the node like istio.

@edwarnicke
Copy link

@krishnajs One of the things in the back of my head on the NSM side has consistently been making sure its possible to do what @AloysAugustin is proposing: sharing a single VPP instance between NSM and Calico-VPP.

As to 'what would happen if the cluster already has other Service Mesh on node like istio' ... Network Service Mesh is complementary, not competitive with L7 Service Meshes list Istio. So you should be fine there :)

@krishnajs
Copy link
Author

Thanks, @edwarnicke, and @AloysAugustin for your perspective. Is there an estimated time to start this work? This can help us if we need to contribute.

@edwarnicke
Copy link

@krishnajs Your interest and willingness to help out is appreciated! The good news is: NSM and Calico-VPP are pretty orthogonal to each other, so barring unforeseen difficulties, it shouldn't be too hard to get them to share a single VPP instance.

The reason for this is actually pretty instructive. The NSM Forwarder is basically plumbing a set of 'vWires'. When a workload requests to be connected to a Network Service, the Forwarder basically needs to:

  1. Create and incoming interface in VPP for the mechanism the NSC (Network Service Client) requests (more on 'mechanisms' later)
  2. Create an outgoing interface in VPP for the mechanism the NSE (Network Service Endpoint) selects.
  3. Cross connect the two (with either l2xc or l3xc)
  4. NSM does not assign IPs to the interfaces inside VPP, nor program routes.

The net result is there is very little surface area for conflict with what Calico-VPP is doing as a CNI.

I mentioned earlier 'mechanisms'. These are the 'local mechanisms' (kernel interface, memif,vfio) or remote mechanisms (vxlan, wireguard, etc) for the interface.

In the case of VXLAN, there is some possibility of collision around VNI selection. That is likely quite resolvable however as Calico has a single VNI configured for it, and NSM can be, via slight modifications, be may to avoid it.

@krishnajs Would you be willing to try a first simple pass at getting the cmd-forwarder-vpp running with Calico-VPP... it should be pretty simple to attempt. If you are interested, I'm happy to lay out the NSM side steps, and I suspect @AloysAugustin would be willing to layout the Calico-VPP side steps :)

@edwarnicke
Copy link

@AloysAugustin Keep me honest about the Calico-VPP parts of this :)

@krishnajs Calico-VPP runs its own VPP instance and mounts in the directory containing the VPP programming socket. The VPP programming socket is then referenced as "/var/run/vpp/vpp-api.sock".

I made a very simple modification to the VPP Forwarder to allow it to optionally use an existing VPP instance rather than starting one of its own by setting the env variable NSM_VPP_API_SOCKET="/var/run/vpp/vpp-api.sock".

I made a second very simple modification to the VPP Forwarder to allow a NONE option for initializing VPP by setting the env variable NSM_VPP_INIT=NONE.

NSM keeps a depo of examples to try: deployment-k8s.

I've got one more thing to fix and then you should be able to:

  1. Add here
    a. NSM_VPP_API_SOCKET="/var/run/vpp/vpp-api.sock"
    b. NSM_VPP_INIT="NONE"
  2. Add volumeMount for /var/run/vpp here.

And run NSM against the Calico-VPP VPP instance.

@AloysAugustin how could one discover the NodeIP being used by Calico-VPP (I need it for NSM_TUNNEL_IP). Is it available as via the downward API as status.podIP if running in hostNetwork: true ?

@krishnajs
Copy link
Author

@edwarnicke thanks a lot for this write-up. I am trying to organize our team to try this out. I Will let you know how it goes.

@edwarnicke
Copy link

@krishnajs My suggestion would be to:

  1. Get Calico-VPP running
  2. Kick the tires on some of the deployment-k8s examples independently of sharing a VPP instance with Calico-VPP (this could be as simple as running the NSM examples in kind)

We still need from @AloysAugustin some information on how to figure out the IP Calico-VPP is using so we can correctly specify the NSM_TUNNEL_IP, and I still have one more small thing to fix in NSM to give us a good shot of having this simply work out of the gate :)

@edwarnicke
Copy link

OK.. fixed the one last little thing :)

Now we just need @AloysAugustin to tell use what IP Calico-VPP is using so we can use that as the NSM_TUNNEL_IP :)

@AloysAugustin
Copy link
Collaborator

The most reliable way to retrieve this IP should be to get it from the Node object in k8s. It should also be available as the Pod IP in a pod running with host networking.

I also checked the VPP patches that are used in the NSM VPP forwarder, it looks like all of them are already included in the VPP version we ship with the VPP dataplane 🙂 .

@edwarnicke
Copy link

I also checked the VPP patches that are used in the NSM VPP forwarder, it looks like all of them are already included in the VPP version we ship with the VPP dataplane 🙂 .

@AloysAugustin this is fantastic news!

The most reliable way to retrieve this IP should be to get it from the Node object in k8s. It should also be available as the Pod IP in a pod running with host networking.

@AloysAugustin This is also good news, as by default the forwarder uses hostNetwork:true and sets NSM_TUNNEL_IP from the status.podIP. This means we don't have to make any change for NSM_TUNNEL_IP :)

@edwarnicke
Copy link

@krishnajs It should be true then that the instructions from the previous comment has a pretty good chance of just working :)

@krishnajs
Copy link
Author

thanks @edwarnicke and @AloysAugustin we will start bring up this on our side and report back our finding

@jtollet
Copy link
Collaborator

jtollet commented Jun 3, 2021

thanks @edwarnicke and @AloysAugustin we will start bring up this on our side and report back our finding

Hello @krishnajs. Do you have any update to report on this ?

@brunodzogovic
Copy link

brunodzogovic commented Dec 19, 2021

Hi everyone, just to join this conversation. I am working at the moment on resolving an issue that transpires when using BGP with Calico and MetalLB to advertise routes between multiple clusters and in between clouds. Namely, I use NSM to tell the software-defined controller (OpenDaylight) that some Kubernetes cluster somewhere on bare-metal, wants to communicate to another cluster running in OpenStack cloud. One way to prevent Calico from interfering with MetalLB, or rather the ToR BGP gateway kicking out one of them from route advertising, is to use VRF. I'm experimenting with VFR virtual routing and forwarding as well as Segment-Routing (SR) to see which one of those can do the job better, although I'm leaning more towards SR as it's perfect for traffic shaping and service function chaining. The SDN controller then can label the communication (like BGP-MPLS) and send it further. Now, the forwarding plane of the NSM and the underlay network use VPP with SR-IOV. However, making this work with Calico is a bit of a headbang for the moment. Looking forward to see how it progresses further as I see a great potential in this domain for the future.

@AloysAugustin
Copy link
Collaborator

Hi @brunodzogovic , if I'm understanding you correctly, at least one of your issues is that you have two BGP daemons running on each node, the MetalLB one and the Calico one. Have you tried announcing the service addresses in BGP directly with Calico? This is described on this page in the calico docs, and it should allow MetalLB and Calico to coexist nicely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants