Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nftables support for ipmasq and portmap #935

Merged
merged 5 commits into from
Sep 16, 2024

Conversation

danwinship
Copy link
Contributor

This adds nftables backends to the two remaining iptables-only bits in containernetworking/plugins; ipmasq (used by ptp and bridge) and portmap. For now, they are both set up to use iptables by default, unless you explicitly request nftables via the network config, or you run the plugin on a host that has nft installed but not iptables (eg, future RHEL 10 hosts).

The firewall plugin currently has "iptables" and "firewalld" backends. In theory it could have a separate explicit "nftables" backend, but I didn't do that here.

The ipmasq and portmap code implement nftables via https://github.com/danwinship/nftables, which is a library I started for the planned kube-proxy nftables backend, and also plan to use for various other nftables ports (eg, cri-o). It might move from danwinship/ to somewhere more generic at some point, but it might not.

After I started working on this branch, I discovered that the bridge plugin was actually already using nftables via pkg/link/spoofcheck.go, using the https://github.com/networkplumbing/go-nft library. Using two separate nftables libraries is probably not great, so we may want to fix that eventually. The libraries have somewhat similar APIs up to a point, then diverge completely because go-nft uses the JSON API to /sbin/nft and therefore has to use the (poorly-documented) JSON rule schema, while danwinship/nftables uses the plain text API to /sbin/nft and so uses "normal" nft rules. eg, compare:

func (sc *SpoofChecker) matchIfaceJumpToChainRule(chain, toChain string) *schema.Rule {
        return &schema.Rule{
                Family: schema.FamilyBridge,
                Table:  natTableName,
                Chain:  chain,
                Expr: []schema.Statement{
                        {Match: &schema.Match{
                                Op:    schema.OperEQ,
                                Left:  schema.Expression{RowData: []byte(`{"meta":{"key":"iifname"}}`)
},
                                Right: schema.Expression{String: &sc.iface},
                        }},
                        {Verdict: schema.Verdict{Jump: &schema.ToTarget{Target: toChain}}},
                },
                Comment: ruleComment(sc.refID),
        }
}

which in danwinship/nftables syntax would become:

func (sc *SpoofChecker) matchIfaceJumpToChainRule(chain, toChain string) *nftables.Rule {
        return &nftables.Rule{
                Chain: chain,
                Rule: nftables.Concat(
                        "iifname", sc.iface,
                        "jump", toChain,
                ),
                Comment: ruleComment(sc.refID),
        }
}

("Draft" PR because not fully-tested yet...)

@danwinship danwinship force-pushed the nftables branch 2 times, most recently from d6e2a46 to 60eb4e3 Compare July 31, 2023 14:18
@danwinship
Copy link
Contributor Author

  [FAILED] Unexpected error:
      <*errors.errorString | 0xc0003bdb20>: 
      running [/usr/sbin/iptables -t nat -D POSTROUTING -s 10.1.2.2 -j CNI-43a5a67926c1a665ff4c21b7 -m comment --comment name: "testConfig" id: "dummy-0" --wait]: exit status 2: iptables v1.8.7 (nf_tables): Chain 'CNI-43a5a67926c1a665ff4c21b7' does not exist

sigh, this is what coreos/go-iptables#107 was supposed to fix but I guess I fixed it for iptables-legacy but not iptables-nft?

(Was this not failing before? I'm not sure why this change would make the race condition suddenly start happening.)

pkg/utils/netfilter.go Outdated Show resolved Hide resolved
@@ -23,7 +23,9 @@ import (

// SetupIPMasq installs iptables rules to masquerade traffic
// coming from ip of ipn and going outside of ipn
func SetupIPMasq(ipn *net.IPNet, chain string, comment string) error {
func SetupIPMasq(ipn *net.IPNet, pluginName, containerID string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a lot but seems this is used externally too https://grep.app/search?q=SetupIPMasq

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, right. Do I have to add a new API rather than changing the old one? I don't see anything about API stability guarantees...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not familiar with this codebase , let's ask @squeed

@danwinship danwinship force-pushed the nftables branch 2 times, most recently from c69144c to e48682c Compare May 25, 2024 12:45
@aojea
Copy link
Contributor

aojea commented May 27, 2024

linter fail, I see is still in draft, let me know if you want a review,

@danwinship danwinship marked this pull request as ready for review July 24, 2024 15:02
@danwinship
Copy link
Contributor Author

OK, finally updated and tested:

  • ip.SetupIPMasq() and ip.TeardownIPMasq() are now unchanged (and still always use iptables). Instead, I added a new SetupIPMasqForPlugin and TeardownIPMasqForPlugin that support both iptables and nftables.
  • The bridge and ptp plugins now use iptables by default, or nftables if iptables isn't installed or doesn't work. Or you can force the backend via "ipMasqBackend": "iptables" or "ipMasqBackend": "nftables" in the config.
  • Likewise, portmap now supports nftables, but still uses iptables by default, unless it's not available/usable, or you pass "backend": "nftables", or you set "conditionsV4" or "conditionsV6" to something that looks like nftables syntax rather than iptables.

(I guess this will need a docs update to the cni.dev repo?)

@danwinship
Copy link
Contributor Author

/assign @squeed

@danwinship
Copy link
Contributor Author

oh, and I could port this to use the same nftables library that spoofcheck uses, if you preferred.

or port the spoofcheck code to use knftables like kube-proxy (and Calico and soon some other stuff)

go.mod Outdated Show resolved Hide resolved
@squeed squeed self-requested a review August 19, 2024 15:07
@danwinship
Copy link
Contributor Author

danwinship commented Aug 26, 2024

or port the spoofcheck code to use knftables like kube-proxy

danwinship@17bb5355

go.mod Outdated Show resolved Hide resolved

// hashForContainer returns a unique hash identifying the rules for this container with
// this plugin
func hashForContainer(pluginName, containerID string) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an error in the iptables implementation of this -- this needs to hash (network, ifname, containerid).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pluginName here is NetConf.Name, which is what you mean by network, right? (Yeah, pluginName is a bad variable name... I was assuming it meant NetConf.Type...)

Why do we need ifname?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need ifname?

Because it is allowed to add and remove the same container to the same network multiple times. In that case, the ifname is the only distinct identifier.

}

func setupIPMasqNFTablesWithInterface(nft knftables.Interface, ipn *net.IPNet, pluginName, containerID string) error {
comment := hashForContainer(pluginName, containerID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The iptables implementation is wrong here -- we need to be able to find all masquerade rules for a given network. Can the comment include network name in plain text?

@squeed
Copy link
Member

squeed commented Aug 27, 2024

Looks basically good. I'd like to see GC support if possible, since it informs the design the API. Could you add that, even if the top-end API glue isn't there yet.

@danwinship danwinship force-pushed the nftables branch 2 times, most recently from a833549 to 043253e Compare August 29, 2024 13:49
Use `conditionsV4` and `conditionsV6` values that actually look like
valid iptables conditions.

Signed-off-by: Dan Winship <[email protected]>
@danwinship
Copy link
Contributor Author

@squeed this is ready for re-review btw; it includes an implementation of GC for nftables ipmasq, which isn't actually plumbed through, but at least gets unit tested.

@squeed squeed merged commit 01a94e1 into containernetworking:main Sep 16, 2024
5 checks passed
@danwinship danwinship deleted the nftables branch September 16, 2024 21:33
@champtar
Copy link
Contributor

champtar commented Nov 6, 2024

I've started to play with this, found a regression (#1114 / PR #1117), and 2 bugs ( #1115 - PR #1116 / #1118 - PR #1120 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants