-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow unknown protocols #49
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aojea The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
ok, the test passes here, so is the output rule /hold I don't have clear what is the expected behavior kubernetes-sigs/network-policy-api#187 (comment) regarding protocol /assign @danwinship |
tx.Add(&knftables.Rule{ | ||
Chain: chainName, | ||
Rule: knftables.Concat( | ||
"oif", "lo", "accept"), | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@uablrek can you check in your PR this solves the problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this update, but the address matches should never hit anything with "lo" since such packet would be a "martian"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It worked! My world is crumbling! My reasoning is that since the set's only contains pod-addresses, packets matching whose sets can't be routed to/from "lo", since that would make them martians.
Unless... a SatefulSet pod actually have a loopback address as pod-address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The modified chain for reference:
chain output {
type filter hook output priority filter - 5; policy accept;
ct state established,related accept
oif "lo" accept
ip saddr @podips-v4 queue to 100 comment "process IPv4 traffic with network policy enforcement"
ip daddr @podips-v4 queue to 100 comment "process IPv4 traffic with network policy enforcement"
ip6 saddr @podips-v6 queue to 100 comment "process IPv6 traffic with network policy enforcement"
ip6 daddr @podips-v6 queue to 100 comment "process IPv6 traffic with network policy enforcement"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This rule definitely seems wrong.
I think sometimes packets are considered by the kernel to be traversing lo
if they have the same local source and destination IP, even if that IP isn't 127.0.0.1/::1?
Anyway, the failure here is probably caused by the fact that in ipvs, packets traverse the network stack twice (from source to ipvs0
and then from ipvs0
to destination), and the DNAT and SNAT don't happen in exactly the same place as with iptables/nftables kube-proxy. (The DNAT happens inside ipvs0
... I forget if the SNAT happens in the first or second traversal.)
Anyway, point is, the iptables/nftables logic is going to end up doing the wrong thing for some packets in ipvs mode. The fix might be to ignore packets with oif = "ipvs0"
so you only look at them on the second traversal, but I'm really not sure about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know Calico needs to explicitly know whether you're using iptables or ipvs, so it's possible we actually need different logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this rule fixes the IPv6 jobs, ipvs is unrelated here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think sometimes packets are considered by the kernel to be traversing lo if they have the same local source and destination IP, even if that IP isn't 127.0.0.1/::1?
So, what I want to express is "we only need to process in OUTPUT local generated packets that are sent outside (not localhost dest)"
I had this issue in the past with kubelet probes and we solved it using the lo
rule, I think it is correct but I can not explain the science behind it, just the evidence it works as expected, any packet destined to localhost does not have to be processed by network policies anyway, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this rule fixes the IPv6 jobs, ipvs is unrelated here
oh... so... the issue is that if we add processing on the output
hook, then IPv6 (but not IPv4) breaks unless we also add the ! lo
rule?
So, what I want to express is "we only need to process in OUTPUT local generated packets that are sent outside (not localhost dest)"
Why should it matter though? We shouldn't be blocking any packets that are destined for localhost. With the podips
changes, we shouldn't even intercept them...
I can not explain the science behind it
boo!
any packet destined to localhost does not have to be processed by network policies anyway, right?
A packet from a pod-network pod addressed to 127.0.0.0/8
or ::1
would not leave the pod netns, and it's not possible (by validation) to have an EndpointSlice that points to 127.0.0.0/8
or ::1
, so you can't end up with that via Service DNAT either. But you can add non-localhost IPs to lo
I think?
klog.Infof("Can not process packet %d applying default policy (failOpen: %v): %v", *a.PacketID, c.config.FailOpen, err) | ||
c.nfq.SetVerdict(*a.PacketID, verdict) //nolint:errcheck | ||
klog.Infof("Can not process packet %d, allowing ... : %v", *a.PacketID, err) | ||
c.nfq.SetVerdict(*a.PacketID, nfqueue.NfAccept) //nolint:errcheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is really correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this means this packet is not TCP or UDP or SCTP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it means "a parsing error occurred". I'm not sure we want to say all parse errors should be "accepts".
(I'm not sure we don't want that either...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, it can only happens if is not ip or ipv6 or one of those protocols ... but indeed is confusing the way it is now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All packets with unknown protocol (not TCP or UDP or SCTP) must be accepted. Unfortunately there is no v1.ProtocolUnknown
constant, but we can type-cast:
default:
//return t, fmt.Errorf("unknown protocol %d", protocol)
t.proto = v1.Protocol("unknown")
but that feels a bit hackish...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be ok to just store the proto as an int in struct packet
, and let the caller match on "syscall.IPPROTO_*". IMO that's the most flexible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point was that it doesn't seem right to say that every possible type of error from parsePacket
should result in the packet being accepted. IMO it would be better to have parsePacket
return both an error and a verdict, even if that verdict happens to be "accept" for all of the errors it currently catches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some investigation, I agree: all unrecognized packets shall not be accepted (#51 (comment)). However, I think parsePacket()
should not set a verdict. It should parse the packet and let the caller decide, but then is can't return TCP,UDP,SCTP or an error for everything else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be ok to just store the proto as an int in
struct packet
, and let the caller match on "syscall.IPPROTO_*". IMO that's the most flexible.
that sounds like the best idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have experimented some with this, and I think the best is to keep packet.proto as v1.Protocol, and add a new "ipproto int" in the packet structure.
An IPPROTO_ROUTING extension header, or extension header out-of-order will return a
var ErrorExtensionHeader = fmt.Errorf("unsupported extension header")
@@ -73,7 +73,7 @@ func parsePacket(b []byte) (packet, error) { | |||
t.proto = v1.ProtocolSCTP | |||
dataOffset = hdrlen + 8 | |||
default: | |||
return t, fmt.Errorf("unknown protocol %d", protocol) | |||
return t, fmt.Errorf("IP family %s unknown transport protocol %d", t.family, protocol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't want to log an error for every ICMPv6 message. You should add a case 58:
for that.
Probably parsePacket
needs another return value for "not an error, but just accept the packet".
(It seems that your callback only gets called for IP and IPv6 protocol packets, right? And, in particular, not ARP and ICMP packets, which are separate L3 protocols, unlike ICMPv6/NDP, which are subprotocols of IPv6.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see the next commit, only TCP , UDP and SCTP packets are sent to the userspace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment I think all fragments will also have an unknown protocol. But I am about to fix that (#15)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see the next commit, only TCP , UDP and SCTP packets are sent to the userspace
That may fix the fragment problem. The kernel will reassemble packets before sending them to user-space (perhaps it already does, in which case #15 isn't a kind/bug after all).
Since network policies are only defined for L4 protocols and is not well defined what happens with the protocols not supported for Kubernetes, use the less surprised mode for users and allow any unknown protocol for Kubernetes Co-Authored-By: Lars Ekman <[email protected]>
avoid to send to user spaces the protocols that we are going accept anyway. Co-Authored-By: Lars Ekman <[email protected]>
Accept the packets destined to the loopback interface in the output hook, because the job for IPv6 and admin network policies fails, still not clear why, however, there seems to be no reason to process a packet in OUTPUT to this interface. Co-Authored-By: Lars Ekman <[email protected]>
/hold cancel @uablrek I've added you as co author and incorporated your commits with some amendments |
tx.Add(&knftables.Rule{ | ||
Chain: chainName, | ||
Rule: knftables.Concat( | ||
"meta", "l4proto", "!=", "{ tcp, udp, sctp }", "accept"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @danwinship
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(There's no real reason to use knftables.Concat
here. You can just write it as a single string.)
So, thinking as an user, I think the best behavior will be that if I don't define any protocol in a network policy I apply the existing policy rules ... we can do this just not returning an error during packet processing , it will return an empty string for the protocol field and zero for src and destination port, and it will fail to match if there is a rule specifying a protocol |
So, it seems there are two AI:
|
is there any part of this or #47 that can be merged as-is? |
I think accepting too much (all unknown) may be a security issue in the IPv6 extension header case. For instance if we miss some combination of EH's that are valid TCP, we may accept TCP traffic that should be blocked. Better be restrictive, and for instance reject packets with EH's in a non-recommended order. |
Scratch #49 (comment). Please merge this PR as-is an close #47. I'll rebase #51 if needed, then that PR should be next in line IMO. I have prepared a commit on top of #51 to handle EH's out-of-order, and routing-EH (#49 (comment)) |
OK, so thinking about this some more, if a pod is unprivileged and does not have So, I'd say, we should keep the So that solves the "unrecognized protocol" problem and leaves:
Right? |
ICMP packets can be evaluated by network policies if no Port fields are set #51 (comment) I'm going to close this to avoid more noise, once we have the parser done, I want to debug the OUTPUT chain problem separetly |
Unknown protocols are not subject to Kubernetes policies, hence they must be allowed
If the receiving pod is local, then ipvs sends packet via the output hook (not forward). So both are needed
Fixes #46
Co-Authored-By: Lars Ekman [email protected] @uablrek