Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve API VIP addresses handling #513

Closed
wants to merge 3 commits into from
Closed

Improve API VIP addresses handling #513

wants to merge 3 commits into from

Conversation

mchiappero
Copy link

When configuring Kubernetes with multiple nodes, EIB requires an IP address to be used as a target for load balancing purposes. However, it assumes this address to be in an IPv4 format, resulting in a broken mangling when IPv6 is used instead.

Correct this problem by letting netip package do the handling of IP addresses, hiding version specific issues. This also allows for easier and additional input validation with minimal effort.

@mchiappero
Copy link
Author

I have added a couple of additional sanity checks, however I don't think it is worth to create additional unit tests for this specific case, but in case let me know and I will add them. It might make sense to validate the function with an IPv6 too though, let me know if you would like it.

@mchiappero mchiappero linked an issue Jul 31, 2024 that may be closed by this pull request
Copy link
Contributor

@atanasdinov atanasdinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing. Unfortunately, we have to move this logic outside of the modified function.

Please ensure that we also update the release notes, as well as the docs referring to this field.

pkg/kubernetes/cluster.go Show resolved Hide resolved
},
},
`with nodes - no network config`: {
`invalid multi-node - no network config`: {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved this test from Node validation to Network validation as I also have removed the node test from there too. I think it makes more sense here, but let me know what you think.

@@ -202,7 +203,9 @@ func setClusterAPIAddress(config map[string]any, apiAddress string, port uint16)
return
}

config[serverKey] = fmt.Sprintf("https://%s:%d", apiAddress, port)
ip, _ := netip.ParseAddr(apiAddress)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to store the Addr directly into APIVIP and skip this additional step already done in validation. But I don't know how easy it is with gopkg.in/yaml.v3 to do so and if it's worth it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do the parsing in NewCluster() and return the error if it fails. Then we can pass it from there.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call cannot fail, the value has been validated already, isn't it? By parsing in NewCluster(), do you mean at the unmarshalling stage? Otherwise I can't see how it would be better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed more of a sanity check. While the value has been validated earlier, it has been done in a very different part of the code. There's no guarantee that future changes will not invalidate the flow we're establishing now.

If we ignore the error, then we risk "successfully" building an image that will fail during boot. And we want to limit the chances of such scenarios as much as possible.

Doing something as simple as the following in NewCluster() would be more than enough to prevent unexpected outcomes:

	virtualIP, err := netip.ParseAddr(kubernetes.Network.APIVIP)
	if err != nil {
		return nil, fmt.Errorf("parsing kubernetes vip address: %w", err)
	}

Copy link
Author

@mchiappero mchiappero Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed more of a sanity check. While the value has been validated earlier, it has been done in a very different part of the code.

I'm new to the codebase and I might very well miss something... but I'm quite puzzled by this fact: we have an input validation we cannot rely on? What would the different code paths be that can lead to a non validated input here? And/or would it be feasible to just store Addr in the Network struct directly?

There's no guarantee that future changes will not invalidate the flow we're establishing now.

I'm afraid I don't understand what you mean here and how this is related to the above.

Doing something as simple as the following in NewCluster() would be more than enough to prevent unexpected outcomes:

But then you end up with:

func setMultiNodeConfigDefaults(kubernetes *image.Kubernetes, **address netip.Addr**, config map[string]any)

despite having *image.Kubernetes, right? If so, it's a bit of a code smell.

By the way, I would like to check with you if the VIP is actually optional or rather forbidden in a single host scenario. I'm not sure I fully understand the code, but shouldn't setClusterAPIAddress() also be invoked when the value is set in a single host scenario?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I'm quite puzzled by this fact: we have an input validation we cannot rely on?

We, as users, should of course rely on the validation logic. However, the code itself shouldn't. Each statement should be responsible for achieving its intended result.

We're not talking about duplicating a validation here. We're just ensuring that the specific package and function can behave as expected regardless of its input. If we're unable to parse a string into an IP address, then the received value is invalid, right?

What would the different coda paths be that can lead to a non validated input here?

We don't know the future. We can just ensure that the code will not produce an erroneous image build regardless of how input validation and modification changes.

And/or would it be feasible to just store Addr in the Network struct directly?

We can't. At least not in a pretty way. I wouldn't want to duplicate a struct only for this use case, and modifying it in place is also not the way to go here. We're only parsing YAML into it. If we have specific and more sophisticated transformations in the future, then a separate struct would make sense.

But then you end up with ... despite having *image.Kubernetes, right? If so, it's a bit of a code smell.

One could argue that ignoring an error that could possibly result into an image failing to bootstrap a cluster is a code smell itself. I don't see anything inherently wrong with passing additional argument to a function, though.

By the way, I would like to check with you if the VIP is actually optional or rather forbidden in a single host scenario. I'm not sure I fully understand the code, but shouldn't setClusterAPIAddress() also be invoked when the value is set in a single host scenario?

It is not allowed to store the value as a server flag, however, the purpose of this field is not only limited to this.

It enables the deployment of MetalLB and configuring a virtual IP address which can then later be used from within and outside the cluster. We ensure that it is recognised by the TLS cert. Perhaps these cases may be very rare but we haven't considered this a forbidden scenario. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the last note, we also disable ServiceLB and use MetalLB in its place for k3s when a VIP is provided. For reference: https://docs.k3s.io/networking/networking-services#disabling-servicelb

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We, as users, should of course rely on the validation logic. However, the code itself shouldn't. Each statement should be responsible for achieving its intended result.

We're not talking about duplicating a validation here. We're just ensuring that the specific package and function can behave as expected regardless of its input. If we're unable to parse a string into an IP address, then the received value is invalid, right?

We don't know the future. We can just ensure that the code will not produce an erroneous image build regardless of how input validation and modification changes.

We can't. At least not in a pretty way. I wouldn't want to duplicate a struct only for this use case, and modifying it in place is also not the way to go here. We're only parsing YAML into it. If we have specific and more sophisticated transformations in the future, then a separate struct would make sense.

Ok, let's step back a bit and go more directly to the point I have been trying to understand: what is the design? What are the various aspects and implications of the design? I'd like to understand what the structure looks like to review my assumptions, have an opinion and understand yours.

You basically say "it's a good practice to write functions that validate their input"; yes, pretty much everybody knows that; but still, it depends, we can make different decisions if it makes sense to do so. Say that the in memory representation is immutable or effectively immutable, or that nothing else is supposed to touch it and such a thing is clear and well documented, and the many possible different cases in between that could exist.
For example, I'd avoid changing the transparent Network struct too and trying to do it was the last thing on my mind despite the question, but it also sucks have to pass around a string or not being able to use the Addr type following the validation step, which suggest the design could be better refined. So what do we do? :)

But then you end up with ... despite having *image.Kubernetes, right? If so, it's a bit of a code smell.

One could argue that ignoring an error that could possibly result into an image failing to bootstrap a cluster is a code smell itself. I don't see anything inherently wrong with passing additional argument to a function, though.

Sure, but we could argue many other things too, I could argue that the Network struct (or the whole definition) should have encapsulation or that it shouldn't contain a string before making this change; I could argue that setMultiNodeConfigDefaults should only receive Version and Network according to the least privilege principle; I could argue that if we have proper validation and unit tests chances are incredibly slim if any... The review is a space to consider and weight them here. I'm not against your change, but are we really sure it's the best decision? Are you sure about it?

Personal opinion, passing an additional parameter to share something that is already available through another parameter is indicative of something off with either the function API design or the code design. Again, what do we do? We don't have time to address the rest, ok, what is a reasonable compromise?

By the way, I would like to check with you if the VIP is actually optional or rather forbidden in a single host scenario. I'm not sure I fully understand the code, but shouldn't setClusterAPIAddress() also be invoked when the value is set in a single host scenario?

It is not allowed to store the value as a server flag, however, the purpose of this field is not only limited to this.

It enables the deployment of MetalLB and configuring a virtual IP address which can then later be used from within and outside the cluster. We ensure that it is recognised by the TLS cert. Perhaps these cases may be very rare but we haven't considered this a forbidden scenario. What do you think?

I have no opinion, but I'd say that if it is specified it should be honored even for a single node clusters (which could grow?). That said, the main reason for asking is this test:
https://github.com/suse-edge/edge-image-builder/blob/ipv6/pkg/image/validation/kubernetes.go#L55
Does it actually make sense? If not I will change it.

Thank you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we're going too far with this discussion. We shouldn't be blocking a PR review over whether an argument should be passed to a function or not.

I basically want to ensure that a failure is caught. As for reasonable compromise, I guess just throw a panic("invalid kubernetes vip address") in the case an error is raised. It would never be thrown in 99.99% of the cases and even if it is, we're able to recover from it gracefully.

As far as design decisions go - both me, you and everyone else understands that the bigger the codebase, the easier it is to miss guidelines which are not enforced by a linter or other static analysing tool. Just documenting something doesn't mean that it will be followed. Not to mention we'd have to keep more documentation up to date and people would very rarely (if ever) revisit it.

As far as coding principles go - that's a very subjective take. Yes, none of these functions utilise the Helm or Manifests subfields of the Kubernetes struct but we're still only passing a single pointer, as opposed to a version string and a pointer to a Network struct. I'm fine with both and don't see major issues with either.

Just note that I'm not claiming that the code is perfect. No code is. We're not trying to follow the "best" coding practices that are out there - we're trying to create and maintain a stable product. And dare I say some of the commonly accepted good practices should not be followed all the time. :)

On the single node scenario - yes, let's keep it. This is intended and I missed the k3s LoadBalancer justification which I briefly mentioned above.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I have never meant neither to withdraw the PR nor to avoid applying the changes if needed. I wanted to make an attempt to bring the review to the topics that matter to me, to complete the change and be happy to sign-off, which had/have not been addressed.

I am going to push the changes and address the rest of comments later today.

Copy link
Contributor

@atanasdinov atanasdinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this. Happy to approve after we resolve these comments.

RELEASE_NOTES.md Show resolved Hide resolved
@@ -202,7 +203,9 @@ func setClusterAPIAddress(config map[string]any, apiAddress string, port uint16)
return
}

config[serverKey] = fmt.Sprintf("https://%s:%d", apiAddress, port)
ip, _ := netip.ParseAddr(apiAddress)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do the parsing in NewCluster() and return the error if it fails. Then we can pass it from there.

pkg/image/validation/kubernetes.go Show resolved Hide resolved
pkg/image/validation/kubernetes.go Show resolved Hide resolved
@@ -140,33 +140,38 @@ func TestIsKubernetesDefined(t *testing.T) {
assert.False(t, result)
}

func TestValidateNodes(t *testing.T) {
func TestValidateNetwork(t *testing.T) {
var validV4Network = validNetwork
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Why do we copy the value instead of just using it?

Copy link
Author

@mchiappero mchiappero Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's actually a leftover, as I initially had v4 & v6 variables. I have removed it, I'd change validNetwork to validV4Network though.

pkg/image/validation/kubernetes.go Outdated Show resolved Hide resolved
Prevent invalid values (e.g. too big or negative) to be passed in as
input by using a more specific and constraining type for network ports.

Signed-off-by: Marco Chiappero <[email protected]>
Almost all sections within the image definition go through sanity checks
before actually creating a new image. However, EIB misses a dedicated
section to test the Network part of the Kubernetes configuration.

Add a validateNetwork function to in order to make sure that the specified
APIVIP value is well-formed and valid, regardless of wheather it is in
IPv4 or IPv6 format.

Signed-off-by: Marco Chiappero <[email protected]>
When configuring Kubernetes with multiple nodes, EIB requires an IP address
to be used as a target for load balancing purposes. However, it assumes this
address to be in an IPv4 format, resulting in a broken mangling when IPv6 is
used instead.

Correct this problem by letting netip package do the handling of IP addresses,
hiding version specific issues. This also allows for easier and additional
input validation with minimal effort.

Signed-off-by: Marco Chiappero <[email protected]>
@mchiappero mchiappero changed the title Leverage the netip package when dealing with IP addresses Improve API VIP addresses handling Aug 8, 2024
Copy link
Contributor

@atanasdinov atanasdinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Happy to approve once these minor comments are resolved.

})

zap.S().Error(msg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for misleading you here - it's been a while since I took a look at this portion. I only envisioned we log the error value itself in order not to lose it, but in order to do that we can just populate the Error field of the FailedValidation and can get rid of this log here.

UserMessage: msg,
})

zap.S().Error(msg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we don't have an actual error value but this message is going to be logged both to the console, as well as the log file so this statement is not necessary.

config[serverKey] = fmt.Sprintf("https://%s:%d", apiAddress, port)
ip, err := netip.ParseAddr(apiAddress)
if err != nil {
panic("Invalid Kubernetes VIP address")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a unit test covering this scenario?

@atanasdinov
Copy link
Contributor

This change will be introduced at a later point.

@mchiappero mchiappero deleted the ipv6 branch August 27, 2024 08:26
@hardys hardys mentioned this pull request Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EIB assumes apiVIP to be IPv4 only when creating the target URL
2 participants