-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Azure VM MSI support #584
Azure VM MSI support #584
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
@jepio I tested this with a VM set up with user managed identity with In my test I used a pre-defined resource group, availability set, and virtual network. Are there recommended settings for user managed identities we should document? |
I'm making progress with testing, might have a suitable config soon to document the Azure environment this feature can be used in. Meanwhile I ran into this; might be caused by the Azure bindings update?
I was running with |
Rebased this branch to latest flatcar-master to include #585. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this can be merged. Using the machine identity works, and I think the earlier issues I encountered when testing this are related to Azure instead of this PR.
LGTM, thank you Jeremi!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to revert my approval, a new run of this PR again triggered a stack trace:
panic: unimplemented case in CopyKeys (conf.Empty not supported on purpose
goroutine 10631 [running]:
github.com/flatcar/mantle/platform/conf.(*Conf).CopyKeys(0x7faf5ac3a9a8?, {0xc00008def8?, 0x47207d?, 0x0?})
/usr/src/mantle/platform/conf/conf.go:1393 +0x107
github.com/flatcar/mantle/platform/conf.(*UserData).Render(0xc000943c80, {0x2efa182, 0x5})
/usr/src/mantle/platform/conf/conf.go:402 +0x39a
github.com/flatcar/mantle/platform.(*BaseCluster).RenderUserData(0xc0001c7f00, 0xc0003f9980?, 0xc000a82f20?)
/usr/src/mantle/platform/cluster.go:169 +0x213
github.com/flatcar/mantle/platform/machine/azure.(*cluster).NewMachine(0xc000749500, 0xc0003f9980)
/usr/src/mantle/platform/machine/azure/cluster.go:45 +0xf7
github.com/flatcar/mantle/platform.NewMachines.func1()
/usr/src/mantle/platform/platform.go:308 +0x72
created by github.com/flatcar/mantle/platform.NewMachines in goroutine 337
/usr/src/mantle/platform/platform.go:306 +0xb5
What arguments do you pass to trigger that? |
and can you confirm that your ignition config is v3.4? |
I'm running the full set of The
|
Ok @t-lo, try again. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Overview
This PR adds support for assigning a managed identity to an Azure VM and switches the boot diagnostics to use managed storage, along with deduplication for SSH keys in ignition v3. Key changes include:
- Integrating support for managed identities by adding a new flag and API lookup for a user-assigned identity.
- Adjusting VM creation parameters to include the managed identity.
- Updating boot diagnostics and SSH key deduplication logic.
Reviewed Changes
File | Description |
---|---|
platform/api/azure/api.go | Added msiClient setup and a new FindManagedIdentityID method. |
platform/machine/azure/flight.go | Integrated managed identity lookup into flight initialization. |
cmd/kola/options.go | Introduced new CLI flag for VM identity and SSH key deduplication. |
util/retry.go | Moved the sleep call in the retry loop to alter retry timings. |
platform/api/azure/instance.go | Updated getVMParameters, CreateInstance, and GetConsoleOutput to pass and use the managed identity and simplify console log fetching. |
platform/cluster.go | Updated SSH key handling in RenderUserData to account for new configuration. |
platform/machine/azure/cluster.go | Updated cluster struct and instance creation to include managed identity. |
platform/machine/azure/machine.go | Adjusted GetConsoleOutput usage based on updated API. |
Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.
Thanks for the quick turnaround! I'll run a new bunch of tests overnight and will check back tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Things look good with the last change; I don't see the ssh key error stacktrace anymore.
@tormath1 would you give this a review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good that we can remove the storage account dependency. LGTM only small nits.
Unfortunately I need it on the orchestrator VM anyway, for other reasons. It would be nice to only operate on resource group scope, i.e. make kola create and delete NIC, storage account, and public IP resources in the existing RG. But I guess that's a larger change. I'm fine with the general state of things for now, this is super useful! |
For access to Azure resources. Signed-off-by: Jeremi Piotrowski <[email protected]>
Signed-off-by: Jeremi Piotrowski <[email protected]>
Signed-off-by: Jeremi Piotrowski <[email protected]>
This does not require that the user have RBAC permissions to a storage account to fetch, because it uses SAS keys behind the scenes. The previous approach used a kola created storage account has Shared Key Access disabled for security reasons. Signed-off-by: Jeremi Piotrowski <[email protected]>
We switched to managed boot diagnostics, so we no longer require a separate storage account for that purpose. We still need a storage account when uploading images. Signed-off-by: Jeremi Piotrowski <[email protected]>
Ignition v3 does not tolerate duplicate ssh keys, which we will often get if we fetch them from ssh-agent and from the filesystem. Implement key deduplication in GetSSHKeys(), which is called when `-t` is passed. Signed-off-by: Jeremi Piotrowski <[email protected]>
This way if the condition is true in the first iteration we exit immediately. Signed-off-by: Jeremi Piotrowski <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Overview
This PR adds support for assigning a managed identity to Azure VMs along with related improvements in instance creation and boot diagnostics. Key changes include:
- Introducing a new VM identity option and updating API clients to use managed identity lookups.
- Propagating the managed identity through the flight, cluster, and instance creation workflows.
- Adjusting SSH key deduplication logic and revising the boot diagnostics retrieval via HTTP.
Reviewed Changes
File | Description |
---|---|
platform/api/azure/api.go | Added managed identity client setup and a lookup function to find identity by name. |
cmd/kola/options.go | Introduced new CLI option for VM Identity and updated SSH key deduplication logic. |
platform/machine/azure/flight.go | Updated flight setup to resolve and pass the managed identity to the instance. |
util/retry.go | Revised the placement of the sleep call in WaitUntilReady. |
platform/api/azure/instance.go | Updated VM parameter and instance creation functions to handle the managed identity. |
platform/cluster.go | Modified user-data SSH key addition to respect the NoSSHKeyInUserData flag. |
platform/machine/azure/cluster.go | Added a ManagedIdentityID field to the cluster struct and passed it during instance creation. |
platform/machine/azure/machine.go | Updated the GetConsoleOutput invocation to remove the storage account parameter. |
Copilot reviewed 29 out of 29 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (2)
cmd/kola/options.go:126
- There is a typo in the option description: "availibity" should be corrected to "availability".
sv(&kola.AzureOptions.AvailabilitySet, "azure-availability-set", "", "Deploy resources in an existing availibity set")
util/retry.go:69
- The repositioning of the time.Sleep call within the WaitUntilReady loop changes the timing behavior (sleeping after checkFunction instead of before). Verify that this revised delay logic meets the intended retry semantics.
time.Sleep(delay)
Thanks @tormath1 :) |
Support for assigning a managed identity to the VM. The MSI is looked up by name. A scenario this unlocks is for example running kola tests from within a vm spawned with kola spawn. Or interacting with a storage account.
Switch to managed boot diagnostics (does not require creating a storage acount).
Fix
--keys
support: keys need to be deduplicated in ignition v3 and some tests don't tolerate ssh keys in userdata (which we must follow).