-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create DiscoverEC2 User Tasks when Auto Discover fails on EC2 instances #47064
Create DiscoverEC2 User Tasks when Auto Discover fails on EC2 instances #47064
Conversation
4e32211
to
873c321
Compare
ccff419
to
7567361
Compare
873c321
to
5e20588
Compare
7567361
to
0c450ad
Compare
5e20588
to
497f0aa
Compare
17cf648
to
c10e937
Compare
(i'm working on the flaky test but this should be reviewable already 👍 ) |
497f0aa
to
649c8be
Compare
bdb1723
to
3eb2db8
Compare
649c8be
to
43cbfc1
Compare
3eb2db8
to
a5b14d9
Compare
c1e1f45
to
6c641a2
Compare
@fheinecke @timothyb89 Can you please take a look when you get a chance? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some questions and suggestions.
lib/srv/discovery/status.go
Outdated
for existingIntanceID, existingInstance := range currentUserTask.Spec.DiscoverEc2.Instances { | ||
// Each DiscoveryService works on all the DiscoveryConfigs assigned to a given DiscoveryGroup. | ||
// So, it's safe to say that current DiscoveryService has the last state for a given DiscoveryGroup. | ||
// If other instances exist for this DiscoveryGroup, they can be discarded because, as said before, the current DiscoveryService has the last state for a given DiscoveryGroup. | ||
if existingInstance.DiscoveryGroup == s.DiscoveryGroup { | ||
continue | ||
} | ||
|
||
// For existing instances whose sync time is too far in the past, just drop them. | ||
// This ensures that if an instance is removed from AWS, it will eventually disappear from the User Tasks' instance list. | ||
// It might also be the case that the DiscoveryConfig was changed and the instance is no longer matched (because of labels/regions or other matchers). | ||
instanceIssueExpiration := s.clock.Now().Add(-2 * s.PollInterval) | ||
if existingInstance.SyncTime.AsTime().Before(instanceIssueExpiration) { | ||
continue | ||
} | ||
|
||
// Merge existing backend state into in-memory object. | ||
failedInstances[existingIntanceID] = existingInstance | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having a bit of a hard time parsing this logic. Are we trying to merge instances from this discovery service into existing tasks object? Can it be simplified or refactored somehow to make this more clear? Right now it's a big method that does several things so it's a bit hard to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct.
It brings the existing task from the cluster's backend and merges it into the existing in-memory state.
And then updates it.
I've moved the merge part into its own method.
Let me know what you think.
lib/srv/discovery/status.go
Outdated
// groupPending is used to register which groups were changed in memory but were not yet sent upstream. | ||
// When upserting User Tasks, if the group is not present in groupPending, | ||
// then the cluster already has the latest version of this particular group. | ||
groupPending map[awsEC2FailedEnrollmentGroup]struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: groupPending is a pretty confusing name tbh. What is the "group" and why is it "pending"? Can we pick a more descriptive name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have 4 fields that we use to group instances into a task: integration, issue type, region and account id.
So, the group is the unique key for this resource.
What do you think about groupsToSync
or taskKeysToSync
?
Changing to the latter, but please let me know if that's clearer?
d2d1849
to
bdc10e0
Compare
@@ -293,5 +300,232 @@ func (s *Server) ReportEC2SSMInstallationResult(ctx context.Context, result *ser | |||
|
|||
s.updateDiscoveryConfigStatus(result.DiscoveryConfig) | |||
|
|||
s.awsEC2Tasks.addFailedEnrollment( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between this invocation and the one in discovery.go above? Just trying to understand what errors will be captured by this function vs the other one cause they seem to be both processing results of SSM runs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have 3 places where we generate tasks:
A) we describe the Instance's SSM Agent state and if there's an error (eg, ssm not installed, ssm not online, os not supported)
B) sending the command (ssm:SendCommand
) fails (wrong permissions, document params issues, ...)
C) after starting the execution, we check the execution result for each instance
A and C are reported using the ReportEC2SSMInstallationResult
B is reported in discovery.go
teleport/lib/srv/discovery/discovery.go
Line 963 in aeefe42
if err := s.ec2Installer.Run(s.ctx, req); err != nil { |
7868348
to
1ee8889
Compare
This PR changes the DiscoveryService to start creating and updating Discover EC2 User Tasks. So, what are Discover EC2 User Tasks? When users set up Auto Discover for EC2 Instances, they don't have a good way of checking for issues on their configured matchers. We created User Tasks as a way to warn Users that something's wrong. Each User Task should describe an issue that happened and a way to fix it. This has potential to be used to report unexpected events trough the whole system, which are not errors per se, but something the user should take action in order to improve the situation. In this case, we are creating a sub type of those tasks: DiscoverEC2. From now on, when the DiscoveryService fails to auto-enroll an instance, it will create a DiscoverEC2 User Task grouping all the failed instances by the following props: - integration - issue type - account id - region A follow up PR will also create notifications so that the user can actually be notified on those User Tasks and take action.
1ee8889
to
c2bae80
Compare
@marcoandredinis See the table below for backport results.
|
…es (#47064) * Create DiscoverEC2 User Tasks for failed EC2 enrollments This PR changes the DiscoveryService to start creating and updating Discover EC2 User Tasks. So, what are Discover EC2 User Tasks? When users set up Auto Discover for EC2 Instances, they don't have a good way of checking for issues on their configured matchers. We created User Tasks as a way to warn Users that something's wrong. Each User Task should describe an issue that happened and a way to fix it. This has potential to be used to report unexpected events trough the whole system, which are not errors per se, but something the user should take action in order to improve the situation. In this case, we are creating a sub type of those tasks: DiscoverEC2. From now on, when the DiscoveryService fails to auto-enroll an instance, it will create a DiscoverEC2 User Task grouping all the failed instances by the following props: - integration - issue type - account id - region A follow up PR will also create notifications so that the user can actually be notified on those User Tasks and take action. * improve naming and split function into multiple methods * rename task issues variable name * fix context early cancelation
This PR changes the DiscoveryService to start creating and updating
Discover EC2 User Tasks.
So, what are Discover EC2 User Tasks?
When users set up Auto Discover for EC2 Instances, they don't have a
good way of checking for issues on their configured matchers.
We created User Tasks as a way to warn Users that something's wrong.
Each User Task should describe an issue that happened and a way to fix
it.
This has potential to be used to report unexpected events trough the
whole system, which are not errors per se, but something the user should
take action in order to improve the situation.
In this case, we are creating a sub type of those tasks: DiscoverEC2.
From now on, when the DiscoveryService fails to auto-enroll an instance,
it will create a DiscoverEC2 User Task grouping all the failed instances
by the following props:
A follow up PR will also create notifications so that the user can
actually be notified on those User Tasks and take action.
Demo: