Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YUNIKORN-2837] Log & Send Events, Improve logging #957

Closed
wants to merge 1 commit into from

Conversation

manirajv06
Copy link
Contributor

What is this PR for?

Send events in appropriate places, enhanced logging etc to help understand the preemption flow while debugging preemption related issues.

What type of PR is it?

  • - Improvement

Todos

  • - Task

What is the Jira issue?

https://issues.apache.org/jira/browse/YUNIKORN-2837

How should this be tested?

Screenshots (if appropriate)

Questions:

  • - The licenses files need update.
  • - There is breaking changes for older versions.
  • - It needs documentation.

@manirajv06 manirajv06 self-assigned this Aug 28, 2024
Copy link

codecov bot commented Aug 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.85%. Comparing base (6f06490) to head (1365158).
Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #957   +/-   ##
=======================================
  Coverage   80.84%   80.85%           
=======================================
  Files          97       97           
  Lines       12512    12517    +5     
=======================================
+ Hits        10115    10120    +5     
  Misses       2126     2126           
  Partials      271      271           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@craigcondit craigcondit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm -1 on this entire approach (see comments on JIRA for a detailed explanation).

Copy link
Contributor

@pbacsko pbacsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also -1, we need to be careful about what we send & how often.

Comment on lines 571 to 605
func (a *Allocation) setPreemptionPreConditionsCheckPassed() {
a.Lock()
defer a.Unlock()
if a.preemptionPreConditionsCheckFailed {
a.preemptionPreConditionsCheckFailed = false
a.askEvents.SendPreemptionPreConditionsCheckPassed(a.allocationKey, a.applicationID, a.allocatedResource)
}
}

func (a *Allocation) setPreemptionPreConditionsCheckFailed() {
a.Lock()
defer a.Unlock()
if !a.preemptionPreConditionsCheckFailed {
a.preemptionPreConditionsCheckFailed = true
a.askEvents.SendPreemptionPreConditionsCheckFailed(a.allocationKey, a.applicationID, a.allocatedResource)
}
}

func (a *Allocation) setPreemptionQueueGuaranteesCheckPassed() {
a.Lock()
defer a.Unlock()
if a.preemptionQueueGuaranteesCheckFailed {
a.preemptionQueueGuaranteesCheckFailed = false
a.askEvents.SendPreemptionQueueGuaranteesCheckPassed(a.allocationKey, a.applicationID, a.allocatedResource)
}
}

func (a *Allocation) setPreemptionQueueGuaranteesCheckFailed() {
a.Lock()
defer a.Unlock()
if !a.preemptionQueueGuaranteesCheckFailed {
a.preemptionQueueGuaranteesCheckFailed = true
a.askEvents.SendPreemptionQueueGuaranteesCheckFailed(a.allocationKey, a.applicationID, a.allocatedResource)
}
}
Copy link
Contributor

@pbacsko pbacsko Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all REQUEST events which means they will be sent to the respective pods.

I think the only message which might be useful is an indication that a particular pod successfully triggered preemption. Or the other way around: we send a message to the terminated pod that it's been preempted to make room for pod X.

Everything else (guarantees passed/failed, precondition passed/failed) is just too much noise which should not be sent to K8s as a pod event.

The headroom/quota events are different because they're related to YK configuration. Hitting the quota is a useful feedback for the user, because they'll know why a particular pod cannot be scheduled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should not send events except in these cases:

Request pod: "Preempting pod {namespace}/{pod} to make room" (there could be multiple of these if we preempt multiple pods)

Victim pods: "Preempting pod to make room for {namespace}/{pod}" (only one of these per victim)

This allows users to see that their pod triggered preemption, or that their pod was preempted and by whom.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now, there is only one log line "Reserving node for ask after preemption" being logged at the end of the preemption process. Other than this, we don't have any info.

We need below info to debug the preemption related issues:

  1. How do we know whether preemption has been even attempted or not?
  2. If attempted, Is there any problem in passing through important steps?
  3. If attempted and successful case, how do we know this case? Answer: Above log line helps

I went through all these questions while trying to find the reason for test failures logged in https://issues.apache.org/jira/browse/YUNIKORN-2808.

I think the only message which might be useful is an indication that a particular pod successfully triggered preemption.

Yes. It definitely helps. We need something like this. I had also similar thought, but doing this way would always sends an event. Hence, I followed the other usual model of 1. Sending only one event and that too only when failed. Does not repeat. 2. Sending only one event when succeeded (Only If 1st event happens). Does not repeat.

Request pod: "Preempting pod {namespace}/{pod} to make room" (there could be multiple of these if we preempt multiple pods)

Victim pods: "Preempting pod to make room for {namespace}/{pod}" (only one of these per victim)

Yes, it helps but only in successful case and that too at the end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must not log for every attempt as this happens far too frequently. It will overwhelm the system. Currently in the scheduler we only log when action is taken or the state of the system changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pod events definitely should only be sent once, and only when action is taken.

Copy link
Contributor

@pbacsko pbacsko Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an acceptable approach is recording stuff like the allocation log for each request. That looks reasonable to me. We can have a type like AllocationLog and we define fixed causes about why the preemption failed:

  • "Preconditions checks failed" (preemptor.CheckPreconditions() false)
  • "No guarantee to free up resources" (p.checkPreemptionQueueGuarantees() false)
  • "Preemption shortfall" (p.ask.GetAllocatedResource().StrictlyGreaterThanOnlyExisting() false)

Every time something fails, you increase a counter for it. You also count the number of total attempts.

So I'm thinking sth like:

// Mutable fields which need protection
allocated           bool
allocLog            map[string]*AllocationLogEntry
preemptionLog       map[string]*AllocationLogEntry // new field
preemptionAttempts  int64                          // new field
preemptionTriggered bool
preemptCheckTime    time.Time
...

var (
  ErrPreemptionPrecondFailed = errors.New("Preconditions checks failed")
  ErrPreemptionNoGuarantee = errors.New("No guarantee to free up resources")
  ErrPreemptionShortfall = errors.New("Preemption shortfall")
)

// OR perhaps better w/ typed errors

type PreemptionError int
const ErrPreemptionPrecondFailed PreemptionError  = iota
const ErrPreemptionNoGuarantee 
const ErrPreemptionShortfall 

Then in application.go:

				if fullIterator != nil {
                                        request.IncPreemptionAttempts()
					if result, err := sa.tryPreemption(headRoom, preemptionDelay, request, fullIterator, false); err != nil {
						// preemption occurred, and possibly reservation
						return result
					}
					request.LogPreemptionFailure(err) 
				}
			}

Obviously you need to make sure that problems propagate back so you can log it. Then this whole thing can be exposed on the REST API just like the allocation log.

It's obviously not as detailed as a separate log entry printed to the stdout, but I see this as a good compromise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just reuse AllocationLog? It already contains events pertaining to things like predicate failures; preemption attempts can be tracked there easily as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's fine too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have used allocationLog earlier too in few places in addition with events & logging. Now using only allocationLog, removed events and logs. Still seeing a gap to add debug logs in few places especially after guarantee checks passes. Anyways, we can address this later depends on the need.

Copy link
Contributor

@craigcondit craigcondit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1. Re-requesting a review when nothing has changed isn't helpful.

Copy link
Contributor

@pbacsko pbacsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just minor things

pkg/common/errors.go Outdated Show resolved Hide resolved
pkg/scheduler/objects/application.go Outdated Show resolved Hide resolved
pkg/scheduler/objects/preemption_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@pbacsko pbacsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I found only a nit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants