Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YUNIKORN-2991] The queue in Draining state does not accept new applications #1002

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

rhh777
Copy link
Contributor

@rhh777 rhh777 commented Dec 6, 2024

What is this PR for?

When the queue is in the Drafting state, new applications are still allowed to be scheduled after submission.
I think we should refuse.

What type of PR is it?

  • - Improvement

What is the Jira issue?

https://issues.apache.org/jira/browse/YUNIKORN-2991

How should this be tested?

Unit tests included

Copy link

codecov bot commented Dec 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.35%. Comparing base (f7d0e10) to head (f3d2d18).
Report is 6 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1002      +/-   ##
==========================================
+ Coverage   81.57%   82.35%   +0.78%     
==========================================
  Files          97       97              
  Lines       15656    15635      -21     
==========================================
+ Hits        12771    12876     +105     
+ Misses       2601     2479     -122     
+ Partials      284      280       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rhh777 rhh777 closed this Dec 9, 2024
@rhh777 rhh777 reopened this Dec 9, 2024
@wilfred-s
Copy link
Contributor

See my comment in the jira, can we investigate doing the check in the AppPlacementManager?

@rhh777
Copy link
Contributor Author

rhh777 commented Dec 25, 2024

I have made the suggested modifications, and now the rejected information in the pod event information is not very clear. Should we display more rejection information?

Before modification
image
After modification
image

See my comment in the jira, can we investigate doing the check in the AppPlacementManager?

pkg/scheduler/placement/placement.go Outdated Show resolved Hide resolved
pkg/scheduler/placement/placement_test.go Outdated Show resolved Hide resolved
@wilfred-s
Copy link
Contributor

I have made the suggested modifications, and now the rejected information in the pod event information is not very clear. Should we display more rejection information?

I think that is a problem on the shim side. When I look at the processing of the rejection there it does this:

func (app *Application) handleRejectApplicationEvent(reason string) {
	log.Log(log.ShimCacheApplication).Info("app is rejected by scheduler", zap.String("appID", app.applicationID))
	// for rejected apps, we directly move them to failed state
	dispatcher.Dispatch(NewFailApplicationEvent(app.applicationID,
		fmt.Sprintf("%s: %s", constants.ApplicationRejectedFailure, reason)))
}

It skips the rejected state and goes to failed. That is something we can follow up separately. I think the code change beside the two nits on the core side is looking good now.

@rhh777
Copy link
Contributor Author

rhh777 commented Dec 31, 2024

I have made the suggested modifications, and now the rejected information in the pod event information is not very clear. Should we display more rejection information?

I think that is a problem on the shim side. When I look at the processing of the rejection there it does this:

func (app *Application) handleRejectApplicationEvent(reason string) {
	log.Log(log.ShimCacheApplication).Info("app is rejected by scheduler", zap.String("appID", app.applicationID))
	// for rejected apps, we directly move them to failed state
	dispatcher.Dispatch(NewFailApplicationEvent(app.applicationID,
		fmt.Sprintf("%s: %s", constants.ApplicationRejectedFailure, reason)))
}

It skips the rejected state and goes to failed. That is something we can follow up separately. I think the code change beside the two nits on the core side is looking good now.

Okay, we can optimize it in shim when we have time.

@rhh777 rhh777 closed this Dec 31, 2024
@rhh777 rhh777 reopened this Dec 31, 2024
@rhh777 rhh777 requested a review from wilfred-s December 31, 2024 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants