Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Cell as SKU in intra-vc scheduler #34

Open
wants to merge 26 commits into
base: master
Choose a base branch
from
Open

Conversation

abuccts
Copy link
Member

@abuccts abuccts commented Jan 28, 2021

New cell as SKU feature in intra-vc scheduler, work items:

  • new interface for pod group and scheduling result placement
  • pod request parsing and conversion
  • affinity group / pod group adjustments before intra-vc scheduler
  • new scheduling algorithm in intra-vc scheduler
  • scheduling placements adjustments
  • update original test cases (leaf cell) Convert old test cases to v2 schema #37
  • spec, request examples and document, design document
  • add new test cases (within feature) Add new test cases for V2 schema #38
  • end-to-end tests and debug

Add new interface for pod group, including scheduling request, interal
status, and scheduled placement.
Add conversion, defaulting, and validation for v2 pod scheduling spec,
update ExtractPodSchedulingSpec accordingly.
@abuccts abuccts added the enhancement New feature or request label Jan 28, 2021
@abuccts abuccts requested review from hzhua and yqwang-ms January 28, 2021 11:18
@abuccts abuccts self-assigned this Jan 28, 2021
@abuccts abuccts mentioned this pull request Jan 28, 2021
9 tasks
@abuccts abuccts requested a review from hzy46 February 3, 2021 06:31
Convert AlgoAffinityGroup to PodGroupSchedulingStatus before intra VC
scheduler.
@abuccts abuccts force-pushed the xiongyf/cell-as-sku branch from db3d1ec to e56fe3b Compare March 4, 2021 02:52
abuccts and others added 6 commits March 12, 2021 16:11
Add cell type to level mapping for intra-vc.
Fix pod group placement type in scheduling status.
* adjust virtual and physical placements for scheduling results
* update pod binding info for tree structure
* expose pod group in api through webserver
* pick best affinity cells
* convert cells to leaf cells in placement
Fix legacy unit tests.
@codecov-commenter
Copy link

codecov-commenter commented Apr 20, 2021

Codecov Report

Merging #34 (5c1e5c8) into master (66f26ac) will decrease coverage by 7.78%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #34      +/-   ##
==========================================
- Coverage   88.46%   80.68%   -7.79%     
==========================================
  Files           8       10       +2     
  Lines        1890     2205     +315     
==========================================
+ Hits         1672     1779     +107     
- Misses        166      334     +168     
- Partials       52       92      +40     
Flag Coverage Δ
unittests 80.68% <ø> (-7.79%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ft/hivedscheduler/pkg/algorithm/cell_allocation.go 59.90% <0.00%> (-32.37%) ⬇️
...om/microsoft/hivedscheduler/pkg/algorithm/utils.go 76.09% <0.00%> (-9.30%) ⬇️
...ft/hivedscheduler/pkg/algorithm/hived_algorithm.go 76.59% <0.00%> (-8.65%) ⬇️
...hivedscheduler/pkg/algorithm/intra_vc_scheduler.go 100.00% <0.00%> (ø)
...cheduler/pkg/algorithm/topology_aware_scheduler.go
...microsoft/hivedscheduler/pkg/algorithm/types_v2.go 89.82% <0.00%> (ø)
...dscheduler/pkg/algorithm/hived_algorithm_tester.go 66.42% <0.00%> (ø)
...uler/pkg/algorithm/topology_guarantee_scheduler.go 92.88% <0.00%> (ø)
...m/microsoft/hivedscheduler/pkg/algorithm/config.go 96.46% <0.00%> (+0.11%) ⬆️
...om/microsoft/hivedscheduler/pkg/algorithm/types.go 90.62% <0.00%> (+5.58%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 66f26ac...5c1e5c8. Read the comment docs.

Fix bugs in new v2 test cases, including:
* calculate leaf cell numbers per pod when creating new PodGroupSchedulingStatus
* get cell chain through within cell type (no higher than node) instead of leaf cell type
* sort cells by virtual address in cluster view
@abuccts abuccts force-pushed the xiongyf/cell-as-sku branch from 9fd0560 to ae26632 Compare May 12, 2021 09:02
@abuccts abuccts marked this pull request as ready for review May 12, 2021 09:06
Fix comments.
pkg/api/v2/types.go Outdated Show resolved Hide resolved
pkg/api/v2/types.go Outdated Show resolved Hide resolved
pkg/api/v2/types.go Outdated Show resolved Hide resolved

// PodGroupMemberCellSpec represents cell spec for each pod in pod group.
type PodGroupMemberCellSpec struct {
CellType api.CellType `yaml:"cellType"` // cell type to be used in pod, differnt group can have differnt cell typs in the same chain, TODO: not support multiple chains yet
Copy link
Member

@yqwang-ms yqwang-ms Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typs

types #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Name string `yaml:"name"` // pod group name
WithinOneCell api.CellType `yaml:"withinOneCell"` // within cell for all cells in current group, e.g., two GPU-cell within one numa-cell, two node-cell within one rack-cell
Pods []PodGroupMemberSpec `yaml:"pods"` // pod list for current group, any two pods are gang with each other in the groupS
ChildGroups []*PodGroupSpec `yaml:"childGroups"` // child group in the hierarchical structure
Copy link
Member

@yqwang-ms yqwang-ms Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pods and ChildGroups are order sensitive? #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@yqwang-ms
Copy link
Member

yqwang-ms commented Sep 14, 2021

Only backtrace to parent pod group may not enough to find the whole pod root group placement if it exist in theory.
Consider the following example:
image
Only backtrace to parent, result in no solution, but in theory, a solution exist.

We can add more pod group sorting criterias, such as if it has withinOneCell requirement, but still cannot prove only backtrace to parent is enough.

So given pod group trees are generally small trees with few nodes, backtrace to previous bound pod group (i.e. parent or left sibling node) should not too time consuming, which is enough to find the whole pod root group placement if it exist in theory.


In reply to: 918971150

@yqwang-ms
Copy link
Member

yqwang-ms commented Sep 14, 2021

Add demo here, such as pods withinOneCell and single pod's non-leaf cell type #WontFix


Refers to: example/feature/README.md:2 in c5e1bf1. [](commit_id = c5e1bf1, deletion_comment = False)

// virtual cluster view
type skuClusterView []*skuCell

// skuScheduler can schedule a pod group of pods with arbitrary cell types on a cluster view.
Copy link
Member

@yqwang-ms yqwang-ms Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with arbitrary cell types

Also mentioned the withinOneCell feature?
Such as can be:
skuScheduler can schedule a group of pods with guaranteed affinity requirement (withinOneCell, e.g. with in one rack) and each pod can specify arbitrary cell types (e.g. non-leaf cell). #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

// skuScheduler can schedule a pod group of pods with arbitrary cell types on a cluster view.
// It first tries to place pod group without preemption, then enable preemption if schedule failed.
// For each try, it will find within cells for each pod group, then find cells for each pods with better affinity.
type skuScheduler struct {
Copy link
Member

@yqwang-ms yqwang-ms Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skuScheduler

Why name it as skuScheduler? topologyGuaranteeScheduler is better? #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed

cellChains map[string][]CellChain
// map each level in a chain to the specific cell type name
cellTypes map[CellChain]map[CellLevel]api.CellType
// map each type name in a chain to the specific cell level
Copy link
Member

@yqwang-ms yqwang-ms Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type name

cell type #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

placement, failedReason = PodGroupPlacement{}, "no matched cells in vc"

cv := skuClusterView{nil}
if level, ok := s.cellLevels[podGroup.WithinOneCell]; ok {
Copy link
Member

@yqwang-ms yqwang-ms Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WithinOneCell

If no WithinOneCell defined in cluster, early fail? #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated


func (s *skuScheduler) findCellsForPods(
pods []apiv2.PodGroupMemberSpec,
priority CellPriority,
Copy link
Member

@yqwang-ms yqwang-ms Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

early return if pods nil or empty? #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

placement, failedReason = PodGroupPlacement{}, "no matched cells in vc"

cv := skuClusterView{nil}
if level, ok := s.cellLevels[podGroup.WithinOneCell]; ok {
Copy link
Member

@yqwang-ms yqwang-ms Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WithinOneCell

What if root group WithinOneCell is not specified? Such as from v1 #Closed

}

func (s *skuScheduler) createSkuClusterView(
within *skuCell,
Copy link
Member

@yqwang-ms yqwang-ms Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment the func return and its param relation #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

}

func (s *skuScheduler) createSkuClusterView(
within *skuCell,
Copy link
Member

@yqwang-ms yqwang-ms Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

within

withinCell? #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated


// skuCell type for selected level cell in virtual cluster view
type skuCell struct {
cell Cell // within cell level, maybe higher or lower than node level
Copy link
Member

@yqwang-ms yqwang-ms Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

within cell level

within cell instead of just a level? #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

return false
}

func ancestorNoHigherThanLevel(withinLevel CellLevel, cell Cell) Cell {
Copy link
Member

@yqwang-ms yqwang-ms Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ancestorNoHigherThanLevel

ancestorNoLowerThanLevel? #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

for _, c := range s.chainCellList[l] {
if (within != nil && !isAncestor(within.cell, c)) ||
cv.contains(ancestorNoHigherThanLevel(withinLevel, c)) {
continue
Copy link
Member

@yqwang-ms yqwang-ms Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will "cv.contains() continue" cause some cells with more free gpus are excluded in the final cv? #Closed

"have to use at least one bad cell %v", withinCell.address)
}
podPlacement := s.findCellsForSinglePod(pods[podIndex], priority, withinCell, allocatedCells)
if podPlacement == nil {
Copy link
Member

@yqwang-ms yqwang-ms Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schedule pods also need backtrace to previous bound pod? otherwise, we need to prove it will not miss solution #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we have changed podGroup.Pods to podGroup.Pod, all pods here with different podIndex are the same now

func (s *skuScheduler) findCellsForSinglePod(
pod apiv2.PodGroupMemberSpec,
priority CellPriority,
within *skuCell,
Copy link
Member

@yqwang-ms yqwang-ms Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

within

withinCell? #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

}
podPlacement := s.findCellsForSinglePod(pods[podIndex], priority, withinCell, allocatedCells)
if podPlacement == nil {
withinCellIndex++
Copy link
Member

@yqwang-ms yqwang-ms Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

withinCellIndex

Do we need to reset withinCellIndex = 0 before schedule each pod? #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we have changed podGroup.Pods to podGroup.Pod, all pods here with different podIndex are the same, no need to reset withinCellIndex anymore

freeCells, preemptibleCells = getFreeCellsAtLevel(
within.cell, currLevel, priority, allocatedCells, freeCells, preemptibleCells)
// free leaf cells will be used first (before preemptible leaf cells)
freeCells = append(freeCells, preemptibleCells...)
Copy link
Member

@yqwang-ms yqwang-ms Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

freeCells

availableCells? #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

}

func (s *skuScheduler) findCellsForSinglePod(
pod apiv2.PodGroupMemberSpec,
Copy link
Member

@yqwang-ms yqwang-ms Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment for single pod we still best effort find nearest cells for it (old behaiour) #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

currLevel := s.cellLevels[pod.CellsPerPod.CellType]
freeCells, preemptibleCells := CellList{}, CellList{}
freeCells, preemptibleCells = getFreeCellsAtLevel(
within.cell, currLevel, priority, allocatedCells, freeCells, preemptibleCells)
Copy link
Member

@yqwang-ms yqwang-ms Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First use cell that does not need to be got from higher level cell split? #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort cells when splitting in getFreeCellsAtLevel so that cells need higher level split can be used later

// map each leaf cell type to all chains that contain this type
leafCellChains map[string][]CellChain
// map each within cell type to all chains that contain this type
cellChains map[string][]CellChain
// map each level in a chain to the specific cell type name
Copy link
Member

@yqwang-ms yqwang-ms Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type name

cell type #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

// 1. cell health (prefer healthy)
// 2. cell level (prefer lower)
// 3. usedLeafCellNumAtPriority (prefer higher)
// 4. usedLeafCellNumHigherPriority (prefer lower)
Copy link
Member

@yqwang-ms yqwang-ms Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if greedy search, may need use same order to sort here? #Pending

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment usedLeafCellNumAtPriority different meaning in different crossPriorityPack case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Name string `yaml:"name"` // pod group name
WithinOneCell api.CellType `yaml:"withinOneCell"` // within cell for all cells in current group, e.g., two GPU-cell within one numa-cell, two node-cell within one rack-cell
Pods []PodGroupMemberSpec `yaml:"pods"` // pod list for current group, any two pods are gang with each other in the groupS
ChildGroups []*PodGroupSpec `yaml:"childGroups"` // child group in the hierarchical structure
Copy link
Member

@yqwang-ms yqwang-ms Sep 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems no need Pod list, and then pods search no need back trace #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to PodGroup.Pod, reserve Pods field to flatten Pod for internal structure

Update according to comments.
Fix GitHub Action config.
Change `PodGroup.Pods` to `PodGroup.Pod` to resolve backtracing search
issue.
* early return check for pods and withinOneCell
* sortcells when splitting in getFreeCellsAtLevel
* add comments for sorting usedLeafCellNumAtPriority
Update config in examples accordingly.
Skip build for legacy topology aware scheduler through build tag.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants