Skip to content

Commit ba979fd

Browse files
Simplify graph in get_topological_weights. (#10574)
Fixes #10557 where the resolver spends too much time calculating the weights. Also, do not let `get_installation_order` calculate these weights at all when there is nothing left to install. Co-authored-by: Tzu-ping Chung <[email protected]>
1 parent 4cdb516 commit ba979fd

File tree

3 files changed

+49
-9
lines changed

3 files changed

+49
-9
lines changed

news/10557.bugfix.rst

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Optimize installation order calculation to improve performance when installing requirements that form a complex dependency graph with a large amount of edges.

src/pip/_internal/resolution/resolvelib/resolver.py

+47-8
Original file line numberDiff line numberDiff line change
@@ -171,12 +171,17 @@ def get_installation_order(
171171
get installed one-by-one.
172172
173173
The current implementation creates a topological ordering of the
174-
dependency graph, while breaking any cycles in the graph at arbitrary
175-
points. We make no guarantees about where the cycle would be broken,
176-
other than they would be broken.
174+
dependency graph, giving more weight to packages with less
175+
or no dependencies, while breaking any cycles in the graph at
176+
arbitrary points. We make no guarantees about where the cycle
177+
would be broken, other than it *would* be broken.
177178
"""
178179
assert self._result is not None, "must call resolve() first"
179180

181+
if not req_set.requirements:
182+
# Nothing is left to install, so we do not need an order.
183+
return []
184+
180185
graph = self._result.graph
181186
weights = get_topological_weights(
182187
graph,
@@ -199,13 +204,19 @@ def get_topological_weights(
199204
This implementation may change at any point in the future without prior
200205
notice.
201206
202-
We take the length for the longest path to any node from root, ignoring any
203-
paths that contain a single node twice (i.e. cycles). This is done through
204-
a depth-first search through the graph, while keeping track of the path to
205-
the node.
207+
We first simplify the dependency graph by pruning any leaves and giving them
208+
the highest weight: a package without any dependencies should be installed
209+
first. This is done again and again in the same way, giving ever less weight
210+
to the newly found leaves. The loop stops when no leaves are left: all
211+
remaining packages have at least one dependency left in the graph.
212+
213+
Then we continue with the remaining graph, by taking the length for the
214+
longest path to any node from root, ignoring any paths that contain a single
215+
node twice (i.e. cycles). This is done through a depth-first search through
216+
the graph, while keeping track of the path to the node.
206217
207218
Cycles in the graph result would result in node being revisited while also
208-
being it's own path. In this case, take no action. This helps ensure we
219+
being on its own path. In this case, take no action. This helps ensure we
209220
don't get stuck in a cycle.
210221
211222
When assigning weight, the longer path (i.e. larger length) is preferred.
@@ -227,6 +238,34 @@ def visit(node: Optional[str]) -> None:
227238
last_known_parent_count = weights.get(node, 0)
228239
weights[node] = max(last_known_parent_count, len(path))
229240

241+
# Simplify the graph, pruning leaves that have no dependencies.
242+
# This is needed for large graphs (say over 200 packages) because the
243+
# `visit` function is exponentially slower then, taking minutes.
244+
# See https://github.com/pypa/pip/issues/10557
245+
# We will loop until we explicitly break the loop.
246+
while True:
247+
leaves = set()
248+
for key in graph:
249+
if key is None:
250+
continue
251+
for _child in graph.iter_children(key):
252+
# This means we have at least one child
253+
break
254+
else:
255+
# No child.
256+
leaves.add(key)
257+
if not leaves:
258+
# We are done simplifying.
259+
break
260+
# Calculate the weight for the leaves.
261+
weight = len(graph) - 1
262+
for leaf in leaves:
263+
weights[leaf] = weight
264+
# Remove the leaves from the graph, making it simpler.
265+
for leaf in leaves:
266+
graph.remove(leaf)
267+
268+
# Visit the remaining graph.
230269
# `None` is guaranteed to be the root node by resolvelib.
231270
visit(None)
232271

tests/unit/resolution_resolvelib/test_resolver.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,7 @@ def test_new_resolver_get_installation_order(
115115
("three", "four"),
116116
("four", "five"),
117117
],
118-
{None: 0, "one": 1, "two": 1, "three": 2, "four": 3, "five": 4},
118+
{None: 0, "five": 5, "four": 4, "one": 4, "three": 2, "two": 1},
119119
),
120120
(
121121
"linear",

0 commit comments

Comments
 (0)