Skip to content

Commit b713497

Browse files
committed
Optimize graph traversal by using a more appropriate algorithm
The previous implementation could visit nodes multiple times when a graph had diamond like patterns. The new implementation uses [Kahn's algorithm](https://en.wikipedia.org/wiki/Topological_sorting#Kahn's_algorithm) which ensures that we only ever visit nodes once. I tested this on firefox-main with `./mach taskgraph full -J -p taskcluster/test/params/mc-onpush.yml`. Before this change, with a profiler attached, it would take ~3mn55. After this change, we're down to ~2mn30. Without profiling we go from 22s to 17s. I added a `links_and_reverse_links_dict` helper here since computing them both at the same time is faster than iterating over the graph twice. In my testing the function was called 11 times, 10 of which were on the exact same graph. Because that function is very expensive to call, I chose to cache it. The caching itself resulted in ~5-6s of time save when profiled on the full graph for firefox and ~1s when not profiled. Hyperfine output, full firefox graph, no profiler attached: ``` Benchmark 1: baseline Time (mean ± σ): 22.565 s ± 0.260 s [User: 53.205 s, System: 5.034 s] Range (min … max): 22.255 s … 23.043 s 10 runs Benchmark 2: with-patch Time (mean ± σ): 19.032 s ± 0.110 s [User: 49.556 s, System: 4.829 s] Range (min … max): 18.856 s … 19.195 s 10 runs Benchmark 3: with-patch-cache Time (mean ± σ): 17.800 s ± 0.250 s [User: 48.605 s, System: 4.893 s] Range (min … max): 17.281 s … 18.159 s 10 runs Summary with-patch-cache ran 1.07 ± 0.02 times faster than with-patch 1.27 ± 0.02 times faster than baseline ```
1 parent 7f75010 commit b713497

File tree

1 file changed

+33
-12
lines changed

1 file changed

+33
-12
lines changed

src/taskgraph/graph.py

Lines changed: 33 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44

55

66
import collections
7+
import functools
78
from dataclasses import dataclass
89

910

@@ -75,20 +76,25 @@ def transitive_closure(self, nodes, reverse=False):
7576
return Graph(new_nodes, new_edges)
7677

7778
def _visit(self, reverse):
78-
queue = collections.deque(sorted(self.nodes))
79-
links_by_node = self.reverse_links_dict() if reverse else self.links_dict()
80-
seen = set()
79+
forward_links, reverse_links = self.links_and_reverse_links_dict()
80+
81+
dependencies = reverse_links if reverse else forward_links
82+
dependents = forward_links if reverse else reverse_links
83+
84+
indegree = {node: len(dependencies[node]) for node in self.nodes}
85+
86+
queue = collections.deque(
87+
node for node, degree in indegree.items() if degree == 0
88+
)
89+
8190
while queue:
8291
node = queue.popleft()
83-
if node in seen:
84-
continue
85-
links = links_by_node[node]
86-
if all((n in seen) for n in links):
87-
seen.add(node)
88-
yield node
89-
else:
90-
queue.extend(n for n in links if n not in seen)
91-
queue.append(node)
92+
yield node
93+
94+
for dependent in dependents[node]:
95+
indegree[dependent] -= 1
96+
if indegree[dependent] == 0:
97+
queue.append(dependent)
9298

9399
def visit_postorder(self):
94100
"""
@@ -107,6 +113,21 @@ def visit_preorder(self):
107113
"""
108114
return self._visit(True)
109115

116+
@functools.cache
117+
def links_and_reverse_links_dict(self):
118+
"""
119+
Return both links and reverse_links dictionaries.
120+
Returns a (forward_links, reverse_links) tuple where forward_links maps
121+
each node to the set of nodes it links to, and reverse_links maps each
122+
node to the set of nodes linking to it.
123+
"""
124+
forward = collections.defaultdict(set)
125+
reverse = collections.defaultdict(set)
126+
for left, right, _ in self.edges:
127+
forward[left].add(right)
128+
reverse[right].add(left)
129+
return (forward, reverse)
130+
110131
def links_dict(self):
111132
"""
112133
Return a dictionary mapping each node to a set of the nodes it links to

0 commit comments

Comments
 (0)