Skip to content

Conversation

@Eijebong
Copy link
Contributor

@Eijebong Eijebong commented Oct 17, 2025

The previous implementation could visit nodes multiple times when a graph had diamond like patterns. The new implementation uses Kahn's algorithm which ensures that we only ever visit nodes once.

I tested this on firefox-main with ./mach taskgraph full -J -p taskcluster/test/params/mc-onpush.yml. Before this change, with a profiler attached, it would take ~3mn55. After this change, we're down to ~2mn30. Without profiling we go from 22s to 17s.

I added a links_and_reverse_links_dict helper here since computing them both at the same time is faster than iterating over the graph twice. In my testing the function was called 11 times, 10 of which were on the exact same graph. Because that function is very expensive to
call, I chose to cache it.

The caching itself resulted in ~5-6s of time save when profiled on the full graph for firefox and ~1s when not profiled.

Hyperfine output, full firefox graph, no profiler attached:

Benchmark 1: baseline
  Time (mean ± σ):     22.565 s ±  0.260 s    [User: 53.205 s, System: 5.034 s]
  Range (min … max):   22.255 s … 23.043 s    10 runs

Benchmark 2: with-patch
  Time (mean ± σ):     19.032 s ±  0.110 s    [User: 49.556 s, System: 4.829 s]
  Range (min … max):   18.856 s … 19.195 s    10 runs

Benchmark 3: with-patch-cache
  Time (mean ± σ):     17.800 s ±  0.250 s    [User: 48.605 s, System: 4.893 s]
  Range (min … max):   17.281 s … 18.159 s    10 runs

Summary
  with-patch-cache ran
    1.07 ± 0.02 times faster than with-patch
    1.27 ± 0.02 times faster than baseline

@Eijebong Eijebong requested a review from a team as a code owner October 17, 2025 10:34
@Eijebong Eijebong requested a review from ahal October 17, 2025 10:34
@Eijebong Eijebong force-pushed the faster-graph-traversal branch 7 times, most recently from 0d85f23 to ceb835e Compare October 17, 2025 13:29
Because type annotations in python are a "I wish people would respect my
authority" kind of feature, we cannot guarantee that a `Graph` is
actually created with frozensets as attributes. That is the case all
throughout taskgraph where we're passing sets. By forcing those
attributes to be frozensets we can ensure that `Graph` is hashable which
will allow us to use `functools.lru_cache` to cache some expensive
functions in here.

Unfortunately, there's no easy way of transforming the inputs passed
into a dataclass, since in both `__init__` and `__post_init__`, the
structure is already frozen. By using a dataclass as the parent of
`Graph` instead of `Graph` itself, we can work around the issue while
keeping the frozenness of the dataclass post init without having to
resort to working around dataclass when transforming the attributes
The previous implementation could visit nodes multiple times when a
graph had diamond like patterns. The new implementation uses [Kahn's
algorithm](https://en.wikipedia.org/wiki/Topological_sorting#Kahn's_algorithm)
which ensures that we only ever visit nodes once.

I tested this on firefox-main with `./mach taskgraph full -J -p
taskcluster/test/params/mc-onpush.yml`. Before this change, with a
profiler attached, it would take ~3mn55. After this change, we're down
to ~2mn30. Without profiling we go from 22s to 17s.

I added a `links_and_reverse_links_dict` helper here since computing
them both at the same time is faster than iterating over the graph
twice. In my testing the function was called 11 times, 10 of which were
on the exact same graph. Because that function is very expensive to
call, I chose to cache it.

The caching itself resulted in ~5-6s of time save when profiled on the
full graph for firefox and ~1s when not profiled.

Hyperfine output, full firefox graph, no profiler attached:

```
Benchmark 1: baseline
  Time (mean ± σ):     22.565 s ±  0.260 s    [User: 53.205 s, System: 5.034 s]
  Range (min … max):   22.255 s … 23.043 s    10 runs

Benchmark 2: with-patch
  Time (mean ± σ):     19.032 s ±  0.110 s    [User: 49.556 s, System: 4.829 s]
  Range (min … max):   18.856 s … 19.195 s    10 runs

Benchmark 3: with-patch-cache
  Time (mean ± σ):     17.800 s ±  0.250 s    [User: 48.605 s, System: 4.893 s]
  Range (min … max):   17.281 s … 18.159 s    10 runs

Summary
  with-patch-cache ran
    1.07 ± 0.02 times faster than with-patch
    1.27 ± 0.02 times faster than baseline
```
@Eijebong Eijebong force-pushed the faster-graph-traversal branch from ceb835e to 338c4f9 Compare October 17, 2025 13:31
Copy link
Contributor

@jcristau jcristau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice :)

@Eijebong Eijebong merged commit b713497 into taskcluster:main Oct 17, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants