Skip to content

Commit a82eb87

Browse files
Merge pull request #737 from ahal/ahal/push-pxtmkwqprlut
docs: write a high level overview of Taskcluster and Taskgraph
2 parents 5ffdbdb + b990c2d commit a82eb87

File tree

3 files changed

+141
-36
lines changed

3 files changed

+141
-36
lines changed

docs/concepts/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,8 @@ create the tasks.
1616
:maxdepth: 2
1717
:caption: More Concepts
1818

19+
overview
1920
task-graphs
20-
taskcluster
2121
kind
2222
loading
2323
transforms

docs/concepts/overview.rst

Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
What is Taskgraph?
2+
==================
3+
4+
Taskgraph is a binary that reads configuration files, and applies logic to them
5+
in order to generate and schedule tasks for the `Taskcluster`_ task execution
6+
framework.
7+
8+
A common misconception is that Taskgraph is part of Taskcluster, and the names
9+
are sometimes mistakenly used interchangeably. But Taskgraph and Taskcluster are two
10+
very different pieces of software, performing different roles. If Taskcluster
11+
is the chef in the kitchen preparing all the meals, then Taskgraph is the person
12+
who designed the recipes and organized them into a menu.
13+
14+
But to really understand how Taskgraph and Taskcluster relate to one another, it's
15+
important to understand a bit more about Taskcluster.
16+
17+
.. _Taskcluster: https://taskcluster.net/
18+
19+
What is Taskcluster?
20+
--------------------
21+
22+
Taskcluster at its core, is a collection of microservices that each provide a
23+
series of powerful APIs for the purposes of executing tasks. Some of these
24+
microservices include the `queue service`_ which provides a mechanism to place
25+
tasks on a queue, as well as claim tasks off of it. There's an `auth
26+
service`_ which validates credentials. There's a `hooks service`_ to provide
27+
integration points (both internal and external). And there's the
28+
`worker-manager service`_ which interfaces with various cloud providers to spin
29+
up VM instances (which can then claim work off the queue).
30+
31+
It may be tempting to call Taskcluster a CI system, as that's what it's
32+
primarily used for. But it would be more accurate to say that Taskcluster is a
33+
set of building blocks, which you can assemble into anything that fits the mold
34+
of a queue of work items and workers that execute them. This could be a bespoke
35+
CI system designed specifically for your use case. But you could also assemble
36+
it into an AI training pipeline, or a web crawler or a distributed
37+
parallelization framework. You get the idea.
38+
39+
.. _queue service: https://docs.taskcluster.net/docs/reference/platform/queue
40+
.. _auth service: https://docs.taskcluster.net/docs/reference/platform/auth
41+
.. _hooks service: https://docs.taskcluster.net/docs/reference/core/hooks
42+
.. _worker-manager service: https://docs.taskcluster.net/docs/reference/core/worker-manager
43+
44+
How to Create Tasks
45+
-------------------
46+
47+
All these microservices come together in a powerful way, but as a user you
48+
might be wondering something. How do you create tasks in the first place?
49+
50+
I hinted that the queue service allows you to push tasks onto a queue, and
51+
indeed there's a `createTask`_ API which accomplishes this. There's also a
52+
`Github service`_ which can read a ``.taskcluster.yml`` file from the root of a
53+
Github repository, render it with context from a Github event, and implicitly
54+
create tasks on your behalf (much like how the ``.github/workflows`` directory
55+
works for Github actions).
56+
57+
But both these methods still require you to define your tasks somehow. You need
58+
to specify where your task runs, what it should do, which environment variables
59+
should be set, etc. For simple projects with only a handful of tasks, it might
60+
be feasible to simply write out the entire task definition inside your
61+
``createTask`` API calls, or the ``.taskcluster.yml`` file at the root of your
62+
repo.
63+
64+
But for more complicated projects, it's not hard to imagine how quickly
65+
hardcoding all your task definitions will turn into a maintenance nightmare!
66+
Consider that as of this writing the Firefox project has ~40k tasks defined,
67+
stuffing all that into a single yaml file would not be a fun time for anyone.
68+
69+
.. _Github service: https://docs.taskcluster.net/docs/reference/integrations/github
70+
.. _createTask: https://docs.taskcluster.net/docs/reference/platform/queue/api#createTask
71+
72+
Taskgraph to the Rescue
73+
-----------------------
74+
75+
This finally brings us back to Taskgraph. There are still input yaml files, but
76+
instead of one there can be as many as you like. Then you can layer on
77+
programmatic logic on top of these inputs to "transform" them into actual task
78+
definitions. This logic can be as simple or complex as you need. It can layer
79+
in powerful features, query external services or even duplicate a single task
80+
into many.
81+
82+
What was previously a large block of hardcoded yaml, can be turned into some
83+
concise yaml files along with a few well written transform functions. When you
84+
invoke the ``taskgraph`` binary, these inputs and transforms combine to create
85+
valid task definitions that conform to Taskcluster's `task schema`_.
86+
87+
.. note::
88+
89+
It's worth noting that the `task schema`_ acts as an interface boundary
90+
between Taskgraph and Taskcluster. It's not necessary to use Taskgraph if
91+
you don't want to, you could instead write your own tool that generates
92+
valid task definitions.
93+
94+
Conversely while Taskgraph generates definitions that are compatible with
95+
Taskcluster, in theory it could generate tasks that conform to any other
96+
task execution framework, such as `Gitlab Pipelines`_.
97+
98+
.. _task schema: https://docs.taskcluster.net/docs/reference/platform/queue/task-schema
99+
.. _Gitlab Pipelines: https://docs.gitlab.com/ci/pipelines/
100+
101+
What is the Relationship Between Taskgraph and Taskcluster?
102+
-----------------------------------------------------------
103+
104+
In the end, Taskgraph is a consumer of Taskcluster. It uses the ``createTask``
105+
API to place tasks on the queue, just like the Github service does, or maybe
106+
you'd be doing in a script if you weren't using either the Github service or
107+
Taskgraph.
108+
109+
Assuming you have the proper credentials, you could invoke ``taskgraph`` from
110+
your terminal and this would cause all the generated tasks to run! But that's
111+
not very convenient from a CI perspective, so instead you can invoke
112+
``taskgraph`` from inside a task itself using the Github service.
113+
114+
Remember that ``.taskcluster.yml`` file where you can hardcode task definitions
115+
and how trying to define lots of tasks in there becomes a maintenance
116+
nightmare? Well defining a single task isn't *too* bad. By `convention`_, we
117+
call this single task a *Decision Task* and it's responsible for invoking the
118+
Taskgraph binary.
119+
120+
When you put all of this together, here are the full steps from making a push
121+
to a Github repo, to running tasks:
122+
123+
#. You push a commit to your Github repo.
124+
#. Github emits a webhook event for your push, containing the event context.
125+
#. Taskcluster's Github service receives this event and uses it to render the
126+
``.taskcluster.yml`` file at the root of your repo. If you aren't using
127+
Taskgraph and just have a couple tasks defined in this file, they'd get
128+
created and you'd be done. But if you're using Taskgraph because you have
129+
more complex CI needs, rendering this file will result in a single task
130+
called the *Decision Task*.
131+
#. Inside this *Decision Task*, Taskgraph reads a bunch of input yaml files and
132+
applies transform logic to them, ultimately resulting in valid Taskcluster
133+
task definitions.
134+
#. Taskgraph calls the ``createTask`` API to place them on the queue, and you're
135+
golden!
136+
137+
Hopefully you now have a slightly better understanding of what exactly Taskgraph
138+
and Taskcluster are, and of the differences between them.
139+
140+
.. _convention: https://docs.taskcluster.net/docs/manual/design/conventions/decision-task

docs/concepts/taskcluster.rst

Lines changed: 0 additions & 35 deletions
This file was deleted.

0 commit comments

Comments
 (0)