Skip to content

Commit fa030ad

Browse files
committed
[doc] more reorganization in README.md and other pages
1 parent d10dd56 commit fa030ad

File tree

8 files changed

+171
-68
lines changed

8 files changed

+171
-68
lines changed

README.md

Lines changed: 141 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -2,63 +2,167 @@
22
[![godoc](https://img.shields.io/badge/godoc-Reference-5272B4.svg)](https://godoc.org/github.com/AliceO2Group/Control)
33
# AliECS
44

5-
The ALICE Experiment Control System (**AliECS**) is the piece of software to drive and control the data taking in the ALICE experiment at CERN.
5+
The ALICE Experiment Control System (**AliECS**) is the piece of software to drive and control data taking activities in the experiment.
66
It is a distributed system that combines state of the art cluster resource management and experiment control functionalities into a single comprehensive solution.
7+
78
Please refer to the [CHEP 2023 paper](https://doi.org/10.1051/epjconf/202429502027) for the latest design overview.
89

910
## How to get started
1011

11-
Regardless of your use case, it is recommended to get acquainted with the main [AliECS concepts](docs/handbook/concepts.md).
12+
Regardless of your particular interests, it is recommended to get acquainted with the main [AliECS concepts](docs/handbook/concepts.md).
1213

1314
After that, please find your concrete use case:
1415

15-
* I want to **run AliECS** and other O²/FLP software
16+
### I want to **run AliECS** and other O²/FLP software
1617

17-
:arrow_right: [O²/FLP Suite deployment instructions](https://alice-flp.docs.cern.ch/system-configuration/utils/o2-flp-setup/)
18+
:arrow_right: [O²/FLP Suite deployment instructions](https://alice-flp.docs.cern.ch/system-configuration/utils/o2-flp-setup/)
1819

19-
These instructions apply to both single-node and multi-node deployments.
20-
Contact [alice-o2-flp-support](mailto:alice-o2-flp-support@cern.ch) for assistance with provisioning and deployment.
20+
These instructions apply to both single-node and multi-node deployments.
21+
Contact [alice-o2-flp-support](mailto:alice-o2-flp-support@cern.ch) for assistance with provisioning and deployment.
2122

22-
There are two ways of interacting with AliECS:
23+
There are two ways of interacting with AliECS:
2324

24-
* The AliECS GUI (a.k.a. Control GUI, COG) - not in this repository, but included in most deployments, recommended
25+
- The AliECS GUI (a.k.a. Control GUI, COG) - not in this repository, but included in most deployments, recommended
2526

26-
:arrow_right: [AliECS GUI documentation](hacking/COG.md)
27+
:arrow_right: [AliECS GUI documentation](hacking/COG.md)
2728

28-
* `coconut` - the command-line control and configuration utility, included with AliECS core, typically for developers and advanced users
29+
- `coconut` - the command-line control and configuration utility, included with AliECS core, typically for developers and advanced users
2930

30-
:arrow_right: [Using `coconut`](https://alice-flp.docs.cern.ch/aliecs/coconut/)
31+
:arrow_right: [Using `coconut`](https://alice-flp.docs.cern.ch/aliecs/coconut/)
3132

32-
:arrow_right: [`coconut` command reference](https://alice-flp.docs.cern.ch/aliecs/coconut/doc/coconut/)
33+
:arrow_right: [`coconut` command reference](https://alice-flp.docs.cern.ch/aliecs/coconut/doc/coconut/)
3334

34-
* I want to ensure AliECS can **run and control my process**
35+
### I want to ensure AliECS can **run and control my process**
3536

36-
* My software is based on FairMQ and/or O² DPL (Data Processing Later)
37-
38-
:palm_tree: AliECS natively supports FairMQ (and DPL) devices.
39-
Head to [ControlWorkflows](https://github.com/AliceO2Group/ControlWorkflows) for instructions on how to configure your software to be controlled by AliECS.
40-
41-
* My software does not use FairMQ and/or DPL, but should be controlled through a state machine
42-
43-
:telescope: See [the OCC documentation](occ/README.md) to learn how to integrate the O² Control and Configuration library with your software. [Readout](https://github.com/AliceO2Group/Readout) is an example of this setup.
44-
Once ready, head to [ControlWorkflows](https://github.com/AliceO2Group/ControlWorkflows) for instructions on how to configure it to be controlled by AliECS.
37+
* **My software is based on FairMQ and/or O² DPL (Data Processing Later)**
4538

46-
* My software is a command line utility with no state machine
47-
48-
:palm_tree: AliECS natively supports generic commands.
49-
Head to [ControlWorkflows](https://github.com/AliceO2Group/ControlWorkflows) for instructions to have your command ran by AliECS.
50-
Make sure the task template for your command sets the control mode to `basic` ([see example](https://github.com/AliceO2Group/ControlWorkflows/blob/master/tasks/o2-roc-cleanup.yaml)).
51-
52-
* I want to **develop** AliECS
39+
AliECS natively supports FairMQ (and DPL) devices.
40+
Head to [ControlWorkflows](https://github.com/AliceO2Group/ControlWorkflows) for instructions on how to configure your software to be controlled by AliECS.
41+
42+
* **My software does not use FairMQ and/or DPL, but should be controlled through a state machine**
43+
44+
See [the OCC documentation](occ/README.md) to learn how to integrate the O² Control and Configuration library with your software. [Readout](https://github.com/AliceO2Group/Readout) is an example of this setup.
5345

54-
:hammer_and_wrench: [Building instructions](https://alice-flp.docs.cern.ch/aliecs/building/)
55-
56-
:arrow_right: [Running instructions](https://alice-flp.docs.cern.ch/aliecs/running/)
46+
Once ready, head to [ControlWorkflows](https://github.com/AliceO2Group/ControlWorkflows) for instructions on how to configure it to be controlled by AliECS.
5747

58-
* I want my service to communicate with AliECS upon environment state transitions
48+
* **My software is a command line utility with no state machine**
49+
50+
AliECS natively supports generic commands.
51+
Head to [ControlWorkflows](https://github.com/AliceO2Group/ControlWorkflows) for instructions to have your command ran by AliECS.
52+
Make sure the task template for your command sets the control mode to `basic` ([see example](https://github.com/AliceO2Group/ControlWorkflows/blob/master/tasks/o2-roc-cleanup.yaml)).
53+
54+
### I want to develop AliECS
5955

60-
* Learn more about the [plugin system](TODO)
56+
:hammer_and_wrench: [Building instructions](https://alice-flp.docs.cern.ch/aliecs/building/)
57+
58+
:arrow_right: [Running instructions](https://alice-flp.docs.cern.ch/aliecs/running/)
59+
60+
TODO CONTRIBUTING.md
61+
62+
### I want to receive updates about environments or services controlled by AliECS
63+
64+
:book: Learn more about the [kafka event service](docs/kafka.md)
65+
66+
### I want my service to communicate with AliECS
67+
68+
:book: Learn more about the [plugin system](TODO)
69+
70+
## Table of Contents
71+
72+
* Introduction
73+
* [Basic Concepts](/docs/handbook/concepts.md#basic-concepts)
74+
* [Tasks](/docs/handbook/concepts.md#tasks)
75+
* [Workflows, roles and environments](/docs/handbook/concepts.md#workflows-roles-and-environments)
76+
* [Design Overview](/docs/handbook/overview.md#design-overview)
77+
* [AliECS Structure](/docs/handbook/overview.md#aliecs-structure)
78+
* [Resource Management](/docs/handbook/overview.md#resource-management)
79+
* [FairMQ](/docs/handbook/overview.md#fairmq)
80+
* [State machines](/docs/handbook/overview.md#state-machines)
81+
82+
* Component reference
83+
* [AliECS GUI](/hacking/cog.md)
84+
* AliECS core
85+
* [Workflow Configuration](/docs/handbook/configuration.md#workflow-configuration)
86+
* [The AliECS workflow template language](/docs/handbook/configuration.md#the-aliecs-workflow-template-language)
87+
* [Workflow template structure](/docs/handbook/configuration.md#workflow-template-structure)
88+
* [Task roles](/docs/handbook/configuration.md#task-roles)
89+
* [Call roles](/docs/handbook/configuration.md#call-roles)
90+
* [Aggregator roles](/docs/handbook/configuration.md#aggregator-roles)
91+
* [Iterator roles](/docs/handbook/configuration.md#iterator-roles)
92+
* [Include roles](/docs/handbook/configuration.md#include-roles)
93+
* [Template expressions](/docs/handbook/configuration.md#template-expressions)
94+
* [Task Configuration](/docs/handbook/configuration.md#task-configuration)
95+
* [Task template structure](/docs/handbook/configuration.md#task-template-structure)
96+
* [Variables pushed to controlled tasks](/docs/handbook/configuration.md#variables-pushed-to-controlled-tasks)
97+
* [Resource wants and limits](/docs/handbook/configuration.md#resource-wants-and-limits)
98+
* plugin_system.md TODO
99+
* [Environment operation order](/docs/handbook/operation_order.md#environment-operation-order)
100+
* [START_ACTIVITY (Start Of Run)](/docs/handbook/operation_order.md#start_activity-start-of-run)
101+
* [STOP_ACTIVITY (End Of Run)](/docs/handbook/operation_order.md#stop_activity-end-of-run)
102+
* [Integrated service operations](/docs/handbook/operation_order.md#integrated-service-operations)
103+
* [DCS](/docs/handbook/operation_order.md#dcs)
104+
* [DCS operations](/docs/handbook/operation_order.md#dcs-operations)
105+
* [DCS PrepareForRun behaviour](/docs/handbook/operation_order.md#dcs-prepareforrun-behaviour)
106+
* [DCS StartOfRun behaviour](/docs/handbook/operation_order.md#dcs-startofrun-behaviour)
107+
* [DCS EndOfRun behaviour](/docs/handbook/operation_order.md#dcs-endofrun-behaviour)
108+
* [Protocol documentation](/docs/apidocs_aliecs.md)
109+
* coconut
110+
* [Configuration file](/coconut/README.md#configuration-file)
111+
* [Using coconut](/coconut/README.md#using-coconut)
112+
* [Creating an environment](/coconut/README.md#creating-an-environment)
113+
* [Controlling an environment](/coconut/README.md#controlling-an-environment)
114+
* [Command reference](/coconut/doc/coconut.md)
115+
* apricot
116+
* [Apricot overview](/apricot/README.md)
117+
* [HTTP service](/apricot/docs/apricot_http_service.md#apricot-http-service)
118+
* [Configuration](/apricot/docs/apricot_http_service.md#configuration)
119+
* [Usage and options](/apricot/docs/apricot_http_service.md#usage-and-options)
120+
* [Examples](/apricot/docs/apricot_http_service.md#examples)
121+
* [Protocol documentation](/docs/apidocs_apricot.md)
122+
* [Command reference](/apricot/docs/apricot.md)
123+
* occ
124+
* [O² Control and Configuration Components](/occ/README.md#o-control-and-configuration-components)
125+
* [Developer quick start instructions for OCClib](/occ/README.md#developer-quick-start-instructions-for-occlib)
126+
* [Manual build instructions](/occ/README.md#manual-build-instructions)
127+
* [Run example](/occ/README.md#run-example)
128+
* [The OCC state machine](/occ/README.md#the-occ-state-machine)
129+
* [Single process control with peanut](/occ/README.md#single-process-control-with-peanut)
130+
* [OCC API debugging with grpcc](/occ/README.md#occ-api-debugging-with-grpcc)
131+
* [Dummy process example for OCC library](/occ/occlib/examples/dummy-process/README.md#dummy-process-example-for-occ-library)
132+
* [Protocol documentation](/docs/apidocs_occ.md)
133+
* [peanut](/occ/peanut/README.md)
134+
* Event service
135+
* [Kafka producer functionality in AliECS core](/docs/kafka.md#kafka-producer-functionality-in-aliecs-core)
136+
* [Making sure that AliECS sends messages](/docs/kafka.md#making-sure-that-aliecs-sends-messages)
137+
* [Currently available topics](/docs/kafka.md#currently-available-topics)
138+
* [Decoding the messages](/docs/kafka.md#decoding-the-messages)
139+
* [Legacy events: Kafka plugin](/docs/kafka.md#legacy-events-kafka-plugin)
140+
* [Making sure that AliECS sends messages](/docs/kafka.md#making-sure-that-aliecs-sends-messages-1)
141+
* [Currently available topics](/docs/kafka.md#currently-available-topics-1)
142+
* [Decoding the messages](/docs/kafka.md#decoding-the-messages-1)
143+
* [Getting Start of Run and End of Run notifications](/docs/kafka.md#getting-start-of-run-and-end-of-run-notifications)
144+
* [Using Kafka debug tools](/docs/kafka.md#using-kafka-debug-tools)
145+
146+
* Developer documentation
147+
* [AliECS pkg.go.dev documentation](https://pkg.go.dev/github.com/AliceO2Group/Control)
148+
* [Building AliECS](/docs/building.md#building-aliecs)
149+
* [Overview](/docs/building.md#overview)
150+
* [Building with aliBuild](/docs/building.md#building-with-alibuild)
151+
* [Manual build](/docs/building.md#manual-build)
152+
* [Go environment](/docs/building.md#go-environment)
153+
* [Clone and build (Go components only)](/docs/building.md#clone-and-build-go-components-only)
154+
* [Makefile reference](/docs/makefile_reference.md)
155+
* [Component Configuration](/docs/handbook/appconfiguration.md#component-configuration)
156+
* [Connectivity to controlled nodes](/docs/handbook/appconfiguration.md#connectivity-to-controlled-nodes)
157+
* [Running AliECS as a developer](/docs/running.md#running-aliecs-as-a-developer)
158+
* [Running the AliECS core](/docs/running.md#running-the-aliecs-core)
159+
* [Running AliECS in production](/docs/running.md#running-aliecs-in-production)
160+
* [Health checks](/docs/running.md#health-checks)
161+
* [Development Information](/docs/development.md#development-information)
162+
* [Release Procedure](/docs/development.md#release-procedure)
163+
* [OCC API debugging with grpcc](/docs/using_grpcc_occ.md#occ-api-debugging-with-grpcc)
164+
* CONTRIBUTING.md TODO
165+
166+
* Resources
167+
* T. Mrnjavac et. al, [AliECS: A New Experiment Control System for the ALICE Experiment](https://doi.org/10.1051/epjconf/202429502027), CHEP23
61168

62-
* I want to receive updates about environments or services controlled by AliECS
63-
64-
* [Receive events published by AliECS via Kafka](docs/kafka.md)

apricot/README.md

Lines changed: 4 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,9 @@
11
# `APRICOT`
22

3-
**A** **p**rocessor and **r**epos**i**tory for **co**nfiguration **t**emplates
3+
TODO write
44

5-
The `o2-apricot` binary implements a centralized configuration (micro)service for ALICE O².
65

7-
```
8-
Usage of bin/o2-apricot:
9-
--backendUri string URI of the Consul server or YAML configuration file (default "consul://127.0.0.1:8500")
10-
--listenPort int Port of apricot server (default 32101)
11-
--verbose Verbose logging
12-
```
6+
### SEE ALSO
137

14-
Protofile: [apricot.proto](protos/apricot.proto)
8+
* [apricot HTTP service](docs/apricot_http_service.md) - make essential cluster information available via a web server
9+
* Protofile: [apricot.proto](protos/apricot.proto)

docs/faq.md

Lines changed: 0 additions & 2 deletions
This file was deleted.

docs/handbook/concepts.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,17 @@
22

33
From a logical point of view of data processing deployment and control, AliECS deals with concepts such as **environments**, **roles** and **tasks**, the understanding of which is paramount for using AliECS effectively.
44

5+
## Tasks
6+
57
The basic unit of scheduling in AliECS is a **task**. A task generally corresponds to a process. Sometimes this is a process that can receive and respond to OCC-compatible control messages (also called a **stateful task**), and other times this is simply a shell script or command line tool invocation (also called a **stateless task** or **basic task**).
68

9+
## Workflows, roles and environments
10+
711
All AliECS **workflows** are collections of tasks, which together form a coherent data processing chain.
812

9-
Tasks are the leaves in a tree of roles. A **role** is a runtime subdivision of the complete system, it represents a kind of operation along with its resources (but less than a complete data processing chain). Each task implements one or more roles. Roles allow binding tasks or groups of tasks to specific host attributes, detectors and configuration values. Each role represents either a single task, or a group of child roles. If tasks are leaves, roles are all the other nodes in the control tree of an environment.
13+
Tasks are the leaves in a tree of roles. A **role** is a runtime subdivision of the complete system, it represents a kind of operation along with its resources (but less than a complete data processing chain). Each task implements one or more roles. Roles allow binding tasks or groups of tasks to specific host attributes, detectors and configuration values. Each role represents either a single task, or a group of child roles. While tasks are leaves, roles are all the other nodes in the control tree of an environment.
1014

1115
These novel, more flexible and more easily deployable abstractions represent the evolution of Run 2 abstractions such as ECS partitions. In memory, a tree of O² roles, along with their tasks and their configuration is a **workflow**. A workflow aggregates the collective state of its constituent O2 roles. A running workflow, along with associated detectors and other hardware and software resources required for experiment operation constitutes an **environment**.
16+
17+
18+
TODO add a diagram for the above, add hooks, activity/run?

docs/handbook/configuration.md

Lines changed: 1 addition & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -167,21 +167,6 @@ roles:
167167

168168
In the absence of an explicit `critical` trait for a given task role, the assumed default value is `critical: true`.
169169

170-
#### State machine callbacks moments
171-
172-
The underlying state machine library allows us to add callbacks upon entering and leaving states as well as before and after events (transitions).
173-
This is the order of callback execution upon a state transition:
174-
1. `before_<EVENT>` - called before event named `<EVENT>`
175-
2. `before_event` - called before all events
176-
3. `leave_<OLD_STATE>` - called before leaving `<OLD_STATE>`
177-
4. `leave_state` - called before leaving all states
178-
5. `enter_<NEW_STATE>`, `<NEW_STATE>` - called after entering `<NEW_STATE>`
179-
6. `enter_state` - called after entering all states
180-
7. `after_<EVENT>`, `<EVENT>` - called after event named `<EVENT>`
181-
8. `after_event` - called after all events
182-
183-
Callback execution is further refined with integer indexes, with the syntax `±index`, e.g. `before_CONFIGURE+2`, `enter_CONFIGURED-666`. An expression with no index is assumed to be indexed `+0`. These indexes do not correspond to timestamps, they are discrete labels that allow more granularity in callbacks, ensuring a strict ordering of callback opportunities within a given callback moment. Thus, `before_CONFIGURE+2` will complete execution strictly after `before_CONFIGURE` runs, but strictly before `enter_CONFIGURED-666` is executed.
184-
185170
### Call roles
186171

187172
Call roles represent calls to integrated services. They must contain a `call`
@@ -212,7 +197,7 @@ for examples of call roles that reference a variety of integration plugins.
212197

213198
The state machine callback moments are exposed to the AliECS workflow template interface and can be used as triggers or synchronization points for integration plugin function calls. The `call` block can be used for this purpose, with similar syntax to the `task` block used for controllable tasks. Its fields are as follows.
214199
* `func` - mandatory, it parses as an [`antonmedv/expr`](https://github.com/antonmedv/expr) expression that corresponds to a call to a function that belongs to an integration plugin object (e.g. `bookkeeping.StartOfRun()`, `dcs.EndOfRun()`, etc.).
215-
* `trigger` - mandatory, the expression at `func` will be executed once the state machine reaches this moment.
200+
* `trigger` - mandatory, the expression at `func` will be executed once the state machine reaches this moment. For possible values, see [State machine triggers](/docs/handbook/operation_order.md#state-machine-triggers)
216201
* `await` - optional, if absent it defaults to the same as `trigger`, the expression at `func` needs to finish by this moment, and the state machine will block until `func` completes.
217202
* `timeout` - optional, Go `time.Duration` expression, defaults to `30s`, the maximum time that `func` should take. The value is provided to the plugin via `varStack["__call_timeout"]` and the plugin should implement a timeout mechanism. The ECS will not abort the call upon reaching the timeout value!
218203
* `critical` - optional, it defaults to `true`, if `true` then a failure or timeout for `func` will send the environment state machine to `ERROR`.

0 commit comments

Comments
 (0)