From fb82dd0b2484881da282a7a69a21e8d5b24440a4 Mon Sep 17 00:00:00 2001 From: Jack Harper Date: Tue, 6 Jan 2026 14:27:54 +0000 Subject: [PATCH 01/18] WIP - add ADRs for data streaming and other information following DSG chat earlier --- doc/conf.py | 1 + doc/specific_iocs/{dae => }/Datastreaming.md | 8 +-- doc/specific_iocs/dae/ISISDSLayout.drawio.svg | 4 -- .../Datastreaming-run-starts-stops.md | 5 -- doc/specific_iocs/datastreaming/ADRs.md | 9 +++ .../datastreaming/ADRs/000_kafka.md | 44 +++++++++++++ .../datastreaming/ADRs/001_histograms.md | 31 +++++++++ .../datastreaming/ADRs/002_spectra_mapping.md | 29 +++++++++ .../datastreaming/ADRs/003_linux.md | 17 +++++ .../ADRs/004_isisdae_abstraction.md | 0 .../ADRs/005_vetos_runcontrol.md | 0 .../datastreaming/ADRs/006_tcbs.md | 3 + .../Datastreaming---File-writing.md | 2 +- .../Datastreaming---Sample-Environment.md | 2 +- ...atastreaming--neutron-events-histograms.md | 4 +- .../datastreaming/Datastreaming-How-To.md | 2 +- .../datastreaming/Datastreaming-Topics.md | 2 +- .../Datastreaming-hardware-architecture.md | 64 +++++++++++++++++++ .../Datastreaming-run-starts-stops.md | 5 ++ .../datastreaming/ISISDSLayout.drawio.svg | 4 ++ .../{dae => datastreaming}/ISISDSLayout.xml | 4 +- 21 files changed, 219 insertions(+), 21 deletions(-) rename doc/specific_iocs/{dae => }/Datastreaming.md (75%) delete mode 100644 doc/specific_iocs/dae/ISISDSLayout.drawio.svg delete mode 100644 doc/specific_iocs/dae/datastreaming/Datastreaming-run-starts-stops.md create mode 100644 doc/specific_iocs/datastreaming/ADRs.md create mode 100644 doc/specific_iocs/datastreaming/ADRs/000_kafka.md create mode 100644 doc/specific_iocs/datastreaming/ADRs/001_histograms.md create mode 100644 doc/specific_iocs/datastreaming/ADRs/002_spectra_mapping.md create mode 100644 doc/specific_iocs/datastreaming/ADRs/003_linux.md create mode 100644 doc/specific_iocs/datastreaming/ADRs/004_isisdae_abstraction.md create mode 100644 doc/specific_iocs/datastreaming/ADRs/005_vetos_runcontrol.md create mode 100644 doc/specific_iocs/datastreaming/ADRs/006_tcbs.md rename doc/specific_iocs/{dae => }/datastreaming/Datastreaming---File-writing.md (96%) rename doc/specific_iocs/{dae => }/datastreaming/Datastreaming---Sample-Environment.md (95%) rename doc/specific_iocs/{dae => }/datastreaming/Datastreaming--neutron-events-histograms.md (50%) rename doc/specific_iocs/{dae => }/datastreaming/Datastreaming-How-To.md (98%) rename doc/specific_iocs/{dae => }/datastreaming/Datastreaming-Topics.md (99%) create mode 100644 doc/specific_iocs/datastreaming/Datastreaming-hardware-architecture.md create mode 100644 doc/specific_iocs/datastreaming/Datastreaming-run-starts-stops.md create mode 100644 doc/specific_iocs/datastreaming/ISISDSLayout.drawio.svg rename doc/specific_iocs/{dae => datastreaming}/ISISDSLayout.xml (97%) diff --git a/doc/conf.py b/doc/conf.py index 5875ccfa2..60ca34427 100644 --- a/doc/conf.py +++ b/doc/conf.py @@ -45,6 +45,7 @@ "sphinxcontrib.mermaid", ] mermaid_d3_zoom = True +mermaid_params = ["--iconPacks", "@material-icon-theme"] napoleon_google_docstring = True napoleon_numpy_docstring = False diff --git a/doc/specific_iocs/dae/Datastreaming.md b/doc/specific_iocs/Datastreaming.md similarity index 75% rename from doc/specific_iocs/dae/Datastreaming.md rename to doc/specific_iocs/Datastreaming.md index d02dcfcb0..cc7469a4a 100644 --- a/doc/specific_iocs/dae/Datastreaming.md +++ b/doc/specific_iocs/Datastreaming.md @@ -9,7 +9,7 @@ datastreaming/* ``` -The data streaming system is being built as a requirement for HRPD-X and possibly SANDALS-II, separate (and complementary) to the `MNeuData` project. HRPD-X, SANDALS-II and other future instruments will not have a traditional DAE2/DAE3 as they are now obsolete. It is architecturally similar to the system that the ESS uses to take data (neutron events, sample environment, and anything else that we can throw into a streaming platform) and write it to file. Previously ISIS aided development to the ESS' streaming pipeline as part of an in-kind project. The system will replace the ICP at ISIS. +The data streaming system is being built as a requirement for HRPD-X and possibly SANDALS-II, separate (and complementary) to the `MNeuData` project. HRPD-X, SANDALS-II and other future instruments will not have a traditional DAE2/DAE3 as they are now obsolete. It is architecturally similar to the system that the ESS uses to take data (neutron events, sample environment, and anything else that we can throw into a streaming platform) and write it to file. Previously ISIS aided development to the ESS' streaming pipeline as part of an in-kind project. The system will replace the ICP at ISIS. Note that all documentation here is applicable to the HRPD-X upgrade and may change in the future. In general this works by producing both neutron events and histograms, sample environment data, and other diagnostic data into a [Kafka](https://kafka.apache.org/) cluster and having clients (consumers in Kafka lingo!) that either view data live and act on it or write the data to a nexus file. Additional information can be found [here](http://accelconf.web.cern.ch/AccelConf/icalepcs2017/papers/tupha029.pdf) and [here](https://iopscience.iop.org/article/10.1088/1742-6596/1021/1/012013). @@ -17,11 +17,11 @@ All data is serialised into [Flatbuffers](https://flatbuffers.dev/) blobs using Overall architecture is as follows: -![](ISISDSLayout.drawio.svg) +![](datastreaming/ISISDSLayout.drawio.svg) This comprises of a few different consumers and producers: -- [`azawakh`](https://github.com/ISISComputingGroup/azawakh) - This is a soft IOC which provides `areaDetector` views, spectra plots and so on by consuming events from the cluster and displaying them over EPICS CA/PVA. -- [`borzoi`](https://github.com/ISISComputingGroup/borzoi) - This is also a soft IOC which is more or less a drop-in replacement for the ISISDAE. It provides an interface that several clients (ie. [genie](https://github.com/ISISComputingGroup/genie), [ibex_bluesky_core](https://github.com/ISISComputingGroup/ibex_bluesky_core), [ibex_gui](https://github.com/ISISComputingGroup/ibex_gui)) talk to to start/stop runs and configure streaming electronics. `borzoi` will send UDP packets to the streaming electronics to configure it. +- [`kdae_diagnostics`](https://github.com/ISISComputingGroup/azawakh) - This is a soft IOC which provides `areaDetector` views, spectra plots and so on by consuming events from the cluster and displaying them over EPICS CA/PVA. +- [`kdae_control`](https://github.com/ISISComputingGroup/borzoi) - This is also a soft IOC which is more or less a drop-in replacement for the ISISDAE. It provides an interface that several clients (ie. [genie](https://github.com/ISISComputingGroup/genie), [ibex_bluesky_core](https://github.com/ISISComputingGroup/ibex_bluesky_core), [ibex_gui](https://github.com/ISISComputingGroup/ibex_gui)) talk to to start/stop runs and configure streaming electronics. `borzoi` will send UDP packets to the streaming electronics to configure it. - [`BSTOKAFKA`](https://github.com/ISISComputingGroup/BSKAFKA) - This configures the `forwarder` with the blocks that are in an instrument's current configuration, as well as other PVs which will either get written to a file or archived for e.g. the log plotter. - `forwarder` - See [Forwarding Sample Environment](datastreaming/Datastreaming---Sample-Environment) - `filewriter` - See [File writing](datastreaming/Datastreaming---File-writing) diff --git a/doc/specific_iocs/dae/ISISDSLayout.drawio.svg b/doc/specific_iocs/dae/ISISDSLayout.drawio.svg deleted file mode 100644 index edfcb8dfc..000000000 --- a/doc/specific_iocs/dae/ISISDSLayout.drawio.svg +++ /dev/null @@ -1,4 +0,0 @@ - - - -
hook for run starts/stops
hook for run starts/stops
NeXus File
NeXus File
EV44 events OR hs01 histograms, pulse metadata
EV44 events OR hs01 histograms, pulse metadata
fc00 forwarder config
fc00 forwarder config
f144 (and associated SE schemas), ev44, hs01, pulse meta, pl72 run starts, 6s4t runstops 
f144 (and associated SE schemas), ev44, hs01, pulse meta, pl72 run starts, 6s4t runstops 
filewriter status x5f2
filewriter status x5f2
Kafka Cluster
Kafka Cluster
Forwarded PV updates from monitor (f144 and others)
Forwarded PV updates from monitor (f144 and others)
forwarder
forwarder
Blockserver
Blockserver
Polls for block names
Polls for block names
fc00 forwarder config - blocks and archived vals. runlog?
fc00 forwarder config - blocks and archived vals. runlog?
BSTOKAFKA
BSTOKAFKA
filewriter
filewriter
UDP
UDP
(borzoi)~
process that
A) bridges FPGA UDP config and EPICS and
B) provides an interface very similar to ISISDAE

(borzoi)~...
For new instruments
udp2kafka
udp2kafka
FPGA streaming boards
FPGA streaming boards
UDP
UDP
pl72 run starts/ 6s4t run stops
pl72 run starts/ 6s4t run stops
ISISDAE IOC
ISISDAE IOC
DCOM
DCOM
NIVISA/Qxtream respectively
NIVISA/Qxtream respectively
ISISICP
ISISICP
DAE2/DAE3
DAE2/DAE3
For existing dae2/dae3 insts
PV for block names in order
to construct pl72 run starts
PV for block names in order...
ev44/hs00
ev44/hs00
(azawakh)
consumer soft ioc that provides areadetector live view from kafka stream/spectra plots over epics
(azawakh)...
IOCs  - blocks and archived vals, maybe runlog
IOCs  - blocks and a...
Text is not SVG - cannot display
\ No newline at end of file diff --git a/doc/specific_iocs/dae/datastreaming/Datastreaming-run-starts-stops.md b/doc/specific_iocs/dae/datastreaming/Datastreaming-run-starts-stops.md deleted file mode 100644 index 1baf8b892..000000000 --- a/doc/specific_iocs/dae/datastreaming/Datastreaming-run-starts-stops.md +++ /dev/null @@ -1,5 +0,0 @@ -{#dsrunstartstops} -# Data streaming: run starts/stops - -Run starts and stops will be dealt with by [`borzoi`](https://github.com/ISISComputingGroup/borzoi) and the flatbuffers blobs will be constructed in this process. It may need to be hooked onto by `ISISDAE` for older instruments using DAE2/DAE3 and the ISISICP. - diff --git a/doc/specific_iocs/datastreaming/ADRs.md b/doc/specific_iocs/datastreaming/ADRs.md new file mode 100644 index 000000000..a627aea0d --- /dev/null +++ b/doc/specific_iocs/datastreaming/ADRs.md @@ -0,0 +1,9 @@ +# Data streaming: ADRs + +```{toctree} +:glob: +:titlesonly: +:maxdepth: 1 + +ADRs/* +``` diff --git a/doc/specific_iocs/datastreaming/ADRs/000_kafka.md b/doc/specific_iocs/datastreaming/ADRs/000_kafka.md new file mode 100644 index 000000000..28cd7bc04 --- /dev/null +++ b/doc/specific_iocs/datastreaming/ADRs/000_kafka.md @@ -0,0 +1,44 @@ +# 0 - Kafka + +## Status + +Accepted + +## Context + +We need to decide on a technology through which we are going to do data streaming. + +There are several options here: +- Kafka or kafka compatible solutions such as Redpanda +- Redis +- ZeroMQ/RabbitMQ/ActiveMQ + +Within each of these options we need to decide on a serialization format. +Options are: +- protobuffers +- flatbuffers with ESS schemas +- JSONB +- msgpack +- Avro +- encoded JSON/BSON + + +## Decision + +We have decided to use a Kafka compatible broker as a streaming platform. This may be either Kafka or Redpanda. + +This is because we can lean on the ESS experience in using this technology and may be able to collaborate with them and use shared tools. +Flatbuffers encoding was performance tested during the in-kind project and showed good performance versus the alternatives at the time. + +We have also decided to serialize the data using the [ESS flatbuffers schemas](https://github.com/ess-dmsc/streaming-data-types) with ISIS additions where necessary. + +Kafka is a broker-based streaming technology - as opposed to brokerless systems which do not keep messages. This allows a Kafka-based system to replay messages or for a consumer to catch up with the 'history' of a stream. We will not retain events in Kafka indefinitely - retention will be tuned to keep a suitable number of messages for our use-cases versus hardware constraints. + +## Consequences + +What becomes easier or more difficult to do because of this change? + +Kafka is indisputably harder to set up than some other simpler alternatives. This is somewhat mitigated by its scaling and redundancy benefits. +We don't intend to do a large amount in Kafka itself (ie. transforms or stream processors) + +The advantage of using Kafka is that we keep much more closely aligned to the ESS, CLF, ANSTO and other facilities who are all using Kafka with Flatbuffers schemas. diff --git a/doc/specific_iocs/datastreaming/ADRs/001_histograms.md b/doc/specific_iocs/datastreaming/ADRs/001_histograms.md new file mode 100644 index 000000000..9ea680fd7 --- /dev/null +++ b/doc/specific_iocs/datastreaming/ADRs/001_histograms.md @@ -0,0 +1,31 @@ +# 1 - Histograms and event mode + +## Status + +Current, but may be superseded after HRPD-X. + +## Context + +**Histogram mode** + +In histogram mode, over the course of a run, counts are accumulated into a running histogram, binned by user-specified +time channel boundaries. + +**Event mode** + +In event mode, over the course of a run, each individual neutron event's detection time and detector ID is recorded. +Event mode data can be later binned to form a histogram, but a histogram cannot be recovered to individual events. In +other words, histogramming is lossy. The advantage of histogram mode is that it typically produces smaller data volumes. + +Histogram mode has historically been used due to hardware limitations in many cases. + +## Decision + +For HRPD-x, we will collect all data, including data from neutron monitors, in event mode only. HRPD-x will not support +histogram mode. + +## Consequences + +- Data volumes on HRPD-x will be higher running in event mode compared to histogram mode. This includes both data in-flight +during networking and kafka processing, as well as final Nexus file sizes. +- Only considering events will simplify components of the HRPD-x data streaming implementation. diff --git a/doc/specific_iocs/datastreaming/ADRs/002_spectra_mapping.md b/doc/specific_iocs/datastreaming/ADRs/002_spectra_mapping.md new file mode 100644 index 000000000..c2b8a5ff4 --- /dev/null +++ b/doc/specific_iocs/datastreaming/ADRs/002_spectra_mapping.md @@ -0,0 +1,29 @@ +# 2 - Wiring and Spectra mapping + +## Status + +pending + +## Context + +Wiring tables is a concept that still exists however the format is now different (`.csv` as opposed to the old wiring table format). + +The options that we're considering are: +- change `.csv` to align with the old format +- write a service/script to convert to/from `.csv` +- keep the two formats separate, acknowledging that they will not be backwards or forwards compatible + +Spectra files share the above considerations as they also use a different file format. + +Grouping spectra in hardware was primarily used to get around limitations of DAE hardware. In event mode there is no advantage to grouping spectra in hardware. + +## Decision + +We are not going to support the old-style spectra files or any spectrum mapping/grouping in general + +For wiring tables this is TBD in https://github.com/ISISComputingGroup/DataStreaming/issues/27. + +## Consequences + +- If HRPD-x previously grouped spectra in hardware, they will now need to be grouped in software (e.g. Mantid) instead. +- Our data streaming software will not need to support spectrum grouping. \ No newline at end of file diff --git a/doc/specific_iocs/datastreaming/ADRs/003_linux.md b/doc/specific_iocs/datastreaming/ADRs/003_linux.md new file mode 100644 index 000000000..0cd533d49 --- /dev/null +++ b/doc/specific_iocs/datastreaming/ADRs/003_linux.md @@ -0,0 +1,17 @@ +# 3 - Linux + +## Status + +What is the status, such as proposed, accepted, rejected, deprecated, superseded, etc.? + +## Context + +What is the issue that we're seeing that is motivating this decision or change? + +## Decision + +What is the change that we're proposing and/or doing? + +## Consequences + +What becomes easier or more difficult to do because of this change? diff --git a/doc/specific_iocs/datastreaming/ADRs/004_isisdae_abstraction.md b/doc/specific_iocs/datastreaming/ADRs/004_isisdae_abstraction.md new file mode 100644 index 000000000..e69de29bb diff --git a/doc/specific_iocs/datastreaming/ADRs/005_vetos_runcontrol.md b/doc/specific_iocs/datastreaming/ADRs/005_vetos_runcontrol.md new file mode 100644 index 000000000..e69de29bb diff --git a/doc/specific_iocs/datastreaming/ADRs/006_tcbs.md b/doc/specific_iocs/datastreaming/ADRs/006_tcbs.md new file mode 100644 index 000000000..95bb2048f --- /dev/null +++ b/doc/specific_iocs/datastreaming/ADRs/006_tcbs.md @@ -0,0 +1,3 @@ +todo: because we are doing everything in event mode there is no setting to be set in hardware. filewriter will write everything in event mode, and not histogramming. +kdae_diagnostics will use tcbs but nothing will set them in the hardware. +this adr might be pending due to live view shenanigans. \ No newline at end of file diff --git a/doc/specific_iocs/dae/datastreaming/Datastreaming---File-writing.md b/doc/specific_iocs/datastreaming/Datastreaming---File-writing.md similarity index 96% rename from doc/specific_iocs/dae/datastreaming/Datastreaming---File-writing.md rename to doc/specific_iocs/datastreaming/Datastreaming---File-writing.md index 9809278b4..05bd0572c 100644 --- a/doc/specific_iocs/dae/datastreaming/Datastreaming---File-writing.md +++ b/doc/specific_iocs/datastreaming/Datastreaming---File-writing.md @@ -1,4 +1,4 @@ -# File writing +# Data streaming: File writing The [filewriter](https://github.com/ess-dmsc/kafka-to-nexus) is responsible for taking the neutron and SE data out of Kafka and writing it to a nexus file. When the ICP ends a run it sends a config message to the filewriter, via Kafka, to tell it to start writing to file. diff --git a/doc/specific_iocs/dae/datastreaming/Datastreaming---Sample-Environment.md b/doc/specific_iocs/datastreaming/Datastreaming---Sample-Environment.md similarity index 95% rename from doc/specific_iocs/dae/datastreaming/Datastreaming---Sample-Environment.md rename to doc/specific_iocs/datastreaming/Datastreaming---Sample-Environment.md index 4b17a6717..213cc3d1b 100644 --- a/doc/specific_iocs/dae/datastreaming/Datastreaming---Sample-Environment.md +++ b/doc/specific_iocs/datastreaming/Datastreaming---Sample-Environment.md @@ -1,4 +1,4 @@ -# Sample environment forwarding +# Data streaming: Sample environment forwarding All IBEX instruments are currently forwarding their sample environment PVs into Kafka. This is done in two parts: diff --git a/doc/specific_iocs/dae/datastreaming/Datastreaming--neutron-events-histograms.md b/doc/specific_iocs/datastreaming/Datastreaming--neutron-events-histograms.md similarity index 50% rename from doc/specific_iocs/dae/datastreaming/Datastreaming--neutron-events-histograms.md rename to doc/specific_iocs/datastreaming/Datastreaming--neutron-events-histograms.md index d92ea0e5b..91c29757b 100644 --- a/doc/specific_iocs/dae/datastreaming/Datastreaming--neutron-events-histograms.md +++ b/doc/specific_iocs/datastreaming/Datastreaming--neutron-events-histograms.md @@ -6,8 +6,8 @@ The ICP (communicated to via the ISISDAE IOC) is responsible for communicating w ## For new instruments using FPGA-based acquisition electronics -`borzoi` is responsible for communicating with the electronics and sending run starts/stops. It will have a similar interface to `ISISDAE` so we can drop-in replace it in the GUI.(?) +`kdae_control` is responsible for communicating with the electronics and sending run starts/stops. It will have a similar interface to `ISISDAE` so we can drop-in replace it in the GUI.(?) ## Live view, spectra plots etc. -These will be provided by a soft IOC (`azawakh`) which effectively consumes from event and histogram topics (and possibly run starts?) which will serve areaDetector and other PVs. +These will be provided by a soft IOC (`kdae_diagnostics`) which effectively consumes from event and histogram topics (and possibly run starts?) which will serve areaDetector and other PVs. diff --git a/doc/specific_iocs/dae/datastreaming/Datastreaming-How-To.md b/doc/specific_iocs/datastreaming/Datastreaming-How-To.md similarity index 98% rename from doc/specific_iocs/dae/datastreaming/Datastreaming-How-To.md rename to doc/specific_iocs/datastreaming/Datastreaming-How-To.md index 50d69f501..aca53d288 100644 --- a/doc/specific_iocs/dae/datastreaming/Datastreaming-How-To.md +++ b/doc/specific_iocs/datastreaming/Datastreaming-How-To.md @@ -1,5 +1,5 @@ {#datastreaminghowto} -# Data streaming how-to guide +# Data streaming: how-to guide This is a guide for basic operations using either the development or production Kafka clusters we use for data streaming at ISIS. diff --git a/doc/specific_iocs/dae/datastreaming/Datastreaming-Topics.md b/doc/specific_iocs/datastreaming/Datastreaming-Topics.md similarity index 99% rename from doc/specific_iocs/dae/datastreaming/Datastreaming-Topics.md rename to doc/specific_iocs/datastreaming/Datastreaming-Topics.md index 6baa51204..03ec0f48a 100644 --- a/doc/specific_iocs/dae/datastreaming/Datastreaming-Topics.md +++ b/doc/specific_iocs/datastreaming/Datastreaming-Topics.md @@ -1,4 +1,4 @@ -# Data streaming topics +# Data streaming: topics We have a number of topics per-instrument on `livedata`, the {ref}`Kafka cluster` we use. diff --git a/doc/specific_iocs/datastreaming/Datastreaming-hardware-architecture.md b/doc/specific_iocs/datastreaming/Datastreaming-hardware-architecture.md new file mode 100644 index 000000000..104fe110e --- /dev/null +++ b/doc/specific_iocs/datastreaming/Datastreaming-hardware-architecture.md @@ -0,0 +1,64 @@ +# Data streaming: hardware architecture + +```{mermaid} +architecture-beta + group control_board[CONTROL BOARD] + service timing_fanout(server)[Timing fanout board] in control_board + service vxi_control_board(server)[VXI Control board] in control_board + + group wlsf_module[WLSF MODULE] + service detector_fpga(server)[WLSF Detector FPGA] in wlsf_module + + group linux(server)[LINUX STREAMING SERVER] + service kafka_runInfo(database)[kafka runInfo] in linux + service kafka_udp(database)[kafka UDP] in linux + service udp(server)[udp to kafka Rust] in linux + service event_processor(server)[event processor Rust] in linux + service kafka_events(database)[kafka Events] in linux + + group external_signals[EXTERNAL SIGNALS] + service isis_timing(server)[ISIS Timing TOF GPS PPP Vetos] in external_signals + + group ibex[IBEX IOCs] in linux + service kdae_control(server)[KDAE Control] in ibex + service kdae_diag(server)[KDAE Diagnostics] in ibex + + udp:T --> B:kafka_udp + kafka_udp:R --> L:event_processor + event_processor:B --> T:kafka_events + + detector_fpga:R --> L:udp + timing_fanout:T --> B:detector_fpga + vxi_control_board:T --> B:timing_fanout + + vxi_control_board:B <-- T:isis_timing + + kdae_control:L --> R:vxi_control_board + kafka_events:B --> T:kdae_diag + kdae_control:B --> T:kafka_runInfo +``` + +## Hardware components + +### VXI Control Board + +Each instrument will have exactly one VXI streaming control board. It controls "global" instrument state, gets timing signals and PPP fed into it. It it what `KDAE_control` primarily talks to begin and end event streaming. + +### Timing Fanout board + +This sends data from the VXI board to the (potentially multiple) WLSF modules. IBEX will not need to communicate to this. + +### WLSF Fibre module (FPGA) + +An instrument may have several of these, depending on the instrument detector configuration. These have a similar UDP-based interface to the VXI boards but also have settings for each detector module. + +IBEX will not need to talk to these except for "advanced" diagnostics. + +### Linux streaming server +This hosts a Kafka cluster, for this bit of infrastructure there are three topics: +- `runInfo` - this contains `pl72` and `6s4t` run starts/stops send by `kdae_control` +- `raw_udp` - this contains kafka messages corresponding to each UDP packet which was received (with metadata such as IP address) sent by the [rust udp to Kafka process](https://gitlab.stfc.ac.uk/isis-detector-systems-group/software/data-streaming/rust-udp-to-kafka) which is also hosted on this server +- `events` - this contains `ev44` formed by the [event stream processor](https://gitlab.stfc.ac.uk/isis-detector-systems-group/software/data-streaming/rust-data-stream-processor/-/tree/main?ref_type=heads) which is also hosted on this server + + +This also hosts the `kdae_control` and `kdae_diagnostics` IOCs. diff --git a/doc/specific_iocs/datastreaming/Datastreaming-run-starts-stops.md b/doc/specific_iocs/datastreaming/Datastreaming-run-starts-stops.md new file mode 100644 index 000000000..485ef0223 --- /dev/null +++ b/doc/specific_iocs/datastreaming/Datastreaming-run-starts-stops.md @@ -0,0 +1,5 @@ +{#dsrunstartstops} +# Data streaming: run starts/stops + +Run starts and stops will be dealt with by [`kdae_control`](https://github.com/ISISComputingGroup/borzoi) and the flatbuffers blobs will be constructed in this process. It may need to be hooked onto by `ISISDAE` for older instruments using DAE2/DAE3 and the ISISICP. + diff --git a/doc/specific_iocs/datastreaming/ISISDSLayout.drawio.svg b/doc/specific_iocs/datastreaming/ISISDSLayout.drawio.svg new file mode 100644 index 000000000..47341b281 --- /dev/null +++ b/doc/specific_iocs/datastreaming/ISISDSLayout.drawio.svg @@ -0,0 +1,4 @@ + + + +
hook for run starts/stops
hook for run starts/stops
NeXus File
NeXus File
EV44 events OR hs01 histograms, pulse metadata
EV44 events OR hs01 histograms, pulse metadata
fc00 forwarder config
fc00 forwarder config
f144 (and associated SE schemas), ev44, hs01, pulse meta, pl72 run starts, 6s4t runstops 
f144 (and associated SE schemas), ev44, hs01, pulse meta, pl72 run starts, 6s4t runstops 
filewriter status x5f2
filewriter status x5f2
Kafka Cluster
Kafka Cluster
Forwarded PV updates from monitor (f144 and others)
Forwarded PV updates from monitor (f144 and others)
forwarder
forwarder
Blockserver
Blockserver
Polls for block names
Polls for block names
fc00 forwarder config - blocks and archived vals. runlog?
fc00 forwarder config - blocks and archived vals. runlog?
BSTOKAFKA
BSTOKAFKA
filewriter
filewriter
UDP
UDP
(kdae_control)~
process that
A) bridges FPGA UDP config and EPICS and
B) provides an interface very similar to ISISDAE

(kdae_control)~...
For new instruments
udp2kafka
udp2kafka
FPGA streaming boards
FPGA streaming boards
UDP
UDP
pl72 run starts/ 6s4t run stops
pl72 run starts/ 6s4t run stops
ISISDAE IOC
ISISDAE IOC
DCOM
DCOM
NIVISA/Qxtream respectively
NIVISA/Qxtream respectively
ISISICP
ISISICP
DAE2/DAE3
DAE2/DAE3
For existing dae2/dae3 insts
PV for block names in order
to construct pl72 run starts
PV for block names in order...
ev44/hs00
ev44/hs00
(kdae_diagnostics)
consumer soft ioc that provides areadetector live view from kafka stream/spectra plots over epics
(kdae_diagnostics)...
IOCs  - blocks and archived vals, maybe runlog
IOCs  - blocks and a...
Text is not SVG - cannot display
\ No newline at end of file diff --git a/doc/specific_iocs/dae/ISISDSLayout.xml b/doc/specific_iocs/datastreaming/ISISDSLayout.xml similarity index 97% rename from doc/specific_iocs/dae/ISISDSLayout.xml rename to doc/specific_iocs/datastreaming/ISISDSLayout.xml index 78d0f1048..85540ea1a 100644 --- a/doc/specific_iocs/dae/ISISDSLayout.xml +++ b/doc/specific_iocs/datastreaming/ISISDSLayout.xml @@ -122,7 +122,7 @@ - + @@ -195,7 +195,7 @@ - + From f2d9199487ec4846d07e37cd151a591f8a342818 Mon Sep 17 00:00:00 2001 From: Jack Harper Date: Tue, 6 Jan 2026 16:14:58 +0000 Subject: [PATCH 02/18] end of day - add adr template --- .../datastreaming/ADRs/002_spectra_mapping.md | 2 +- .../datastreaming/ADRs/003_linux.md | 2 +- .../ADRs/004_isisdae_abstraction.md | 17 ++++++++++++++++ .../ADRs/005_vetos_runcontrol.md | 17 ++++++++++++++++ .../datastreaming/ADRs/006_tcbs.md | 20 ++++++++++++++++++- 5 files changed, 55 insertions(+), 3 deletions(-) diff --git a/doc/specific_iocs/datastreaming/ADRs/002_spectra_mapping.md b/doc/specific_iocs/datastreaming/ADRs/002_spectra_mapping.md index c2b8a5ff4..d8bd84ff6 100644 --- a/doc/specific_iocs/datastreaming/ADRs/002_spectra_mapping.md +++ b/doc/specific_iocs/datastreaming/ADRs/002_spectra_mapping.md @@ -26,4 +26,4 @@ For wiring tables this is TBD in https://github.com/ISISComputingGroup/DataStrea ## Consequences - If HRPD-x previously grouped spectra in hardware, they will now need to be grouped in software (e.g. Mantid) instead. -- Our data streaming software will not need to support spectrum grouping. \ No newline at end of file +- Our data streaming software will not need to support spectrum grouping. diff --git a/doc/specific_iocs/datastreaming/ADRs/003_linux.md b/doc/specific_iocs/datastreaming/ADRs/003_linux.md index 0cd533d49..36a411cf5 100644 --- a/doc/specific_iocs/datastreaming/ADRs/003_linux.md +++ b/doc/specific_iocs/datastreaming/ADRs/003_linux.md @@ -2,7 +2,7 @@ ## Status -What is the status, such as proposed, accepted, rejected, deprecated, superseded, etc.? +Accepted. ## Context diff --git a/doc/specific_iocs/datastreaming/ADRs/004_isisdae_abstraction.md b/doc/specific_iocs/datastreaming/ADRs/004_isisdae_abstraction.md index e69de29bb..62302dc91 100644 --- a/doc/specific_iocs/datastreaming/ADRs/004_isisdae_abstraction.md +++ b/doc/specific_iocs/datastreaming/ADRs/004_isisdae_abstraction.md @@ -0,0 +1,17 @@ +# Title + +## Status + +What is the status, such as proposed, accepted, rejected, deprecated, superseded, etc.? + +## Context + +What is the issue that we're seeing that is motivating this decision or change? + +## Decision + +What is the change that we're proposing and/or doing? + +## Consequences + +What becomes easier or more difficult to do because of this change? diff --git a/doc/specific_iocs/datastreaming/ADRs/005_vetos_runcontrol.md b/doc/specific_iocs/datastreaming/ADRs/005_vetos_runcontrol.md index e69de29bb..62302dc91 100644 --- a/doc/specific_iocs/datastreaming/ADRs/005_vetos_runcontrol.md +++ b/doc/specific_iocs/datastreaming/ADRs/005_vetos_runcontrol.md @@ -0,0 +1,17 @@ +# Title + +## Status + +What is the status, such as proposed, accepted, rejected, deprecated, superseded, etc.? + +## Context + +What is the issue that we're seeing that is motivating this decision or change? + +## Decision + +What is the change that we're proposing and/or doing? + +## Consequences + +What becomes easier or more difficult to do because of this change? diff --git a/doc/specific_iocs/datastreaming/ADRs/006_tcbs.md b/doc/specific_iocs/datastreaming/ADRs/006_tcbs.md index 95bb2048f..9d01a4983 100644 --- a/doc/specific_iocs/datastreaming/ADRs/006_tcbs.md +++ b/doc/specific_iocs/datastreaming/ADRs/006_tcbs.md @@ -1,3 +1,21 @@ todo: because we are doing everything in event mode there is no setting to be set in hardware. filewriter will write everything in event mode, and not histogramming. kdae_diagnostics will use tcbs but nothing will set them in the hardware. -this adr might be pending due to live view shenanigans. \ No newline at end of file +this adr might be pending due to live view shenanigans. + +# Title + +## Status + +What is the status, such as proposed, accepted, rejected, deprecated, superseded, etc.? + +## Context + +What is the issue that we're seeing that is motivating this decision or change? + +## Decision + +What is the change that we're proposing and/or doing? + +## Consequences + +What becomes easier or more difficult to do because of this change? From 98de23dc7f59acfa8a3b7fccccfa00748e5da1d0 Mon Sep 17 00:00:00 2001 From: Jack Harper Date: Wed, 7 Jan 2026 11:55:01 +0000 Subject: [PATCH 03/18] fill out remaining ADRs on vetos etc. --- doc/specific_iocs/Datastreaming.md | 2 +- .../datastreaming/ADRs/001_histograms.md | 1 + .../datastreaming/ADRs/003_linux.md | 28 +++++++++++-- .../ADRs/004_isisdae_abstraction.md | 17 -------- .../ADRs/004_vetos_runcontrol.md | 39 +++++++++++++++++++ .../ADRs/005_vetos_runcontrol.md | 17 -------- .../datastreaming/ADRs/006_tcbs.md | 21 ---------- .../Datastreaming-hardware-architecture.md | 2 + 8 files changed, 68 insertions(+), 59 deletions(-) delete mode 100644 doc/specific_iocs/datastreaming/ADRs/004_isisdae_abstraction.md create mode 100644 doc/specific_iocs/datastreaming/ADRs/004_vetos_runcontrol.md delete mode 100644 doc/specific_iocs/datastreaming/ADRs/005_vetos_runcontrol.md delete mode 100644 doc/specific_iocs/datastreaming/ADRs/006_tcbs.md diff --git a/doc/specific_iocs/Datastreaming.md b/doc/specific_iocs/Datastreaming.md index cc7469a4a..500754232 100644 --- a/doc/specific_iocs/Datastreaming.md +++ b/doc/specific_iocs/Datastreaming.md @@ -33,7 +33,7 @@ There is a (non-production!) [Redpanda](https://www.redpanda.com/) Kafka cluster A web interface is available [here](https://reduce.isis.cclrc.ac.uk/redpanda-console/overview). :::{important} -It was decided that we no longer maintain the Kafka cluster, and it will be handled by the the Flexible Interactive +It was decided that we no longer maintain the Kafka cluster, and it will be handled by the Flexible Interactive Automation team. See `\\isis\shares\ISIS_Experiment_Controls\On Call\autoreduction_livedata_support.txt` for their support information. ::: diff --git a/doc/specific_iocs/datastreaming/ADRs/001_histograms.md b/doc/specific_iocs/datastreaming/ADRs/001_histograms.md index 9ea680fd7..b569013a4 100644 --- a/doc/specific_iocs/datastreaming/ADRs/001_histograms.md +++ b/doc/specific_iocs/datastreaming/ADRs/001_histograms.md @@ -1,3 +1,4 @@ +{#001_histograms} # 1 - Histograms and event mode ## Status diff --git a/doc/specific_iocs/datastreaming/ADRs/003_linux.md b/doc/specific_iocs/datastreaming/ADRs/003_linux.md index 36a411cf5..f540a2b91 100644 --- a/doc/specific_iocs/datastreaming/ADRs/003_linux.md +++ b/doc/specific_iocs/datastreaming/ADRs/003_linux.md @@ -6,12 +6,34 @@ Accepted. ## Context -What is the issue that we're seeing that is motivating this decision or change? +Historically we have used Windows for running most of our software due to LabVIEW support when migrating from SECI. +This often means software is not natively supported as most of the wider experiment controls community at comparable facilities run Linux instead, so we regularly have to port software designed for Linux to run on Windows. + +Streaming software is linux-centric and running Kafka itself natively on Windows is not natively supported by either [confluent Kafka](https://www.confluent.io/blog/set-up-and-run-kafka-on-windows-linux-wsl-2/#:~:text=Windows%20still%20isn%E2%80%99t%20the%20recommended%20platform%20for%20running%20Kafka%20with%20production%20workloads) or [redpanda](https://docs.redpanda.com/current/get-started/quick-start/#:~:text=If%20you%E2%80%99re,Linux%20%28WSL%29%2E). In addition, a number of tools developed both externally and at ISIS for data streaming only currently run on Linux. + +Running software under WSL is an option, but is not [built for production use.](https://learn.microsoft.com/en-us/windows/wsl/faq#can-i-use-wsl-for-production-scenarios-) It also adds several layers of complexity with networking, file systems and so on. Container networking configuration is difficult and limited using WSL. + +We would ideally like to run the data streaming software in containers as they offer security, deployment and repeatability benefits and are extremely popular in the software development industry. + +Docker Desktop _is_ supported on Windows, but uses the WSL with a strict licensing agreement which does not suit our needs. Other alternatives also use the WSL. Many container configuration options (e.g. host networking, volume mounting options) cannot be supported with containers on Windows (whether via Docker desktop or another solution). + +One of the long-term goals on the Experiment Controls roadmap is to revisit our operating system choice. It is very likely this will be a distribution of Linux. + +For HRPD-X specifically, the NDX and NDH is unsuitable for deploying data streaming software, as it will remain using Windows (in the short term). + +Additionally, Windows Licensing costs have recently changed significantly (as of January 2026). If we chose to use a Linux distribution, even with support ie. in the style of Red Hat Enterprise Linux, it is likely it would cost less. + +ISIS has made it clear that Linux support will be provided in the medium term. ## Decision -What is the change that we're proposing and/or doing? +New or repurposed hardware, running Linux, will be used to run the streaming software as shown {ref}`here`. + +Wherever possible, software will be deployed in containers, which will minimise the amount of Linux systems administration knowledge required. The aim will be for the Linux machine to 'only' have a container engine (such as docker or podman) installed, and very little else. ## Consequences -What becomes easier or more difficult to do because of this change? +- We are able to use Linux-centric technologies and tools, without needing to spend large amounts of time inventing workarounds for Windows. +- The OS will be different. Developers will need _some_ understanding of Linux to maintain these servers. + * Mitigation: do as little as possible on the host, ideally limit it to just having a container engine installed via a configuration management tool such as Ansible. +- Data-streaming infrastructure will not be on the NDH/NDX machine with the rest of IBEX. This is fine - EPICS is explicitly designed to run in a distributed way. diff --git a/doc/specific_iocs/datastreaming/ADRs/004_isisdae_abstraction.md b/doc/specific_iocs/datastreaming/ADRs/004_isisdae_abstraction.md deleted file mode 100644 index 62302dc91..000000000 --- a/doc/specific_iocs/datastreaming/ADRs/004_isisdae_abstraction.md +++ /dev/null @@ -1,17 +0,0 @@ -# Title - -## Status - -What is the status, such as proposed, accepted, rejected, deprecated, superseded, etc.? - -## Context - -What is the issue that we're seeing that is motivating this decision or change? - -## Decision - -What is the change that we're proposing and/or doing? - -## Consequences - -What becomes easier or more difficult to do because of this change? diff --git a/doc/specific_iocs/datastreaming/ADRs/004_vetos_runcontrol.md b/doc/specific_iocs/datastreaming/ADRs/004_vetos_runcontrol.md new file mode 100644 index 000000000..3ff2f06e1 --- /dev/null +++ b/doc/specific_iocs/datastreaming/ADRs/004_vetos_runcontrol.md @@ -0,0 +1,39 @@ +# 4 - Vetos and Run control + +## Status + +Accepted + +## Context + +**Vetos** +Vetos are currently used by DAE2/DAE3 to mark neutron data as not useful - these are usually hardware signals fed into the DAE by other sources such as ISIS central timing systems and choppers. + +This will remain fairly similar for the streamed data and will be fed in via the VXI control boards, which has registers for viewing the status of vetos. + +There are two ways that an event packet could be vetoed, configurable via the VXI streaming control board (and the WLSF modules individually): +- **Software veto:** Emit the frame, but mark it as invalid using veto flags +- **Hardware veto:** Do not emit the frame + +Long term, we will likely need to allow for both modes. We _may_ also need to support each mode for each different type of veto - e.g. not emitting frames if one type of veto is active, while emitting frames marked as invalid for another type of veto. + +Data is still forwarded by UDP, but may not get processed into the `ev44` format if it is vetoed. This is configurable by the streaming boards. + +**Run control** + +Run control is controlled by IBEX using the existing {doc}`/system_components/Run-control` IOC. The concept of 'suspending' data collection if sample environment is outside a desired range will still be required. + +## Decision + +There will be a register, in the streaming control VXI crate that `kdae_control` can write to, which will be set via EPICS by the run control IOC. This register will act exactly like a hardware veto signal, except will be controlled by software. The runcontrol status will be monitored by `kdae_control`, and when it changes, `kdae_control` will write to the corresponding register in the streaming control VXI crate. + +The overall concept of {external+ibex_user_manual:ref}`concept-good-raw-frames` will still be needed, as scientists will use {external+genie_python:py:obj}`genie.waitfor_frames` and similar functions to control their run durations. + +We have also agreed with DSG that the WLSF modules should _not_ be allowed to individually veto data despite this being technically possible and this should be the responsibility of the VXI control board. + +## Consequences + +- The existing concept of {doc}`/system_components/Run-control` is retained. This means that commands such as {external+genie_python:py:obj}`genie.waitfor_frames` will work largely as before, minimising required changes to instrument scripts. +- The existing concepts of {external+ibex_user_manual:ref}`concept_good_raw_frames` are retained. +- Existing vetoes will largely map across onto the new system. +- `kdae_control` will need to monitor the run control [this PV](https://github.com/ISISComputingGroup/EPICS-RunControl/blob/master/RunControlApp/Db/gencontrolMgr.db#L54C28-L54C35) for changes. diff --git a/doc/specific_iocs/datastreaming/ADRs/005_vetos_runcontrol.md b/doc/specific_iocs/datastreaming/ADRs/005_vetos_runcontrol.md deleted file mode 100644 index 62302dc91..000000000 --- a/doc/specific_iocs/datastreaming/ADRs/005_vetos_runcontrol.md +++ /dev/null @@ -1,17 +0,0 @@ -# Title - -## Status - -What is the status, such as proposed, accepted, rejected, deprecated, superseded, etc.? - -## Context - -What is the issue that we're seeing that is motivating this decision or change? - -## Decision - -What is the change that we're proposing and/or doing? - -## Consequences - -What becomes easier or more difficult to do because of this change? diff --git a/doc/specific_iocs/datastreaming/ADRs/006_tcbs.md b/doc/specific_iocs/datastreaming/ADRs/006_tcbs.md deleted file mode 100644 index 9d01a4983..000000000 --- a/doc/specific_iocs/datastreaming/ADRs/006_tcbs.md +++ /dev/null @@ -1,21 +0,0 @@ -todo: because we are doing everything in event mode there is no setting to be set in hardware. filewriter will write everything in event mode, and not histogramming. -kdae_diagnostics will use tcbs but nothing will set them in the hardware. -this adr might be pending due to live view shenanigans. - -# Title - -## Status - -What is the status, such as proposed, accepted, rejected, deprecated, superseded, etc.? - -## Context - -What is the issue that we're seeing that is motivating this decision or change? - -## Decision - -What is the change that we're proposing and/or doing? - -## Consequences - -What becomes easier or more difficult to do because of this change? diff --git a/doc/specific_iocs/datastreaming/Datastreaming-hardware-architecture.md b/doc/specific_iocs/datastreaming/Datastreaming-hardware-architecture.md index 104fe110e..8e241ce01 100644 --- a/doc/specific_iocs/datastreaming/Datastreaming-hardware-architecture.md +++ b/doc/specific_iocs/datastreaming/Datastreaming-hardware-architecture.md @@ -1,3 +1,5 @@ +{#ds_hardware_architecture} + # Data streaming: hardware architecture ```{mermaid} From 57423e27e07eec0edbc6052d86260e058379c93f Mon Sep 17 00:00:00 2001 From: Jack Harper Date: Wed, 7 Jan 2026 14:45:52 +0000 Subject: [PATCH 04/18] speling --- doc/spelling_wordlist.txt | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/doc/spelling_wordlist.txt b/doc/spelling_wordlist.txt index 385a44ba0..54ae2e497 100644 --- a/doc/spelling_wordlist.txt +++ b/doc/spelling_wordlist.txt @@ -420,6 +420,7 @@ junit jvisualvm Jython Jülich +Kafka Kammrath Kanban kbaud @@ -462,6 +463,7 @@ longin longout lookups loopback +lossy lowlimit lowT LSi @@ -637,6 +639,7 @@ PLCs plt PNG png +podman polarisers polref postfixed @@ -879,6 +882,7 @@ ua uA uamps UC +UDP UI ui uk From e653b8f1c120f826e1ec917075a8b7276349cc9b Mon Sep 17 00:00:00 2001 From: Jack Harper Date: Wed, 7 Jan 2026 14:46:34 +0000 Subject: [PATCH 05/18] fix reference --- doc/specific_iocs/datastreaming/ADRs/004_vetos_runcontrol.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/specific_iocs/datastreaming/ADRs/004_vetos_runcontrol.md b/doc/specific_iocs/datastreaming/ADRs/004_vetos_runcontrol.md index 3ff2f06e1..6433b1244 100644 --- a/doc/specific_iocs/datastreaming/ADRs/004_vetos_runcontrol.md +++ b/doc/specific_iocs/datastreaming/ADRs/004_vetos_runcontrol.md @@ -27,7 +27,7 @@ Run control is controlled by IBEX using the existing {doc}`/system_components/Ru There will be a register, in the streaming control VXI crate that `kdae_control` can write to, which will be set via EPICS by the run control IOC. This register will act exactly like a hardware veto signal, except will be controlled by software. The runcontrol status will be monitored by `kdae_control`, and when it changes, `kdae_control` will write to the corresponding register in the streaming control VXI crate. -The overall concept of {external+ibex_user_manual:ref}`concept-good-raw-frames` will still be needed, as scientists will use {external+genie_python:py:obj}`genie.waitfor_frames` and similar functions to control their run durations. +The overall concept of {external+ibex_user_manual:ref}`concept_good_raw_frames` will still be needed, as scientists will use {external+genie_python:py:obj}`genie.waitfor_frames` and similar functions to control their run durations. We have also agreed with DSG that the WLSF modules should _not_ be allowed to individually veto data despite this being technically possible and this should be the responsibility of the VXI control board. From 61c1e9d29686fc5e130f18f91f8b3795e253903f Mon Sep 17 00:00:00 2001 From: Jack Harper Date: Wed, 7 Jan 2026 14:53:31 +0000 Subject: [PATCH 06/18] more speling --- doc/specific_iocs/datastreaming/ADRs/000_kafka.md | 2 +- doc/specific_iocs/datastreaming/ADRs/003_linux.md | 2 +- .../datastreaming/Datastreaming-hardware-architecture.md | 2 +- doc/spelling_wordlist.txt | 4 ++++ 4 files changed, 7 insertions(+), 3 deletions(-) diff --git a/doc/specific_iocs/datastreaming/ADRs/000_kafka.md b/doc/specific_iocs/datastreaming/ADRs/000_kafka.md index 28cd7bc04..aac8ddae5 100644 --- a/doc/specific_iocs/datastreaming/ADRs/000_kafka.md +++ b/doc/specific_iocs/datastreaming/ADRs/000_kafka.md @@ -9,7 +9,7 @@ Accepted We need to decide on a technology through which we are going to do data streaming. There are several options here: -- Kafka or kafka compatible solutions such as Redpanda +- Kafka or Kafka compatible solutions such as Redpanda - Redis - ZeroMQ/RabbitMQ/ActiveMQ diff --git a/doc/specific_iocs/datastreaming/ADRs/003_linux.md b/doc/specific_iocs/datastreaming/ADRs/003_linux.md index f540a2b91..52cea9d0e 100644 --- a/doc/specific_iocs/datastreaming/ADRs/003_linux.md +++ b/doc/specific_iocs/datastreaming/ADRs/003_linux.md @@ -17,7 +17,7 @@ We would ideally like to run the data streaming software in containers as they o Docker Desktop _is_ supported on Windows, but uses the WSL with a strict licensing agreement which does not suit our needs. Other alternatives also use the WSL. Many container configuration options (e.g. host networking, volume mounting options) cannot be supported with containers on Windows (whether via Docker desktop or another solution). -One of the long-term goals on the Experiment Controls roadmap is to revisit our operating system choice. It is very likely this will be a distribution of Linux. +One of the long-term goals on the Experiment Controls road-map is to revisit our operating system choice. It is very likely this will be a distribution of Linux. For HRPD-X specifically, the NDX and NDH is unsuitable for deploying data streaming software, as it will remain using Windows (in the short term). diff --git a/doc/specific_iocs/datastreaming/Datastreaming-hardware-architecture.md b/doc/specific_iocs/datastreaming/Datastreaming-hardware-architecture.md index 8e241ce01..68d370be5 100644 --- a/doc/specific_iocs/datastreaming/Datastreaming-hardware-architecture.md +++ b/doc/specific_iocs/datastreaming/Datastreaming-hardware-architecture.md @@ -59,7 +59,7 @@ IBEX will not need to talk to these except for "advanced" diagnostics. ### Linux streaming server This hosts a Kafka cluster, for this bit of infrastructure there are three topics: - `runInfo` - this contains `pl72` and `6s4t` run starts/stops send by `kdae_control` -- `raw_udp` - this contains kafka messages corresponding to each UDP packet which was received (with metadata such as IP address) sent by the [rust udp to Kafka process](https://gitlab.stfc.ac.uk/isis-detector-systems-group/software/data-streaming/rust-udp-to-kafka) which is also hosted on this server +- `raw_udp` - this contains Kafka messages corresponding to each UDP packet which was received (with metadata such as IP address) sent by the [rust udp to Kafka process](https://gitlab.stfc.ac.uk/isis-detector-systems-group/software/data-streaming/rust-udp-to-kafka) which is also hosted on this server - `events` - this contains `ev44` formed by the [event stream processor](https://gitlab.stfc.ac.uk/isis-detector-systems-group/software/data-streaming/rust-data-stream-processor/-/tree/main?ref_type=heads) which is also hosted on this server diff --git a/doc/spelling_wordlist.txt b/doc/spelling_wordlist.txt index 54ae2e497..e790b24b7 100644 --- a/doc/spelling_wordlist.txt +++ b/doc/spelling_wordlist.txt @@ -58,6 +58,7 @@ autosaved autosaving autostart autotune +Avro backend Baldor Baratron @@ -89,6 +90,7 @@ booleans bootable bottlenecking breakpoint +brokerless bruteforce bumpstrip burndown @@ -533,6 +535,7 @@ moxas MPa msc msg +msgpack msi MSLITS msm @@ -668,6 +671,7 @@ procserver profiler programmatically proto +protobuffers pseudocode psu pugixml From e8bb2410913ec8d71d63bbe4446b2471f0e8968f Mon Sep 17 00:00:00 2001 From: Jack Harper Date: Wed, 7 Jan 2026 14:57:29 +0000 Subject: [PATCH 07/18] even more speling --- doc/specific_iocs/datastreaming/ADRs/001_histograms.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/specific_iocs/datastreaming/ADRs/001_histograms.md b/doc/specific_iocs/datastreaming/ADRs/001_histograms.md index b569013a4..28ab978a0 100644 --- a/doc/specific_iocs/datastreaming/ADRs/001_histograms.md +++ b/doc/specific_iocs/datastreaming/ADRs/001_histograms.md @@ -28,5 +28,5 @@ histogram mode. ## Consequences - Data volumes on HRPD-x will be higher running in event mode compared to histogram mode. This includes both data in-flight -during networking and kafka processing, as well as final Nexus file sizes. +during networking and Kafka processing, as well as final Nexus file sizes. - Only considering events will simplify components of the HRPD-x data streaming implementation. From 60a3c183e0c74a739b696f69d123f03747e4d124 Mon Sep 17 00:00:00 2001 From: Jack Harper Date: Wed, 7 Jan 2026 16:30:54 +0000 Subject: [PATCH 08/18] use new names and URLs --- doc/specific_iocs/Datastreaming.md | 6 +++--- .../datastreaming/Datastreaming-run-starts-stops.md | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/doc/specific_iocs/Datastreaming.md b/doc/specific_iocs/Datastreaming.md index 500754232..baec4abdd 100644 --- a/doc/specific_iocs/Datastreaming.md +++ b/doc/specific_iocs/Datastreaming.md @@ -20,9 +20,9 @@ Overall architecture is as follows: ![](datastreaming/ISISDSLayout.drawio.svg) This comprises of a few different consumers and producers: -- [`kdae_diagnostics`](https://github.com/ISISComputingGroup/azawakh) - This is a soft IOC which provides `areaDetector` views, spectra plots and so on by consuming events from the cluster and displaying them over EPICS CA/PVA. -- [`kdae_control`](https://github.com/ISISComputingGroup/borzoi) - This is also a soft IOC which is more or less a drop-in replacement for the ISISDAE. It provides an interface that several clients (ie. [genie](https://github.com/ISISComputingGroup/genie), [ibex_bluesky_core](https://github.com/ISISComputingGroup/ibex_bluesky_core), [ibex_gui](https://github.com/ISISComputingGroup/ibex_gui)) talk to to start/stop runs and configure streaming electronics. `borzoi` will send UDP packets to the streaming electronics to configure it. -- [`BSTOKAFKA`](https://github.com/ISISComputingGroup/BSKAFKA) - This configures the `forwarder` with the blocks that are in an instrument's current configuration, as well as other PVs which will either get written to a file or archived for e.g. the log plotter. +- [`kafka_dae_diagnostics`](https://github.com/ISISComputingGroup/kafka_dae_diagnostics) - This is a soft IOC which provides `areaDetector` views, spectra plots and so on by consuming events from the cluster and displaying them over EPICS CA/PVA. +- [`kafka_dae_control`](https://github.com/ISISComputingGroup/kafka_dae_control) - This is also a soft IOC which is more or less a drop-in replacement for the ISISDAE. It provides an interface that several clients (ie. [genie](https://github.com/ISISComputingGroup/genie), [ibex_bluesky_core](https://github.com/ISISComputingGroup/ibex_bluesky_core), [ibex_gui](https://github.com/ISISComputingGroup/ibex_gui)) talk to to start/stop runs and configure streaming electronics. `kafka_dae_control` will send UDP packets to the streaming electronics to configure it. +- [`kafka_forwarder_configurer`](https://github.com/ISISComputingGroup/kafka_forwarder_configurer) - This configures the `forwarder` with the blocks that are in an instrument's current configuration, as well as other PVs which will either get written to a file or archived for e.g. the log plotter. - `forwarder` - See [Forwarding Sample Environment](datastreaming/Datastreaming---Sample-Environment) - `filewriter` - See [File writing](datastreaming/Datastreaming---File-writing) diff --git a/doc/specific_iocs/datastreaming/Datastreaming-run-starts-stops.md b/doc/specific_iocs/datastreaming/Datastreaming-run-starts-stops.md index 485ef0223..48a715c19 100644 --- a/doc/specific_iocs/datastreaming/Datastreaming-run-starts-stops.md +++ b/doc/specific_iocs/datastreaming/Datastreaming-run-starts-stops.md @@ -1,5 +1,5 @@ {#dsrunstartstops} # Data streaming: run starts/stops -Run starts and stops will be dealt with by [`kdae_control`](https://github.com/ISISComputingGroup/borzoi) and the flatbuffers blobs will be constructed in this process. It may need to be hooked onto by `ISISDAE` for older instruments using DAE2/DAE3 and the ISISICP. +Run starts and stops will be dealt with by [`kafka_dae_control`](https://github.com/ISISComputingGroup/kafka_dae_control) and the flatbuffers blobs will be constructed in this process. It may need to be hooked onto by `ISISDAE` for older instruments using DAE2/DAE3 and the ISISICP. From 20aa32dc1e27d933d4865d489f49eea71ee61d53 Mon Sep 17 00:00:00 2001 From: Jack Harper Date: Wed, 7 Jan 2026 16:35:10 +0000 Subject: [PATCH 09/18] use full names --- .../datastreaming/ADRs/004_vetos_runcontrol.md | 4 ++-- .../Datastreaming--neutron-events-histograms.md | 4 ++-- .../Datastreaming-hardware-architecture.md | 16 ++++++++-------- .../datastreaming/ISISDSLayout.drawio.svg | 2 +- doc/specific_iocs/datastreaming/ISISDSLayout.xml | 4 ++-- 5 files changed, 15 insertions(+), 15 deletions(-) diff --git a/doc/specific_iocs/datastreaming/ADRs/004_vetos_runcontrol.md b/doc/specific_iocs/datastreaming/ADRs/004_vetos_runcontrol.md index 6433b1244..4c343d6f8 100644 --- a/doc/specific_iocs/datastreaming/ADRs/004_vetos_runcontrol.md +++ b/doc/specific_iocs/datastreaming/ADRs/004_vetos_runcontrol.md @@ -25,7 +25,7 @@ Run control is controlled by IBEX using the existing {doc}`/system_components/Ru ## Decision -There will be a register, in the streaming control VXI crate that `kdae_control` can write to, which will be set via EPICS by the run control IOC. This register will act exactly like a hardware veto signal, except will be controlled by software. The runcontrol status will be monitored by `kdae_control`, and when it changes, `kdae_control` will write to the corresponding register in the streaming control VXI crate. +There will be a register, in the streaming control VXI crate that `kafka_dae_control` can write to, which will be set via EPICS by the run control IOC. This register will act exactly like a hardware veto signal, except will be controlled by software. The runcontrol status will be monitored by `kafka_dae_control`, and when it changes, `kafka_dae_control` will write to the corresponding register in the streaming control VXI crate. The overall concept of {external+ibex_user_manual:ref}`concept_good_raw_frames` will still be needed, as scientists will use {external+genie_python:py:obj}`genie.waitfor_frames` and similar functions to control their run durations. @@ -36,4 +36,4 @@ We have also agreed with DSG that the WLSF modules should _not_ be allowed to in - The existing concept of {doc}`/system_components/Run-control` is retained. This means that commands such as {external+genie_python:py:obj}`genie.waitfor_frames` will work largely as before, minimising required changes to instrument scripts. - The existing concepts of {external+ibex_user_manual:ref}`concept_good_raw_frames` are retained. - Existing vetoes will largely map across onto the new system. -- `kdae_control` will need to monitor the run control [this PV](https://github.com/ISISComputingGroup/EPICS-RunControl/blob/master/RunControlApp/Db/gencontrolMgr.db#L54C28-L54C35) for changes. +- `kafka_dae_control` will need to monitor the run control [this PV](https://github.com/ISISComputingGroup/EPICS-RunControl/blob/master/RunControlApp/Db/gencontrolMgr.db#L54C28-L54C35) for changes. diff --git a/doc/specific_iocs/datastreaming/Datastreaming--neutron-events-histograms.md b/doc/specific_iocs/datastreaming/Datastreaming--neutron-events-histograms.md index 91c29757b..1342acdb4 100644 --- a/doc/specific_iocs/datastreaming/Datastreaming--neutron-events-histograms.md +++ b/doc/specific_iocs/datastreaming/Datastreaming--neutron-events-histograms.md @@ -6,8 +6,8 @@ The ICP (communicated to via the ISISDAE IOC) is responsible for communicating w ## For new instruments using FPGA-based acquisition electronics -`kdae_control` is responsible for communicating with the electronics and sending run starts/stops. It will have a similar interface to `ISISDAE` so we can drop-in replace it in the GUI.(?) +`kafka_dae_control` is responsible for communicating with the electronics and sending run starts/stops. It will have a similar interface to `ISISDAE` so we can drop-in replace it in the GUI.(?) ## Live view, spectra plots etc. -These will be provided by a soft IOC (`kdae_diagnostics`) which effectively consumes from event and histogram topics (and possibly run starts?) which will serve areaDetector and other PVs. +These will be provided by a soft IOC (`kafka_dae_diagnostics`) which effectively consumes from event and histogram topics (and possibly run starts?) which will serve areaDetector and other PVs. diff --git a/doc/specific_iocs/datastreaming/Datastreaming-hardware-architecture.md b/doc/specific_iocs/datastreaming/Datastreaming-hardware-architecture.md index 68d370be5..9f068ec05 100644 --- a/doc/specific_iocs/datastreaming/Datastreaming-hardware-architecture.md +++ b/doc/specific_iocs/datastreaming/Datastreaming-hardware-architecture.md @@ -22,8 +22,8 @@ architecture-beta service isis_timing(server)[ISIS Timing TOF GPS PPP Vetos] in external_signals group ibex[IBEX IOCs] in linux - service kdae_control(server)[KDAE Control] in ibex - service kdae_diag(server)[KDAE Diagnostics] in ibex + service kafka_dae_control(server)[KDAE Control] in ibex + service kafka_dae_diagnostics(server)[KDAE Diagnostics] in ibex udp:T --> B:kafka_udp kafka_udp:R --> L:event_processor @@ -35,16 +35,16 @@ architecture-beta vxi_control_board:B <-- T:isis_timing - kdae_control:L --> R:vxi_control_board - kafka_events:B --> T:kdae_diag - kdae_control:B --> T:kafka_runInfo + kafka_dae_control:L --> R:vxi_control_board + kafka_events:B --> T:kafka_dae_diagnostics + kafka_dae_control:B --> T:kafka_runInfo ``` ## Hardware components ### VXI Control Board -Each instrument will have exactly one VXI streaming control board. It controls "global" instrument state, gets timing signals and PPP fed into it. It it what `KDAE_control` primarily talks to begin and end event streaming. +Each instrument will have exactly one VXI streaming control board. It controls "global" instrument state, gets timing signals and PPP fed into it. It it what `kafka_dae_control` primarily talks to begin and end event streaming. ### Timing Fanout board @@ -58,9 +58,9 @@ IBEX will not need to talk to these except for "advanced" diagnostics. ### Linux streaming server This hosts a Kafka cluster, for this bit of infrastructure there are three topics: -- `runInfo` - this contains `pl72` and `6s4t` run starts/stops send by `kdae_control` +- `runInfo` - this contains `pl72` and `6s4t` run starts/stops send by `kafka_dae_control` - `raw_udp` - this contains Kafka messages corresponding to each UDP packet which was received (with metadata such as IP address) sent by the [rust udp to Kafka process](https://gitlab.stfc.ac.uk/isis-detector-systems-group/software/data-streaming/rust-udp-to-kafka) which is also hosted on this server - `events` - this contains `ev44` formed by the [event stream processor](https://gitlab.stfc.ac.uk/isis-detector-systems-group/software/data-streaming/rust-data-stream-processor/-/tree/main?ref_type=heads) which is also hosted on this server -This also hosts the `kdae_control` and `kdae_diagnostics` IOCs. +This also hosts the `kafka_dae_control` and `kafka_dae_diagnostics` IOCs. diff --git a/doc/specific_iocs/datastreaming/ISISDSLayout.drawio.svg b/doc/specific_iocs/datastreaming/ISISDSLayout.drawio.svg index 47341b281..a3cbf29a4 100644 --- a/doc/specific_iocs/datastreaming/ISISDSLayout.drawio.svg +++ b/doc/specific_iocs/datastreaming/ISISDSLayout.drawio.svg @@ -1,4 +1,4 @@ -
hook for run starts/stops
hook for run starts/stops
NeXus File
NeXus File
EV44 events OR hs01 histograms, pulse metadata
EV44 events OR hs01 histograms, pulse metadata
fc00 forwarder config
fc00 forwarder config
f144 (and associated SE schemas), ev44, hs01, pulse meta, pl72 run starts, 6s4t runstops 
f144 (and associated SE schemas), ev44, hs01, pulse meta, pl72 run starts, 6s4t runstops 
filewriter status x5f2
filewriter status x5f2
Kafka Cluster
Kafka Cluster
Forwarded PV updates from monitor (f144 and others)
Forwarded PV updates from monitor (f144 and others)
forwarder
forwarder
Blockserver
Blockserver
Polls for block names
Polls for block names
fc00 forwarder config - blocks and archived vals. runlog?
fc00 forwarder config - blocks and archived vals. runlog?
BSTOKAFKA
BSTOKAFKA
filewriter
filewriter
UDP
UDP
(kdae_control)~
process that
A) bridges FPGA UDP config and EPICS and
B) provides an interface very similar to ISISDAE

(kdae_control)~...
For new instruments
udp2kafka
udp2kafka
FPGA streaming boards
FPGA streaming boards
UDP
UDP
pl72 run starts/ 6s4t run stops
pl72 run starts/ 6s4t run stops
ISISDAE IOC
ISISDAE IOC
DCOM
DCOM
NIVISA/Qxtream respectively
NIVISA/Qxtream respectively
ISISICP
ISISICP
DAE2/DAE3
DAE2/DAE3
For existing dae2/dae3 insts
PV for block names in order
to construct pl72 run starts
PV for block names in order...
ev44/hs00
ev44/hs00
(kdae_diagnostics)
consumer soft ioc that provides areadetector live view from kafka stream/spectra plots over epics
(kdae_diagnostics)...
IOCs  - blocks and archived vals, maybe runlog
IOCs  - blocks and a...
Text is not SVG - cannot display
\ No newline at end of file +
hook for run starts/stops
hook for run starts/stops
NeXus File
NeXus File
EV44 events OR hs01 histograms, pulse metadata
EV44 events OR hs01 histograms, pulse metadata
fc00 forwarder config
fc00 forwarder config
f144 (and associated SE schemas), ev44, hs01, pulse meta, pl72 run starts, 6s4t runstops 
f144 (and associated SE schemas), ev44, hs01, pulse meta, pl72 run starts, 6s4t runstops 
filewriter status x5f2
filewriter status x5f2
Kafka Cluster
Kafka Cluster
Forwarded PV updates from monitor (f144 and others)
Forwarded PV updates from monitor (f144 and others)
forwarder
forwarder
Blockserver
Blockserver
Polls for block names
Polls for block names
fc00 forwarder config - blocks and archived vals. runlog?
fc00 forwarder config - blocks and archived vals. runlog?
BSTOKAFKA
BSTOKAFKA
filewriter
filewriter
UDP
UDP
(kafka_dae_control)~
process that
A) bridges FPGA UDP config and EPICS and
B) provides an interface very similar to ISISDAE

(kafka_dae_control)~...
For new instruments
udp2kafka
udp2kafka
FPGA streaming boards
FPGA streaming boards
UDP
UDP
pl72 run starts/ 6s4t run stops
pl72 run starts/ 6s4t run stops
ISISDAE IOC
ISISDAE IOC
DCOM
DCOM
NIVISA/Qxtream respectively
NIVISA/Qxtream respectively
ISISICP
ISISICP
DAE2/DAE3
DAE2/DAE3
For existing dae2/dae3 insts
PV for block names in order
to construct pl72 run starts
PV for block names in order...
ev44/hs00
ev44/hs00
(kafka_dae_diagnosticsnostics)
consumer soft ioc that provides areadetector live view from kafka stream/spectra plots over epics
(kafka_dae_diagnosticsnostics)...
IOCs  - blocks and archived vals, maybe runlog
IOCs  - blocks and a...
Text is not SVG - cannot display
\ No newline at end of file diff --git a/doc/specific_iocs/datastreaming/ISISDSLayout.xml b/doc/specific_iocs/datastreaming/ISISDSLayout.xml index 85540ea1a..0bb1ceff8 100644 --- a/doc/specific_iocs/datastreaming/ISISDSLayout.xml +++ b/doc/specific_iocs/datastreaming/ISISDSLayout.xml @@ -122,7 +122,7 @@ - + @@ -195,7 +195,7 @@ - + From 0198a9e9c91f57c15a40424eabf5e24f509b0160 Mon Sep 17 00:00:00 2001 From: Jack Harper Date: Wed, 14 Jan 2026 17:17:43 +0000 Subject: [PATCH 10/18] Remove mermaid_params from conf.py Remove mermaid_params configuration from Sphinx. --- doc/conf.py | 1 - 1 file changed, 1 deletion(-) diff --git a/doc/conf.py b/doc/conf.py index 60ca34427..5875ccfa2 100644 --- a/doc/conf.py +++ b/doc/conf.py @@ -45,7 +45,6 @@ "sphinxcontrib.mermaid", ] mermaid_d3_zoom = True -mermaid_params = ["--iconPacks", "@material-icon-theme"] napoleon_google_docstring = True napoleon_numpy_docstring = False From 9fd57d7b5e926c53aa2b9b1da0b02cf218f96759 Mon Sep 17 00:00:00 2001 From: Jack Harper Date: Mon, 19 Jan 2026 11:51:57 +0000 Subject: [PATCH 11/18] add note on run start metadata --- .../datastreaming/Datastreaming-run-starts-stops.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/doc/specific_iocs/datastreaming/Datastreaming-run-starts-stops.md b/doc/specific_iocs/datastreaming/Datastreaming-run-starts-stops.md index 48a715c19..4fd0b1bee 100644 --- a/doc/specific_iocs/datastreaming/Datastreaming-run-starts-stops.md +++ b/doc/specific_iocs/datastreaming/Datastreaming-run-starts-stops.md @@ -3,3 +3,7 @@ Run starts and stops will be dealt with by [`kafka_dae_control`](https://github.com/ISISComputingGroup/kafka_dae_control) and the flatbuffers blobs will be constructed in this process. It may need to be hooked onto by `ISISDAE` for older instruments using DAE2/DAE3 and the ISISICP. +Run starts will contain static and streamed data in the `nexus_structure`, including things like `run_number`, `instrument_name` and so on which will get written to a file. + +Run starts will _also_ contain metadata used by `kafka_dae_diagnostics`, in a json schema defined by https://github.com/ISISComputingGroup/DataStreaming/issues/29 - this is so `kafka_dae_diagnostics` does not have to try and parse the `nexus_structure` as it doesn't know or care about NeXus files. + From baad7972b6288b014dadd54b62b5275f963e0dd1 Mon Sep 17 00:00:00 2001 From: Tom Willemsen Date: Wed, 21 Jan 2026 09:19:22 +0000 Subject: [PATCH 12/18] Expand Linux server specification considerations Added considerations for Linux server specifications related to data rates, including disk write performance, network interface speeds, and memory requirements. --- doc/specific_iocs/datastreaming/ADRs/003_linux.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/doc/specific_iocs/datastreaming/ADRs/003_linux.md b/doc/specific_iocs/datastreaming/ADRs/003_linux.md index 52cea9d0e..c93724afe 100644 --- a/doc/specific_iocs/datastreaming/ADRs/003_linux.md +++ b/doc/specific_iocs/datastreaming/ADRs/003_linux.md @@ -37,3 +37,7 @@ Wherever possible, software will be deployed in containers, which will minimise - The OS will be different. Developers will need _some_ understanding of Linux to maintain these servers. * Mitigation: do as little as possible on the host, ideally limit it to just having a container engine installed via a configuration management tool such as Ansible. - Data-streaming infrastructure will not be on the NDH/NDX machine with the rest of IBEX. This is fine - EPICS is explicitly designed to run in a distributed way. +- We will need to carefully consider the system specification of the Linux server in order to ensure it is adequate for expected data rates (including data rates from e.g. noisy detectors, to a point). In particular we expect to need to carefully consider: + * Disk write performance (for the Kafka broker and the filewriter) + * Network interface speeds (both from the electronics into this server, and from this server onwards to consumers such as Mantid) + * Memory (for any processes which need to histogram the data - must be able to keep a histogram in memory) From d5b034ad8320ce8ba4dde35757f709867c851a5b Mon Sep 17 00:00:00 2001 From: Tom Willemsen Date: Wed, 21 Jan 2026 09:23:09 +0000 Subject: [PATCH 13/18] Clarify containerization needs for data streaming software Expanded on the need for containerized data streaming software due to new detector technology and the limitations of WSL on Windows. --- doc/specific_iocs/datastreaming/ADRs/003_linux.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/specific_iocs/datastreaming/ADRs/003_linux.md b/doc/specific_iocs/datastreaming/ADRs/003_linux.md index c93724afe..99b543533 100644 --- a/doc/specific_iocs/datastreaming/ADRs/003_linux.md +++ b/doc/specific_iocs/datastreaming/ADRs/003_linux.md @@ -13,7 +13,7 @@ Streaming software is linux-centric and running Kafka itself natively on Windows Running software under WSL is an option, but is not [built for production use.](https://learn.microsoft.com/en-us/windows/wsl/faq#can-i-use-wsl-for-production-scenarios-) It also adds several layers of complexity with networking, file systems and so on. Container networking configuration is difficult and limited using WSL. -We would ideally like to run the data streaming software in containers as they offer security, deployment and repeatability benefits and are extremely popular in the software development industry. +We would ideally like to run the data streaming software in containers as they offer security, deployment and repeatability benefits and are extremely popular in the software development industry. HRPD-X coming online with a new detector technology means that we will need to be able to easily apply new versions of data-streaming software at short notice. Containers give us an easy, repeatable path to be able to do this. Docker Desktop _is_ supported on Windows, but uses the WSL with a strict licensing agreement which does not suit our needs. Other alternatives also use the WSL. Many container configuration options (e.g. host networking, volume mounting options) cannot be supported with containers on Windows (whether via Docker desktop or another solution). From 1bf9bbf15ff6b85130b0d10fd35a7153266bbcc7 Mon Sep 17 00:00:00 2001 From: Tom Willemsen Date: Wed, 21 Jan 2026 09:25:11 +0000 Subject: [PATCH 14/18] Update Linux ADR with data streaming details Added considerations for data streaming stack and container configuration. --- doc/specific_iocs/datastreaming/ADRs/003_linux.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/doc/specific_iocs/datastreaming/ADRs/003_linux.md b/doc/specific_iocs/datastreaming/ADRs/003_linux.md index 99b543533..8b92b4486 100644 --- a/doc/specific_iocs/datastreaming/ADRs/003_linux.md +++ b/doc/specific_iocs/datastreaming/ADRs/003_linux.md @@ -41,3 +41,5 @@ Wherever possible, software will be deployed in containers, which will minimise * Disk write performance (for the Kafka broker and the filewriter) * Network interface speeds (both from the electronics into this server, and from this server onwards to consumers such as Mantid) * Memory (for any processes which need to histogram the data - must be able to keep a histogram in memory) + - The data streaming stack will be unaffected by a restart of the NDX system, and will keep running in the background. + - We will configure the relevant containers for data streaming software to automatically start on reboot of the data streaming Linux server. From 12deb337a3b7b85eb9b5ba74d2ff1e05121b52b4 Mon Sep 17 00:00:00 2001 From: Tom Willemsen Date: Wed, 21 Jan 2026 09:27:11 +0000 Subject: [PATCH 15/18] Revise status in 001_histograms.md Updated status to reflect pending discussions with HRPD-X parties. --- doc/specific_iocs/datastreaming/ADRs/001_histograms.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/specific_iocs/datastreaming/ADRs/001_histograms.md b/doc/specific_iocs/datastreaming/ADRs/001_histograms.md index 28ab978a0..08caccb08 100644 --- a/doc/specific_iocs/datastreaming/ADRs/001_histograms.md +++ b/doc/specific_iocs/datastreaming/ADRs/001_histograms.md @@ -3,7 +3,7 @@ ## Status -Current, but may be superseded after HRPD-X. +Pending discussion with HRPD-X interested parties (including instrument scientists & Mantid). ## Context From b9ac01f66938d40a3817aac8fe1bfc4dfbaf6826 Mon Sep 17 00:00:00 2001 From: Jack Harper Date: Wed, 21 Jan 2026 13:16:36 +0000 Subject: [PATCH 16/18] add notes on approaches for running software --- .../datastreaming/ADRs/003_linux.md | 82 ++++++++++++++++++- 1 file changed, 79 insertions(+), 3 deletions(-) diff --git a/doc/specific_iocs/datastreaming/ADRs/003_linux.md b/doc/specific_iocs/datastreaming/ADRs/003_linux.md index 8b92b4486..05a834793 100644 --- a/doc/specific_iocs/datastreaming/ADRs/003_linux.md +++ b/doc/specific_iocs/datastreaming/ADRs/003_linux.md @@ -25,17 +25,91 @@ Additionally, Windows Licensing costs have recently changed significantly (as of ISIS has made it clear that Linux support will be provided in the medium term. +There are a few different options with associated risks and benefits: + +### 1 - Running natively on the (Windows) NDX machines + +This has the benefit that everything runs along with the rest of IBEX, though we would have to spend extra development effort trying to port software to Windows which we may undo in the future if moving to Linux anyway. + +The NDX already exists so doesn't add any system administration requirements to be able to run the software. + +The main drawbacks for this approach are that: +- the NDXes do not have sufficient resources for the software currently +- Windows is different to everyone else running the streaming software, and we may waste effort porting software to Windows +- Kafka itself will not run on Windows +- Deployment and patching will become difficult for a system which will need to be patched frequently and will interfere with the rest of the control system. As an example, we do not use virtual environments in python, so updating a python depdendency for the `forwarder` runs the risk that it may break something in user scripts and/or the block server, which would be very bad. +- Task scheduling/service management can be difficult on Windows, we could use `procserv` but in most cases we don't require an interactive terminal for processes. +- Moving away from Windows is on the roadmap - we would be creating more work for ourselves when we migrate over eventually. + + +### 2 - Running natively on a separate Linux machine + +This adds another machine in the data streaming stack which _could_ fail and stop data collection. + +Running the streaming software (which is designed to be run on Linux) natively would be the most performant of all the approaches. + +The main benefit to this is that we can specify the hardware requirements independently so they are suitable for the streaming software, with enough overhead to run Kafka on if needed. + +Tooling is generally more available for deployment and patching than Windows, however this is not something we are familiar with as a team. + +The downside is that if the setup of this machine isn't entirely automated it could be very difficult to maintain and/or reproduce if a hardware failure occurred. + +As well as this, more Linux system administration knowledge is required by the team. + +Operating system updates could inadvertently affect the processes running on the machine, which could cause issues if we set the system to install unattended upgrades. + +### 3 - Running in containers on the NDX machines + +The benefit of doing this is that all services can be brought up and down together with the rest of the control system. It is also one less link in the data streaming chain to fail. + +This would require the use of the WSL which is not specifically designed for production use and has limited functionality with containers due to host-network mode issues and so on. +NDXes are virtualised and have _very_ limited resources. + +Another drawback is that we currently the current IBEX deployment method makes it difficult to patch these services as easily - moving these services elsewhere means we do not have to interrupt e.g. sample environment scripts to restart/redeploy new versions of DAE processes. + +We are unable to use Docker desktop, required for docker engine on Windows, without paid licensing. An alternative is Rancher Desktop or podman. These both use underlying VMs. + +### 4 - Running in containers on a Linux VM on the NDH + +This has the same benefits as above but would allow us to run a Linux VM alongside the NDX which lets us avoid WSL oddities. +Additionally, there should be more flexibility in deployment and patching, and less interference with the rest of IBEX. + +NDHes are currently very limited on resources, much like the NDXes which run on them. This is the main sticking point for this approach. + +Alongside this, virtual machines do generally introduce a performance penalty - the exact figure for this depends on several factors but it will never be as fast as a native application or a container which shares the kernel. For fast processes such as live histogramming and event processing we may require high performance which could be limited by a virtual machine. + +### 5 - Running in containers on a separate Linux machine + +This adds another machine in the data streaming stack which _could_ fail and stop data collection. +As well as this, it will rely on the instrument's local network switch, which could also fail (though this is applicable to any of the options as the WLSF boards will be streaming over this switch) + +Another downside, which affects the above approach, is that some system administration knowledge will be required to keep the operating system alive and secure. If we are using containers this should be very minimal. + +This shares the benefit of being able to specify suitable hardware requirements as approach 2. + +Containers are much more easily reproducible than native software. The wider industry is moving towards them generally because of this amongst other reasons. + +Deployment, patching and orchestration is also very widely supported by several frameworks with containers. We should decide exactly what we're doing at a later point, but if we started with [`docker-compose`](https://docs.docker.com/compose/) as a simple first step, it is straightforward to move towards something like [Kubernetes](https://kubernetes.io/) instead if we decide we need the features it offers. + +Containers also provide a cyber-security benefit in that the processes are isolated individually. + +DSG already have some container-based software to convert UDP streams to Flatbuffers blobs - we could quite easily host this for them if we have container infrastructure. This applies to any of the approaches that offer it. + +If we end up being responsible for running the Kafka instance for HRPD-x, adding this is straightforward - Redpanda and the other Kafka implementations all offer production-ready container images. + ## Decision -New or repurposed hardware, running Linux, will be used to run the streaming software as shown {ref}`here`. +New hardware, running Linux, will be used to run the streaming software as shown {ref}`here`. -Wherever possible, software will be deployed in containers, which will minimise the amount of Linux systems administration knowledge required. The aim will be for the Linux machine to 'only' have a container engine (such as docker or podman) installed, and very little else. +Wherever possible, software will be deployed in containers, which will minimise the amount of Linux systems administration knowledge required. The aim will be for the Linux machine to 'only' have a container engine (such as docker or podman) installed, and very little else. Containers will use health checks and auto-restarting to ensure reliability. We will decide on exact deployment and orchestration methods later on, but there are several approaches to choose from. + +Exact specifications will depend on data rates and prototype testing. ## Consequences - We are able to use Linux-centric technologies and tools, without needing to spend large amounts of time inventing workarounds for Windows. - The OS will be different. Developers will need _some_ understanding of Linux to maintain these servers. - * Mitigation: do as little as possible on the host, ideally limit it to just having a container engine installed via a configuration management tool such as Ansible. + * Mitigation: do as little as possible on the host, ideally limit it to just having a container engine installed via a configuration management tool such as Ansible. Some Linux distributions come with this out of the box such as [Fedora CoreOS](https://docs.fedoraproject.org/en-US/fedora-coreos/) or [RancherOS](https://rancher.com/docs/os/v1.x/en/) - Data-streaming infrastructure will not be on the NDH/NDX machine with the rest of IBEX. This is fine - EPICS is explicitly designed to run in a distributed way. - We will need to carefully consider the system specification of the Linux server in order to ensure it is adequate for expected data rates (including data rates from e.g. noisy detectors, to a point). In particular we expect to need to carefully consider: * Disk write performance (for the Kafka broker and the filewriter) @@ -43,3 +117,5 @@ Wherever possible, software will be deployed in containers, which will minimise * Memory (for any processes which need to histogram the data - must be able to keep a histogram in memory) - The data streaming stack will be unaffected by a restart of the NDX system, and will keep running in the background. - We will configure the relevant containers for data streaming software to automatically start on reboot of the data streaming Linux server. + +The above will be impacted if we are required to run a Kafka instance on the streaming machine - this is unclear as of January 2026. Redpanda provides [some documentation on hardware requirements](https://docs.redpanda.com/current/deploy/redpanda/manual/production/requirements/) which we should consider. From c2bb1f086b741dacac1f31fbe32e50c74b846846 Mon Sep 17 00:00:00 2001 From: Jack Harper Date: Wed, 21 Jan 2026 13:28:40 +0000 Subject: [PATCH 17/18] words --- doc/specific_iocs/datastreaming/ADRs/003_linux.md | 4 ++-- doc/spelling_wordlist.txt | 2 ++ 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/doc/specific_iocs/datastreaming/ADRs/003_linux.md b/doc/specific_iocs/datastreaming/ADRs/003_linux.md index 05a834793..077edcbf1 100644 --- a/doc/specific_iocs/datastreaming/ADRs/003_linux.md +++ b/doc/specific_iocs/datastreaming/ADRs/003_linux.md @@ -37,9 +37,9 @@ The main drawbacks for this approach are that: - the NDXes do not have sufficient resources for the software currently - Windows is different to everyone else running the streaming software, and we may waste effort porting software to Windows - Kafka itself will not run on Windows -- Deployment and patching will become difficult for a system which will need to be patched frequently and will interfere with the rest of the control system. As an example, we do not use virtual environments in python, so updating a python depdendency for the `forwarder` runs the risk that it may break something in user scripts and/or the block server, which would be very bad. +- Deployment and patching will become difficult for a system which will need to be patched frequently and will interfere with the rest of the control system. As an example, we do not use virtual environments in python, so updating a python dependency for the `forwarder` runs the risk that it may break something in user scripts and/or the block server, which would be very bad. - Task scheduling/service management can be difficult on Windows, we could use `procserv` but in most cases we don't require an interactive terminal for processes. -- Moving away from Windows is on the roadmap - we would be creating more work for ourselves when we migrate over eventually. +- Moving away from Windows is on the road-map - we would be creating more work for ourselves when we migrate over eventually. ### 2 - Running natively on a separate Linux machine diff --git a/doc/spelling_wordlist.txt b/doc/spelling_wordlist.txt index e790b24b7..d69c71869 100644 --- a/doc/spelling_wordlist.txt +++ b/doc/spelling_wordlist.txt @@ -163,6 +163,7 @@ Culham customisable cxx Cybaman +cyber cybersecurity cygwin DAC @@ -933,6 +934,7 @@ vhds vhdx viewmodel virtualbox +virtualised vis Viscotherm vm From 11a12674edc02ad8f6d06b3dfc866db6d8610e7a Mon Sep 17 00:00:00 2001 From: Jack Harper Date: Fri, 23 Jan 2026 09:39:17 +0000 Subject: [PATCH 18/18] specify partitions for topics --- .../datastreaming/Datastreaming-Topics.md | 22 +++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/doc/specific_iocs/datastreaming/Datastreaming-Topics.md b/doc/specific_iocs/datastreaming/Datastreaming-Topics.md index 03ec0f48a..1ed5f7acd 100644 --- a/doc/specific_iocs/datastreaming/Datastreaming-Topics.md +++ b/doc/specific_iocs/datastreaming/Datastreaming-Topics.md @@ -2,8 +2,12 @@ We have a number of topics per-instrument on `livedata`, the {ref}`Kafka cluster` we use. +Partition numbers are listed below. For variable partitions this will depend on the throughput requirements of the specific instrument. + ## `_runInfo` +partitions: 1 + This contains run start and run stop flatbuffers blobs. Flatbuffers schemas in this topic: @@ -12,6 +16,8 @@ Flatbuffers schemas in this topic: ## `_events` +partitions: variable + This contains data from event-mode events. Flatbuffers schemas in this topic: @@ -19,6 +25,8 @@ Flatbuffers schemas in this topic: {#topics_sampleenv} ## `_sampleEnv` + +partitions: 1 This contains sample environment data forwarded from EPICS. In a `.nxs` file this should end up in `raw_data_1/selog/` @@ -32,6 +40,8 @@ Flatbuffers schemas in this topic: ## `_runLog` +partitions: 1 + This contains run metadata forwarded from the ICP. In a `.nxs` file this should end up in `raw_data_1/runlog/` @@ -39,34 +49,46 @@ Schemas in this topic match the ones in {ref}`topics_sampleenv` ## `_monitorHistograms` +partitions: variable + This contains monitor histograms. Flatbuffers schemas in this topic: - [`hs01` - Histograms](https://github.com/ess-dmsc/streaming-data-types/blob/master/schemas/hs01_event_histogram.fbs) ## `_detSpecMap` +partitions: 1 + This contains details of the detector-spectrum mapping. Flatbuffers schemas in this topic: - [`df12` - Detector-spectrum mapping](https://github.com/ess-dmsc/streaming-data-types/blob/master/schemas/df12_det_spec_map.fbs) ## `_areaDetector` +partitions: variable + This is raw `areaDetector` data. It's sent by [this line in `ISISDAE`](https://github.com/ISISComputingGroup/EPICS-ioc/blob/716aada58c972cf0661ab6cebc41fba34d29b806/ISISDAE/iocBoot/iocISISDAE-IOC-01/liveview.cmd#L8) ## `_forwarderConfig` +partitions: 1 + This is the forwarder configuration, sent by {ref}`bskafka`. Flatbuffers schemas in this topic: - [`fc00` - Forwarder configuration](https://github.com/ess-dmsc/streaming-data-types/blob/master/schemas/fc00_forwarder_config.fbs) ## `_forwarderStatus` +partitions: 1 + This is the forwarder status topic which contains details about what PVs the forwarder is forwarding. Flatbuffers schemas in this topic: - [`x5f2` - General status](https://github.com/ess-dmsc/streaming-data-types/blob/master/schemas/x5f2_status.fbs) ## `_forwarderStorage` +partitions: 1 + This is the last known forwarder configuration, sent by {ref}`bskafka`. This is for if the forwarder crashes, then it can quickly retrieve its last configuration. Flatbuffers schemas in this topic: - [`fc00` - Forwarder Configuration](https://github.com/ess-dmsc/streaming-data-types/blob/master/schemas/fc00_forwarder_config.fbs)