Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -9,20 +9,20 @@
datastreaming/*
```

The data streaming system is being built as a requirement for HRPD-X and possibly SANDALS-II, separate (and complementary) to the `MNeuData` project. HRPD-X, SANDALS-II and other future instruments will not have a traditional DAE2/DAE3 as they are now obsolete. It is architecturally similar to the system that the ESS uses to take data (neutron events, sample environment, and anything else that we can throw into a streaming platform) and write it to file. Previously ISIS aided development to the ESS' streaming pipeline as part of an in-kind project. The system will replace the ICP at ISIS.
The data streaming system is being built as a requirement for HRPD-X and possibly SANDALS-II, separate (and complementary) to the `MNeuData` project. HRPD-X, SANDALS-II and other future instruments will not have a traditional DAE2/DAE3 as they are now obsolete. It is architecturally similar to the system that the ESS uses to take data (neutron events, sample environment, and anything else that we can throw into a streaming platform) and write it to file. Previously ISIS aided development to the ESS' streaming pipeline as part of an in-kind project. The system will replace the ICP at ISIS. Note that all documentation here is applicable to the HRPD-X upgrade and may change in the future.

In general this works by producing both neutron events and histograms, sample environment data, and other diagnostic data into a [Kafka](https://kafka.apache.org/) cluster and having clients (consumers in Kafka lingo!) that either view data live and act on it or write the data to a nexus file. Additional information can be found [here](http://accelconf.web.cern.ch/AccelConf/icalepcs2017/papers/tupha029.pdf) and [here](https://iopscience.iop.org/article/10.1088/1742-6596/1021/1/012013).

All data is serialised into [Flatbuffers](https://flatbuffers.dev/) blobs using [these schemas](https://github.com/ess-dmsc/streaming-data-types) - we have a tool called [saluki](https://github.com/ISISComputingGroup/saluki) which can deserialise these and make them human-readable after they've been put into Kafka.

Overall architecture is as follows:

![](ISISDSLayout.drawio.svg)
![](datastreaming/ISISDSLayout.drawio.svg)

This comprises of a few different consumers and producers:
- [`azawakh`](https://github.com/ISISComputingGroup/azawakh) - This is a soft IOC which provides `areaDetector` views, spectra plots and so on by consuming events from the cluster and displaying them over EPICS CA/PVA.
- [`borzoi`](https://github.com/ISISComputingGroup/borzoi) - This is also a soft IOC which is more or less a drop-in replacement for the ISISDAE. It provides an interface that several clients (ie. [genie](https://github.com/ISISComputingGroup/genie), [ibex_bluesky_core](https://github.com/ISISComputingGroup/ibex_bluesky_core), [ibex_gui](https://github.com/ISISComputingGroup/ibex_gui)) talk to to start/stop runs and configure streaming electronics. `borzoi` will send UDP packets to the streaming electronics to configure it.
- [`BSTOKAFKA`](https://github.com/ISISComputingGroup/BSKAFKA) - This configures the `forwarder` with the blocks that are in an instrument's current configuration, as well as other PVs which will either get written to a file or archived for e.g. the log plotter.
- [`kafka_dae_diagnostics`](https://github.com/ISISComputingGroup/kafka_dae_diagnostics) - This is a soft IOC which provides `areaDetector` views, spectra plots and so on by consuming events from the cluster and displaying them over EPICS CA/PVA.
- [`kafka_dae_control`](https://github.com/ISISComputingGroup/kafka_dae_control) - This is also a soft IOC which is more or less a drop-in replacement for the ISISDAE. It provides an interface that several clients (ie. [genie](https://github.com/ISISComputingGroup/genie), [ibex_bluesky_core](https://github.com/ISISComputingGroup/ibex_bluesky_core), [ibex_gui](https://github.com/ISISComputingGroup/ibex_gui)) talk to to start/stop runs and configure streaming electronics. `kafka_dae_control` will send UDP packets to the streaming electronics to configure it.
- [`kafka_forwarder_configurer`](https://github.com/ISISComputingGroup/kafka_forwarder_configurer) - This configures the `forwarder` with the blocks that are in an instrument's current configuration, as well as other PVs which will either get written to a file or archived for e.g. the log plotter.
- `forwarder` - See [Forwarding Sample Environment](datastreaming/Datastreaming---Sample-Environment)
- `filewriter` - See [File writing](datastreaming/Datastreaming---File-writing)

Expand All @@ -33,7 +33,7 @@ There is a (non-production!) [Redpanda](https://www.redpanda.com/) Kafka cluster
A web interface is available [here](https://reduce.isis.cclrc.ac.uk/redpanda-console/overview).

:::{important}
It was decided that we no longer maintain the Kafka cluster, and it will be handled by the the Flexible Interactive
It was decided that we no longer maintain the Kafka cluster, and it will be handled by the Flexible Interactive
Automation team. See `\\isis\shares\ISIS_Experiment_Controls\On Call\autoreduction_livedata_support.txt` for their
support information.
:::
Expand Down
4 changes: 0 additions & 4 deletions doc/specific_iocs/dae/ISISDSLayout.drawio.svg

This file was deleted.

This file was deleted.

This file was deleted.

9 changes: 9 additions & 0 deletions doc/specific_iocs/datastreaming/ADRs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Data streaming: ADRs

```{toctree}
:glob:
:titlesonly:
:maxdepth: 1

ADRs/*
```
44 changes: 44 additions & 0 deletions doc/specific_iocs/datastreaming/ADRs/000_kafka.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# 0 - Kafka

## Status

Accepted

## Context

We need to decide on a technology through which we are going to do data streaming.

There are several options here:
- Kafka or Kafka compatible solutions such as Redpanda
- Redis
- ZeroMQ/RabbitMQ/ActiveMQ

Within each of these options we need to decide on a serialization format.
Options are:
- protobuffers
- flatbuffers with ESS schemas
- JSONB
- msgpack
- Avro
- encoded JSON/BSON


## Decision

We have decided to use a Kafka compatible broker as a streaming platform. This may be either Kafka or Redpanda.

This is because we can lean on the ESS experience in using this technology and may be able to collaborate with them and use shared tools.
Flatbuffers encoding was performance tested during the in-kind project and showed good performance versus the alternatives at the time.

We have also decided to serialize the data using the [ESS flatbuffers schemas](https://github.com/ess-dmsc/streaming-data-types) with ISIS additions where necessary.

Kafka is a broker-based streaming technology - as opposed to brokerless systems which do not keep messages. This allows a Kafka-based system to replay messages or for a consumer to catch up with the 'history' of a stream. We will not retain events in Kafka indefinitely - retention will be tuned to keep a suitable number of messages for our use-cases versus hardware constraints.

## Consequences

What becomes easier or more difficult to do because of this change?

Kafka is indisputably harder to set up than some other simpler alternatives. This is somewhat mitigated by its scaling and redundancy benefits.
We don't intend to do a large amount in Kafka itself (ie. transforms or stream processors)

The advantage of using Kafka is that we keep much more closely aligned to the ESS, CLF, ANSTO and other facilities who are all using Kafka with Flatbuffers schemas.
32 changes: 32 additions & 0 deletions doc/specific_iocs/datastreaming/ADRs/001_histograms.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{#001_histograms}
# 1 - Histograms and event mode

## Status

Pending discussion with HRPD-X interested parties (including instrument scientists & Mantid).

## Context

**Histogram mode**

In histogram mode, over the course of a run, counts are accumulated into a running histogram, binned by user-specified
time channel boundaries.

**Event mode**

In event mode, over the course of a run, each individual neutron event's detection time and detector ID is recorded.
Event mode data can be later binned to form a histogram, but a histogram cannot be recovered to individual events. In
other words, histogramming is lossy. The advantage of histogram mode is that it typically produces smaller data volumes.

Histogram mode has historically been used due to hardware limitations in many cases.

## Decision

For HRPD-x, we will collect all data, including data from neutron monitors, in event mode only. HRPD-x will not support
histogram mode.

## Consequences

- Data volumes on HRPD-x will be higher running in event mode compared to histogram mode. This includes both data in-flight
during networking and Kafka processing, as well as final Nexus file sizes.
- Only considering events will simplify components of the HRPD-x data streaming implementation.
29 changes: 29 additions & 0 deletions doc/specific_iocs/datastreaming/ADRs/002_spectra_mapping.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# 2 - Wiring and Spectra mapping

## Status

pending

## Context

Wiring tables is a concept that still exists however the format is now different (`.csv` as opposed to the old wiring table format).

The options that we're considering are:
- change `.csv` to align with the old format
- write a service/script to convert to/from `.csv`
- keep the two formats separate, acknowledging that they will not be backwards or forwards compatible

Spectra files share the above considerations as they also use a different file format.

Grouping spectra in hardware was primarily used to get around limitations of DAE hardware. In event mode there is no advantage to grouping spectra in hardware.

## Decision

We are not going to support the old-style spectra files or any spectrum mapping/grouping in general

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spectra file could also be used to disable collecting from a noisy detector (using spectrum 0) - is this possible via a different route?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's a noisy detector we probably don't want it streamed at all - we probably want to just not map it (before it ever hits kafka)?

Copy link
Member

@FreddieAkeroyd FreddieAkeroyd Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

saving spectrum 0 to file was optional, so using spectrum 0 was a workaround for DAE3 to discard data as it would always send data. Is it easy for a scientist to unmap a detector?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be file writer, but there was a third table detector.dat that contained detector angle details, there was similar to a mantid instrument geometry in idea. ISISICP could read detector.dat or a saved mantid workspace to extract detector details to add to a nexus file. Excitations used to adjust these files each cycle post calibration, so just noting that there would ultimately need to be a way for scientists to adjust detector metadata for an experiment.

For wiring tables this is TBD in https://github.com/ISISComputingGroup/DataStreaming/issues/27.

## Consequences

- If HRPD-x previously grouped spectra in hardware, they will now need to be grouped in software (e.g. Mantid) instead.
- Our data streaming software will not need to support spectrum grouping.
Loading