-
Notifications
You must be signed in to change notification settings - Fork 2
Add more data streaming documentation - ADRs, more hardware architecture. #135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
rerpha
wants to merge
18
commits into
master
Choose a base branch
from
data_streaming_docs_2
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
fb82dd0
WIP - add ADRs for data streaming and other information following DSG…
rerpha f2d9199
end of day - add adr template
rerpha 98de23d
fill out remaining ADRs on vetos etc.
rerpha 57423e2
speling
rerpha e653b8f
fix reference
rerpha 61c1e9d
more speling
rerpha e8bb241
even more speling
rerpha 60a3c18
use new names and URLs
rerpha 20aa32d
use full names
rerpha 0198a9e
Remove mermaid_params from conf.py
rerpha 9fd57d7
add note on run start metadata
rerpha baad797
Expand Linux server specification considerations
Tom-Willemsen d5b034a
Clarify containerization needs for data streaming software
Tom-Willemsen 1bf9bbf
Update Linux ADR with data streaming details
Tom-Willemsen 12deb33
Revise status in 001_histograms.md
Tom-Willemsen b9ac01f
add notes on approaches for running software
rerpha c2bb1f0
words
rerpha 11a1267
specify partitions for topics
rerpha File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
13 changes: 0 additions & 13 deletions
13
doc/specific_iocs/dae/datastreaming/Datastreaming--neutron-events-histograms.md
This file was deleted.
Oops, something went wrong.
5 changes: 0 additions & 5 deletions
5
doc/specific_iocs/dae/datastreaming/Datastreaming-run-starts-stops.md
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| # Data streaming: ADRs | ||
|
|
||
| ```{toctree} | ||
| :glob: | ||
| :titlesonly: | ||
| :maxdepth: 1 | ||
|
|
||
| ADRs/* | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| # 0 - Kafka | ||
|
|
||
| ## Status | ||
|
|
||
| Accepted | ||
|
|
||
| ## Context | ||
|
|
||
| We need to decide on a technology through which we are going to do data streaming. | ||
|
|
||
| There are several options here: | ||
| - Kafka or Kafka compatible solutions such as Redpanda | ||
| - Redis | ||
| - ZeroMQ/RabbitMQ/ActiveMQ | ||
|
|
||
| Within each of these options we need to decide on a serialization format. | ||
| Options are: | ||
| - protobuffers | ||
| - flatbuffers with ESS schemas | ||
| - JSONB | ||
| - msgpack | ||
| - Avro | ||
| - encoded JSON/BSON | ||
|
|
||
|
|
||
| ## Decision | ||
|
|
||
| We have decided to use a Kafka compatible broker as a streaming platform. This may be either Kafka or Redpanda. | ||
|
|
||
| This is because we can lean on the ESS experience in using this technology and may be able to collaborate with them and use shared tools. | ||
| Flatbuffers encoding was performance tested during the in-kind project and showed good performance versus the alternatives at the time. | ||
|
|
||
| We have also decided to serialize the data using the [ESS flatbuffers schemas](https://github.com/ess-dmsc/streaming-data-types) with ISIS additions where necessary. | ||
|
|
||
| Kafka is a broker-based streaming technology - as opposed to brokerless systems which do not keep messages. This allows a Kafka-based system to replay messages or for a consumer to catch up with the 'history' of a stream. We will not retain events in Kafka indefinitely - retention will be tuned to keep a suitable number of messages for our use-cases versus hardware constraints. | ||
|
|
||
| ## Consequences | ||
|
|
||
| What becomes easier or more difficult to do because of this change? | ||
|
|
||
| Kafka is indisputably harder to set up than some other simpler alternatives. This is somewhat mitigated by its scaling and redundancy benefits. | ||
| We don't intend to do a large amount in Kafka itself (ie. transforms or stream processors) | ||
|
|
||
| The advantage of using Kafka is that we keep much more closely aligned to the ESS, CLF, ANSTO and other facilities who are all using Kafka with Flatbuffers schemas. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| {#001_histograms} | ||
| # 1 - Histograms and event mode | ||
|
|
||
| ## Status | ||
|
|
||
| Pending discussion with HRPD-X interested parties (including instrument scientists & Mantid). | ||
|
|
||
| ## Context | ||
|
|
||
| **Histogram mode** | ||
|
|
||
| In histogram mode, over the course of a run, counts are accumulated into a running histogram, binned by user-specified | ||
| time channel boundaries. | ||
|
|
||
| **Event mode** | ||
|
|
||
| In event mode, over the course of a run, each individual neutron event's detection time and detector ID is recorded. | ||
| Event mode data can be later binned to form a histogram, but a histogram cannot be recovered to individual events. In | ||
| other words, histogramming is lossy. The advantage of histogram mode is that it typically produces smaller data volumes. | ||
|
|
||
| Histogram mode has historically been used due to hardware limitations in many cases. | ||
|
|
||
| ## Decision | ||
|
|
||
| For HRPD-x, we will collect all data, including data from neutron monitors, in event mode only. HRPD-x will not support | ||
| histogram mode. | ||
|
|
||
| ## Consequences | ||
|
|
||
| - Data volumes on HRPD-x will be higher running in event mode compared to histogram mode. This includes both data in-flight | ||
| during networking and Kafka processing, as well as final Nexus file sizes. | ||
| - Only considering events will simplify components of the HRPD-x data streaming implementation. |
29 changes: 29 additions & 0 deletions
29
doc/specific_iocs/datastreaming/ADRs/002_spectra_mapping.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| # 2 - Wiring and Spectra mapping | ||
|
|
||
| ## Status | ||
|
|
||
| pending | ||
|
|
||
| ## Context | ||
|
|
||
| Wiring tables is a concept that still exists however the format is now different (`.csv` as opposed to the old wiring table format). | ||
|
|
||
| The options that we're considering are: | ||
| - change `.csv` to align with the old format | ||
| - write a service/script to convert to/from `.csv` | ||
| - keep the two formats separate, acknowledging that they will not be backwards or forwards compatible | ||
|
|
||
| Spectra files share the above considerations as they also use a different file format. | ||
|
|
||
| Grouping spectra in hardware was primarily used to get around limitations of DAE hardware. In event mode there is no advantage to grouping spectra in hardware. | ||
|
|
||
| ## Decision | ||
|
|
||
| We are not going to support the old-style spectra files or any spectrum mapping/grouping in general | ||
|
|
||
| For wiring tables this is TBD in https://github.com/ISISComputingGroup/DataStreaming/issues/27. | ||
|
|
||
| ## Consequences | ||
|
|
||
| - If HRPD-x previously grouped spectra in hardware, they will now need to be grouped in software (e.g. Mantid) instead. | ||
| - Our data streaming software will not need to support spectrum grouping. | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spectra file could also be used to disable collecting from a noisy detector (using spectrum 0) - is this possible via a different route?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's a noisy detector we probably don't want it streamed at all - we probably want to just not map it (before it ever hits kafka)?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
saving spectrum 0 to file was optional, so using spectrum 0 was a workaround for DAE3 to discard data as it would always send data. Is it easy for a scientist to unmap a detector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be file writer, but there was a third table
detector.datthat contained detector angle details, there was similar to a mantid instrument geometry in idea. ISISICP could readdetector.dator a saved mantid workspace to extract detector details to add to a nexus file. Excitations used to adjust these files each cycle post calibration, so just noting that there would ultimately need to be a way for scientists to adjust detector metadata for an experiment.