-
Notifications
You must be signed in to change notification settings - Fork 11
Deadline monitor implementation #41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadline monitor implementation #41
Conversation
License Check Results🚀 The license check job ran with the Bazel command: bazel run //:license-checkStatus: Click to expand output |
|
The created documentation from the pull request is available at: docu-html |
5027d80 to
9d36a69
Compare
9d36a69 to
d48a7c6
Compare
d48a7c6 to
c10992a
Compare
| match possible_err { | ||
| PossibleError::None => {}, | ||
| PossibleError::TooEarly => { | ||
| error!("Deadline {:?} stopped too early", self.tag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be helpful if we can also log here by who much the deadline was missed.
Something like:
Deadline xy stopped 5ms too early
Deadline xy stopped 5ms too late
This would be very useful when investigating why supervision failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| (self.0 & DEADLINE_STATE_FINISHED_TOO_EARLY) != 0 | ||
| } | ||
|
|
||
| pub(super) fn timestamp(&self) -> u32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be helpful to include somewhere the unit of this timestamp. Either in documentation or in naming of the variable/method itself (e.g. timestamp_ms or similar).
Reading the code I am not really sure is this a timestamp in ms or ns.
Similarly, only from other code I can infer that this is actually the absolute timestamp of the max deadline. But from the naming this does not become clear to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| const DEADLINE_STATE_RUNNING: u64 = 0b0000_0010; | ||
| const DEADLINE_STATE_STOPPED: u64 = 0b0000_0001; | ||
| const DEADLINE_STATE_FINISHED_TOO_EARLY: u64 = 0b0000_1000; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, what is the motivation for these bitmasks compared to using for example regular attributes for timestamp and state (e.g. running, stopped, etc)?
Is it about concurrency to only need to update single value or about saving a few bytes?
I was also wondering what is the motivation for constraining the timestamp to u32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make a simple algorithm lock-free, we have to fit into 64bits. Then value and state are consistent, and doing it with spsc fashion is simple. Can also be done differently, but this pattern normally works well, and the whole code uses abstraction on top of it so you don't see all this nesty bit manipulation at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For timestamp being u32 as written in code, it will cover ~~49days (we use offsetting and not real unix timestamp). I don't expect we need to support longer power cycles. The motivation is the same, you need to make room in 64bit value. However, we can always increase since there are still a few bits left.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose 49days would be enough, but it might be worth validating that with the other members of the group. Maybe there could be some use case were the system is only going into a low-power mode but never really shut off completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to use u128 for the state instead, so we could use u64 for the timestamp? That should be enough for all use cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be used, but i woud still tend to keep it 64 and just instead 32 bytes use 50 bytes which would till end of the world (35k years?). I would say we can check this in CFT and i can update i follow ups if its okey.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| u32::try_from(duration.as_millis()).expect("Monitor running for too long") | ||
| } | ||
|
|
||
| fn evaluate(&self, mut on_failed: impl FnMut(&IdentTag, bool)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if code readability would be improved by encoding the failure modes too early / too late as an enum or similar instead of encoding this as true/false when calling on_failed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
NicolasFussberger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a first review. Though it might be better to get another review from someone with deeper Rust experience.
| // | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
| // | ||
| #![allow(dead_code)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move allow to ProtectedMemoryAllocator? I mean - make it struct-wide, not module-wide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will leave it, soon there will be examples and i will remove it
src/health_monitoring_lib/src/log.rs
Outdated
| pub(crate) use {debug, error, fatal, info, trace, warning as warn}; | ||
|
|
||
| // Re-export symbols from `score_log`. | ||
| pub(crate) use score_log::fmt::Error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird formatting:
pub(crate) use score_log::fmt::{Error, FormatSpec, ScoreDebug, Writer, DebugStruct};
pub(crate) use score_log::ScoreDebug;There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto formatted but fixed
|
|
||
| /// Errors that can occur when working with Deadline instances | ||
| #[derive(Debug, PartialEq, ScoreDebug, Eq, Clone, Copy, Hash)] | ||
| pub enum DeadlineErrors { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consistency -> DeadlineError.
BTW I'd merge all errors of this component into shared type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that then you have to match cases that do not exist. In this particular usecase I tihnk its better to have them separate.
- Deadline API in Rust - usage fo score_log
912137d to
6868928
Compare
6868928 to
f1c0de9
Compare
| PhmDaemon::PhmDaemon(score::lcm::saf::timers::OsClockInterface& f_osClock, logging::PhmLogger& f_logger_r, | ||
| std::unique_ptr<watchdog::IWatchdogIf> f_watchdog, std::unique_ptr<score::lcm::IProcessStateReceiver> f_process_state_receiver) : | ||
| osClock{f_osClock}, cycleTimer{&osClock}, logger_r{f_logger_r}, swClusterHandlers{}, watchdog(std::move(f_watchdog)), processStateReader{std::move(f_process_state_receiver)} | ||
| osClock{f_osClock}, cycleTimer{&osClock}, logger_r{f_logger_r}, swClusterHandlers{}, processStateReader{std::move(f_process_state_receiver)}, watchdog(std::move(f_watchdog)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebase leftover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, fixes issues introduced on master ;]
| /// The closure receives the current state snapshot and should return | ||
| /// an Option with the new state snapshot. If None is returned, the state | ||
| /// is not updated. | ||
| pub(super) fn update<F: FnMut(DeadlineStateSnapshot) -> Option<DeadlineStateSnapshot>>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test for this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| const DEADLINE_STATE_RUNNING: u64 = 0b0000_0010; | ||
| const DEADLINE_STATE_STOPPED: u64 = 0b0000_0001; | ||
| const DEADLINE_STATE_FINISHED_TOO_EARLY: u64 = 0b0000_1000; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to use u128 for the state instead, so we could use u64 for the timestamp? That should be enough for all use cases
#![allow(dead_code)]needed until there is no consumer of APICloses #12