fix(scheduler) Make schedule changes take effect immediately#590
fix(scheduler) Make schedule changes take effect immediately#590
Conversation
We had feedback a few months back that changes in schedules should take effect immediately, and these changes implement that with backwards compatibility for existing run state. The scenario that came up was a task was moved from every 3 hours to every 10 minutes, and the developer was surprised when their task had to wait for 2+hrs to see their task run again. Refs STREAM-676
| legacy_keys = [sk.rsplit(":", 1)[0] for sk in storage_keys] | ||
|
|
||
| new_values = self._redis.mget([self._make_key(sk) for sk in storage_keys]) | ||
| legacy_values = self._redis.mget([self._make_key(lk) for lk in legacy_keys]) | ||
|
|
||
| run_times: dict[str, datetime | None] = {} | ||
| for storage_key, new_val, legacy_val in zip(storage_keys, new_values, legacy_values): | ||
| raw = new_val if new_val is not None else legacy_val | ||
| run_times[storage_key] = datetime.fromisoformat(raw) if raw else None |
There was a problem hiding this comment.
Bug: The backwards-compatibility fallback for legacy run state can prevent a task from running immediately after a schedule change, delaying its execution until the new interval has passed.
Severity: MEDIUM
Suggested Fix
The backwards-compatibility logic should be adjusted to ignore the legacy last_run value if the schedule has changed. One approach is to have read_many not perform the fallback at all, forcing an immediate run on the new schedule. Alternatively, the calling function _load_last_run could compare the schedule ID from the new key with the schedule that produced the legacy data (if possible to determine) and ignore the legacy data if they differ.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: clients/python/src/taskbroker_client/scheduler/runner.py#L83-L91
Potential issue: The backwards-compatibility fallback in `read_many()` can prevent a
task from running immediately after a schedule change. If a task ran recently under an
old schedule, its last run time is stored in a legacy-formatted Redis key. When the code
is updated and the schedule is changed simultaneously, the new logic will read the last
run time from the legacy key. This causes the `is_due()` calculation to incorrectly
delay the task's next execution, respecting the old run time instead of triggering
immediately, which contradicts the feature's goal.
Did we get this right? 👍 / 👎 to inform future reviews.
| Retrieve last run times in bulk. | ||
|
|
||
| storage_keys are the new-format keys including the schedule_id suffix | ||
| (e.g. "test:valid:300"). Falls back to the legacy key (derived by |
There was a problem hiding this comment.
This is a contrived example, but:
storage_keys = [`test:val:300`, `test:val:3000`]
legacy_keys = [`test:val`, `test:val`]
new_values = [123, None] # No value for 3000
legacy_values = [321, 321] # We have a legacy value
run_times[`test:val:300`] = 123
run_times[`test:val:3000`] = 321
Is this scenario possible? And if so, does this result make sense?
We had feedback a few months back that changes in schedules should take effect immediately, and these changes implement that with backwards compatibility for existing run state.
The scenario that came up was a task was moved from every 3 hours to every 10 minutes, and the developer was surprised when their task had to wait for 2+hrs to see their task run again.
Refs STREAM-676