Skip to content

Probe cycle discrimination + scheduler health snapshot#50

Open
henry0816191 wants to merge 3 commits into
developfrom
feat/probe-cycle-health-snapshot
Open

Probe cycle discrimination + scheduler health snapshot#50
henry0816191 wants to merge 3 commits into
developfrom
feat/probe-cycle-health-snapshot

Conversation

@henry0816191
Copy link
Copy Markdown
Collaborator

@henry0816191 henry0816191 commented May 25, 2026

Summary

  • run_cycle() returns CycleResult (success / empty / failed) instead of a list.
  • /health reads scheduler state from a locked, immutable snapshot (last_updated, last_cycle_status, etc.).

Why

Operators could not tell a real empty probe from a failed one, and the health thread read scheduler fields without synchronization.

Main changes

  • models.pyCycleStatus, CycleResult
  • sources.py — discriminated run_cycle(), new log lines
  • monitor.py — handle all cycle outcomes; do not advance staleness on FAILED; SchedulerSnapshot + health_snapshot()
  • __main__.py — health extras from scheduler.health_snapshot()

Tests

pytest tests/test_sources.py tests/test_monitor.py tests/test_health.py -q

138 passed (includes empty/failed cycle, FAILED staleness, concurrent health snapshot).

Related issues

close #46
close #49

Summary by CodeRabbit

  • New Features

    • /health now exposes last-updated timestamp, poll count, probe stats, probe success rate, and last cycle status/error; scheduler snapshot is published atomically and merged with queue/db metrics
    • Probe cycles now report discriminated outcomes (SUCCESS/EMPTY/FAILED) with clearer probe-cycle logging
  • Documentation

    • Clarified concurrency semantics and the immutable, lock-protected health snapshot
  • Tests

    • Expanded health and prober tests, including concurrent snapshot consistency and cycle-outcome coverage

Review Change Stack

@henry0816191 henry0816191 self-assigned this May 25, 2026
@henry0816191 henry0816191 requested a review from wpak-ai as a code owner May 25, 2026 16:35
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f32c5e4d-9da9-468f-954b-a39640f228c6

📥 Commits

Reviewing files that changed from the base of the PR and between 29108ea and f637b98.

📒 Files selected for processing (1)
  • src/paperscout/monitor.py

📝 Walkthrough

Walkthrough

Implements discriminated ISO probe cycle outcomes (CycleResult with SUCCESS/EMPTY/FAILED), adds a frozen SchedulerSnapshot published under a lock, updates Scheduler to record cycle status/error and publish snapshots after seed/poll, wires snapshot into /health extra fields, and updates tests/docs accordingly.

Changes

Discriminated Cycles and Health Snapshots

Layer / File(s) Summary
Cycle Result Types
src/paperscout/models.py
CycleStatus enum and CycleResult frozen dataclass define discriminated probe outcomes and conditional hits property.
ISOProber Cycle Outcome Refactor
src/paperscout/sources.py
run_cycle() now imports cycle types, returns CycleResult, wraps cycle in try/except, computes cycle_status, logs success/empty/failed, and includes cycle_status in the PROBE-CYCLE-SUMMARY.
Scheduler Health Infrastructure
src/paperscout/monitor.py
Adds imports and _compute_probe_success_rate(), defines frozen SchedulerSnapshot and defaults used when no snapshot exists.
Scheduler Cycle Integration
src/paperscout/monitor.py
Adds methods to extract hits from CycleResult (recording last_cycle_status/error), record probe-cycle stats, conditionally advance last_successful_poll only on non-FAILED cycles, publish immutable snapshots, and updates seed()/poll_once() to publish snapshots.
Health Endpoint Integration
src/paperscout/__main__.py
_extra_health_fields() now merges scheduler.health_snapshot() with mq.health_fields() fallback and db_pool status rather than computing fields inline.
ISOProber Unit Tests
tests/test_sources.py
Updates tests to assert cycle.status and cycle.hits; adds test_run_cycle_empty and test_run_cycle_failed.
Scheduler Tests and Helpers
tests/test_monitor.py
Adds CycleResult helpers, updates scheduler mocks to use CycleResult objects, and adds tests ensuring failed cycles don't advance last_successful_poll and that health_snapshot() reports last_updated and poll_count.
Health Endpoint Tests and Concurrency
tests/test_health.py
Expands extra_fields_fn assertions for snapshot fields and adds a concurrent snapshot publisher test verifying /health returns consistent fields under concurrent updates.
Documentation and Changelog
CHANGELOG.md, docs/architecture.md
Documents discriminated cycle outcomes, cycle_status logging, atomic SchedulerSnapshot, and that /health reads an immutable, lock-protected snapshot.

Sequence Diagram

sequenceDiagram
  participant Scheduler
  participant ISOProber
  participant HealthEndpoint
  Scheduler->>ISOProber: run_cycle()
  alt Probe Success
    ISOProber->>ISOProber: perform concurrent HEAD probes, collect hits
    ISOProber-->>Scheduler: CycleResult(SUCCESS, results=hits)
  else Probe Empty
    ISOProber->>ISOProber: probes return no hits
    ISOProber-->>Scheduler: CycleResult(EMPTY)
  else Probe Failed
    ISOProber->>ISOProber: exception during cycle
    ISOProber-->>Scheduler: CycleResult(FAILED, error=msg)
  end
  Scheduler->>Scheduler: _probe_hits_from_cycle() (record status/error)
  Scheduler->>Scheduler: _record_probe_cycle_completion() (store probe_stats)
  alt non-FAILED cycle
    Scheduler->>Scheduler: _mark_poll_successful_if_probe_ok()
  end
  Scheduler->>Scheduler: _publish_health_snapshot()
  HealthEndpoint->>Scheduler: health_snapshot()
  Scheduler-->>HealthEndpoint: SchedulerSnapshot dict
  HealthEndpoint->>HealthEndpoint: merge with mq/db metrics
  HealthEndpoint-->>Client: /health response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • wpak-ai

Poem

🐰 I hopped through cycles, brave and spry,

Found success, empty, or why it died;
The scheduler sealed the snapshot tight,
Health reads steady in the morning light.
Cheers — a tiny hop for observability.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 31.48% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Probe cycle discrimination + scheduler health snapshot' accurately and concisely summarizes the main changes: discriminated cycle outcomes and health snapshot synchronization.
Description check ✅ Passed The PR description includes Summary, Why, Main changes, Tests, and Related issues sections matching the template structure with complete information about the changes and test results.
Linked Issues check ✅ Passed The PR fully addresses both linked issues: #46 implements discriminated CycleResult with SUCCESS/EMPTY/FAILED [#46], and #49 implements atomic SchedulerSnapshot with thread-safe health endpoint reads [#49].
Out of Scope Changes check ✅ Passed All code changes are directly scoped to the linked objectives: cycle discrimination, snapshot synchronization, health endpoint updates, and supporting tests with no unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/probe-cycle-health-snapshot

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/paperscout/monitor.py`:
- Around line 128-136: The health snapshot fallback uses
dict(_HEALTH_SNAPSHOT_DEFAULTS) which only performs a shallow copy and ends up
sharing the nested probe_stats dict; change the construction in the code that
creates the fallback (where dict(_HEALTH_SNAPSHOT_DEFAULTS) is used) to produce
a deep/independent copy of nested structures—either use copy.deepcopy on
_HEALTH_SNAPSHOT_DEFAULTS or rebuild the dict while explicitly creating a new
empty dict for "probe_stats" (e.g., {**_HEALTH_SNAPSHOT_DEFAULTS, "probe_stats":
{}}) so each consumer gets its own probe_stats instead of a shared mutable
object.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f831e1c1-21d8-46f5-b3dd-da55b443e852

📥 Commits

Reviewing files that changed from the base of the PR and between f7b0bd5 and fc58957.

📒 Files selected for processing (9)
  • CHANGELOG.md
  • docs/architecture.md
  • src/paperscout/__main__.py
  • src/paperscout/models.py
  • src/paperscout/monitor.py
  • src/paperscout/sources.py
  • tests/test_health.py
  • tests/test_monitor.py
  • tests/test_sources.py

Comment thread src/paperscout/monitor.py
@henry0816191
Copy link
Copy Markdown
Collaborator Author

@coderabbitai don't skip. full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

Tip

For best results, initiate chat on the files or code changes.

@henry0816191 Here is the full review of PR #50.


PR #50 — Full Review: Probe cycle discrimination + scheduler health snapshot

Summary

The implementation is well-structured, correctly addresses issues #46 and #49, and the test coverage is solid (happy path, empty, failed, staleness, concurrent snapshot reads). A few issues are worth addressing before merging.


src/paperscout/models.py

CycleResult lacks invariant validation

CycleResult is frozen but its fields are not validated. A FAILED result with error=None, or a SUCCESS result with an empty results tuple, is silently constructable.

`@dataclass`(frozen=True, slots=True)
class CycleResult:
    status: CycleStatus
    results: tuple[ProbeHit, ...] = ()
    error: str | None = None

    def __post_init__(self) -> None:
        if self.status == CycleStatus.FAILED and not self.error:
            raise ValueError("CycleResult FAILED must carry a non-empty error string")
        if self.status == CycleStatus.SUCCESS and not self.results:
            raise ValueError("CycleResult SUCCESS must carry at least one ProbeHit")
        if self.status != CycleStatus.FAILED and self.error is not None:
            raise ValueError("CycleResult error is only valid for FAILED status")

src/paperscout/sources.py

Dead code: unreachable return None in _probe_one

The async with sem: block always returns (a ProbeHit | None) or raises. The bare return None after it is unreachable and should be removed.

    async with sem:
        ...
        return ProbeHit(...)
    return None  # ← unreachable; remove this line

src/paperscout/monitor.py

1. Missing wildcard case in _probe_hits_from_cycle

The match statement covers all three current CycleStatus values, but there is no case _: wildcard. If a new status is added later the method falls through and implicitly returns None, silently breaking the -> list[ProbeHit] contract. Add a defensive fallback:

def _probe_hits_from_cycle(self, cycle: CycleResult) -> list[ProbeHit]:
    self._last_cycle_status = cycle.status
    self._last_cycle_error = cycle.error
    match cycle.status:
        case CycleStatus.SUCCESS:
            return cycle.hits
        case CycleStatus.EMPTY:
            log.info("POLL  probe cycle empty")
            return []
        case CycleStatus.FAILED:
            log.error("POLL  probe cycle failed: %s", cycle.error)
            return []
        case _:
            log.error("POLL  unknown cycle status: %s", cycle.status)
            return []

2. _record_probe_cycle_completion() is called twice in the seed paths

seed() already calls _record_probe_cycle_completion() when enable_iso_probe=True. Both early-return branches in poll_once() call it again unconditionally:

  • Cold-start path (not had_prior_state, enable_iso_probe=True): seed()_record_probe_cycle_completion(), then poll_once()_record_probe_cycle_completion() again.
  • Restart-with-prior-state path: same double-call.

The second call is harmless (snapshots the same stats), but it's misleading. Consider removing the redundant calls from poll_once() for the seed sub-paths, or moving all stat recording out of seed().

3. match/case requires Python ≥ 3.10

_probe_hits_from_cycle uses structural pattern matching. Please verify python_requires in pyproject.toml is >=3.10, or replace with an if/elif/else chain if the project targets 3.8/3.9.

4. SchedulerSnapshot.probe_stats — mutable dict in a frozen dataclass

frozen=True prevents reassigning the probe_stats field but doesn't prevent in-place mutation of the dict. Since _publish_health_snapshot copies stats (dict(self._last_probe_stats)) and health_snapshot calls dataclasses.asdict() (which deep-copies nested structures), this is safe in practice. However, for true immutability, consider using MappingProxyType or a types.MappingProxyType(stats) here, or document the invariant explicitly.


src/paperscout/__main__.py

Key merge order in _extra_health_fields

return {**scheduler.health_snapshot(), **mq_extra, "db_pool": _pool_status(pool)}

mq_extra can silently overwrite any key from scheduler.health_snapshot() (e.g., last_updated if health_fields() ever returns it). Consider an explicit allow-list or at least a debug-level log/assertion when conflicts occur.


tests/test_monitor.py

Missing test: health_snapshot() defaults before any snapshot is published

There is a test for health_snapshot() after a poll, but no test verifies the default dict returned when _health_snapshot is None (i.e., before the first _publish_health_snapshot() call):

async def test_health_snapshot_defaults_before_first_poll(self, fake_pool):
    scheduler, _, _, _, _ = _make_scheduler(fake_pool)
    snap = scheduler.health_snapshot()
    assert snap["last_updated"] is None
    assert snap["poll_count"] == 0
    assert snap["last_cycle_status"] is None
    assert snap["last_successful_poll"] is None

tests/test_health.py

TestHealthSnapshotConcurrency test could assert field-type consistency more strictly

The loop asserts last_cycle_status in ("success", "empty", None) — good. Consider also asserting that poll_count is an int and last_updated (when not None) is a parseable ISO datetime string, to catch any serialization issues:

if data["last_updated"] is not None:
    datetime.fromisoformat(data["last_updated"])  # would raise if malformed
assert isinstance(data["poll_count"], int)

Minor nits

File Location Note
sources.py run_cycle() The PROBE-CYCLE-SUMMARY JSON log is emitted after the PROBE-SUCCESS / PROBE-EMPTY log. Reversing the order would make it easier to correlate the summary line with the outcome line when grepping logs.
monitor.py health_snapshot() copy.deepcopy(_HEALTH_SNAPSHOT_DEFAULTS) is called on every request when _health_snapshot is None. Since defaults are constant, a shallow dict(_HEALTH_SNAPSHOT_DEFAULTS) is sufficient (no nested mutable values that need deep-copying).
models.py CycleResult.hits docstring "otherwise empty" — consider clarifying it also returns empty for EMPTY status, not just FAILED.

Verdict

The approach is sound. The staleness-clock fix (not advancing _last_successful_poll on FAILED) and the lock-protected snapshot are the right patterns. The main actionable items are:

  1. Unreachable return None in _probe_one (trivial).
  2. Missing wildcard case in _probe_hits_from_cycle (low-risk but important for forward safety).
  3. Python version check for match/case (must verify).
  4. Redundant _record_probe_cycle_completion() calls in seed paths (cleanup).
  5. CycleResult.__post_init__ validation (defensive programming).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Synchronize Health Endpoint Reads of Scheduler State Discriminate Failed vs Empty Probe Cycles in run_cycle

1 participant