Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Phase plan (P0/P1/P2) — next 2 weeks
Phase P0 (Days 1–5): v1 schema + state fields end-to-end (blocking)
Objective: add state fields to the canonical snapshot schema and make them queryable/assertable in both SDKs.
Deliverables
sentience-chrome (raw extraction)
input_value(orvalue) for inputs/textarea/select (with redaction rules)input_type(to support password redaction)checked/aria_checkeddisabled/aria_disabledaria_expandedname(best-effort):aria-label,aria-labelledby, associated<label for=...>, placeholder fallbackinput_type=password: omit value or setvalue_redacted=truevalueto a max length (e.g., 200 chars) to reduce PII risk + payload bloatgateway (canonical response schema)
gateway/src/snapshot/types.rs:Attributesand/orRawElementSmartElementoutput schema to include:name,value(redacted/clipped),input_typearia_checked,aria_disabled,aria_expandedchecked,disabled,expandedgateway/src/snapshot/processing.rsmapping:SmartElementsdk-python
sentience/models.py::Elementwith optional fields matching gateway output.checked=true|false|mixeddisabled=true|falseexpanded=true|falsevalue="...",value~"...",name~"..."(if exposed)sentience/verification.py(implemented as predicates overquery(...)):is_enabled(selector)/is_disabled(selector)is_checked(selector)/is_unchecked(selector)value_contains(selector, substr)/value_equals(selector, value)is_expanded(selector)/is_collapsed(selector)sdk-ts
src/types.ts::Elementwith optional fields matching gateway output.src/verification.tsmirroring python.sentience-core (checkpoint)
sentience-corechanges?Tests (P0)
Phase P1 (Days 6–10): v1 runtime ergonomics + failure intelligence
Objective: make assertions production-grade without requiring Studio.
Deliverables
Recommended API shape:
AssertionHandle.eventually(...)Adding
assertEventually()/assertDoneEventually()creates a second “family” of runtime methods. A better UX (closer to Jest/Playwright/Cypress) is:assert_()/assert()behavior unchanged (returnsbool, emits trace events).AssertionHandle:runtime.check(predicate, label=..., required=False)→AssertionHandleruntime.check(predicate, label, { required })→AssertionHandleAssertionHandlesupports:.once()(single evaluation; delegates to existingassert_()/assert()).eventually(...)(retry loop with fresh snapshots + backoff)runtime.checkDone(...).eventually(...)), but keep the core retry mechanism sharedNote: In Python,
assertis a keyword; keepassert_naming in the DSL/predicates and runtime method names.sdk-python (AgentRuntime)
AssertionHandle+runtime.check(...)returning it.await handle.eventually(timeout_s=10, poll_s=0.25, min_confidence=0.7, max_retries=...).assert_eventually(...)can remain as a thin wrapper that internally callsruntime.check(...).eventually(...).details:no_snapshot,no_match,match_offscreen,match_occluded,state_mismatchsdk-ts (AgentRuntime)
AssertionHandle+runtime.check(...).await handle.eventually({ timeoutMs: 10_000, pollMs: 250, minConfidence: 0.7, maxRetries }).assertEventually(...)can be a thin wrapper overcheck(...).eventually(...)if desired for discoverability.CLI-first artifacts (both SDKs)
Tests (P1)
Phase P2 (Days 11–14): v2 snapshot confidence/exhaustion + minimal vision fallback
Objective: stop agents failing silently on unstable pages; provide deterministic escalation.
P2.1 Snapshot confidence + exhaustion
sentience-chrome
document_ready_statenode_countquiet_ms(MutationObserver-based)layout_delta(if feasible without major overhead)gateway
diagnostics(instead ofmeta):confidence(0..1)reasons[]metrics(raw metrics above, for debugging)attempt,exhaustedfor retry loopsready_state,quiet_ms,node_count, and coarse “signal” like interactive element countsnapshot_exhaustedsdk-python + sdk-ts
diagnostics(and keep it optional for backward compatibility)..eventually()to:min_confidencesnapshot_exhausted) with reasons/metricsP2.2 Vision fallback (verifier-only, last resort)
sdk-python
LLMProvider.generate_with_image(...):supports_vision()is true.eventually()after snapshot exhaustion (with an explicit option/flag so callers can enable/disable vision fallback per assertion).sdk-ts
LLMProviderinterface (backward compatible):supportsVision(): boolean(default false in base class)generateWithImage(systemPrompt, userPrompt, imageBase64, options?)supportsVision=false.eventually()after exhaustion (with an explicit option/flag so callers can enable/disable vision fallback per assertion).Tests (P2)
snapshot.diagnosticsis optional and backward compatible.Next 2–4 weeks (v2 hardening) — phased priorities
Phase P0 (Week 3): EscalationPolicy + structured failure events (runtime-level)
Deliverables
FailureKind+RecoveryActionenums and emit them in trace + return them to callers.EscalationPolicyfocused on assertion execution (not full agent orchestration):Tests
Phase P1 (Week 3–4): State matcher completeness + normalization
Deliverables
aria-pressed,aria-selected,role=switch, select/option valueTests
Phase P2 (Week 4): Diff-based assertions (action-effect verification)
Deliverables
previous_snapshotinAssertContext(both SDKs) and inAgentRuntime.snapshot().diff.added,diff.removed,diff.modified,diff.count_added, etc.Tests
Phase P3 (Week 4): Vision fallback upgrade (optional)
Deliverables