diff --git a/README.md b/README.md index d78bb04..606ef12 100644 --- a/README.md +++ b/README.md @@ -1,1073 +1,177 @@ # Sentience Python SDK -**Semantic snapshots and Jest-style assertions for reliable AI web agents with time-travel traces** +> **A verification & control layer for AI agents that operate browsers** -## πŸ“¦ Installation +Sentience is built for **AI agent developers** who already use Playwright / CDP / browser-use / LangGraph and care about **flakiness, cost, determinism, evals, and debugging**. -```bash -# Install from PyPI -pip install sentienceapi +Often described as *Jest for Browser AI Agents* - but applied to end-to-end agent runs (not unit tests). -# Install Playwright browsers (required) -playwright install chromium +The core loop is: -# For LLM Agent features (optional) -pip install openai # For OpenAI models -pip install anthropic # For Claude models -pip install transformers torch # For local LLMs -``` +> **Agent β†’ Snapshot β†’ Action β†’ Verification β†’ Artifact** -**For local development:** -```bash -pip install -e . -``` +## What Sentience is + +- A **verification-first runtime** (`AgentRuntime`) for browser agents +- Treats the browser as an adapter (Playwright / CDP / browser-use); **`AgentRuntime` is the product** +- A **controlled perception** layer (semantic snapshots; pruning/limits; lowers token usage by filtering noise from what models see) +- A **debugging layer** (structured traces + failure artifacts) +- Enables **local LLM small models (3B-7B)** for browser automation (privacy, compliance, and cost control) +- Keeps vision models **optional** (use as a fallback when DOM/snapshot structure falls short, e.g. ``) + +## What Sentience is not -## 🧭 Manual driver CLI +- Not a browser driver +- Not a Playwright replacement +- Not a vision-first agent framework -Use the interactive CLI to open a page, inspect clickables, and drive actions: +## Install ```bash -sentience driver --url https://example.com +pip install sentienceapi +playwright install chromium ``` -Commands: -- `open ` -- `state [limit]` -- `click ` -- `type ` -- `press ` -- `screenshot [path]` -- `help` -- `close` +## Conceptual example (why this exists) + +In Sentience, agents don’t β€œhope” an action worked. -## Jest for AI Web Agent +- **Every step is gated by verifiable UI assertions** +- If progress can’t be proven, the run **fails with evidence** (trace + artifacts) +- This is how you make runs **reproducible** and **debuggable**, and how you run evals reliably -### Semantic snapshots and assertions that let agents act, verify, and know when they're done. +## Quickstart: a verification-first loop -Use `AgentRuntime` to add Jest-style assertions to your agent loops. Verify browser state, check task completion, and get clear feedback on what's working: +This is the smallest useful pattern: snapshot β†’ assert β†’ act β†’ assert-done. ```python import asyncio -from sentience import AsyncSentienceBrowser, AgentRuntime, CaptchaOptions, HumanHandoffSolver -from sentience.verification import ( - url_contains, - exists, - all_of, - is_enabled, - is_checked, - value_equals, -) -from sentience.tracing import Tracer, JsonlTraceSink -async def main(): - # Create tracer - tracer = Tracer(run_id="my-run", sink=JsonlTraceSink("trace.jsonl")) +from sentience import AgentRuntime, AsyncSentienceBrowser +from sentience.tracing import JsonlTraceSink, Tracer +from sentience.verification import exists, url_contains + + +async def main() -> None: + tracer = Tracer(run_id="demo", sink=JsonlTraceSink("trace.jsonl")) - # Create browser and runtime async with AsyncSentienceBrowser() as browser: page = await browser.new_page() + await page.goto("https://example.com") + runtime = await AgentRuntime.from_sentience_browser( browser=browser, page=page, - tracer=tracer + tracer=tracer, ) - # Navigate and take snapshot - await page.goto("https://example.com") - runtime.begin_step("Verify page loaded") + runtime.begin_step("Verify homepage") await runtime.snapshot() - # v1: deterministic assertions (Jest-style) - runtime.assert_(url_contains("example.com"), label="on_correct_domain") + runtime.assert_(url_contains("example.com"), label="on_domain", required=True) runtime.assert_(exists("role=heading"), label="has_heading") - runtime.assert_(all_of([ - exists("role=button"), - exists("role=link") - ]), label="has_interactive_elements") - - # v1: state-aware assertions (when Gateway refinement is enabled) - runtime.assert_(is_enabled("role=button"), label="button_enabled") - runtime.assert_(is_checked("role=checkbox name~'subscribe'"), label="subscribe_checked_if_present") - runtime.assert_(value_equals("role=textbox name~'email'", "user@example.com"), label="email_value_if_present") - - # v2: retry loop with snapshot confidence gating + exhaustion - ok = await runtime.check( - exists("role=heading"), - label="heading_eventually_visible", - required=True, - ).eventually(timeout_s=10.0, poll_s=0.25, min_confidence=0.7, max_snapshot_attempts=3) - print("eventually() result:", ok) - - # CAPTCHA handling (detection + handoff + verify) - runtime.set_captcha_options( - CaptchaOptions(policy="callback", handler=HumanHandoffSolver()) - ) - - # Check task completion - if runtime.assert_done(exists("text~'Example'"), label="task_complete"): - print("βœ… Task completed!") - - print(f"Task done: {runtime.is_task_done}") - -asyncio.run(main()) -``` - -#### CAPTCHA strategies (Batteries Included) - -```python -from sentience import CaptchaOptions, ExternalSolver, HumanHandoffSolver, VisionSolver - -# Human-in-loop -runtime.set_captcha_options(CaptchaOptions(policy="callback", handler=HumanHandoffSolver())) - -# Vision verification only -runtime.set_captcha_options(CaptchaOptions(policy="callback", handler=VisionSolver())) - -# External system/webhook -runtime.set_captcha_options( - CaptchaOptions( - policy="callback", - handler=ExternalSolver(lambda ctx: notify_webhook(ctx)), - ) -) -``` - -### Failure Artifact Buffer (Phase 1) - -Capture a short ring buffer of screenshots and persist them when a required assertion fails. - -```python -from sentience.failure_artifacts import FailureArtifactsOptions - -await runtime.enable_failure_artifacts( - FailureArtifactsOptions(buffer_seconds=15, capture_on_action=True, fps=0.0) -) - -# After each action, record it (best-effort). -await runtime.record_action("CLICK") -``` - -**Video clip generation (optional):** To generate MP4 video clips from captured frames, install [ffmpeg](https://ffmpeg.org/) (version 4.0 or later; version 5.1+ recommended for best compatibility). If ffmpeg is not installed, frames are still captured but no video clip is generated. - -### Redaction callback (Phase 3) - -Provide a user-defined callback to redact snapshots and decide whether to persist frames. The SDK does not implement image/video redaction. - -```python -from sentience.failure_artifacts import FailureArtifactsOptions, RedactionContext, RedactionResult - -def redact(ctx: RedactionContext) -> RedactionResult: - # Example: drop frames entirely, keep JSON only. - return RedactionResult(drop_frames=True) - -await runtime.enable_failure_artifacts( - FailureArtifactsOptions(on_before_persist=redact) -) -``` - -**See examples:** [`examples/asserts/`](examples/asserts/) - -## πŸš€ Quick Start: Choose Your Abstraction Level - -Sentience SDK offers **three abstraction levels** - use what fits your needs: - -
-🎯 Level 3: Natural Language (Easiest) - For non-technical users - -```python -from sentience import SentienceBrowser, ConversationalAgent -from sentience.llm_provider import OpenAIProvider - -browser = SentienceBrowser() -llm = OpenAIProvider(api_key="your-key", model="gpt-4o") -agent = ConversationalAgent(browser, llm) - -with browser: - response = agent.execute("Search for magic mouse on google.com") - print(response) - # β†’ "I searched for 'magic mouse' and found several results. - # The top result is from amazon.com selling Magic Mouse 2 for $79." -``` - -**Best for:** End users, chatbots, no-code platforms -**Code required:** 3-5 lines -**Technical knowledge:** None - -
- -
-βš™οΈ Level 2: Technical Commands (Recommended) - For AI developers - -```python -from sentience import SentienceBrowser, SentienceAgent -from sentience.llm_provider import OpenAIProvider - -browser = SentienceBrowser() -llm = OpenAIProvider(api_key="your-key", model="gpt-4o") -agent = SentienceAgent(browser, llm) - -with browser: - browser.page.goto("https://google.com") - agent.act("Click the search box") - agent.act("Type 'magic mouse' into the search field") - agent.act("Press Enter key") -``` - -**Best for:** Building AI agents, automation scripts -**Code required:** 10-15 lines -**Technical knowledge:** Medium (Python basics) - -
- -
-πŸ”§ Level 1: Direct SDK (Most Control) - For production automation - -```python -from sentience import SentienceBrowser, snapshot, find, click - -with SentienceBrowser(headless=False) as browser: - browser.page.goto("https://example.com") - - # Take snapshot - captures all interactive elements - snap = snapshot(browser) - print(f"Found {len(snap.elements)} elements") - - # Find and click a link using semantic selectors - link = find(snap, "role=link text~'More information'") - if link: - result = click(browser, link.id) - print(f"Click success: {result.success}") -``` - -**Best for:** Maximum control, performance-critical apps -**Code required:** 20-50 lines -**Technical knowledge:** High (SDK API, selectors) - -
- ---- - -## πŸ†• What's New (2026-01-06) - -### Human-like Typing -Add realistic delays between keystrokes to mimic human typing: -```python -from sentience import type_text - -# Type instantly (default) -type_text(browser, element_id, "Hello World") - -# Type with human-like delay (~10ms between keystrokes) -type_text(browser, element_id, "Hello World", delay_ms=10) -``` - -### Scroll to Element -Scroll elements into view with smooth animation: -```python -from sentience import snapshot, find, scroll_to - -snap = snapshot(browser) -button = find(snap, 'role=button text~"Submit"') - -# Scroll element into view with smooth animation -scroll_to(browser, button.id) - -# Scroll instantly to top of viewport -scroll_to(browser, button.id, behavior='instant', block='start') -``` - ---- - -
-

πŸ’Ό Real-World Example: Assertion-driven navigation

- -This example shows how to use **assertions + `.eventually()`** to make an agent loop resilient: - -```python -import asyncio -import os -from sentience import AsyncSentienceBrowser, AgentRuntime -from sentience.tracing import Tracer, JsonlTraceSink -from sentience.verification import url_contains, exists - -async def main(): - tracer = Tracer(run_id="verified-run", sink=JsonlTraceSink("trace_verified.jsonl")) - async with AsyncSentienceBrowser(headless=True) as browser: - page = await browser.new_page() - runtime = await AgentRuntime.from_sentience_browser(browser=browser, page=page, tracer=tracer) - runtime.sentience_api_key = os.getenv("SENTIENCE_API_KEY") # optional, enables Gateway diagnostics - - await page.goto("https://example.com") - runtime.begin_step("Verify we're on the right page") - - await runtime.check(url_contains("example.com"), label="on_domain", required=True).eventually( - timeout_s=10.0, poll_s=0.25, min_confidence=0.7, max_snapshot_attempts=3 - ) - runtime.assert_(exists("role=heading"), label="heading_present") - -asyncio.run(main()) -``` - -
- ---- - -## πŸ“š Core Features - -
-

🌐 Browser Control

- -- **`SentienceBrowser`** - Playwright browser with Sentience extension pre-loaded -- **`browser.goto(url)`** - Navigate with automatic extension readiness checks -- Automatic bot evasion and stealth mode -- Configurable headless/headed mode - -
- -
-

πŸ“Έ Snapshot - Intelligent Page Analysis

- -**`snapshot(browser, options=SnapshotOptions(screenshot=True, show_overlay=False, limit=None, goal=None))`** - Capture page state with AI-ranked elements - -Features: -- Returns semantic elements with roles, text, importance scores, and bounding boxes -- Optional screenshot capture (PNG/JPEG) - set `screenshot=True` -- Optional visual overlay to see what elements are detected - set `show_overlay=True` -- Pydantic models for type safety -- Optional ML reranking when `goal` is provided -- **`snapshot.save(filepath)`** - Export to JSON - -**Example:** -```python -from sentience import snapshot, SnapshotOptions - -# Basic snapshot with defaults (no screenshot, no overlay) -snap = snapshot(browser) - -# With screenshot and overlay -snap = snapshot(browser, SnapshotOptions( - screenshot=True, - show_overlay=True, - limit=100, - goal="Click the login button" # Optional: enables ML reranking -)) - -# Access structured data -print(f"URL: {snap.url}") -print(f"Viewport: {snap.viewport.width}x{snap.viewport.height}") -print(f"Elements: {len(snap.elements)}") - -# Iterate over elements -for element in snap.elements: - print(f"{element.role}: {element.text} (importance: {element.importance})") - - # Check ML reranking metadata (when goal is provided) - if element.rerank_index is not None: - print(f" ML rank: {element.rerank_index} (confidence: {element.ml_probability:.2%})") -``` - -
- -
-

πŸ” Query Engine - Semantic Element Selection

- -- **`query(snapshot, selector)`** - Find all matching elements -- **`find(snapshot, selector)`** - Find single best match (by importance) -- Powerful query DSL with multiple operators - -**Query Examples:** -```python -# Find by role and text -button = find(snap, "role=button text='Sign in'") - -# Substring match (case-insensitive) -link = find(snap, "role=link text~'more info'") - -# Spatial filtering -top_left = find(snap, "bbox.x<=100 bbox.y<=200") - -# Multiple conditions (AND logic) -primary_btn = find(snap, "role=button clickable=true visible=true importance>800") - -# Prefix/suffix matching -starts_with = find(snap, "text^='Add'") -ends_with = find(snap, "text$='Cart'") - -# Numeric comparisons -important = query(snap, "importance>=700") -first_row = query(snap, "bbox.y<600") -``` - -**πŸ“– [Complete Query DSL Guide](docs/QUERY_DSL.md)** - All operators, fields, and advanced patterns - -
- -
-

πŸ‘† Actions - Interact with Elements

- -- **`click(browser, element_id)`** - Click element by ID -- **`click_rect(browser, rect)`** - Click at center of rectangle (coordinate-based) -- **`type_text(browser, element_id, text)`** - Type into input fields -- **`press(browser, key)`** - Press keyboard keys (Enter, Escape, Tab, etc.) - -All actions return `ActionResult` with success status, timing, and outcome: - -```python -result = click(browser, element.id) - -print(f"Success: {result.success}") -print(f"Outcome: {result.outcome}") # "navigated", "dom_updated", "error" -print(f"Duration: {result.duration_ms}ms") -print(f"URL changed: {result.url_changed}") -``` - -**Coordinate-based clicking:** -```python -from sentience import click_rect - -# Click at center of rectangle (x, y, width, height) -click_rect(browser, {"x": 100, "y": 200, "w": 50, "h": 30}) - -# With visual highlight (default: red border for 2 seconds) -click_rect(browser, {"x": 100, "y": 200, "w": 50, "h": 30}, highlight=True, highlight_duration=2.0) - -# Using element's bounding box -snap = snapshot(browser) -element = find(snap, "role=button") -if element: - click_rect(browser, { - "x": element.bbox.x, - "y": element.bbox.y, - "w": element.bbox.width, - "h": element.bbox.height - }) -``` - -
- -
-

⏱️ Wait & Assertions

- -- **`wait_for(browser, selector, timeout=5.0, interval=None, use_api=None)`** - Wait for element to appear -- **`expect(browser, selector)`** - Assertion helper with fluent API - -**Examples:** -```python -# Wait for element (auto-detects optimal interval based on API usage) -result = wait_for(browser, "role=button text='Submit'", timeout=10.0) -if result.found: - print(f"Found after {result.duration_ms}ms") - -# Use local extension with fast polling (0.25s interval) -result = wait_for(browser, "role=button", timeout=5.0, use_api=False) - -# Use remote API with network-friendly polling (1.5s interval) -result = wait_for(browser, "role=button", timeout=5.0, use_api=True) - -# Custom interval override -result = wait_for(browser, "role=button", timeout=5.0, interval=0.5, use_api=False) - -# Semantic wait conditions -wait_for(browser, "clickable=true", timeout=5.0) # Wait for clickable element -wait_for(browser, "importance>100", timeout=5.0) # Wait for important element -wait_for(browser, "role=link visible=true", timeout=5.0) # Wait for visible link -# Assertions -expect(browser, "role=button text='Submit'").to_exist(timeout=5.0) -expect(browser, "role=heading").to_be_visible() -expect(browser, "role=button").to_have_text("Submit") -expect(browser, "role=link").to_have_count(10) -``` - -
- -
-

🎨 Visual Overlay - Debug Element Detection

- -- **`show_overlay(browser, elements, target_element_id=None)`** - Display visual overlay highlighting elements -- **`clear_overlay(browser)`** - Clear overlay manually - -Show color-coded borders around detected elements to debug, validate, and understand what Sentience sees: - -```python -from sentience import show_overlay, clear_overlay - -# Take snapshot once -snap = snapshot(browser) - -# Show overlay anytime without re-snapshotting -show_overlay(browser, snap) # Auto-clears after 5 seconds - -# Highlight specific target element in red -button = find(snap, "role=button text~'Submit'") -show_overlay(browser, snap, target_element_id=button.id) - -# Clear manually before 5 seconds -import time -time.sleep(2) -clear_overlay(browser) -``` - -**Color Coding:** -- πŸ”΄ Red: Target element -- πŸ”΅ Blue: Primary elements (`is_primary=true`) -- 🟒 Green: Regular interactive elements - -**Visual Indicators:** -- Border thickness/opacity scales with importance -- Semi-transparent fill -- Importance badges -- Star icons for primary elements -- Auto-clear after 5 seconds - -
- -
-

πŸ“„ Content Reading

- -**`read(browser, format="text|markdown|raw")`** - Extract page content -- `format="text"` - Plain text extraction -- `format="markdown"` - High-quality markdown conversion (uses markdownify) -- `format="raw"` - Cleaned HTML (default) - -**Example:** -```python -from sentience import read - -# Get markdown content -result = read(browser, format="markdown") -print(result["content"]) # Markdown text - -# Get plain text -result = read(browser, format="text") -print(result["content"]) # Plain text -``` - -
- -
-

πŸ“· Screenshots

- -**`screenshot(browser, format="png|jpeg", quality=80)`** - Standalone screenshot capture -- Returns base64-encoded data URL -- PNG or JPEG format -- Quality control for JPEG (1-100) - -**Example:** -```python -from sentience import screenshot -import base64 - -# Capture PNG screenshot -data_url = screenshot(browser, format="png") - -# Save to file -image_data = base64.b64decode(data_url.split(",")[1]) -with open("screenshot.png", "wb") as f: - f.write(image_data) - -# JPEG with quality control (smaller file size) -data_url = screenshot(browser, format="jpeg", quality=85) -``` - -
- -
-

πŸ”Ž Text Search - Find Elements by Visible Text

- -**`find_text_rect(browser, text, case_sensitive=False, whole_word=False, max_results=10)`** - Find text on page and get exact pixel coordinates - -Find buttons, links, or any UI elements by their visible text without needing element IDs or CSS selectors. Returns exact pixel coordinates for each match. - -**Example:** -```python -from sentience import SentienceBrowser, find_text_rect, click_rect - -with SentienceBrowser() as browser: - browser.page.goto("https://example.com") - - # Find "Sign In" button - result = find_text_rect(browser, "Sign In") - if result.status == "success" and result.results: - first_match = result.results[0] - print(f"Found at: ({first_match.rect.x}, {first_match.rect.y})") - print(f"In viewport: {first_match.in_viewport}") - - # Click on the found text - if first_match.in_viewport: - click_rect(browser, { - "x": first_match.rect.x, - "y": first_match.rect.y, - "w": first_match.rect.width, - "h": first_match.rect.height - }) -``` - -**Advanced Options:** -```python -# Case-sensitive search -result = find_text_rect(browser, "LOGIN", case_sensitive=True) - -# Whole word only (won't match "login" as part of "loginButton") -result = find_text_rect(browser, "log", whole_word=True) - -# Find multiple matches -result = find_text_rect(browser, "Buy", max_results=10) -for match in result.results: - if match.in_viewport: - print(f"Found '{match.text}' at ({match.rect.x}, {match.rect.y})") - print(f"Context: ...{match.context.before}[{match.text}]{match.context.after}...") -``` - -**Returns:** `TextRectSearchResult` with: -- **`status`**: "success" or "error" -- **`results`**: List of `TextMatch` objects with: - - `text` - The matched text - - `rect` - Absolute coordinates (with scroll offset) - - `viewport_rect` - Viewport-relative coordinates - - `context` - Surrounding text (before/after) - - `in_viewport` - Whether visible in current viewport - -**Use Cases:** -- Find buttons/links by visible text without CSS selectors -- Get exact pixel coordinates for click automation -- Verify text visibility and position on page -- Search dynamic content that changes frequently - -**Note:** Does not consume API credits (runs locally in browser) - -**See example:** `examples/find_text_demo.py` - -
- ---- - -## πŸ”„ Async API - -For asyncio contexts (FastAPI, async frameworks): - -```python -from sentience.async_api import AsyncSentienceBrowser, snapshot_async, click_async, find - -async def main(): - async with AsyncSentienceBrowser() as browser: - await browser.goto("https://example.com") - snap = await snapshot_async(browser) - button = find(snap, "role=button") - if button: - await click_async(browser, button.id) - -asyncio.run(main()) -``` - -**See example:** `examples/async_api_demo.py` - ---- - -## πŸ“‹ Reference - -
-

Element Properties

- -Elements returned by `snapshot()` have the following properties: - -```python -element.id # Unique identifier for interactions -element.role # ARIA role (button, link, textbox, heading, etc.) -element.text # Visible text content -element.importance # AI importance score (0-1000) -element.bbox # Bounding box (x, y, width, height) -element.visual_cues # Visual analysis (is_primary, is_clickable, background_color) -element.in_viewport # Is element visible in current viewport? -element.is_occluded # Is element covered by other elements? -element.z_index # CSS stacking order -``` - -
- -
-

Query DSL Reference

- -### Basic Operators - -| Operator | Description | Example | -|----------|-------------|---------| -| `=` | Exact match | `role=button` | -| `!=` | Exclusion | `role!=link` | -| `~` | Substring (case-insensitive) | `text~'sign in'` | -| `^=` | Prefix match | `text^='Add'` | -| `$=` | Suffix match | `text$='Cart'` | -| `>`, `>=` | Greater than | `importance>500` | -| `<`, `<=` | Less than | `bbox.y<600` | - -### Supported Fields - -- **Role**: `role=button|link|textbox|heading|...` -- **Text**: `text`, `text~`, `text^=`, `text$=` -- **Visibility**: `clickable=true|false`, `visible=true|false` -- **Importance**: `importance`, `importance>=N`, `importance - ---- - -## βš™οΈ Configuration - -
-

Viewport Size

- -Default viewport is **1280x800** pixels. You can customize it using Playwright's API: + runtime.assert_done(exists("text~'Example'"), label="task_complete") -```python -with SentienceBrowser(headless=False) as browser: - # Set custom viewport before navigating - browser.page.set_viewport_size({"width": 1920, "height": 1080}) - - browser.goto("https://example.com") -``` - -
- -
-

Headless Mode

- -```python -# Headed mode (default in dev, shows browser window) -browser = SentienceBrowser(headless=False) - -# Headless mode (default in CI environments) -browser = SentienceBrowser(headless=True) - -# Auto-detect based on environment -browser = SentienceBrowser() # headless=True if CI=true, else False -``` -
- -
-

🌍 Residential Proxy Support

- -Use residential proxies to route traffic and protect your IP address. Supports HTTP, HTTPS, and SOCKS5 with automatic SSL certificate handling: - -```python -# Method 1: Direct configuration -browser = SentienceBrowser(proxy="http://user:pass@proxy.example.com:8080") - -# Method 2: Environment variable -# export SENTIENCE_PROXY="http://user:pass@proxy.example.com:8080" -browser = SentienceBrowser() - -# Works with agents -llm = OpenAIProvider(api_key="your-key", model="gpt-4o") -agent = SentienceAgent(browser, llm) - -with browser: - browser.page.goto("https://example.com") - agent.act("Search for products") - # All traffic routed through proxy with WebRTC leak protection +if __name__ == "__main__": + asyncio.run(main()) ``` -**Features:** -- HTTP, HTTPS, SOCKS5 proxy support -- Username/password authentication -- Automatic self-signed SSL certificate handling -- WebRTC IP leak protection (automatic) - -See `examples/residential_proxy_agent.py` for complete examples. +## Capabilities (lifecycle guarantees) -
+### Controlled perception -
-

πŸ” Authentication Session Injection

+- **Semantic snapshots** instead of raw DOM dumps +- **Pruning knobs** via `SnapshotOptions` (limit/filter) +- Snapshot diagnostics that help decide when β€œstructure is insufficient” -Inject pre-recorded authentication sessions (cookies + localStorage) to start your agent already logged in, bypassing login screens, 2FA, and CAPTCHAs. This saves tokens and reduces costs by eliminating login steps. - -```python -# Workflow 1: Inject pre-recorded session from file -from sentience import SentienceBrowser, save_storage_state +### Constrained action space -# Save session after manual login -browser = SentienceBrowser() -browser.start() -browser.goto("https://example.com") -# ... log in manually ... -save_storage_state(browser.context, "auth.json") +- Action primitives operate on **stable IDs / rects** derived from snapshots +- Optional helpers for ordinality (β€œclick the 3rd result”) -# Use saved session in future runs -browser = SentienceBrowser(storage_state="auth.json") -browser.start() -# Agent starts already logged in! +### Verified progress -# Workflow 2: Persistent sessions (cookies persist across runs) -browser = SentienceBrowser(user_data_dir="./chrome_profile") -browser.start() -# First run: Log in -# Second run: Already logged in (cookies persist automatically) -``` +- Predicates like `exists(...)`, `url_matches(...)`, `is_enabled(...)`, `value_equals(...)` +- Fluent assertion DSL via `expect(...)` +- Retrying verification via `runtime.check(...).eventually(...)` -**Benefits:** -- Bypass login screens and CAPTCHAs with valid sessions -- Save 5-10 agent steps and hundreds of tokens per run -- Maintain stateful sessions for accessing authenticated pages -- Act as authenticated users (e.g., "Go to my Orders page") +### Explained failure -See `examples/auth_injection_agent.py` for complete examples. +- JSONL trace events (`Tracer` + `JsonlTraceSink`) +- Optional failure artifact bundles (snapshots, diagnostics, step timelines, frames/clip) +- Deterministic failure semantics: when required assertions can’t be proven, the run fails with artifacts you can replay -
+### Framework interoperability ---- +- Bring your own LLM and orchestration (LangGraph, AutoGen, custom loops) +- Register explicit LLM-callable tools with `ToolRegistry` -## πŸ’‘ Best Practices +## ToolRegistry (LLM-callable tools) -
-Click to expand best practices +Sentience can expose a **typed tool surface** for agents (with tool-call tracing). -### 1. Wait for Dynamic Content ```python -browser.goto("https://example.com", wait_until="domcontentloaded") -time.sleep(1) # Extra buffer for AJAX/animations -``` +from sentience.tools import ToolRegistry, register_default_tools -### 2. Use Multiple Strategies for Finding Elements -```python -# Try exact match first -btn = find(snap, "role=button text='Add to Cart'") +registry = ToolRegistry() +register_default_tools(registry, runtime) # or pass a ToolContext -# Fallback to fuzzy match -if not btn: - btn = find(snap, "role=button text~='cart'") +# LLM-ready tool specs +tools_for_llm = registry.llm_tools() ``` -### 3. Check Element Visibility Before Clicking -```python -if element.in_viewport and not element.is_occluded: - click(browser, element.id) -``` +## Permissions (avoid Chrome permission bubbles) -### 4. Handle Navigation -```python -result = click(browser, link_id) -if result.url_changed: - browser.page.wait_for_load_state("networkidle") -``` +Chrome permission prompts are outside the DOM and can be invisible to snapshots. Prefer setting a policy **before navigation**. -### 5. Use Screenshots Sparingly ```python -# Fast - no screenshot (only element data) -snap = snapshot(browser) - -# Slower - with screenshot (for debugging/verification) -snap = snapshot(browser, SnapshotOptions(screenshot=True)) -``` - -
- ---- - -## πŸ› οΈ Troubleshooting - -
-Click to expand common issues and solutions - -### "Extension failed to load" -**Solution:** Build the extension first: -```bash -cd sentience-chrome -./build.sh -``` - -### "Element not found" -**Solutions:** -- Ensure page is loaded: `browser.page.wait_for_load_state("networkidle")` -- Use `wait_for()`: `wait_for(browser, "role=button", timeout=10)` -- Debug elements: `print([el.text for el in snap.elements])` - -### Button not clickable -**Solutions:** -- Check visibility: `element.in_viewport and not element.is_occluded` -- Scroll to element: `browser.page.evaluate(f"window.sentience_registry[{element.id}].scrollIntoView()")` - -
- ---- +from sentience import AsyncSentienceBrowser, PermissionPolicy -## πŸ”¬ Advanced Features (v0.12.0+) - -
-

πŸ“Š Agent Tracing & Debugging

- -The SDK now includes built-in tracing infrastructure for debugging and analyzing agent behavior: - -```python -from sentience import SentienceBrowser, SentienceAgent -from sentience.llm_provider import OpenAIProvider -from sentience.tracing import Tracer, JsonlTraceSink -from sentience.agent_config import AgentConfig - -# Create tracer to record agent execution -tracer = Tracer( - run_id="my-agent-run-123", - sink=JsonlTraceSink("trace.jsonl") +policy = PermissionPolicy( + default="clear", + auto_grant=["geolocation"], + geolocation={"latitude": 37.77, "longitude": -122.41, "accuracy": 50}, + origin="https://example.com", ) -# Configure agent behavior -config = AgentConfig( - snapshot_limit=50, - temperature=0.0, - max_retries=1, - capture_screenshots=True -) - -browser = SentienceBrowser() -llm = OpenAIProvider(api_key="your-key", model="gpt-4o") - -# Pass tracer and config to agent -agent = SentienceAgent(browser, llm, tracer=tracer, config=config) - -with browser: - browser.page.goto("https://example.com") - - # All actions are automatically traced - agent.act("Click the sign in button") - agent.act("Type 'user@example.com' into email field") - -# Trace events saved to trace.jsonl -# Events: step_start, snapshot, llm_query, action, step_end, error +async with AsyncSentienceBrowser(permission_policy=policy) as browser: + ... ``` -**Trace Events Captured:** -- `step_start` - Agent begins executing a goal -- `snapshot` - Page state captured -- `llm_query` - LLM decision made (includes tokens, model, response) -- `action` - Action executed (click, type, press) -- `step_end` - Step completed successfully -- `error` - Error occurred during execution - -**Use Cases:** -- Debug why agent failed or got stuck -- Analyze token usage and costs -- Replay agent sessions -- Train custom models from successful runs -- Monitor production agents - -
+If your backend supports it, you can also use ToolRegistry permission tools (`grant_permissions`, `clear_permissions`, `set_geolocation`) mid-run. -
-

πŸ” Agent Runtime Verification

+## Downloads (verification predicate) -`AgentRuntime` provides assertion predicates for runtime verification in agent loops, enabling programmatic verification of browser state during execution. +If a flow is expected to download a file, assert it explicitly: ```python -from sentience import ( - AgentRuntime, SentienceBrowser, - url_contains, exists, all_of -) -from sentience.tracer_factory import create_tracer - -browser = SentienceBrowser() -browser.start() -tracer = create_tracer(run_id="my-run", upload_trace=False) -runtime = AgentRuntime(browser, browser.page, tracer) +from sentience.verification import download_completed -# Navigate and take snapshot -browser.page.goto("https://example.com") -runtime.begin_step("Verify page") -runtime.snapshot() - -# Run assertions -runtime.assert_(url_contains("example.com"), "on_correct_domain") -runtime.assert_(exists("role=heading"), "has_heading") -runtime.assert_done(exists("text~'Example'"), "task_complete") - -print(f"Task done: {runtime.is_task_done}") +runtime.assert_(download_completed("report.csv"), label="download_ok", required=True) ``` -**See example:** [`examples/agent_runtime_verification.py`](examples/agent_runtime_verification.py) - -
+## Debugging (fast) -
-

🧰 Snapshot Utilities

- -New utility functions for working with snapshots: - -```python -from sentience import snapshot -from sentience.utils import compute_snapshot_digests, canonical_snapshot_strict -from sentience.formatting import format_snapshot_for_llm - -snap = snapshot(browser) - -# Compute snapshot fingerprints (detect page changes) -digests = compute_snapshot_digests(snap.elements) -print(f"Strict digest: {digests['strict']}") # Changes when text changes -print(f"Loose digest: {digests['loose']}") # Only changes when layout changes - -# Format snapshot for LLM prompts -llm_context = format_snapshot_for_llm(snap, limit=50) -print(llm_context) -# Output: [1]
- ---- - -## πŸ“– Documentation - -- **πŸ“– [Amazon Shopping Guide](../docs/AMAZON_SHOPPING_GUIDE.md)** - Complete tutorial with real-world example -- **πŸ“– [Query DSL Guide](docs/QUERY_DSL.md)** - Advanced query patterns and operators -- **πŸ“„ [API Contract](../spec/SNAPSHOT_V1.md)** - Snapshot API specification -- **πŸ“„ [Type Definitions](../spec/sdk-types.md)** - TypeScript/Python type definitions - ---- - -## πŸ’» Examples & Testing - -
-

Examples

- -See the `examples/` directory for complete working examples: - -- **`hello.py`** - Extension bridge verification -- **`basic_agent.py`** - Basic snapshot and element inspection -- **`query_demo.py`** - Query engine demonstrations -- **`wait_and_click.py`** - Waiting for elements and performing actions -- **`read_markdown.py`** - Content extraction and markdown conversion - -
- -
-

Testing

+- **Manual driver CLI** (inspect clickables, click/type/press quickly): ```bash -# Run all tests -pytest tests/ - -# Run specific test file -pytest tests/test_snapshot.py - -# Run with verbose output -pytest -v tests/ +sentience driver --url https://example.com ``` -
+- **Verification + artifacts + debugging with time-travel traces (Sentience Studio demo)**: ---- + -## License & Commercial Use +If the video tag doesn’t render in your GitHub README view, use this link: [`sentience-studio-demo.mp4`](https://github.com/user-attachments/assets/7ffde43b-1074-4d70-bb83-2eb8d0469307) -### Open Source SDK -The Sentience SDK is dual-licensed under [MIT License](./LICENSE-MIT) and [Apache 2.0](./LICENSE-APACHE). You are free to use, modify, and distribute this SDK in your own projects (including commercial ones) without restriction. +- **Sentience SDK Documentation**: https://www.sentienceapi.com/docs -### Commercial Platform -While the SDK is open source, the **Sentience Cloud Platform** (API, Hosting, Sentience Studio) is a commercial service. +## Integrations (examples) -**We offer Commercial Licenses for:** -* **High-Volume Production:** Usage beyond the free tier limits. -* **SLA & Support:** Guaranteed uptime and dedicated engineering support. -* **On-Premise / Self-Hosted Gateway:** If you need to run the Sentience Gateway (Rust+ONNX) in your own VPC for compliance (e.g., banking/healthcare), you need an Enterprise License. +- **Browser-use:** [examples/browser-use](examples/browser-use/) +- **LangChain:** [examples/lang-chain](examples/lang-chain/) +- **LangGraph:** [examples/langgraph](examples/langgraph/) +- **Pydantic AI:** [examples/pydantic_ai](examples/pydantic_ai/) -[Contact Us](mailto:support@sentienceapi.com) for Enterprise inquiries.