|
| 1 | +# Build SentienceContext for Browser-Use Agent Framework |
| 2 | + |
| 3 | +## Relevant Files for Reference: |
| 4 | +1.state_injector.py: `/Users/guoliangwang/Code/Python/browser-use/browser_use/integrations/sentience/state_injector.py` |
| 5 | +2. Browser-use repo: `/Users/guoliangwang/Code/Python/browser-use` |
| 6 | + |
| 7 | +## Task |
| 8 | +You are implementing a “Token-Slasher Context Middleware” for browser-use users, shipped inside the Sentience SDK. |
| 9 | + |
| 10 | +### Goal |
| 11 | + |
| 12 | +Create a **directly importable** context class named `SentienceContext` that browser-use users can plug into their agent runtime to generate a compact, ranked DOM context block using Sentience snapshots, reducing tokens and improving reliability. |
| 13 | + |
| 14 | +This should be implemented inside the Sentience SDK repo under: |
| 15 | + |
| 16 | +* `sentience/backends/sentience_context.py` (new file) |
| 17 | +* and exported from `sentience/backends/__init__.py` |
| 18 | + |
| 19 | +It should refactor and supersede the logic currently in `state_injector.py` (which lives in a local browser-use repo copy). Use it as the baseline behavior, but remove debugging prints and improve robustness. |
| 20 | + |
| 21 | +Also integrate with the already-existing `BrowserUseAdapter` inside the SDK. |
| 22 | + |
| 23 | +--- |
| 24 | + |
| 25 | +## Requirements |
| 26 | + |
| 27 | +### 1) Public API |
| 28 | + |
| 29 | +Implement: |
| 30 | + |
| 31 | +```py |
| 32 | +@dataclass |
| 33 | +class SentienceContextState: |
| 34 | + url: str |
| 35 | + snapshot: Snapshot |
| 36 | + prompt_block: str |
| 37 | + # optional: selected_element_ids: list[int] |
| 38 | + |
| 39 | +class SentienceContext: |
| 40 | + def __init__( |
| 41 | + self, |
| 42 | + *, |
| 43 | + api_key: str | None = None, |
| 44 | + api_url: str | None = None, |
| 45 | + use_api: bool | None = None, |
| 46 | + limit: int = 60, |
| 47 | + show_overlay: bool = False, |
| 48 | + top_by_importance: int = 60, |
| 49 | + top_from_dominant_group: int = 15, |
| 50 | + top_by_position: int = 10, |
| 51 | + role_link_when_href: bool = True, |
| 52 | + include_rank_in_group: bool = True, |
| 53 | + env_api_key: str = "SENTIENCE_API_KEY", |
| 54 | + ): ... |
| 55 | + |
| 56 | + async def build( |
| 57 | + self, |
| 58 | + browser_session: "BrowserSession", |
| 59 | + *, |
| 60 | + goal: str | None = None, |
| 61 | + wait_for_extension_ms: int = 5000, |
| 62 | + retries: int = 2, |
| 63 | + retry_delay_s: float = 1.0, |
| 64 | + ) -> SentienceContextState | None: |
| 65 | + """Return context state or None if snapshot isn’t available.""" |
| 66 | +``` |
| 67 | + |
| 68 | +Notes: |
| 69 | + |
| 70 | +* `build()` must not throw for common failures; it should return `None` and log a warning. |
| 71 | +* `goal` should be passed into SnapshotOptions (so gateway rerank can use it). |
| 72 | +* Support both **extension-only mode** and **gateway mode** (if api_key exists or use_api=True). |
| 73 | +* `api_key` defaults from env var `SENTIENCE_API_KEY` if not passed. |
| 74 | +* Must avoid making browser-use a hard dependency (import types only under TYPE_CHECKING). |
| 75 | + |
| 76 | +### 2) Snapshot acquisition (browser-use) |
| 77 | + |
| 78 | +Use Sentience SDK’s existing integration pattern: |
| 79 | + |
| 80 | +* Construct a `BrowserUseAdapter(browser_session)` and call `await adapter.create_backend()` (or equivalent) to obtain a backend for `sentience.backends.snapshot.snapshot()`. |
| 81 | + |
| 82 | +Your `BrowserUseAdapter` exists and wraps CDP access. Don’t change its behavior except where necessary to make the context class clean. |
| 83 | + |
| 84 | +### 3) Formatting: compact prompt block |
| 85 | + |
| 86 | +The prompt block should be a minimal token “inventory,” similar to `state_injector.py`: |
| 87 | + |
| 88 | +* Output header: |
| 89 | + |
| 90 | + * `Elements: ID|role|text|imp|docYq|ord|DG|href` (compatible with existing) |
| 91 | +* Then list lines `cur_line = f"{id}|{role}|{name}|{importance}|{doc_yq}|{ord_val}|{dg_flag}|{href}"` |
| 92 | + |
| 93 | +BUT improvements required: |
| 94 | + |
| 95 | +#### 3.1 Remove debug prints |
| 96 | + |
| 97 | +The existing file prints group keys and formatted lines; remove those entirely. |
| 98 | + |
| 99 | +#### 3.2 Role semantics improvement (link vs button) |
| 100 | + |
| 101 | +If `role_link_when_href=True`: |
| 102 | + |
| 103 | +* If element has a non-empty `href`, output `role="link"` even if original role/tag is `button`. |
| 104 | +* Else keep existing role. |
| 105 | + |
| 106 | +This improves LLM priors for feed/list pages. |
| 107 | + |
| 108 | +#### 3.3 Dominant group membership must NOT use exact match |
| 109 | + |
| 110 | +Use `el.in_dominant_group` if present (preferred). That field is expected from gateway and uses fuzzy matching. |
| 111 | +If it’s missing, fallback to exact match ONLY as last resort. (You already do this; keep it.) |
| 112 | + |
| 113 | +#### 3.4 Fix `ord_val` semantics (avoid huge values) |
| 114 | + |
| 115 | +If `include_rank_in_group=True`: |
| 116 | + |
| 117 | +* Prefer a true small ordinal index over `group_index` if `group_index` can be “bucket-like”. |
| 118 | + Implement: |
| 119 | +* `rank_in_group`: computed locally in this formatter: |
| 120 | + |
| 121 | + * Take interactive elements where `in_dominant_group=True` |
| 122 | + * Sort them by `(doc_y, bbox.y, bbox.x, -importance)` using available fields |
| 123 | + * Assign `rank_in_group = 0..n-1` |
| 124 | +* Then set: |
| 125 | + |
| 126 | + * `ord_val = rank_in_group` for dominant group items |
| 127 | + * otherwise `ord_val="-"` |
| 128 | + |
| 129 | +Do NOT modify the Snapshot schema; compute this locally in the context builder. |
| 130 | + |
| 131 | +Keep emitting `doc_yq` as `round(doc_y/200)` like current code, but ensure doc_y uses `el.doc_y` if present. |
| 132 | + |
| 133 | +#### 3.5 `href` field: keep short token |
| 134 | + |
| 135 | +Keep the current behavior of compressing href into a short token (domain second-level or last path segment). |
| 136 | + |
| 137 | +### 4) Element selection strategy (token slasher) |
| 138 | + |
| 139 | +Replicate the selection recipe from `state_injector.py`: |
| 140 | + |
| 141 | +* Filter to “interactive roles” |
| 142 | +* Take: |
| 143 | + |
| 144 | + * top_by_importance |
| 145 | + * top_from_dominant_group |
| 146 | + * top_by_position (lowest doc_y) |
| 147 | +* Deduplicate by element ID. |
| 148 | + |
| 149 | +BUT make it robust: |
| 150 | + |
| 151 | +* don’t rely on `snapshot.dominant_group_key` being present; use `in_dominant_group=True` filtering primarily. |
| 152 | +* if `doc_y` missing, fallback to `bbox.y` if bbox exists. |
| 153 | + |
| 154 | +### 5) Logging |
| 155 | + |
| 156 | +Use `logging.getLogger(__name__)`. |
| 157 | + |
| 158 | +* On ImportError: warn “Sentience SDK not available…” |
| 159 | +* On snapshot failure: warn and include exception string. |
| 160 | +* On success: info “SentienceContext snapshot: X elements URL=…” |
| 161 | + |
| 162 | +No prints. |
| 163 | + |
| 164 | +### 6) Packaging / exports |
| 165 | + |
| 166 | +* Export `SentienceContext` and `SentienceContextState` from `sentience/backends/__init__.py` |
| 167 | +* Keep browser-use as optional dependency (document usage in README; do not introduce mandatory dependency) |
| 168 | +* Ensure type hints don’t import browser-use at runtime. |
| 169 | + |
| 170 | +### 7) Example usage snippet |
| 171 | + |
| 172 | +Provide an example in docstring or comments: |
| 173 | + |
| 174 | +```py |
| 175 | +from browser_use import Agent |
| 176 | +from sentience.backends import SentienceContext |
| 177 | + |
| 178 | +ctx = SentienceContext(show_overlay=True) |
| 179 | +state = await ctx.build(agent.browser_session, goal="Click the first Show HN post") |
| 180 | +if state: |
| 181 | + agent.add_context(state.prompt_block) # or however browser-use injects state |
| 182 | +``` |
| 183 | + |
| 184 | +Do not modify browser-use repo. This is SDK-only. |
| 185 | + |
| 186 | +--- |
| 187 | + |
| 188 | +## Deliverables |
| 189 | + |
| 190 | +1. `sentience/backends/sentience_context.py` new module |
| 191 | +2. update `sentience/backends/__init__.py` exports |
| 192 | +3. Ensure it compiles and is formatted |
| 193 | +4. Keep behavior backwards compatible with existing compact line schema, but improve `role` and `ord_val` as above. |
| 194 | + |
| 195 | +--- |
| 196 | + |
| 197 | +If you need to reference the baseline behavior, use the attached `state_injector.py` as the template. |
| 198 | + |
| 199 | +--- |
| 200 | + |
| 201 | +If you want, I can also give you a short "README integration snippet" for browser-use users (the 5-line copy/paste install + usage) once Claude produces the code. |
| 202 | + |
| 203 | +--- |
| 204 | + |
| 205 | +## Feasibility & Complexity Assessment |
| 206 | + |
| 207 | +### Overall Verdict: ✅ FEASIBLE - Medium Complexity |
| 208 | + |
| 209 | +**Estimated effort:** 2-4 hours for Python SDK |
| 210 | + |
| 211 | +--- |
| 212 | + |
| 213 | +### Prerequisites Analysis |
| 214 | + |
| 215 | +| Prerequisite | Status | Notes | |
| 216 | +|-------------|--------|-------| |
| 217 | +| `BrowserUseAdapter` exists | ✅ Ready | `sentience/backends/browser_use_adapter.py` - wraps CDP for browser-use | |
| 218 | +| `snapshot()` function exists | ✅ Ready | `sentience/backends/snapshot.py` - supports both extension and API modes | |
| 219 | +| `Element` model has ordinal fields | ✅ Ready | `doc_y`, `group_key`, `group_index`, `href`, `in_dominant_group` all present | |
| 220 | +| `Snapshot` model has `dominant_group_key` | ✅ Ready | Added in Phase 2 | |
| 221 | +| `SnapshotOptions` supports `goal` | ✅ Ready | Line 139 in models.py | |
| 222 | +| browser-use not a hard dependency | ✅ Ready | Already uses `TYPE_CHECKING` pattern | |
| 223 | + |
| 224 | +--- |
| 225 | + |
| 226 | +### Complexity Breakdown by Requirement |
| 227 | + |
| 228 | +| Requirement | Complexity | Rationale | |
| 229 | +|-------------|------------|-----------| |
| 230 | +| 1) Public API (`SentienceContext`, `SentienceContextState`) | 🟢 Low | Simple dataclass + class with `__init__` and `build()` | |
| 231 | +| 2) Snapshot acquisition | 🟢 Low | Reuse existing `BrowserUseAdapter` + `snapshot()` | |
| 232 | +| 3.1) Remove debug prints | 🟢 Low | Just don't add them | |
| 233 | +| 3.2) Role link-when-href | 🟢 Low | Simple conditional: `"link" if href else role` | |
| 234 | +| 3.3) Use `in_dominant_group` (fuzzy) | 🟢 Low | Field already exists from gateway | |
| 235 | +| 3.4) Fix `ord_val` (local rank computation) | 🟡 Medium | Need to sort dominant group elements locally and assign 0..n-1 | |
| 236 | +| 3.5) Short href token | 🟢 Low | URL parsing logic already in state_injector.py | |
| 237 | +| 4) Element selection (token slasher) | 🟡 Medium | 3-way selection + deduplication, but logic is clear | |
| 238 | +| 5) Logging | 🟢 Low | Standard `logging.getLogger(__name__)` | |
| 239 | +| 6) Packaging/exports | 🟢 Low | Add 2 lines to `__init__.py` | |
| 240 | +| 7) Example in docstring | 🟢 Low | Copy from design doc | |
| 241 | + |
| 242 | +--- |
| 243 | + |
| 244 | +### Risk Areas |
| 245 | + |
| 246 | +1. **`ord_val` local computation (Req 3.4)**: The design requires computing `rank_in_group` locally by sorting `in_dominant_group=True` elements. This is the right approach to fix the large `ord_val` issue, but requires careful implementation: |
| 247 | + - Sort key: `(doc_y or bbox.y, bbox.x, -importance)` |
| 248 | + - Must handle missing `doc_y` gracefully |
| 249 | + |
| 250 | +2. **Retry logic**: The `build()` method has `retries` and `retry_delay_s` parameters. Need to implement exponential backoff or simple retry loop. |
| 251 | + |
| 252 | +3. **Error handling**: Must catch exceptions from `snapshot()` and return `None` instead of propagating. |
| 253 | + |
| 254 | +--- |
| 255 | + |
| 256 | +### Implementation Checklist |
| 257 | + |
| 258 | +- [ ] Create `sentience/backends/sentience_context.py` |
| 259 | +- [ ] Update `sentience/backends/__init__.py` with exports |
| 260 | +- [ ] Test with browser-use locally |
| 261 | + |
| 262 | +--- |
| 263 | + |
| 264 | +### Conclusion |
| 265 | + |
| 266 | +This is a **well-scoped, medium-complexity task** with all prerequisites already in place. The main implementation work is: |
| 267 | +1. Element selection logic (3-way merge with deduplication) |
| 268 | +2. Local `rank_in_group` computation (sort + enumerate) |
| 269 | +3. Compact line formatting |
| 270 | + |
| 271 | +No schema changes or gateway modifications required. The gateway fix for `is_content_like_element` (MIN_CONTENT_TEXT_LENGTH=5) has already been implemented, which should reduce the large `ord_val` issue at the source. |
0 commit comments