Skip to content

Commit 82c9cb3

Browse files
authored
Merge pull request #135 from SentienceAPI/sentience_context2
Sentience Context for Browser-use
2 parents ac07099 + 4d23da1 commit 82c9cb3

File tree

14 files changed

+1596
-217
lines changed

14 files changed

+1596
-217
lines changed

THIRD_PARTY_LICENSES.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Third-Party Licenses
2+
3+
This product optionally depends on third-party open source software:
4+
5+
- **browser-use** (MIT License) — https://github.com/browser-use/browser-use

examples/browser-use/README.md

Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
# Sentience + browser-use Integration
2+
3+
This directory contains examples for integrating [Sentience](https://github.com/SentienceAPI/sentience-python) with [browser-use](https://github.com/browser-use/browser-use).
4+
5+
## What is browser-use?
6+
7+
[browser-use](https://github.com/browser-use/browser-use) is an open-source framework for building AI agents that can interact with web browsers. Sentience enhances browser-use by providing:
8+
9+
- **Semantic element detection** — Accurate element identification using visual and structural cues
10+
- **Token-slashed DOM context** — Reduces tokens by ~80% compared to raw DOM dumps
11+
- **Importance-ranked elements** — Elements sorted by actionability for better LLM targeting
12+
- **Ordinal task support** — "Click the 3rd item" works reliably with dominant group detection
13+
14+
## Installation
15+
16+
Install both packages together using the optional dependency:
17+
18+
```bash
19+
pip install "sentienceapi[browser-use]"
20+
```
21+
22+
Or install separately:
23+
24+
```bash
25+
pip install sentienceapi browser-use
26+
```
27+
28+
## Quick Start
29+
30+
### Using SentienceContext (Recommended)
31+
32+
`SentienceContext` provides a high-level API for getting compact, ranked DOM context:
33+
34+
```python
35+
from browser_use import BrowserSession, BrowserProfile
36+
from sentience import get_extension_dir
37+
from sentience.backends import SentienceContext, TopElementSelector
38+
39+
# Setup browser with Sentience extension
40+
profile = BrowserProfile(
41+
args=[f"--load-extension={get_extension_dir()}"],
42+
)
43+
session = BrowserSession(browser_profile=profile)
44+
await session.start()
45+
46+
# Create context builder
47+
ctx = SentienceContext(
48+
max_elements=60,
49+
top_element_selector=TopElementSelector(
50+
by_importance=60, # Top N by importance score
51+
from_dominant_group=15, # Top N from dominant group
52+
by_position=10, # Top N by page position
53+
),
54+
)
55+
56+
# Build context from browser session
57+
await session.navigate("https://news.ycombinator.com")
58+
state = await ctx.build(
59+
session,
60+
goal="Find the first Show HN post",
61+
wait_for_extension_ms=5000,
62+
)
63+
64+
if state:
65+
print(f"URL: {state.url}")
66+
print(f"Elements: {len(state.snapshot.elements)}")
67+
print(f"Prompt block:\n{state.prompt_block}")
68+
```
69+
70+
### Using Low-Level APIs
71+
72+
For fine-grained control over snapshots and actions:
73+
74+
```python
75+
from sentience import find, query, get_extension_dir
76+
from sentience.backends import BrowserUseAdapter, snapshot, click, type_text
77+
78+
# Create adapter and backend
79+
adapter = BrowserUseAdapter(session)
80+
backend = await adapter.create_backend()
81+
82+
# Take snapshot
83+
snap = await snapshot(backend)
84+
85+
# Find and interact with elements
86+
search_box = find(snap, 'role=textbox[name*="Search"]')
87+
if search_box:
88+
await click(backend, search_box.bbox)
89+
await type_text(backend, "Sentience AI")
90+
```
91+
92+
## Examples
93+
94+
| File | Description |
95+
|------|-------------|
96+
| [integration.py](integration.py) | Complete integration example with SentienceContext |
97+
98+
## Output Format
99+
100+
The `SentienceContext.build()` method returns a `SentienceContextState` with:
101+
102+
- `url` — Current page URL
103+
- `snapshot` — Full Sentience snapshot with all elements
104+
- `prompt_block` — Compact LLM-ready context block
105+
106+
The prompt block format:
107+
```
108+
Elements: ID|role|text|imp|is_primary|docYq|ord|DG|href
109+
Rules: ordinal→DG=1 then ord asc; otherwise imp desc. Use click(ID)/input_text(ID,...).
110+
1|link|Show HN: My Project|85|1|2|0|1|ycombinato
111+
2|link|Ask HN: Best practices|80|0|3|1|1|ycombinato
112+
...
113+
```
114+
115+
Fields:
116+
- `ID` — Element ID for actions
117+
- `role` — Semantic role (button, link, textbox, etc.)
118+
- `text` — Truncated element text (max 30 chars)
119+
- `imp` — Importance score (0-100)
120+
- `is_primary` — 1 if primary CTA, 0 otherwise
121+
- `docYq` — Quantized Y position (doc_y / 200)
122+
- `ord` — Ordinal rank within dominant group, or "-"
123+
- `DG` — 1 if in dominant group, 0 otherwise
124+
- `href` — Compressed href token
125+
126+
## API Reference
127+
128+
### SentienceContext
129+
130+
```python
131+
SentienceContext(
132+
sentience_api_key: str | None = None, # API key for gateway mode
133+
use_api: bool | None = None, # Force API vs extension mode
134+
max_elements: int = 60, # Max elements to fetch
135+
show_overlay: bool = False, # Show visual overlay
136+
top_element_selector: TopElementSelector | None = None,
137+
)
138+
```
139+
140+
### TopElementSelector
141+
142+
```python
143+
TopElementSelector(
144+
by_importance: int = 60, # Top N by importance score
145+
from_dominant_group: int = 15, # Top N from dominant group
146+
by_position: int = 10, # Top N by page position
147+
)
148+
```
149+
150+
### SentienceContext.build()
151+
152+
```python
153+
await ctx.build(
154+
browser_session, # browser-use BrowserSession
155+
goal: str | None = None, # Task description for reranking
156+
wait_for_extension_ms: int = 5000, # Extension load timeout
157+
retries: int = 2, # Retry attempts
158+
retry_delay_s: float = 1.0, # Delay between retries
159+
) -> SentienceContextState | None
160+
```
161+
162+
## License
163+
164+
Sentience SDK is dual-licensed under MIT and Apache-2.0.
165+
166+
browser-use is licensed under MIT. See [THIRD_PARTY_LICENSES.md](../../THIRD_PARTY_LICENSES.md).

0 commit comments

Comments
 (0)