Reproducible demos showing how structure-first browser agents outperform vision-only agents.
This repository contains 8 real-world browser agent demos that run using:
- Semantic geometry snapshots (DOM-based, not vision)
- Jest-style AgentRuntime assertions
- 6 of these 8 demos use local-first inference (Qwen 2.5 3B)
amazon_shoppingandgoogle_searchuse cloud LLM models for comparison- Optional vision fallback only after exhaustion
TL;DR
- ✅ 100% task success across all demos
- 💸 ~50% lower token usage per step
- 🧠 Works with small local models (3B–7B)
- ❌ Vision-only agents fail systematically on the same tasks
This is a playground + benchmark for developers evaluating:
- browser agents
- local LLM execution
- deterministic web automation
- flaky UI handling
- assertion-driven verification
Each demo includes:
- runnable code
- logs
- screenshots
- optional video artifacts
- token accounting
Task Open the top "Show HN" post deterministically.
Why it matters This tests ordinal reasoning ("first", "top") — a known weakness of vision agents.
Config
- Model: Qwen 2.5 3B (local)
- Vision: Disabled
- Assertions:
ordinal=first,url_contains - Tokens: ~1.6k per step
Result ✅ PASS — zero retries, deterministic
📂 news_list_skimming/ | 📹 Video
Task Log in, wait for async hydration, verify profile state.
Why it matters
Shows state-aware assertions (enabled, visible, value_equals) on a modern SPA.
Config
- Model: Qwen 2.5 3B (local)
- Vision: Disabled
- Assertions:
eventually(),is_enabled,text_contains - Handles delayed hydration + dynamic state
Result ✅ PASS — no sleeps, no magic waits
📂 login_profile_check/ | 📹 Video
Task Search product → open result → add to cart.
Why it matters High-noise, JS-heavy, real production site.
Config
- Model: Qwen 2.5 3B (local)
- Vision: Disabled (fallback optional)
- Assertions: navigation, button state, success banner
- Tokens: ~5.5k total
Result ✅ PASS — vision-only agents failed 3/3 runs
📂 amazon_shopping_with_assertions/ | 📹 Video
| Metric | Vision-Only | Sentience SDK |
|---|---|---|
| Task success | ❌ 0–30% | ✅ 100% |
| Avg tokens / step | ~3,000+ | ~1,500 |
| Vision usage | Required | Optional fallback |
| Determinism | No | Yes |
| Local model viable | No | Yes (3B–7B) |
Vision agents reason from pixels. Sentience agents reason from structure.
Snapshots provide:
- semantic roles
- ordinality
- grouping
- state (enabled, checked, expanded)
- confidence diagnostics
Assertions verify outcomes — not guesses.
The demo suite consistently succeeds with a small local model (Qwen2.5 3B) using compact, structured prompts:
- Token efficiency: ~14.9K tokens across 5 demos vs 100K+ for vision-heavy approaches
- Reliability: 5/5 PASS with 0 retries across multi-step flows
- Speed: Local text models are faster than vision LLMs for structured UI tasks
See docs/DEMO_REPORTS.md for full metrics and results.
git clone https://github.com/SentienceAPI/sentience-sdk-playground
cd sentience-sdk-playground
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
pip install sentienceapi
playwright install chromiumRun a demo:
cd news_list_skimming
python main.pynews_list_skimming/ # Ordinality + list reasoning
amazon_shopping_with_assertions/ # Real-world stress test
login_profile_check/ # SPA + form + login flows
dashboard_kpi_extraction/ # KPI extraction + DOM churn
form_validation_submission/ # Multi-step form validation
local-llama-land/ # Demo Next.js site (SPA)
docs/ # Reports, plans, comparisons
- Sentience SDK (Python): https://github.com/SentienceAPI/sentience-python
- Sentience SDK (TS): https://github.com/SentienceAPI/sentience-ts
- Demo Site: https://sentience-sdk-playground.vercel.app
- Docs: https://www.sentienceapi.com/docs
- Issues: https://github.com/SentienceAPI/sentience-sdk-playground/issues
Structure replaces vision. Assertions replace retries. Small models become viable.
This repo shows that clearly — with real logs, real sites, real results.
Task: Extract KPIs from dynamic dashboard with DOM churn resilience.
📂 dashboard_kpi_extraction/ | 📹 Video
Task: Complete multi-step form with validation at each step.
📂 form_validation_submission/ | 📹 Video (screenshots generated locally after running)
See docs/DEMO_REPORTS.md for detailed execution reports and metrics.



