-
Notifications
You must be signed in to change notification settings - Fork 1k
Open
Labels
Description
Related Issues
- Accessibility tree/element(s) snapshot - exposing semantics, roles, states, ARIA,... #363 - Accessibility tree/element(s) snapshot
- can we add a download folder to config for automating some download task. #284 - Download folder configuration for automation
- Puppeteer #6311 - URL attribute for links
Problem Statement
While working on browser automation for AI agents, we identified several reliability gaps in snapshot/extraction that affect real-world usage:
- Href extraction edge cases - Some dynamically-rendered links or SPAs don't expose
hrefreliably through the accessibility tree alone - No proactive download identification - Agents must parse snapshot text manually to find downloadable files
- Single-source accessibility - Relying solely on Puppeteer's snapshot can miss semantics (as discussed in Accessibility tree/element(s) snapshot - exposing semantics, roles, states, ARIA,... #363)
- Snapshot fragility - Individual element failures can break the entire snapshot
Proposed Improvements
We've implemented and deployed (Azure production) solutions for these:
1. Dual-Fallback Href Extraction
// Runtime.callFunctionOn with fallback
return this.href || this.getAttribute('href') || '';This handles edge cases where the standard property read fails (related to Puppeteer #6311 discussion).
2. Explicit downloadLinks Field
interface SnapshotResult {
// ... existing fields
downloadLinks: Array<{
url: string;
filename: string;
extension: string;
}>;
}Automatically identifies downloadable files by extension (.csv, .xlsx, .zip, .pdf, .json, etc.). Agents no longer need to parse text manually.
3. Dual-Source Accessibility Tree
| Source | Purpose |
|---|---|
Puppeteer page.accessibility.snapshot() |
Semantic structure |
CDP backendNodeId |
Precise DOM element mapping |
This addresses the gaps @BogdanCerovac identified in #363 - combining semantic accessibility with precise DOM mapping.
4. Resilient Error Handling
// Continue on individual element failures
for (const node of nodes) {
try {
await extractNodeData(node);
} catch (e) {
console.warn(`Skipping node: ${e.message}`);
continue; // Don't fail entire snapshot
}
}Implementation
We have a working implementation deployed in production. Happy to:
- Submit a PR with these improvements
- Provide more technical details on any specific aspect
- Discuss alternative approaches
Questions for Maintainers
- Would you prefer these as separate PRs or one consolidated change?
- For
downloadLinks- should this be opt-in via a parameter or always included? - Any concerns about the dual-source approach adding complexity?
/cc @OrKoN