feat: Add post_navigation_hooks to crawlers#1795
feat: Add post_navigation_hooks to crawlers#1795Mantisus wants to merge 7 commits intoapify:masterfrom
post_navigation_hooks to crawlers#1795Conversation
There was a problem hiding this comment.
Pull request overview
Adds support for post_navigation_hooks across the crawler stack (HTTP, Playwright, and Adaptive Playwright) so users can run logic after navigation completes but before the request handler executes.
Changes:
- Introduces post-navigation hook registration/execution in
AbstractHttpCrawlerandPlaywrightCrawler. - Adds
PlaywrightPostNavCrawlingContextand updates context inheritance so the post-nav context includesresponse. - Extends
AdaptivePlaywrightCrawlerwith post-nav hooks and a wrapperAdaptivePlaywrightPostNavCrawlingContext, plus new/updated unit tests.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/crawlers/_playwright/test_playwright_crawler.py | Adds Playwright pre/post-nav hook tests and ordering assertions. |
| tests/unit/crawlers/_http/test_http_crawler.py | Updates HTTP crawler hook tests and adds post-nav hook coverage + ordering. |
| tests/unit/crawlers/_adaptive_playwright/test_adaptive_playwright_crawler.py | Adds Adaptive Playwright post-nav hook tests and updates hook-only test naming. |
| src/crawlee/crawlers/_playwright/_playwright_post_nav_crawling_context.py | New post-nav context type holding the Playwright response. |
| src/crawlee/crawlers/_playwright/_playwright_crawling_context.py | Makes main Playwright context inherit from post-nav context. |
| src/crawlee/crawlers/_playwright/_playwright_crawler.py | Inserts post-nav hook execution into the Playwright pipeline and exposes registration API. |
| src/crawlee/crawlers/_playwright/init.py | Exports PlaywrightPostNavCrawlingContext. |
| src/crawlee/crawlers/_adaptive_playwright/_adaptive_playwright_crawling_context.py | Adds AdaptivePlaywrightPostNavCrawlingContext wrapper + conversion helper. |
| src/crawlee/crawlers/_adaptive_playwright/_adaptive_playwright_crawler.py | Delegates post-nav hooks to subcrawlers and adds a public registration API. |
| src/crawlee/crawlers/_adaptive_playwright/init.py | Exports AdaptivePlaywrightPostNavCrawlingContext. |
| src/crawlee/crawlers/_abstract_http/_abstract_http_crawler.py | Adds post-nav hook list, pipeline step, and registration method for HTTP crawlers. |
| src/crawlee/crawlers/init.py | Re-exports the new crawling context types from the top-level crawlers package. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/crawlee/crawlers/_adaptive_playwright/_adaptive_playwright_crawler.py
Outdated
Show resolved
Hide resolved
tests/unit/crawlers/_adaptive_playwright/test_adaptive_playwright_crawler.py
Show resolved
Hide resolved
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Pijukatel
left a comment
There was a problem hiding this comment.
With more hooks being added, there is some code duplication. I guess there are more hooks on the way, so it would be good to start thinking about code re-use and some refactoring in case many hooks will share near duplicate code snippets.
vdusek
left a comment
There was a problem hiding this comment.
Currently, I don't have enough time to review it properly manually, but here are a few comments from Claude. Consider them, please.
src/crawlee/crawlers/_adaptive_playwright/_adaptive_playwright_crawling_context.py
Outdated
Show resolved
Hide resolved
src/crawlee/crawlers/_adaptive_playwright/_adaptive_playwright_crawler.py
Show resolved
Hide resolved
vdusek
left a comment
There was a problem hiding this comment.
And also could you please consider whether we should not document this feature somewhere?
Description
post_navigation_hooksthat run after navigation.Issues
Testing