Does AdaptivePlaywrightCrawler always invokes Playwright browser? #1793
-
|
Hello, I'm trying to scrape some websites and determine what AdaptivePlaywrightCrawler strategy is used to scrape the website. And check the context.page in the handler to determine what strategy is used. It looks like the handler is always invoked twice: one time for the Playwright client strategy and one time for the static strategy. So, is it a fact Crawlee AdaptivePlaywrightCrawler always tries the "client" first to determine what strategy to use next for sub levels and/or similar pages? How can I finally determine what strategy AdaptivePlaywrightCrawler has or will use for the crawl as I have to generate statistics about the amount of websites that need the client vs the static strategy. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
|
Hello, you can inspect the Crawler statistics state. In case of
Initially, it will run both to compare results. It will gradually lean towards one of the approaches for a group of similar urls and it will, from time to time, recheck the assumption with both. To further understand how the prediction works, you can check https://crawlee.dev/python/docs/guides/adaptive-playwright-crawler#prediction-related-arguments |
Beta Was this translation helpful? Give feedback.
Yes, the freshly started
AdaptivePlaywrightCrawlerwill try both on the first request. It is learning by comparing the results of both; if they are the same, no need to use the browser. The statistics become useful if you let it run for a while.