Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,14 @@ keywords = [
"scraping",
]
dependencies = [
"apify-fingerprint-datapoints>=0.11.0",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have all these added dependencies in the optional dependencies group playwright. So please remove them from here.

"async-timeout>=5.0.1",
"browserforge>=1.2.4",
"cachetools>=5.5.0",
"colorama>=0.4.0",
"impit>=0.8.0",
"more-itertools>=10.2.0",
"playwright>=1.58.0",
"protego>=0.5.0",
"psutil>=6.0.0",
"pydantic-settings>=2.12.0",
Expand Down
16 changes: 15 additions & 1 deletion src/crawlee/browsers/_playwright_browser.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from __future__ import annotations

import asyncio
import inspect
import shutil
import tempfile
from logging import getLogger
Expand Down Expand Up @@ -67,8 +68,21 @@ async def new_context(self, **context_options: Any) -> BrowserContext:
user_data_dir = tempfile.mkdtemp(prefix=self._TMP_DIR_PREFIX)
self._temp_dir = Path(user_data_dir)

launch_persistent_context_sig = inspect.signature(self._browser_type.launch_persistent_context)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a reasonable approach, but it has some drawbacks. If user has just typo ( in otherwise valid argument name), it will just show warning in log. Same for using some completely nonsensical argument. That should raise an error and not just log a warning.

For example, this should raise (typo in headles):

    persist_browser = PlaywrightPersistentBrowser(
        playwright.chromium, browser_launch_options={'headles': True}
    )

Maybe this approach could be adopted one lever higher (not in PlaywrightPersistentBrowser - which always just calls launch_persistent_context), but in PlaywrightBrowserController - that is the class that decides about calling launch_persistent_context or new_context, but feeds them the same arguments.

It should properly raise exceptions for bad arguments, but it could just log a warning as per your suggestion for arguments at least valid in the other method. It would have to get 3 sets of arguments to be able to do such a distinction. Something like:

...
    launch_persistent_context_sig = set(inspect.signature(BrowserType.launch_persistent_context).parameters)
    new_context_sig = set(inspect.signature(Browser.new_context).parameters)
    persistent_unique_options = launch_persistent_context_sig - new_context_sig
    new_context_unique_options = new_context_sig - launch_persistent_context_sig
    common_options = launch_persistent_context_sig & new_context_sig
...

And then raise an exception or just log based on the selected mode.

filtered_launch_options = {
key: value for key, value in launch_options.items() if key in launch_persistent_context_sig.parameters
}

removed_options = set(launch_options.keys()) - set(filtered_launch_options.keys())
if removed_options:
logger.warning(
f"The following options are not supported by Playwright's launch_persistent_context "
f'and will be ignored: {removed_options}. '
'To use these options, consider using incognito pages (use_incognito_pages=True).'
)

self._context = await self._browser_type.launch_persistent_context(
user_data_dir=user_data_dir, **launch_options
user_data_dir=user_data_dir, **filtered_launch_options
)

if self._temp_dir:
Expand Down
10 changes: 10 additions & 0 deletions tests/unit/browsers/test_playwright_browser.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,3 +42,13 @@ async def test_delete_temp_folder_with_close_browser(playwright: Playwright) ->
assert current_temp_dir.exists()
await persist_browser.close()
assert not current_temp_dir.exists()


async def test_new_context_unsupported_options(playwright: Playwright) -> None:
persist_browser = PlaywrightPersistentBrowser(
playwright.chromium, user_data_dir=None, browser_launch_options={'headless': True}
)
# This should not crash despite 'storage_state' being invalid for launch_persistent_context
context = await persist_browser.new_context(storage_state={'cookies': [], 'origins': []})
assert context is not None
await persist_browser.close()
6 changes: 6 additions & 0 deletions uv.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading