feat: Add FirecrawlWebSearch component by saakshigupta2002 · Pull Request #2888 · deepset-ai/haystack-core-integrations

saakshigupta2002 · 2026-02-27T13:36:10Z

Related Issues

fixes Firecrawl: Add websearch functionality #2870

Proposed Changes:

This PR adds a new FirecrawlWebSearch component to the Firecrawl integration, enabling web search queries using the Firecrawl Search API. This follows the pattern established by Haystack's existing SearchApiWebSearch and SerperDevWebSearch components, as suggested in the PR review for #2859.

Many use cases require starting from a user search query rather than from a predefined URL list. While the existing FirecrawlCrawler handles crawling known URLs, this new component allows users to search the web and retrieve results as Haystack Documents — making it straightforward to build search-augmented pipelines.

What's included:

FirecrawlWebSearch component at haystack_integrations.components.websearch.firecrawl
- Follows the standard Haystack WebSearch interface: accepts a query string, returns documents and links
- Supports both synchronous (run()) and asynchronous (run_async()) execution
- Configurable top_k for result limiting and search_params for Firecrawl-specific options (time filters, location, scrape options, etc.)
- Full serialization support via to_dict() / from_dict()
- Graceful error handling with logging (returns empty results on failure)
- Handles both search-only results (title/description/url) and scraped document results (full markdown content when scrapeOptions are provided)
Comprehensive test suite (16 unit tests + 2 integration tests)
- Tests for initialization, serialization round-trip, search execution, parameter overrides, top_k truncation, async execution, error handling, warm-up behavior, and edge cases (empty/null results)
Configuration updates
- Updated pyproject.toml with "Web Search" keyword and mypy target for the new module
- Updated pydoc/config_docusaurus.yml to include the new component in API documentation generation

Usage example:

from haystack_integrations.components.websearch.firecrawl import FirecrawlWebSearch
from haystack.utils import Secret

websearch = FirecrawlWebSearch(
    api_key=Secret.from_env_var("FIRECRAWL_API_KEY"),
    top_k=5,
)
websearch.warm_up()

result = websearch.run(query="What is Haystack by deepset?")
documents = result["documents"]  # List of Document objects with search result content
links = result["links"]          # List of URL strings

Design decisions:

Uses the Firecrawl Python SDK (firecrawl-py) directly rather than raw HTTP requests, consistent with the existing FirecrawlCrawler implementation. No new dependencies needed since firecrawl-py>=4.0.0 already includes the search() method.
Placed under components/websearch/firecrawl/ to follow Haystack's namespace convention for web search components, separate from the existing crawler under components/fetchers/firecrawl/.
top_k maps to Firecrawl's limit parameter when not explicitly set in search_params, providing a familiar interface while allowing full control through search_params.
Response parsing handles both response types: SearchResultWeb objects (search metadata only) and Firecrawl Document objects (full markdown content when scrapeOptions are provided), using attribute detection rather than type imports to avoid coupling to SDK internals.

How did you test it?

All 16 unit tests pass (hatch run test:unit) — covering initialization, serialization round-trip, search execution with mocked clients, runtime parameter overrides, top_k truncation, async execution, error handling, warm-up behavior, and edge cases (empty/null web results)
All existing FirecrawlCrawler tests continue to pass (14/14)
Lint and format checks pass (hatch run fmt-check)
Integration tests included (require FIRECRAWL_API_KEY environment variable): test_run_integration and test_run_async_integration

Notes for the reviewer

The new component lives under components/websearch/firecrawl/ (separate from the existing components/fetchers/firecrawl/) to follow the same namespace convention as Haystack core's SearchApiWebSearch and SerperDevWebSearch
The _parse_search_response method handles two types of results from the Firecrawl SDK: basic SearchResultWeb objects (when no scrape options are set) and full Document objects with markdown content (when scrapeOptions are provided). This is handled via hasattr checks rather than importing SDK-internal types.
No new dependencies were added — firecrawl-py>=4.0.0 (already a dependency) includes the search() method

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.

Add a new FirecrawlWebSearch component that enables web search queries using the Firecrawl Search API. The component follows the standard Haystack WebSearch interface (query input, documents + links output) and supports both synchronous and asynchronous execution. Closes deepset-ai#2870

Add missing py.typed marker file to the websearch package so mypy can type check the module.

anakin87 · 2026-02-27T15:09:00Z

@bogdankostic could you help review this PR?

Remember that integration tests do not run on PRs from forks, so we need to verify locally that they are correct.

bogdankostic

Thanks for the PR @saakshigupta2002! I left a few comments on how to further improve it :)

bogdankostic · 2026-03-02T13:11:31Z