feat: add Tavily as parallel search provider option in backend#208
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
There was a problem hiding this comment.
Code Review
This pull request integrates Tavily as an alternative search provider for web research, adding the necessary environment variables, dependencies, and configuration options. The core logic in the agent's graph has been refactored to support multiple search providers. Feedback focuses on improving the flexibility and performance of the Tavily integration by making search parameters configurable and caching the client instance.
| search_provider: str = Field( | ||
| default="google", | ||
| metadata={ | ||
| "description": "The search provider to use for web research. Options: 'google' (default) or 'tavily'." | ||
| }, | ||
| ) |
There was a problem hiding this comment.
To make the Tavily integration more flexible, consider making max_results and search_depth configurable instead of hardcoding them in graph.py. This allows adjusting the search behavior without changing the code.
| search_provider: str = Field( | |
| default="google", | |
| metadata={ | |
| "description": "The search provider to use for web research. Options: 'google' (default) or 'tavily'." | |
| }, | |
| ) | |
| search_provider: str = Field( | |
| default="google", | |
| metadata={ | |
| "description": "The search provider to use for web research. Options: 'google' (default) or 'tavily'." | |
| }, | |
| ) | |
| tavily_max_results: int = Field( | |
| default=5, | |
| metadata={"description": "The maximum number of results to return from Tavily search."}, | |
| ) | |
| tavily_search_depth: str = Field( | |
| default="advanced", | |
| metadata={"description": "The depth of search for Tavily. Options: 'basic' or 'advanced'."}, | |
| ) |
| tavily_client = TavilyClient() | ||
| response = tavily_client.search( | ||
| query=state["search_query"], | ||
| max_results=5, | ||
| search_depth="advanced", | ||
| ) |
There was a problem hiding this comment.
There are a couple of improvements that can be made here:
- Client Caching: To improve performance, the
TavilyClientshould be instantiated only once. Creating a new client on every call is inefficient, especially since this function can be called multiple times. You can cache the client instance, for example, as a function attribute. - Configurability: The
max_resultsandsearch_depthare hardcoded. It would be more flexible to make these configurable via theConfigurationmodel, similar to other settings.
This suggestion addresses both points and assumes you add tavily_max_results and tavily_search_depth to configuration.py.
| tavily_client = TavilyClient() | |
| response = tavily_client.search( | |
| query=state["search_query"], | |
| max_results=5, | |
| search_depth="advanced", | |
| ) | |
| if not hasattr(_web_research_tavily, "client"): | |
| _web_research_tavily.client = TavilyClient() | |
| response = _web_research_tavily.client.search( | |
| query=state["search_query"], | |
| max_results=configurable.tavily_max_results, | |
| search_depth=configurable.tavily_search_depth, | |
| ) |
Summary
tavily-pythondependency tobackend/pyproject.tomlTAVILY_API_KEYtobackend/.env.example(optional, only needed whensearch_provider=tavily)search_providerfield (default'google') toConfigurationmodel inconfiguration.pyweb_researchnode ingraph.py: existing Google grounding path preserved as default, new Tavily path added that maps Tavily results into the samesources_gathered/web_research_resultstate shapeFiles changed
backend/pyproject.toml— addedtavily-pythondependencybackend/.env.example— documentedTAVILY_API_KEYenv varbackend/src/agent/configuration.py— addedsearch_providerconfig fieldbackend/src/agent/graph.py— importedTavilyClient, refactoredweb_researchinto provider-specific helpersDependency changes
tavily-pythontobackend/pyproject.tomldependenciesEnvironment variable changes
TAVILY_API_KEYtobackend/.env.example(required only whensearch_provider=tavily)Notes for reviewers
search_depth="advanced"andmax_results=5for high-quality results{label, short_url, value}source schema used by the Google pathshort_urlplaceholders for Tavily usehttps://tavily.com/id/prefix to stay consistent with the URL replacement logic infinalize_answer🤖 Generated with Claude Code
Automated Review
search_providerconfig field is added with a safe default of 'google', and theweb_researchnode dispatches to a new_web_research_tavilyhelper that maps Tavily results into the existingsources_gathered/web_research_resultstate shape. All existing Google Search logic is fully preserved and no regressions were found. A few minor issues exist but none block approval.