fix(security): Add SSRF protection to LinkContentFetcher#10514
fix(security): Add SSRF protection to LinkContentFetcher#10514Mr-Neutr0n wants to merge 1 commit intodeepset-ai:mainfrom
Conversation
This commit addresses a Server-Side Request Forgery (SSRF) vulnerability in the LinkContentFetcher component by implementing URL validation that blocks requests to internal/private network resources. Changes: - Add `is_safe_url()` function in `url_validation.py` to detect unsafe URLs - Add `_is_private_ip()` helper to identify private/internal IP addresses - Add `block_internal_urls` parameter to LinkContentFetcher (default=True) - Block requests to: - Private IP ranges (10.x, 172.16-31.x, 192.168.x) - Loopback addresses (127.0.0.1, localhost, ::1) - Link-local addresses (169.254.x.x) including cloud metadata endpoints - Reserved and multicast addresses - Perform DNS resolution to prevent DNS rebinding attacks - Add comprehensive test coverage for SSRF protection Security Impact: - Prevents attackers from using Haystack pipelines to access internal network resources, cloud metadata services, or localhost services - Protection is enabled by default but can be disabled via `block_internal_urls=False` for trusted internal use cases Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
@Mr-Neutr0n is attempting to deploy a commit to the deepset Team on Vercel. A member of the Team first needs to authorize it. |
|
Friendly follow-up - is there anything I can improve in this PR? Happy to address any feedback. Thanks! |
|
@Mr-Neutr0n Thank you for the suggestion to add url validation to the LinkContentFetcher component and opening this pull request. |
|
Makes sense, thanks for taking the time to discuss it internally and getting back to me. I can see how defaulting to blocking internal URLs would be a breaking change for folks with valid internal network use cases. Documenting the risks and recommending input validation at the application layer sounds like a reasonable approach. Closing this one out — cheers! |
Related Issue
Fixes #10513
Summary
This PR adds Server-Side Request Forgery (SSRF) protection to the
LinkContentFetchercomponent by implementing URL validation that blocks requests to internal/private network resources.Changes
New Features
is_safe_url()function inhaystack/utils/url_validation.pyto detect unsafe URLs_is_private_ip()helper function to identify private/internal IP addressesblock_internal_urlsparameter toLinkContentFetcher(default:True)Blocked URL Types
The protection blocks requests to:
10.0.0.0/8,172.16.0.0/12,192.168.0.0/16127.0.0.1,localhost,::1,0.0.0.0169.254.0.0/16(includes cloud metadata endpoints)DNS Rebinding Protection
The implementation also performs DNS resolution to prevent DNS rebinding attacks where a hostname initially resolves to a safe IP but later resolves to an internal IP.
Usage
Test Plan
_is_private_ip()functionis_safe_url()functionLinkContentFetcherSSRF protection (sync)LinkContentFetcherSSRF protection (async)Breaking Changes
This change is backwards compatible but may affect existing code that relies on fetching content from internal URLs. Users who need to access internal resources can set
block_internal_urls=False.Security Impact
🤖 Generated with Claude Code