fix(security): Add SSRF protection to LinkContentFetcher by Mr-Neutr0n · Pull Request #10514 · deepset-ai/haystack

Mr-Neutr0n · 2026-02-05T10:33:39Z

Related Issue

Summary

This PR adds Server-Side Request Forgery (SSRF) protection to the LinkContentFetcher component by implementing URL validation that blocks requests to internal/private network resources.

Changes

New Features

Added is_safe_url() function in haystack/utils/url_validation.py to detect unsafe URLs
Added _is_private_ip() helper function to identify private/internal IP addresses
Added block_internal_urls parameter to LinkContentFetcher (default: True)

Blocked URL Types

The protection blocks requests to:

Private IP ranges: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16
Loopback addresses: 127.0.0.1, localhost, ::1, 0.0.0.0
Link-local addresses: 169.254.0.0/16 (includes cloud metadata endpoints)
Reserved and multicast addresses

DNS Rebinding Protection

The implementation also performs DNS resolution to prevent DNS rebinding attacks where a hostname initially resolves to a safe IP but later resolves to an internal IP.

Usage

from haystack.components.fetchers import LinkContentFetcher

# SSRF protection enabled by default
fetcher = LinkContentFetcher()
# This will be blocked:
# fetcher.run(urls=["http://169.254.169.254/latest/meta-data/"])

# Opt-out for trusted internal use cases
fetcher = LinkContentFetcher(block_internal_urls=False)
# Now internal URLs are allowed

Test Plan

Added unit tests for _is_private_ip() function
Added unit tests for is_safe_url() function
Added unit tests for LinkContentFetcher SSRF protection (sync)
Added unit tests for LinkContentFetcher SSRF protection (async)
Tests cover all private IP ranges, localhost variants, and cloud metadata endpoints
Tests verify the opt-out mechanism works correctly

Breaking Changes

This change is backwards compatible but may affect existing code that relies on fetching content from internal URLs. Users who need to access internal resources can set block_internal_urls=False.

Security Impact

Prevents attackers from using Haystack pipelines to access internal network resources
Blocks access to cloud metadata services (credential theft prevention)
Prevents access to localhost services
Protection is enabled by default but can be disabled for trusted use cases

🤖 Generated with Claude Code

This commit addresses a Server-Side Request Forgery (SSRF) vulnerability in the LinkContentFetcher component by implementing URL validation that blocks requests to internal/private network resources. Changes: - Add `is_safe_url()` function in `url_validation.py` to detect unsafe URLs - Add `_is_private_ip()` helper to identify private/internal IP addresses - Add `block_internal_urls` parameter to LinkContentFetcher (default=True) - Block requests to: - Private IP ranges (10.x, 172.16-31.x, 192.168.x) - Loopback addresses (127.0.0.1, localhost, ::1) - Link-local addresses (169.254.x.x) including cloud metadata endpoints - Reserved and multicast addresses - Perform DNS resolution to prevent DNS rebinding attacks - Add comprehensive test coverage for SSRF protection Security Impact: - Prevents attackers from using Haystack pipelines to access internal network resources, cloud metadata services, or localhost services - Protection is enabled by default but can be disabled via `block_internal_urls=False` for trusted internal use cases Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

vercel · 2026-02-05T10:33:46Z

@Mr-Neutr0n is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

CLAassistant · 2026-02-05T10:33:48Z

All committers have signed the CLA.

Mr-Neutr0n · 2026-02-08T11:46:03Z

Friendly follow-up - is there anything I can improve in this PR? Happy to address any feedback. Thanks!

julian-risch · 2026-02-10T16:24:28Z

@Mr-Neutr0n Thank you for the suggestion to add url validation to the LinkContentFetcher component and opening this pull request.
After internal discussions, we decided to keep the behavior of the LinkContentFetcher as is for now. We plan to extend the documentation to better explain the risks of passing user inputs to this component and how application developers can validate inputs prior to forwarding the inputs to the LinkContentFetcher. Haystack expects the application to handle any input validation/sanitization and detect any user-defined inputs with malicious intent before sending inputs to the framework.
As there are valid use cases where the LinkContentFetcher needs access to internal IP addresses, the suggested changes with the default setting of block_internal_urls=True would be a breaking change.

Mr-Neutr0n · 2026-02-10T17:04:32Z

Makes sense, thanks for taking the time to discuss it internally and getting back to me. I can see how defaulting to blocking internal URLs would be a breaking change for folks with valid internal network use cases.

Documenting the risks and recommending input validation at the application layer sounds like a reasonable approach. Closing this one out — cheers!

Mr-Neutr0n requested a review from a team as a code owner February 5, 2026 10:33

Mr-Neutr0n requested review from bogdankostic and removed request for a team February 5, 2026 10:33

github-actions bot added topic:tests type:documentation Improvements on the docs labels Feb 5, 2026

sjrl mentioned this pull request Feb 9, 2026

security: add SSRF protection to LinkContentFetcher with URL validation #10527

Closed

julian-risch requested review from julian-risch and removed request for bogdankostic February 9, 2026 09:06

julian-risch closed this Feb 10, 2026

julian-risch mentioned this pull request Feb 10, 2026

Extend documentation of LinkContentFetcher and explain risks of passing user-defined inputs #10513

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(security): Add SSRF protection to LinkContentFetcher#10514

fix(security): Add SSRF protection to LinkContentFetcher#10514
Mr-Neutr0n wants to merge 1 commit intodeepset-ai:mainfrom
Mr-Neutr0n:security/ssrf-protection-link-content-fetcher

Mr-Neutr0n commented Feb 5, 2026

Uh oh!

vercel bot commented Feb 5, 2026

Uh oh!

CLAassistant commented Feb 5, 2026 •

edited

Loading

Uh oh!

Mr-Neutr0n commented Feb 8, 2026

Uh oh!

julian-risch commented Feb 10, 2026

Uh oh!

Mr-Neutr0n commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Mr-Neutr0n commented Feb 5, 2026

Related Issue

Summary

Changes

New Features

Blocked URL Types

DNS Rebinding Protection

Usage

Test Plan

Breaking Changes

Security Impact

Uh oh!

vercel bot commented Feb 5, 2026

Uh oh!

CLAassistant commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Mr-Neutr0n commented Feb 8, 2026

Uh oh!

julian-risch commented Feb 10, 2026

Uh oh!

Mr-Neutr0n commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Feb 5, 2026 •

edited

Loading