Add browser-compatible headers to TempFileAPI URL downloads#34721
Add browser-compatible headers to TempFileAPI URL downloads#34721
Conversation
…age downloads Co-authored-by: fmontes <751424+fmontes@users.noreply.github.com>
Co-authored-by: fmontes <751424+fmontes@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR updates TempFileAPI to send browser-like HTTP headers when validating and downloading remote URLs, improving compatibility with CDNs/WAF-protected image hosts that reject “bare” requests.
Changes:
- Added a shared
BROWSER_HEADERSheader map (with partial configurability viaConfig). - Applied these headers to
CircuitBreakerUrlrequests increateTempFileFromUrl()andvalidUrl().
| static final Map<String, String> BROWSER_HEADERS = ImmutableMap.<String, String>builder() | ||
| .put("User-Agent", Config.getStringProperty("TEMP_FILE_URL_USER_AGENT", | ||
| "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36")) | ||
| .put("Accept", Config.getStringProperty("TEMP_FILE_URL_ACCEPT", | ||
| "image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8")) | ||
| .put("Accept-Language", Config.getStringProperty("TEMP_FILE_URL_ACCEPT_LANGUAGE", | ||
| "en-US,en;q=0.9")) | ||
| .put("Accept-Encoding", Config.getStringProperty("TEMP_FILE_URL_ACCEPT_ENCODING", | ||
| "gzip, deflate, br")) | ||
| .put("Connection", "keep-alive") | ||
| .put("Sec-Fetch-Dest", "image") | ||
| .put("Sec-Fetch-Mode", "no-cors") | ||
| .put("Sec-Fetch-Site", "cross-site") | ||
| .build(); |
There was a problem hiding this comment.
BROWSER_HEADERS is initialized once at class-load time, so the values read from Config.getStringProperty(...) won’t reflect later runtime config changes (system table updates / hot-reload). If these TEMP_FILE_URL_* settings are expected to be tunable without a restart, consider building the headers map on demand (e.g., a getBrowserHeaders() method) so each call re-reads Config.
| final CircuitBreakerUrl urlGetter = | ||
| CircuitBreakerUrl.builder().setMethod(Method.GET).setUrl(finalUrl) | ||
| .setHeaders(BROWSER_HEADERS) | ||
| .setTimeout(timeoutSeconds * 1000L).build(); |
There was a problem hiding this comment.
The new behavior (sending browser-like headers during URL validation/download) isn’t covered by tests here. Given there are existing integration tests for TempFile URL upload, consider adding a WireMock-based test that asserts the expected headers are present on the outbound request (and that the response body is written correctly).
| .put("Accept-Language", Config.getStringProperty("TEMP_FILE_URL_ACCEPT_LANGUAGE", | ||
| "en-US,en;q=0.9")) | ||
| .put("Accept-Encoding", Config.getStringProperty("TEMP_FILE_URL_ACCEPT_ENCODING", | ||
| "gzip, deflate, br")) |
There was a problem hiding this comment.
Accept-Encoding default includes br, but CircuitBreakerUrl uses Apache HttpClient without any Brotli decoding support (no brotli dependency/interceptor in CircuitBreakerUrl.doOut). If a server responds with Content-Encoding: br (common for SVG/text), the downloaded temp file will contain compressed bytes rather than the original content. Consider removing br from the default (e.g., gzip, deflate), omitting Accept-Encoding entirely, or adding explicit Brotli decoding support before advertising it.
| "gzip, deflate, br")) | |
| "gzip, deflate")) |
… br, add wire-level header test Co-authored-by: fmontes <751424+fmontes@users.noreply.github.com>
|
@fmontes — I had a gut feeling that this implementation could be problematic in the way it tries to spoof a browser identity, so I ran it by Claude with the full PR context. Here's what came back: Concerns with the Browser-Spoofing Approach1. Deceptive User-Agent — Legal/ToS Risk (Most Serious) The hardcoded Chrome UA string ( 2. The
These headers add no practical benefit and increase detection risk with sophisticated WAFs. 3. Image-Only
4. Non-Configurable Headers Can't Be Disabled
Suggested Minimal Fix
Drop |
|
@spbolton all this is cool, but it's missing the important part... what options do I have? 😂 |
…configurable and disableable Co-authored-by: fmontes <751424+fmontes@users.noreply.github.com>
Many CDNs and Cloudflare-protected hosts reject bare server-side HTTP requests with 403/406.
TempFileAPI.createTempFileFromUrl()andvalidUrl()made such bare requests with no headers.Proposed Changes
getBrowserHeaders()method — builds header map on each call soConfigoverrides are effective at runtime without restartConfig; setting a key to blank disables that header (operators can suppress any header without code change):TEMP_FILE_URL_USER_AGENT→ Chrome-compatible User-AgentTEMP_FILE_URL_ACCEPT→*/*(generic — endpoint downloads any file type, not just images)TEMP_FILE_URL_ACCEPT_LANGUAGE→en-US,en;q=0.9TEMP_FILE_URL_ACCEPT_ENCODING→gzip, deflate(nobr— Apache HttpClient has no Brotli decoder)TEMP_FILE_URL_CONNECTION→keep-aliveSec-Fetch-Dest/Mode/Siteintentionally omitted — these are browser-generated Fetch Metadata headers for server→browser trust; sending them from a server-side client is misleading and can cause rejectionscreateTempFileFromUrl()andvalidUrl()via.setHeaders(getBrowserHeaders())Checklist
Additional Info
testBrowserHeaders_containsAllRequiredKeys— verifies all 5 headers presenttestBrowserHeaders_defaultValues— verifiesAcceptis*/*,Connectioniskeep-alive,Accept-Encodinghas nobr, andSec-Fetch-*headers are absenttestValidUrl_sendsBrowserHeaders—MockHttpServerwire test confirming every header ingetBrowserHeaders()actually arrives at the remote serverScreenshots
Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.