-
Notifications
You must be signed in to change notification settings - Fork 40
fix: always decompress gzip when GzipDecoder is explicitly selected #909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2642,7 +2642,7 @@ def create_gzip_decoder( | |
| return CompositeRawDecoder.by_headers( | ||
| [({"Content-Encoding", "Content-Type"}, _compressed_response_types, gzip_parser)], | ||
| stream_response=True, | ||
| fallback_parser=gzip_parser.inner_parser, | ||
| fallback_parser=gzip_parser, | ||
| ) | ||
|
Comment on lines
2642
to
2646
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: # Find GzipParser class and its parse method
rg -n 'class GzipParser' -A 30 --type pyRepository: airbytehq/airbyte-python-cdk Length of output: 3227 🏁 Script executed: # Find CompositeRawDecoder class and by_headers method
rg -n 'class CompositeRawDecoder' -A 50 --type pyRepository: airbytehq/airbyte-python-cdk Length of output: 5560 🏁 Script executed: # Check the by_headers implementation more thoroughly
rg -n 'def by_headers' -B 5 -A 40 --type pyRepository: airbytehq/airbyte-python-cdk Length of output: 5320 🏁 Script executed: cat -n airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py | sed -n '34,60p'Repository: airbytehq/airbyte-python-cdk Length of output: 1183 Fix looks correct — but found a pre-existing issue in The change properly ensures that when One thing flagged during verification:
This is pre-existing, but your change makes it more likely to surface: with Could you either:
Pagination support is fine: 🤖 Prompt for AI Agents
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch on the docstring-implementation mismatch in However, I'm keeping this PR scoped to the minimal fix (changing the fallback parser) rather than expanding into
I'd suggest either fixing the |
||
|
|
||
| @staticmethod | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 Fallback parser change causes BadGzipFile crash on non-gzipped responses
When a connector configures
GzipDecoderand an API returns a non-gzipped response without matching compression headers (e.g., error responses, certain endpoints, or APIs that conditionally compress), the fallback parser now attempts gzip decompression, which raisesgzip.BadGzipFile.Root Cause
The fallback parser was changed from
gzip_parser.inner_parser(the raw JSON/CSV/etc. parser) togzip_parser(which wraps data ingzip.GzipFile). The_select_parsermethod atairbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py:201-221returns the fallback parser when no response headers match the compressed types set.With the old code, non-gzip responses falling back would be parsed directly (e.g., as JSON). With the new code, they are passed through
GzipParser.parse()at line 45:This raises
gzip.BadGzipFile: Not a gzipped filebecauseGzipParserhas no error handling for non-gzipped data (despite its docstring claiming it handles this case — see lines 38-39 ofcomposite_raw_decoder.py).Impact: Any connector using
GzipDecoderthat receives even a single non-gzipped response without matchingContent-Encoding/Content-Typeheaders will crash. This includes error responses from APIs, which typically aren't compressed. The previous behavior gracefully handled mixed gzip/non-gzip responses.Prompt for agents
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same concern was raised by CodeRabbit and addressed in my reply there. TL;DR:
GzipParseris correct and will decompress successfully.GzipParserdocstring/implementation mismatch (noBadGzipFilehandling) is a pre-existing issue that would be better addressed in a follow-up to avoid expanding the scope of this minimal fix.GzipParsergraceful-fallback fix here or keep it separate.Devin session