fix(decoder): auto-detect gzip magic bytes for responses without Content-Encoding header by Airbyte-Support · Pull Request #914 · airbytehq/airbyte-python-cdk

Airbyte Support (Airbyte-Support) · 2026-02-20T05:52:16Z

fix(decoder): auto-detect gzip magic bytes for responses without Content-Encoding header

Summary

Some APIs (notably Apple App Store Connect /v1/salesReports) return gzip-compressed response bodies without setting the Content-Encoding: gzip header. The existing GzipParser unconditionally assumed gzip input, and the create_gzip_decoder factory used the inner parser (e.g. CsvParser) as the fallback when headers didn't match — meaning gzip data without the header was never decompressed, producing 'utf-8' codec can't decode byte 0x8b errors.

Changes:

GzipParser.parse() — now reads the first 2 bytes and checks for gzip magic bytes (\x1f\x8b). If present, decompresses; otherwise passes data through to the inner parser unchanged.
create_gzip_decoder() — uses gzip_parser (with auto-detection) instead of gzip_parser.inner_parser as both the default parser in builder mode and the fallback in production mode.

Updates since last revision

Added a parametrized test (test_gzip_parser_auto_detection) with 6 cases covering: gzip CSV/JSONL without Content-Encoding header, non-gzip passthrough, by_headers fallback path, non-streamed mode, and empty data.
All 39 decoder tests pass locally (33 existing + 6 new).

Review & Testing Checklist for Human

Memory regression for large streaming responses: GzipParser.parse() now calls data.read() to buffer the entire response into a BytesIO. The old code streamed through gzip.GzipFile(fileobj=data) directly. For very large responses in production mode (stream_response=True), this could significantly increase memory usage. Consider whether a streaming-friendly approach (e.g., a prefixed stream wrapper) is needed.
Double decompression edge case in builder mode: When Content-Encoding: gzip IS present and stream_response=False, response.content is already decompressed by the requests library. GzipParser then receives the decompressed bytes — the magic-byte check should correctly identify this as non-gzip and pass through. However, if decompressed content happens to start with \x1f\x8b bytes, it would be incorrectly re-decompressed. Assess whether this is a realistic risk for your API consumers.
Recommended manual test plan: Build a connector against the Apple App Store Connect /v1/salesReports endpoint (or mock a server that returns gzip bytes without Content-Encoding) and confirm the response is correctly decompressed and parsed as TSV/CSV.

Notes

Requested by: Airbyte Support (@Airbyte-Support)
Link to Devin run: https://app.devin.ai/sessions/9d951e79f7ca4da29fcd00d6c3c5e39e
This is a draft PR — awaiting human review before marking ready.

…ent-Encoding header Co-Authored-By: syed.khadeer@airbyte.io <cloud-support@airbyte.io>

devin-ai-integration · 2026-02-20T05:52:21Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

github-actions · 2026-02-20T05:52:30Z

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1771566625-gzip-auto-detect-magic-bytes#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1771566625-gzip-auto-detect-magic-bytes

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

/autofix - Fixes most formatting and linting issues
/poetry-lock - Updates poetry.lock file
/test - Runs connector tests with the updated CDK
/prerelease - Triggers a prerelease publish with default arguments
/poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
/poe <command> - Runs any poe command in the CDK environment

📚 Show Repo Guidance

Helpful Resources

CDK API Reference

📝 Edit this welcome message.

…Encoding header Co-Authored-By: syed.khadeer@airbyte.io <cloud-support@airbyte.io>

github-actions · 2026-02-20T06:01:09Z

PyTest Results (Fast)

3 875 tests +6 3 863 ✅ +6 6m 15s ⏱️ -22s
1 suites ±0 12 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 670e1e3. ± Comparison against base commit cd7e369.

github-actions · 2026-02-20T06:05:53Z

PyTest Results (Full)

3 878 tests 3 866 ✅ 10m 57s ⏱️
1 suites 12 💤
1 files 0 ❌

Results for commit 670e1e3.

fix(decoder): auto-detect gzip magic bytes for responses without Cont…

e622087

…ent-Encoding header Co-Authored-By: syed.khadeer@airbyte.io <cloud-support@airbyte.io>

devin-ai-integration bot assigned Airbyte Support (Airbyte-Support) Feb 20, 2026

test: add parametrized tests for gzip auto-detection without Content-…

670e1e3

…Encoding header Co-Authored-By: syed.khadeer@airbyte.io <cloud-support@airbyte.io>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(decoder): auto-detect gzip magic bytes for responses without Content-Encoding header#914

fix(decoder): auto-detect gzip magic bytes for responses without Content-Encoding header#914
Airbyte Support (Airbyte-Support) wants to merge 2 commits intomainfrom
devin/1771566625-gzip-auto-detect-magic-bytes

Airbyte Support (Airbyte-Support) commented Feb 20, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

devin-ai-integration bot commented Feb 20, 2026

Uh oh!

github-actions bot commented Feb 20, 2026

Testing This CDK Version

PR Slash Commands

Helpful Resources

Uh oh!

github-actions bot commented Feb 20, 2026

Uh oh!

github-actions bot commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Airbyte Support (Airbyte-Support) commented Feb 20, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

fix(decoder): auto-detect gzip magic bytes for responses without Content-Encoding header

Summary

Updates since last revision

Review & Testing Checklist for Human

Notes

Uh oh!

devin-ai-integration bot commented Feb 20, 2026

🤖 Devin AI Engineer

Uh oh!

github-actions bot commented Feb 20, 2026

👋 Greetings, Airbyte Team Member!

Testing This CDK Version

PR Slash Commands

Helpful Resources

Uh oh!

github-actions bot commented Feb 20, 2026

PyTest Results (Fast)

Uh oh!

github-actions bot commented Feb 20, 2026

PyTest Results (Full)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Airbyte Support (Airbyte-Support) commented Feb 20, 2026 •

edited by devin-ai-integration bot

Loading