Skip to content

refactor(parquet-datasource): split opener.rs into an opener/ module#22346

Merged
adriangb merged 2 commits into
apache:mainfrom
adriangb:refactor/split-parquet-opener
May 19, 2026
Merged

refactor(parquet-datasource): split opener.rs into an opener/ module#22346
adriangb merged 2 commits into
apache:mainfrom
adriangb:refactor/split-parquet-opener

Conversation

@adriangb
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Relates to the discussion in #22024 about the Parquet datasource crate becoming hard to navigate. Split out of #22156, which bundled several code-motion moves into one PR — this is one of three smaller, independently-reviewable PRs that replace it.

Rationale for this change

opener.rs had grown to ~2,700 LOC, bundling several distinct responsibilities into one file. That makes it hard to read and hard to review changes in isolation. This PR is pure code motion: no behavior change and no public API change.

What changes are included in this PR?

Splits opener.rs into an opener/ directory module:

  • opener/early_stop.rsEarlyStoppingStream, the dynamic-filter early-termination wrapper applied at the end of build_stream.
  • opener/encryption.rsEncryptionContext and the ParquetMorselizer::get_encryption_context helpers, isolating the #[cfg(feature = "parquet_encryption")] gating that previously bled through the main file.

opener.rs becomes opener/mod.rs.

Note: #22156 originally also extracted an opener/push_decoder_stream.rs. That move is now obsolete — #22289 has since extracted PushDecoderStreamState into push_decoder.rs — so it is dropped here.

Are these changes tested?

Yes, covered by existing tests. cargo test -p datafusion-datasource-parquet --all-features (122 passing) and cargo clippy -p datafusion-datasource-parquet --all-targets --all-features -- -D warnings both pass.

Are there any user-facing changes?

No. opener was already a private module; this only reorganizes files inside the crate.

🤖 Generated with Claude Code

Pure code motion, no behavior change and no public API change
(`opener` was already a private module). Splits the ~2,700 LOC
`opener.rs` into a directory module:

- `opener/early_stop.rs` — `EarlyStoppingStream`, the dynamic-filter
  early-termination wrapper applied at the end of `build_stream`.
- `opener/encryption.rs` — `EncryptionContext` and the
  `ParquetMorselizer::get_encryption_context` helpers, isolating the
  `#[cfg(feature = "parquet_encryption")]` gating that previously bled
  through the main file.

`opener.rs` becomes `opener/mod.rs`.

Split out of apache#22156, which originally also extracted an
`opener/push_decoder_stream.rs`; that move is now obsolete since apache#22289
already extracted `PushDecoderStreamState` into `push_decoder.rs`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@adriangb
Copy link
Copy Markdown
Contributor Author

@xudong963 wonder if you could review this refactor?

use std::task::{Context, Poll};

use arrow::datatypes::{SchemaRef, TimeUnit};
use datafusion_common::encryption::FileDecryptionProperties;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this refactor, with default features, this import becomes unused

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — both uses in this file are #[cfg(feature = "parquet_encryption")], so I've gated the import the same way in 5bb94e0. Verified both default-features and --features parquet_encryption build cleanly.

Both uses in opener/mod.rs are cfg-gated on the feature; with default
features the import was unused after the module split.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@adriangb adriangb enabled auto-merge May 19, 2026 03:45
@adriangb adriangb added this pull request to the merge queue May 19, 2026
Merged via the queue into apache:main with commit b4739e5 May 19, 2026
35 checks passed
@adriangb adriangb deleted the refactor/split-parquet-opener branch May 19, 2026 04:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

datasource Changes to the datasource crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants