[yaml] add support for matchall#38512
Conversation
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request adds the MatchAll transform to the Beam YAML API, allowing users to match file patterns from input PCollections. The changes include the core implementation in yaml_io.py, registration in standard_io.yaml, and new integration and unit tests. Review feedback suggests narrowing a broad try-except block to prevent masking configuration errors, converting the empty_match_treatment string to the required enum type, and handling potential null values for file timestamps to avoid TypeErrors.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #38512 +/- ##
============================================
+ Coverage 57.54% 57.58% +0.03%
Complexity 5329 5329
============================================
Files 1398 1400 +2
Lines 198769 198943 +174
Branches 4980 4980
============================================
+ Hits 114389 114555 +166
- Misses 80466 80474 +8
Partials 3914 3914
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces the 'MatchAll' transform to the Apache Beam YAML SDK. This addition enables users to dynamically match files based on patterns provided in the input PCollection, enhancing the file handling capabilities within YAML pipelines. The implementation includes robust schema validation and flexible configuration options for empty match scenarios. Highlights
New Features🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces the MatchAll transform to the Beam YAML IO collection, allowing users to match file patterns provided via an input PCollection. The implementation includes logic to automatically identify the file pattern field in schema-aware PCollections and provides configuration for empty match treatment. Comprehensive integration and unit tests have been added. The review feedback correctly identifies a missing mock import in the test file which would lead to a NameError during execution.
Fixes: #38013
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.