-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[2/3] sdks/python: sink data with Milvus Search I/O connector #36729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @mohamedawnallah, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new Milvus Search Sink I/O connector for the Apache Beam Python SDK. This connector enables users to efficiently write data, including vector embeddings and associated metadata, to Milvus collections. The changes also involve significant refactoring of existing Milvus-related code, centralizing common utilities and test helpers to improve maintainability and support future Milvus integrations. This enhancement completes Beam's support for Milvus, providing a robust solution for both reading from and writing to Milvus vector databases. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
9461a12 to
a7ec46f
Compare
a7ec46f to
3992585
Compare
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
|
Assigning reviewers: R: @shunping for label python. Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
7b2cfff to
46a03c8
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a Milvus sink I/O connector for the Python SDK, which is a great addition for users looking to integrate Apache Beam with Milvus for RAG pipelines. The implementation is well-structured, leveraging DoFn for batching and a context manager for resource management, which are good practices. The integration tests are comprehensive, covering various scenarios including error cases and idempotency, which ensures the connector's reliability.
I've provided a few suggestions to improve the code. One key point is to remove redundant client initialization logic that bypasses the retry mechanism. I've also pointed out a documentation mismatch and suggested simplifications in the test code for better clarity.
Overall, this is a solid contribution that significantly enhances Beam's capabilities for building RAG applications.
sdks/python/apache_beam/ml/rag/ingestion/milvus_search_it_test.py
Outdated
Show resolved
Hide resolved
ed290c9 to
d0b68de
Compare
|
All relevant CI tests are passing now |
damccorm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Description
Motivation and Context
Motivated by this comment #35708 (comment) for splitting the original PR into two one for refactoring and one for core integration. That way it would be relatively easier to review
Towards #35046.
Depends on #35708.
Next #35944.
Closes #36702.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.