-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[Fixes #13641] Multilang: index - fix #13786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @etj, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a critical bug affecting the index search functionality, specifically when dealing with non-Latin characters. By refining the search string sanitization logic to properly recognize and process a broader spectrum of Unicode characters, the change significantly improves the system's ability to perform accurate and comprehensive searches across various languages. This ensures that users can effectively search for content regardless of the character set used in their queries. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses an issue with searching using non-latin characters by improving the input sanitization regex. The change is a good step forward, correctly using \w with the re.UNICODE flag to support a wider range of characters. My review includes a suggestion to further enhance the robustness of the search functionality by handling whitespace-only or padded search strings.
| value = params[1] | ||
| # sanitize search string | ||
| value = re.sub(r"[^0-9A-Za-z/_\.-]+", "", value) | ||
| value = re.sub(r"[^\w\s./\-]+", "", value, flags=re.UNICODE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After sanitizing the search string, it's possible for it to contain only whitespace, or have leading/trailing whitespace. This could lead to queries like ' ':* which are valid but likely not what the user intends. It's good practice to strip() the value to handle these cases gracefully. An all-whitespace search would then correctly fall back to '*:*', which is consistent with an empty search. You can chain .strip() to the re.sub() call for a concise solution.
| value = re.sub(r"[^\w\s./\-]+", "", value, flags=re.UNICODE) | |
| value = re.sub(r"[^\w\s./\-]+", "", value, flags=re.UNICODE).strip() |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #13786 +/- ##
=======================================
Coverage 74.13% 74.13%
=======================================
Files 944 944
Lines 56291 56307 +16
Branches 7617 7624 +7
=======================================
+ Hits 41730 41742 +12
Misses 12897 12897
- Partials 1664 1668 +4 🚀 New features to boost your workflow:
|
Fixes index search with non-latin chars
Checklist
For all pull requests:
The following are required only for core and extension modules (they are welcomed, but not required, for contrib modules):
Submitting the PR does not require you to check all items, but by the time it gets merged, they should be either satisfied or inapplicable.