Conversation
The previous implementation used str.replace() which had two bugs:
1. Partial matching: :id would incorrectly replace inside :product_id
2. No quote escaping: values like O'Brien produced invalid SQL
The new token-based approach:
- Splits SQL on :param patterns, then rebuilds with substituted values
- Prevents partial matching by design (tokens are complete identifiers)
- Escapes single quotes using SQL standard (' -> '')
- 2-7x faster than regex alternative, scales better with more parameters
Added unit tests covering partial matching, quote escaping,
and edge cases (unicode, backslashes, empty strings, bytes).dit tool working with tests
There was a problem hiding this comment.
Pull request overview
This pull request introduces two major features to the redisvl library: SQL query support and multi-prefix index capability. The PR bumps the version from 0.13.2 to 0.14.0, reflecting the addition of significant new functionality.
Changes:
- Adds SQLQuery class to translate SQL SELECT statements into Redis FT.SEARCH/FT.AGGREGATE commands via the sql-redis library
- Implements multi-prefix index support allowing a single index to monitor multiple key prefixes
- Adds comprehensive test coverage for both new features
Reviewed changes
Copilot reviewed 15 out of 17 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| pyproject.toml | Version bump to 0.14.0, adds sql-redis optional dependency |
| uv.lock | Updates lock file with sql-redis and sqlglot dependencies |
| redisvl/query/sql.py | New SQLQuery class with parameter substitution and Redis command translation |
| redisvl/query/init.py | Exports SQLQuery from the query module |
| redisvl/index/index.py | Adds multi-prefix support and _sql_query method for executing SQL queries |
| tests/unit/test_sql_parameter_substitution.py | Unit tests for SQL parameter substitution edge cases |
| tests/integration/test_sql_redis_json.py | Integration tests for SQL queries against JSON storage |
| tests/integration/test_sql_redis_hash.py | Integration tests for SQL queries against Hash storage |
| tests/integration/test_multi_prefix.py | Integration tests verifying multi-prefix index functionality |
| examples/multi_prefix_example.py | Example demonstrating multi-prefix index usage |
| docs/api/query.rst | API documentation for SQLQuery |
| docs/user_guide/index.md | Adds reference to new SQL query guide |
| docs/user_guide/01_getting_started.ipynb | Updates prefix configuration example |
| docs/user_guide/02_hybrid_queries.ipynb | Re-executed notebook with updated outputs |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| Note: | ||
| Requires the optional `sql-redis` package. Install with: | ||
| ``pip install redisvl[sql]`` |
There was a problem hiding this comment.
The docstring in line 30 refers to pip install redisvl[sql], but the extra name in pyproject.toml is sql-redis. This inconsistency will confuse users. Update the docstring to use pip install redisvl[sql-redis].
| ``pip install redisvl[sql]`` | |
| ``pip install redisvl[sql-redis]`` |
| except ImportError: | ||
| raise ImportError( | ||
| "sql-redis is required for SQL query support. " | ||
| "Install it with: pip install redisvl[sql]" |
There was a problem hiding this comment.
The error message instructs users to install with pip install redisvl[sql], but the actual extra name defined in pyproject.toml is sql-redis. This inconsistency will cause installation failures. The error message should be updated to pip install redisvl[sql-redis] to match the extra name defined in pyproject.toml line 54.
| "Install it with: pip install redisvl[sql]" | |
| "Install it with: pip install redisvl[sql-redis]" |
| def _substitute_params(self, sql: str, params: Dict[str, Any]) -> str: | ||
| """Substitute parameter placeholders in SQL with actual values. | ||
|
|
||
| Uses token-based approach: splits SQL on :param patterns, then rebuilds | ||
| with substituted values. This prevents partial matching (e.g., :id | ||
| won't match inside :product_id) and is faster than regex at scale. | ||
|
|
||
| Args: | ||
| sql: The SQL string with :param placeholders. | ||
| params: Dictionary mapping parameter names to values. | ||
|
|
||
| Returns: | ||
| SQL string with parameters substituted. | ||
|
|
||
| Note: | ||
| - String values are wrapped in single quotes with proper escaping | ||
| - Numeric values are converted to strings | ||
| - Bytes values (e.g., vectors) are NOT substituted here | ||
| """ | ||
| if not params: | ||
| return sql | ||
|
|
||
| # Split SQL on :param patterns, keeping the delimiters | ||
| # Pattern matches : followed by valid identifier (letter/underscore, then alphanumeric/underscore) | ||
| tokens = re.split(r"(:[a-zA-Z_][a-zA-Z0-9_]*)", sql) | ||
|
|
||
| result = [] | ||
| for token in tokens: | ||
| if token.startswith(":"): | ||
| key = token[1:] # Remove leading : | ||
| if key in params: | ||
| value = params[key] | ||
| if isinstance(value, (int, float)): | ||
| result.append(str(value)) | ||
| elif isinstance(value, str): | ||
| # Escape single quotes using SQL standard: ' -> '' | ||
| escaped = value.replace("'", "''") | ||
| result.append(f"'{escaped}'") | ||
| else: | ||
| # Keep placeholder for bytes (vectors handled by Executor) | ||
| result.append(token) | ||
| else: | ||
| # Keep unmatched placeholders as-is | ||
| result.append(token) | ||
| else: | ||
| result.append(token) | ||
|
|
||
| return "".join(result) |
There was a problem hiding this comment.
The parameter substitution function only escapes single quotes for string values. While this follows SQL standard escaping, the implementation doesn't prevent SQL injection attacks through other means (e.g., backslashes, special characters). Consider whether the sql-redis library performs additional sanitization on the SQL string after substitution. If not, this could be a security vulnerability. Additionally, document clearly what sanitization is expected from downstream components versus what is done here.
| " \"index\": {\n", | ||
| " \"name\": \"user_simple\",\n", | ||
| " \"prefix\": \"user_simple_docs\",\n", | ||
| " \"prefix\": [\"prefix_1\", \"prefix_2\"],\n", |
There was a problem hiding this comment.
The getting started notebook has been updated to use a multi-prefix configuration ["prefix_1", "prefix_2"] instead of the single prefix "user_simple_docs". However, this change appears to be unintentional for a getting started guide, as:
- The table output still shows
['user_simple_docs']as the prefix (line 296), suggesting the notebook output wasn't re-run after the code change - Multi-prefix configuration is an advanced topic that may confuse beginners in a "getting started" guide
- There's no explanation or context for why multiple prefixes are being used
This should either be reverted to the simple single-prefix example, or the notebook should be re-run and additional explanation added.
| """Simulate the CURRENT buggy implementation for comparison. | ||
|
|
||
| This is the exact code from redisvl/query/sql.py lines 105-113. |
There was a problem hiding this comment.
The docstring comment on lines 15-18 states "This is the exact code from redisvl/query/sql.py lines 105-113", but those line numbers don't correspond to any code in the actual redisvl/query/sql.py file (which only has 159 lines total). This comment appears to reference a buggy implementation that may have existed during development but doesn't match the current code. The comment should either be removed or updated to clarify that this is a demonstration of what NOT to do, without referencing non-existent line numbers.
| """Simulate the CURRENT buggy implementation for comparison. | |
| This is the exact code from redisvl/query/sql.py lines 105-113. | |
| """Simulate a naive, intentionally buggy implementation for comparison. | |
| This mirrors the earlier str.replace-based approach used in SQLQuery and | |
| is kept only to demonstrate incorrect behavior; it is not the production | |
| implementation. |
| import pytest | ||
|
|
||
| from redisvl.query.sql import SQLQuery | ||
|
|
||
|
|
There was a problem hiding this comment.
Import of 'pytest' is not used.
| import pytest | |
| from redisvl.query.sql import SQLQuery | |
| from redisvl.query.sql import SQLQuery |
| # Clean up any existing index | ||
| try: | ||
| client.ft(INDEX_NAME).dropindex(delete_documents=True) | ||
| except Exception: |
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
| except Exception: | |
| except Exception: | |
| # The index may not exist yet or may have been removed already; ignore drop errors. |
| # Clean up any existing index | ||
| try: | ||
| client.ft(index_name).dropindex(delete_documents=True) | ||
| except Exception: |
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
| except Exception: | |
| except Exception: | |
| # The index may not exist yet or may have already been removed; ignore | |
| # cleanup errors so that test setup can proceed. |
| # Cleanup | ||
| try: | ||
| index.delete(drop=True) | ||
| except Exception: |
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
| except Exception: | |
| except Exception: | |
| # Best-effort cleanup: ignore errors if the index is already deleted | |
| # or the Redis connection is unavailable during test teardown. |
| finally: | ||
| try: | ||
| index.delete(drop=True) | ||
| except Exception: |
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
| except Exception: | |
| except Exception: | |
| # Best-effort cleanup: ignore errors when deleting the index so tests | |
| # do not fail due to teardown issues. |
|
@rbs333 Looks like this has leaked commits from the SQL work. Can you pare it back to the multi-prefix additions? |
vishal-bala
left a comment
There was a problem hiding this comment.
No major comments from me aside a question about whether building the registry/executor for SQL translation is expensive. There are a few Copilot comments that I agreed should be addressed (left a 👍 on them) - aside from those, it generally looks good! Super nice to be able to have SQL queries translated 🙌
| registry = SchemaRegistry(self._redis_client) | ||
| registry.load_all() # Loads index schemas from Redis | ||
|
|
||
| executor = Executor(self._redis_client, registry) |
There was a problem hiding this comment.
How involved are these to be run on every query? If they're non-negligible, do we want to cache/pre-load/lazy-load them?
No description provided.