-
Notifications
You must be signed in to change notification settings - Fork 0
tests: Remove external services requirement from elt-common e2e tests #209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It avoids any external services and vastly improves test times. Testing with the RestCatalog will be used for the pipelines.
Removes the need for any external infrastructure for the tests.
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughThe pull request consolidates elt-common testing workflows, adds dual Iceberg catalog type support (REST and SQL), refactors credential handling to accommodate both modes, and restructures test fixtures and helpers to support the new SQL catalog pathway alongside existing REST-based infrastructure. Changes
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In @elt-common/src/elt_common/iceberg/trino.py:
- Around line 23-24: The dataclass fields user and password were made optional
(str | None) but from_env still expects the env vars and will raise KeyError;
update the dataclass to give these fields explicit defaults (e.g., user: str |
None = None and password: str | None = None) or modify the from_env factory in
TrinoConnection (or similar) to use os.getenv/ dict.get with default=None for
the corresponding environment variables so absent vars do not raise.
In @elt-common/src/elt_common/testing/lakekeeper.py:
- Around line 20-22: The constructor (__init__) of the class calls
access_token(settings) which performs network I/O and lacks retry handling,
causing unhandled exceptions during Server instantiation; modify initialization
to avoid network calls: either add tenacity-based retry logic inside
access_token() or change the class to lazily obtain the token (store settings in
__init__ and call access_token(settings) on first use), and ensure behavior
mirrors existing retry-protected methods (purge_warehouse, delete_warehouse) so
transient network errors are retried rather than bubbling out of __init__.
- Around line 171-188: token_endpoint() and access_token() call
requests.get/post without timeouts causing potential indefinite hangs; update
the requests calls to include a timeout (use the same default as
_request_with_auth, e.g. timeout=10.0) so both token_endpoint(settings)
(requests.get) and access_token(settings) (requests.post) pass timeout=10.0,
keep response.raise_for_status()/response.json() unchanged, or alternatively
refactor to reuse the existing _request_with_auth helper that already defaults
timeout=10.0.
🧹 Nitpick comments (12)
elt-common/src/elt_common/iceberg/trino.py (1)
107-111: Fallback credentials use a hardcoded username.The fallback
BasicAuthentication("trino", "")assumes a default username of"trino"when credentials are missing. This is reasonable for local/test scenarios but worth documenting if production use is intended, as silent fallback to a generic user could mask configuration errors.elt-common/tests/unit_tests/iceberg/test_trino.py (1)
8-21: Test name is misleading and return value is not verified.The test name suggests it verifies that only Iceberg tables are returned, but it only asserts the SQL command executed. Consider either:
- Renaming to
test_trino_query_engine_list_tables_executes_correct_sql, or- Mocking
executeto return sample data and asserting the formatted output.Option 2: Verify the return value
def test_trino_query_engine_list_tables_returns_only_iceberg_tables( mocker: MockerFixture, ): mock_creds = mocker.MagicMock() mocker.patch.object(TrinoQueryEngine, "_create_engine", mocker.MagicMock()) trino = TrinoQueryEngine(mock_creds) - trino_execute_spy = mocker.spy(trino, "execute") + mock_rows = [("schema1", "table1"), ("schema2", "table2")] + mocker.patch.object(trino, "execute", return_value=mock_rows) - trino.list_iceberg_tables() + result = trino.list_iceberg_tables() - assert trino_execute_spy.call_count == 1 - expected_cmd_re = re.compile(r"^select \* from system.iceberg_tables$") - command_match = expected_cmd_re.match(trino_execute_spy.call_args[0][0]) - assert command_match is not None + trino.execute.assert_called_once() + assert "system.iceberg_tables" in trino.execute.call_args[0][0] + assert result == ["schema1.table1", "schema2.table2"]elt-common/tests/unit_tests/iceberg/maintenance/test_table_maintenance.py (1)
26-39: Consider simplifying the mock verification.Using
mocker.spy()on aMagicMockis redundant sinceMagicMockalready tracks all calls. You can verify directly viatrino_engine.execute.call_argswithout the spy wrapper.♻️ Suggested simplification
def test_iceberg_maintenance_commands_run_expected_trino_alter_table_command( mocker: MockerFixture, command: str, command_args: Dict[str, str], ): trino_engine = mocker.MagicMock() iceberg_maint = IcebergTableMaintenaceSql(trino_engine) table_id = "ns1.table1" - trino_execute_spy = mocker.spy(trino_engine, "execute") getattr(iceberg_maint, command)(table_id, **command_args) - assert trino_execute_spy.call_count == 1 + assert trino_engine.execute.call_count == 1 expected_cmd_re = re.compile(rf"^alter table {table_id} execute ({command})(.+)?$") - command_match = expected_cmd_re.match(trino_execute_spy.call_args[0][0]) + command_match = expected_cmd_re.match(trino_engine.execute.call_args[0][0]) assert command_match is not None assert command_match.group(1) == command if command_args: for key in command_args.keys(): assert key in command_match.group(2)elt-common/src/elt_common/dlt_destinations/pyiceberg/configuration.py (2)
53-65: Validation logic is sound; minor readability improvement possible.The authentication property validation correctly ensures all-or-none semantics. The counting expression could be simplified for readability.
♻️ Optional simplification
- non_null_count = sum(map(lambda x: 1 if x is not None else 0, auth_props.values())) + non_null_count = sum(1 for v in auth_props.values() if v is not None)
68-78: Consider adding validation for required SQL credentials.The
on_resolvedmethod is a no-op, buturiandwarehouseare essential for SQL catalog connectivity. Consider validating that these fields are non-empty to fail fast with a clear error message rather than during catalog connection.♻️ Suggested validation
def on_resolved(self) -> None: - pass + if not self.uri: + raise DestinationTerminalException("Missing required configuration value: 'uri' for SQL catalog.") + if not self.warehouse: + raise DestinationTerminalException("Missing required configuration value: 'warehouse' for SQL catalog.")elt-common/src/elt_common/testing/sqlcatalog.py (1)
8-20: Potential resource management issue withTemporaryDirectory.The
TemporaryDirectoryis stored as an instance attribute but not used as a context manager, which means its automatic cleanup won't trigger. The current design relies on external cleanup viashutil.rmtreeinfixtures.py. This works but creates tight coupling and potential for resource leaks if the fixture teardown doesn't execute.Consider either:
- Adding an explicit cleanup method to
SqlCatalogWarehouse- Documenting that callers must handle cleanup
♻️ Option 1: Add explicit cleanup method
class SqlCatalogWarehouse: def __init__(self, warehouse_name: str): self.name = warehouse_name self.workdir = TemporaryDirectory() self.uri = f"sqlite:///{self.workdir.name}/{warehouse_name}.db" self.warehouse_path = f"file://{self.workdir.name}/{warehouse_name}" + def cleanup(self) -> None: + """Clean up the temporary directory""" + self.workdir.cleanup() + def connect(self) -> PyIcebergCatalog: """Connect to the warehouse in the catalog""" creds = PyIcebergSqlCatalogCredentials() creds.uri = self.uri creds.warehouse = self.warehouse_path return load_catalog(name="default", **creds.as_dict())elt-common/src/elt_common/testing/fixtures.py (3)
35-39: Prefer usingTemporaryDirectory.cleanup()for consistency.Using
shutil.rmtree(warehouse.workdir.name)works, but sinceSqlCatalogWarehousestores aTemporaryDirectoryobject, calling itscleanup()method would be more consistent and handles edge cases (e.g., directory already removed).♻️ Suggested change
def cleanup_func(): - shutil.rmtree(warehouse.workdir.name) + warehouse.workdir.cleanup()
64-75: Exception handling may be too narrow.The cleanup logic only catches
RuntimeError, but other exception types could occur during warehouse purge/delete operations (e.g.,requests.RequestException,tenacity.RetryError). Consider catchingExceptionto ensure cleanup warnings are shown for any failure type.♻️ Suggested change
- except RuntimeError as exc: + except Exception as exc: warnings.warn( f"Error deleting test warehouse '{str(warehouse.project_id)}'. It may need to be removed manually." ) warnings.warn(f"Error:\n{str(exc)}")
83-89: Consider adding a type hint for clarity.The type annotation was removed because
warehousecan be eitherSqlCatalogWarehouseorRestCatalogWarehouse. Adding a Union type hint would improve IDE support and documentation.♻️ Suggested type hint
+from .lakekeeper import RestCatalogWarehouse + @pytest.fixture -def destination_config(warehouse): +def destination_config(warehouse: SqlCatalogWarehouse | RestCatalogWarehouse): destination_config = PyIcebergDestinationTestConfiguration(warehouse)elt-common/src/elt_common/testing/dlt.py (2)
52-59: Inconsistent use of local variables.Line 53 creates
serverandserver_settingslocals, but lines 54-58 continue to useself.warehouse.serverdirectly. Use the locals consistently for readability.♻️ Suggested fix
else: server, server_settings = self.warehouse.server, self.warehouse.server.settings - environ["DESTINATION__PYICEBERG__CREDENTIALS__URI"] = str( - self.warehouse.server.catalog_endpoint() - ) - environ["DESTINATION__PYICEBERG__CREDENTIALS__PROJECT_ID"] = str( - self.warehouse.server.settings.project_id - ) + environ["DESTINATION__PYICEBERG__CREDENTIALS__URI"] = str(server.catalog_endpoint()) + environ["DESTINATION__PYICEBERG__CREDENTIALS__PROJECT_ID"] = str( + server_settings.project_id + )
46-51: Consider settingCATALOG_TYPEexplicitly for REST path.The SQL branch sets
CATALOG_TYPE="sql", but the REST branch relies on the default. For explicitness and symmetry, consider adding:environ["DESTINATION__PYICEBERG__CATALOG_TYPE"] = "rest"Also applies to: 52-78
elt-common/src/elt_common/testing/lakekeeper.py (1)
24-26: Naming conflict and behavioural inconsistency.
Server.token_endpoint(property, line 24) constructs the token URL directly, whilsttoken_endpoint()(function, line 171) fetches it via OpenID Connect discovery. Both have the same name but different behaviours:
- Property:
settings.openid_provider_uri + "/protocol/openid-connect/token"(assumes Keycloak path)- Function: fetches from
.well-known/openid-configuration(spec-compliant discovery)Consider renaming the module-level function (e.g.,
discover_token_endpoint) or updating the property to use discovery for consistency.Also applies to: 171-174
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (21)
.github/actions/run-pytest-with-uv/action.yml.github/workflows/elt-common_e2e_tests.yml.github/workflows/elt-common_tests.yml.github/workflows/elt-common_unit_tests.yml.github/workflows/warehouses_e2e_tests.ymlelt-common/pytest.inielt-common/src/elt_common/dlt_destinations/pyiceberg/configuration.pyelt-common/src/elt_common/dlt_destinations/pyiceberg/helpers.pyelt-common/src/elt_common/dlt_destinations/pyiceberg/pyiceberg.pyelt-common/src/elt_common/iceberg/trino.pyelt-common/src/elt_common/testing/__init__.pyelt-common/src/elt_common/testing/dlt.pyelt-common/src/elt_common/testing/fixtures.pyelt-common/src/elt_common/testing/lakekeeper.pyelt-common/src/elt_common/testing/sqlcatalog.pyelt-common/tests/e2e_tests/elt_common/iceberg/conftest.pyelt-common/tests/unit_tests/dlt_destinations/pyiceberg/test_helpers.pyelt-common/tests/unit_tests/iceberg/__init__.pyelt-common/tests/unit_tests/iceberg/maintenance/__init__.pyelt-common/tests/unit_tests/iceberg/maintenance/test_table_maintenance.pyelt-common/tests/unit_tests/iceberg/test_trino.py
💤 Files with no reviewable changes (5)
- .github/workflows/elt-common_e2e_tests.yml
- .github/workflows/elt-common_unit_tests.yml
- elt-common/tests/e2e_tests/elt_common/iceberg/conftest.py
- .github/actions/run-pytest-with-uv/action.yml
- elt-common/src/elt_common/dlt_destinations/pyiceberg/helpers.py
🧰 Additional context used
🧬 Code graph analysis (5)
elt-common/tests/unit_tests/iceberg/test_trino.py (1)
elt-common/src/elt_common/iceberg/trino.py (2)
TrinoQueryEngine(45-119)list_iceberg_tables(83-91)
elt-common/src/elt_common/testing/fixtures.py (3)
elt-common/src/elt_common/testing/__init__.py (2)
Settings(35-101)storage_config(80-101)elt-common/src/elt_common/testing/lakekeeper.py (4)
Server(17-94)create_warehouse(44-74)purge_warehouse(77-79)delete_warehouse(82-87)elt-common/src/elt_common/testing/sqlcatalog.py (1)
SqlCatalogWarehouse(8-20)
elt-common/src/elt_common/testing/dlt.py (2)
elt-common/src/elt_common/testing/sqlcatalog.py (1)
SqlCatalogWarehouse(8-20)elt-common/src/elt_common/testing/lakekeeper.py (4)
RestCatalogWarehouse(98-168)catalog_endpoint(28-33)token_endpoint(25-26)token_endpoint(171-174)
elt-common/tests/unit_tests/iceberg/maintenance/test_table_maintenance.py (1)
elt-common/src/elt_common/iceberg/maintenance/__init__.py (1)
IcebergTableMaintenaceSql(14-48)
elt-common/src/elt_common/dlt_destinations/pyiceberg/pyiceberg.py (2)
elt-common/src/elt_common/dlt_destinations/pyiceberg/factory.py (1)
pyiceberg(15-58)elt-common/src/elt_common/dlt_destinations/pyiceberg/configuration.py (1)
connection_properties(104-106)
🔇 Additional comments (12)
.github/workflows/warehouses_e2e_tests.yml (1)
1-1: LGTM!The workflow and job name changes are sensible simplifications that improve readability without affecting functionality.
Also applies to: 27-27
elt-common/pytest.ini (1)
3-4: LGTM!The
requires_trinomarker is well-documented and follows pytest conventions for conditionally deselecting tests that depend on external infrastructure..github/workflows/elt-common_tests.yml (2)
44-47: LGTM!The test execution step correctly uses
uv run pytestsince the environment is not activated, ensuring the virtual environment context is properly managed.
26-26: No action required.ubuntu-slimis a valid GitHub-hosted runner label designed for lightweight jobs and is appropriate for test workflows.Likely an incorrect or invalid review comment.
elt-common/tests/unit_tests/dlt_destinations/pyiceberg/test_helpers.py (1)
14-14: LGTM!The import change correctly reflects the API refactor where
RestCatalogis now sourced directly frompyiceberg.catalog.restrather than being re-exported through the helpers module. The mock-based tests continue to function correctly since they reference the same underlying class.elt-common/src/elt_common/dlt_destinations/pyiceberg/configuration.py (1)
88-109: LGTM!The
resolve_typedecorator pattern cleanly supports dynamic credential type resolution based oncatalog_type. The delegation ofon_resolvedto the credentials ensures validation runs for both REST and SQL configurations.elt-common/src/elt_common/testing/__init__.py (1)
36-44: This is a new file, not a modification—there is no breaking change.The
Settingsclass withenv_prefix="elt_common_testing_"is being introduced for the first time. There was no previous"tests_"prefix to migrate from. The git history shows this file was created in commit 97921a2, not modified from an earlier state.No existing code in the repository currently references these environment variables, so there are no compatibility concerns for internal usage.
Likely an incorrect or invalid review comment.
elt-common/src/elt_common/dlt_destinations/pyiceberg/pyiceberg.py (1)
27-27: LGTM – The switch fromcreate_catalogtoload_catalogcorrectly uses PyIceberg's recommended pattern. The credentials'as_dict()method properly includes the"type"key (either"rest"or"sql") required byload_catalogto determine the correct catalog implementation. The approach of passingname="default"alongside**config.connection_propertiesallows the configuration to drive catalog selection whilst maintaining the expected defaults.elt-common/src/elt_common/testing/dlt.py (2)
17-18: LGTM!The imports correctly bring in both warehouse types to support the new dual catalog configuration.
38-38: LGTM!The union type annotation clearly indicates support for both catalog types.
elt-common/src/elt_common/testing/lakekeeper.py (2)
97-102: LGTM!The rename from
WarehousetoRestCatalogWarehouseclearly distinguishes it fromSqlCatalogWarehouseand improves code clarity.
104-114: LGTM!The
connect()method correctly usesPyIcebergRestCatalogCredentialsand the standardload_catalogAPI.
Summary
Vastly simplifies the running of the test suite for the
elt-commonpackage. We use a local sqlite catalog through pyiceberg to avoid the need for running an external REST catalog.The RestCatalog configuration is kept for running tests against the ingestion scripts.
Summary by CodeRabbit
Release Notes
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.