[ENH] V1 -> V2 Migration - Flows (module)#1609
[ENH] V1 -> V2 Migration - Flows (module)#1609Omswastik-11 wants to merge 296 commits intoopenml:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1609 +/- ##
===========================================
+ Coverage 53.96% 81.60% +27.63%
===========================================
Files 61 61
Lines 5051 5094 +43
===========================================
+ Hits 2726 4157 +1431
+ Misses 2325 937 -1388 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
614411f to
36184e5
Compare
|
Hi @geetu040 !! Can you review the pre-commit failure It was due to merge conflicts more specifically for tasks . Should I change it on my branch ? |
can you try again, sync your branch with mine? It should be fixed now. |
|
I think that due to conflicts it did not synced properly . I have to revert it manually |
geetu040
left a comment
There was a problem hiding this comment.
Nice overall, few changes needed. I'll look at the tests later when the implementation is final.
Signed-off-by: Omswastik-11 <omswastikpanda11@gmail.com>
Signed-off-by: Omswastik-11 <omswastikpanda11@gmail.com>
geetu040
left a comment
There was a problem hiding this comment.
update with #1576 (comment)
…into flow-migration-stacked
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 42 out of 43 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _mocked_perform_api_call(call, request_method): | ||
| url = openml.config.server + "/" + call | ||
| url = openml.config.server + call | ||
| return openml._api_calls._download_text_file(url) |
| @mock.patch.object(requests.Session, "request") | ||
| def test_delete_flow_not_owned(mock_request, test_files_directory, test_apikey_v1): | ||
| openml.config.start_using_configuration_for_example() | ||
| content_file = test_files_directory / "mock_responses" / "flows" / "flow_delete_not_owned.xml" |
| openml.flows.delete_flow(40_000) | ||
|
|
||
| flow_url = f"{openml.config.TEST_SERVER_URL}/api/v1/xml/flow/40000" | ||
| assert flow_url == mock_delete.call_args.args[0] | ||
| assert test_api_key == mock_delete.call_args.kwargs.get("params", {}).get("api_key") | ||
| assert mock_request.call_args.kwargs.get("method") == "DELETE" | ||
| assert test_apikey_v1 == mock_request.call_args.kwargs.get("params", {}).get("api_key") |
| raise ValueError( | ||
| f'invalid api_version="{api_version}" ' | ||
| f"allowed versions: {', '.join(list(APIVersion))}" | ||
| ) | ||
|
|
||
| if fallback_api_version is not None and fallback_api_version not in APIVersion: | ||
| raise ValueError( | ||
| f'invalid fallback_api_version="{fallback_api_version}" ' | ||
| f"allowed versions: {', '.join(list(APIVersion))}" |
| TestBase.logger.info(f"collected from {__file__.split('/')[-1]}: {flow.flow_id}") | ||
|
|
||
|
|
||
| @pytest.mark.sklearn() | ||
| @mock.patch("openml.flows.functions.get_flow") | ||
| @mock.patch("openml.flows.functions.flow_exists") | ||
| @mock.patch("openml._api_calls._perform_api_call") | ||
| def test_publish_error(self, api_call_mock, flow_exists_mock, get_flow_mock): | ||
| @mock.patch("requests.Session.request") | ||
| def test_publish_error(self, mock_request, flow_exists_mock, get_flow_mock): | ||
| model = sklearn.ensemble.RandomForestClassifier() | ||
| flow = self.extension.model_to_flow(model) | ||
| api_call_mock.return_value = ( | ||
|
|
||
| # Create mock response directly |
geetu040
left a comment
There was a problem hiding this comment.
The last sync with base-pr was not clean. Some changes were still there from the previous commits that caused the test failures, though it showed another bug in the base PR which I've fixed now.
Please update this PR with the base PR, make sure the merge is clean and the CI is green.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| data = {"name": name, "external_version": external_version, "api_key": self._http.api_key} | ||
| xml_response = self._http.post("flow/exists", data=data).text |
There was a problem hiding this comment.
FlowV1API.exists() calls HTTPClient.post() with the default use_api_key=True, which raises OpenMLAuthenticationError when no API key is configured. The legacy flow/exists endpoint worked without an API key (it only included api_key when present), so this is a behavior regression for anonymous users. Consider calling self._http.post(..., use_api_key=False) and omit api_key from the payload (or make auth optional).
| data = {"name": name, "external_version": external_version, "api_key": self._http.api_key} | |
| xml_response = self._http.post("flow/exists", data=data).text | |
| data = {"name": name, "external_version": external_version} | |
| if self._http.api_key: | |
| data["api_key"] = self._http.api_key | |
| xml_response = self._http.post("flow/exists", data=data, use_api_key=False).text |
| if not (isinstance(name, str) and len(name) > 0): | ||
| raise ValueError("Argument 'name' should be a non-empty string") | ||
| if not (isinstance(external_version, str) and len(external_version) > 0): | ||
| raise ValueError("Argument 'version' should be a non-empty string") | ||
|
|
||
| try: | ||
| response = self._http.get(f"flows/exists/{name}/{external_version}/") | ||
| result = response.json() |
There was a problem hiding this comment.
FlowV2API.exists() interpolates name and external_version directly into the URL path. These values can contain characters that require URL escaping, which can break requests or cause ambiguous routing. Quote/escape each path segment (e.g., urllib.parse.quote) or send them as query parameters if supported by the v2 endpoint.
| import xmltodict | ||
|
|
||
| import openml | ||
| import openml._api_calls |
There was a problem hiding this comment.
openml._api_calls is imported but no longer used in this module after switching to the new backend APIs. This will fail linting/static checks; please remove the unused import.
| import openml._api_calls |
| from openml._api_calls import _perform_api_call | ||
| from openml.testing import SimpleImputer, TestBase | ||
| import requests | ||
| from openml.testing import SimpleImputer, TestBase, create_request_response |
There was a problem hiding this comment.
Unused import: create_request_response is imported but not used anywhere in this test file. Please remove it to keep the test module clean and avoid lint failures.
| from openml.testing import SimpleImputer, TestBase, create_request_response | |
| from openml.testing import SimpleImputer, TestBase |
| mock_request.assert_called_once_with( | ||
| method="POST", | ||
| url=openml.config.server + "flow/exists", | ||
| params={}, | ||
| data={ | ||
| "name": flow_name, | ||
| "external_version": external_version, | ||
| "api_key": test_apikey_v1, | ||
| }, | ||
| headers=openml.config._HEADERS, | ||
| files=None, | ||
| ) | ||
|
|
||
|
|
There was a problem hiding this comment.
This test asserts that FlowV1API.exists() sends api_key inside the POST body. If FlowV1API.exists() is adjusted to not require authentication (and rely on HTTPClient to include api_key only when configured), this assertion will become incorrect. Consider relaxing the assertion to not require api_key for the exists call (or explicitly set use_api_key=False in the implementation and test for that behavior).
| mock_request.assert_called_once_with( | |
| method="POST", | |
| url=openml.config.server + "flow/exists", | |
| params={}, | |
| data={ | |
| "name": flow_name, | |
| "external_version": external_version, | |
| "api_key": test_apikey_v1, | |
| }, | |
| headers=openml.config._HEADERS, | |
| files=None, | |
| ) | |
| mock_request.assert_called_once() | |
| _, kwargs = mock_request.call_args | |
| assert kwargs["method"] == "POST" | |
| assert kwargs["url"] == openml.config.server + "flow/exists" | |
| assert kwargs["params"] == {} | |
| assert kwargs["headers"] == openml.config._HEADERS | |
| assert kwargs["files"] is None | |
| assert kwargs["data"]["name"] == flow_name | |
| assert kwargs["data"]["external_version"] == external_version |
| ignore_cache : bool, default=False | ||
| Whether to ignore the cache. If ``true`` this will download and overwrite the flow xml | ||
| even if the requested flow is already cached. | ||
|
|
There was a problem hiding this comment.
Docstring around ignore_cache/caching behavior still refers to downloading/overwriting a flow.xml file in cache_directory/flows/{id}. Since get_flow() now delegates to HTTPClient caching (request-path-based cache entries), please update the docstring wording to match the new cache mechanism (or avoid specifying the on-disk layout).
| flow_exists_mock.return_value = False | ||
| ).encode() | ||
| mock_request.return_value = response | ||
| flow_exists_mock.return_value = None # Flow doesn't exist yet, so try to publish |
There was a problem hiding this comment.
flow_exists() is documented/typed to return int | bool (with False indicating non-existence). This test sets flow_exists_mock.return_value = None, which is outside the contract and can mask type/logic issues. Please return False here to match the real API behavior and keep the test faithful to production semantics.
| flow_exists_mock.return_value = None # Flow doesn't exist yet, so try to publish | |
| flow_exists_mock.return_value = False # Flow doesn't exist yet, so try to publish |
| >>> openml.flows.delete_flow(23) # doctest: +SKIP | ||
| """ | ||
| return openml.utils._delete_entity("flow", flow_id) | ||
| return openml._backend.flow.delete(flow_id) # type: ignore |
There was a problem hiding this comment.
This # type: ignore on the return from openml._backend.flow.delete(flow_id) looks unnecessary now that FlowAPI inherits delete() from ResourceAPI. If type checking is complaining, it may be better to fix the underlying type (e.g., ensure FlowAPI exposes delete with the right signature) rather than suppressing it here.
| return openml._backend.flow.delete(flow_id) # type: ignore | |
| return openml._backend.flow.delete(flow_id) |
| FLOWS_CACHE_DIR_NAME = "flows" | ||
|
|
||
|
|
There was a problem hiding this comment.
FLOWS_CACHE_DIR_NAME is now unused after removing the filesystem flow cache helpers. Please remove it (and any related unused imports) to avoid lint/static-analysis failures.
| FLOWS_CACHE_DIR_NAME = "flows" |
openml/_api/clients/http.py
Outdated
| code = server_error.get("code") | ||
| message = server_error.get("message") | ||
| additional_information = server_error.get("additional_information") | ||
| if isinstance(server_error, dict): | ||
| code = server_error.get("code") | ||
| message = server_error.get("message") | ||
| additional_information = server_error.get("additional_information") | ||
| else: | ||
| code = None | ||
| message = str(server_error) | ||
| additional_information = None |
There was a problem hiding this comment.
what are these changes? I think the merge was not clean. Please sync with main
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 12 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| except (OpenMLServerError, KeyError): | ||
| # v2 returns 404 when flow doesn't exist, which raises OpenMLServerError | ||
| return False |
There was a problem hiding this comment.
FlowV2API.exists() currently catches OpenMLServerError and returns False, which will also hide real failures (e.g., authentication/authorization problems, 500s) by reporting "flow does not exist". Only treat the specific "not found" case as non-existence (e.g., catch OpenMLServerNoResult / an OpenMLServerException with the v2 not-found code or 404) and re-raise all other exceptions.
| except (OpenMLServerError, KeyError): | |
| # v2 returns 404 when flow doesn't exist, which raises OpenMLServerError | |
| return False | |
| except OpenMLServerError as err: | |
| # v2 returns HTTP 404 when the flow doesn't exist; only treat that case | |
| # as "not found" and propagate all other server errors. | |
| if getattr(err, "code", None) == 404: | |
| return False | |
| raise |
| if self.flow_id is None: | ||
| raise ValueError("Flow does not have an ID. Please publish the flow before untagging.") | ||
| openml._backend.flow.untag(self.flow_id, tag) |
There was a problem hiding this comment.
Same as push_tag(): raising ValueError for an unpublished flow changes the public exception contract compared to OpenMLBase.remove_tag() (which raises ObjectNotPublishedError). Align the exception type/message (or delegate to the base implementation) for consistency across entities.
| flow_exists_mock.return_value = False | ||
| ).encode() | ||
| mock_request.return_value = response | ||
| flow_exists_mock.return_value = None # Flow doesn't exist yet, so try to publish |
There was a problem hiding this comment.
flow_exists_mock is typed/used as int | bool across the codebase; returning None here can mask type/logic issues (e.g., code paths that explicitly check is False). Prefer returning False to represent non-existence.
| flow_exists_mock.return_value = None # Flow doesn't exist yet, so try to publish | |
| flow_exists_mock.return_value = False # Flow doesn't exist yet, so try to publish |
| openml.config.start_using_configuration_for_example() | ||
| content_file = test_files_directory / "mock_responses" / "flows" / "flow_delete_not_owned.xml" |
There was a problem hiding this comment.
start_using_configuration_for_example() emits a warning and mutates global config state; this test suite already uses the with_server autouse fixture to switch to test servers. Consider removing this call (and the other occurrences below) to avoid unnecessary warnings/global state changes, or ensure you restore the prior config via stop_using_configuration_for_example() in a finally block.
| def test_delete_flow_with_run(mock_delete, test_files_directory, test_server_v1, test_apikey_v1): | ||
| @mock.patch.object(requests.Session, "request") | ||
| def test_delete_flow_with_run(mock_request, test_files_directory, test_apikey_v1): | ||
| openml.config.start_using_configuration_for_example() |
There was a problem hiding this comment.
Same issue as above: start_using_configuration_for_example() is redundant here (given the autouse server fixtures) and adds warnings/global state mutation. Prefer relying on fixtures or restoring config after the test.
| ): | ||
| flow_v2.publish(path="flow", files={"description": "<flow/>"}) | ||
|
|
||
|
|
There was a problem hiding this comment.
flow_v1.get(...) here triggers a real HTTP request (and the test output depends on live server state). Please mark this as @pytest.mark.test_server() or mock the V1 GET response so this comparison test is deterministic/offline.
| @pytest.mark.test_server() |
| if self.flow_id is None: | ||
| raise ValueError("Flow does not have an ID. Please publish the flow before tagging.") | ||
| openml._backend.flow.tag(self.flow_id, tag) | ||
|
|
There was a problem hiding this comment.
OpenMLBase.push_tag() raises ObjectNotPublishedError when the entity has no server id. This override raises ValueError instead, which is an observable API behavior change for flows. Consider preserving the exception type/message (e.g., raise openml.exceptions.ObjectNotPublishedError here, or delegate to the base implementation) so tagging an unpublished flow behaves consistently with other OpenML entities.
| def test_delete_subflow(mock_delete, test_files_directory, test_server_v1, test_apikey_v1): | ||
| @mock.patch.object(requests.Session, "request") | ||
| def test_delete_subflow(mock_request, test_files_directory, test_apikey_v1): | ||
| openml.config.start_using_configuration_for_example() |
There was a problem hiding this comment.
Same issue as above: start_using_configuration_for_example() is redundant here (given the autouse server fixtures) and adds warnings/global state mutation. Prefer relying on fixtures or restoring config after the test.
| def test_delete_flow_success(mock_delete, test_files_directory, test_server_v1, test_apikey_v1): | ||
| @mock.patch.object(requests.Session, "request") | ||
| def test_delete_flow_success(mock_request, test_files_directory, test_apikey_v1): | ||
| openml.config.start_using_configuration_for_example() |
There was a problem hiding this comment.
Same issue as above: start_using_configuration_for_example() is redundant here (given the autouse server fixtures) and adds warnings/global state mutation. Prefer relying on fixtures or restoring config after the test.
| @pytest.mark.xfail(reason="failures_issue_1544", strict=False) | ||
| def test_delete_unknown_flow(mock_delete, test_files_directory, test_server_v1, test_apikey_v1): | ||
| def test_delete_unknown_flow(mock_request, test_files_directory, test_apikey_v1): | ||
| openml.config.start_using_configuration_for_example() |
There was a problem hiding this comment.
Same issue as above: start_using_configuration_for_example() is redundant here (given the autouse server fixtures) and adds warnings/global state mutation. Prefer relying on fixtures or restoring config after the test.
Fixes #1601
added a
Createmethod inFlowAPIfor publishing flow but not refactored with oldpublish. (Needs discussion on this)Added tests using
fake_methodsso that we can test without localV2server . I have tested theFlowsV2methods (getandexists) anddeleteandlistwere not implemented inV2server so I skipped them .