[ENH] V1 → V2 API Migration - setups#1619
[ENH] V1 → V2 API Migration - setups#1619EmanAbdelhaleem wants to merge 264 commits intoopenml:mainfrom
Conversation
…into issue1564
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1619 +/- ##
==========================================
+ Coverage 53.96% 54.26% +0.29%
==========================================
Files 61 61
Lines 5051 5070 +19
==========================================
+ Hits 2726 2751 +25
+ Misses 2325 2319 -6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
geetu040
left a comment
There was a problem hiding this comment.
To check: Does #1611 (review) affect this PR?
geetu040
left a comment
There was a problem hiding this comment.
some old tests will fail because the cache path has changed
old: tests/files/org/openml/test/tasks/1882
new: tests/files/org/openml/test/api/v1/xml/task/1882
regarding this we can do the following, as i've done in this PR
- populate the files in new directory - reference
- you can copy
meta.jsonandheaders.jsonas they are and update the URLs (this is trivial, anything should be fine here) - rename the
description.xmlfile tobody.bin - move other downloadable files from
minioandHTTPClient.downloadaccordingly
- you can copy
- if there are any tests that rely on cache, update them by mocking
Session.request- reference - update
conftest.pywith new file paths, for now you can manually hardcode the new paths and remove this resource from the code earlier, we can later clean this up - reference
geetu040
left a comment
There was a problem hiding this comment.
@EmanAbdelhaleem Thanks for the PR. Nicely done!
I have updated the PR to sync with latest changes in base PR.
@PGijsbers please review/merge.
PGijsbers
left a comment
There was a problem hiding this comment.
I left a few remarks, additionally the ability to add and remove tags was added, but I did not see any tests added that test this new functionality. Those tests should be added.
I believe we had briefly discussed this before on a different PR, but I can't find it easily. Why store the response body as a binary file? The response is always text (either xml or json, potentially also some error text but text regardless). I do not think it is good to store these files as binary because it doesn't play well with git -and- because it makes it hard to review the file content (see for example the recent xz/ssh exploit). I would prefer these files have an extension that matches their content, and if that needs to be more generic, then we should use .txt.
openml/_api/resources/setup.py
Outdated
| def _to_dict( | ||
| self, flow_id: int, openml_parameter_settings: builtins.list[dict[str, Any]] | ||
| ) -> OrderedDict: |
There was a problem hiding this comment.
Please turn it into a static method since it doesn't use the object's state.
openml/_api/resources/setup.py
Outdated
| ) | ||
| return OpenMLSetup(setup_id, flow_id, parameters) | ||
|
|
||
| def _create_setup_parameter_from_xml(self, result_dict: dict[str, str]) -> OpenMLParameter: |
There was a problem hiding this comment.
Please turn it into a static method since it doesn't use the object's state.
openml/_api/resources/setup.py
Outdated
| for setup_ in setups_dict["oml:setups"]["oml:setup"] | ||
| ] | ||
|
|
||
| def _create_setup(self, result_dict: dict) -> OpenMLSetup: |
There was a problem hiding this comment.
Please turn it into a static method since it doesn't use the object's state.
tests/test_api/test_setup.py
Outdated
| md5 = hashlib.md5() | ||
| md5.update(str(time.time()).encode("utf-8")) | ||
| sentinel = md5.hexdigest()[:10] | ||
| return f"TEST{sentinel}" |
There was a problem hiding this comment.
| md5 = hashlib.md5() | |
| md5.update(str(time.time()).encode("utf-8")) | |
| sentinel = md5.hexdigest()[:10] | |
| return f"TEST{sentinel}" | |
| sentinel = uuid.uuid4().hex[:10] | |
| return f"TEST{sentinel}" |
(and import uuid)
The generated string is deterministic based on the current time. That means that if this is used for multiple tests, and they are executed in parallel, there is a non-zero chance the same value is generated (or when the clock is set back). Admittedly, the precision of time.time() is enough that this likely doesn't happen, but it's good to avoid the chance. Moreover, generating it with uuid clearly signals the intent that we are just looking for a random string.
openml#1619 (comment) Co-authored-by: Pieter Gijsbers <p.gijsbers@tue.nl>
for more information, see https://pre-commit.ci
Fixes #1625
Depends on #1576
Related to #1575
Details
This PR implements
Setupsresource, and refactor its existing functions