Skip to content

[ENH] V1 → V2 API Migration - setups#1619

Open
EmanAbdelhaleem wants to merge 264 commits intoopenml:mainfrom
EmanAbdelhaleem:setups-mig
Open

[ENH] V1 → V2 API Migration - setups#1619
EmanAbdelhaleem wants to merge 264 commits intoopenml:mainfrom
EmanAbdelhaleem:setups-mig

Conversation

@EmanAbdelhaleem
Copy link
Contributor

@EmanAbdelhaleem EmanAbdelhaleem commented Jan 20, 2026

Fixes #1625
Depends on #1576
Related to #1575

Details
This PR implements Setups resource, and refactor its existing functions

@codecov-commenter
Copy link

codecov-commenter commented Jan 20, 2026

Codecov Report

❌ Patch coverage is 68.68687% with 31 lines in your changes missing coverage. Please review.
✅ Project coverage is 54.26%. Comparing base (8ff14eb) to head (801c16f).

Files with missing lines Patch % Lines
openml/_api/resources/setup.py 71.42% 24 Missing ⚠️
openml/setups/setup.py 25.00% 6 Missing ⚠️
openml/setups/functions.py 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1619      +/-   ##
==========================================
+ Coverage   53.96%   54.26%   +0.29%     
==========================================
  Files          61       61              
  Lines        5051     5070      +19     
==========================================
+ Hits         2726     2751      +25     
+ Misses       2325     2319       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@geetu040 geetu040 mentioned this pull request Jan 21, 2026
18 tasks
@geetu040 geetu040 self-assigned this Mar 23, 2026
Copy link
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To check: Does #1611 (review) affect this PR?

Copy link
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some old tests will fail because the cache path has changed

old: tests/files/org/openml/test/tasks/1882
new: tests/files/org/openml/test/api/v1/xml/task/1882

regarding this we can do the following, as i've done in this PR

  • populate the files in new directory - reference
    • you can copy meta.json and headers.json as they are and update the URLs (this is trivial, anything should be fine here)
    • rename the description.xml file to body.bin
    • move other downloadable files from minio and HTTPClient.download accordingly
  • if there are any tests that rely on cache, update them by mocking Session.request - reference
  • update conftest.py with new file paths, for now you can manually hardcode the new paths and remove this resource from the code earlier, we can later clean this up - reference

FYI: @satvshr @Omswastik-11 @JATAYU000

Copy link
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EmanAbdelhaleem Thanks for the PR. Nicely done!
I have updated the PR to sync with latest changes in base PR.

@PGijsbers please review/merge.

Copy link
Collaborator

@PGijsbers PGijsbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few remarks, additionally the ability to add and remove tags was added, but I did not see any tests added that test this new functionality. Those tests should be added.

I believe we had briefly discussed this before on a different PR, but I can't find it easily. Why store the response body as a binary file? The response is always text (either xml or json, potentially also some error text but text regardless). I do not think it is good to store these files as binary because it doesn't play well with git -and- because it makes it hard to review the file content (see for example the recent xz/ssh exploit). I would prefer these files have an extension that matches their content, and if that needs to be more generic, then we should use .txt.

Comment on lines +228 to +230
def _to_dict(
self, flow_id: int, openml_parameter_settings: builtins.list[dict[str, Any]]
) -> OrderedDict:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please turn it into a static method since it doesn't use the object's state.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

)
return OpenMLSetup(setup_id, flow_id, parameters)

def _create_setup_parameter_from_xml(self, result_dict: dict[str, str]) -> OpenMLParameter:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please turn it into a static method since it doesn't use the object's state.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

for setup_ in setups_dict["oml:setups"]["oml:setup"]
]

def _create_setup(self, result_dict: dict) -> OpenMLSetup:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please turn it into a static method since it doesn't use the object's state.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Comment on lines +22 to +25
md5 = hashlib.md5()
md5.update(str(time.time()).encode("utf-8"))
sentinel = md5.hexdigest()[:10]
return f"TEST{sentinel}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
md5 = hashlib.md5()
md5.update(str(time.time()).encode("utf-8"))
sentinel = md5.hexdigest()[:10]
return f"TEST{sentinel}"
sentinel = uuid.uuid4().hex[:10]
return f"TEST{sentinel}"

(and import uuid)

The generated string is deterministic based on the current time. That means that if this is used for multiple tests, and they are executed in parallel, there is a non-zero chance the same value is generated (or when the clock is set back). Admittedly, the precision of time.time() is enough that this likely doesn't happen, but it's good to avoid the chance. Moreover, generating it with uuid clearly signals the intent that we are just looking for a random string.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ENH] V1 → V2 API Migration - setups

8 participants