Skip to content

Update normalization parameters and add estimator params validation#210

Open
avolkov-intel wants to merge 6 commits intoIntelPython:mainfrom
avolkov-intel:dev/anatolyv-normalize-fix
Open

Update normalization parameters and add estimator params validation#210
avolkov-intel wants to merge 6 commits intoIntelPython:mainfrom
avolkov-intel:dev/anatolyv-normalize-fix

Conversation

@avolkov-intel
Copy link
Copy Markdown
Collaborator

@avolkov-intel avolkov-intel commented May 6, 2026

Description

  • Add different normalization options
  • Remove implicit normalization from loaders
  • Add estimator parameters validation

Estimator parameters validation can be useful in case you want to override some parameter like n_jobs using -p algorithm:estimator_params:n_jobs=64 for the benchmarks. Currently it would fail since some estimators don't support n_jobs and you would need to run the benchmarks separately. In this approach this parameter will be simply ignored by estimators that don't support it and warning will be shown.


Checklist:

Completeness and readability

  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with updates and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
  • I have extended testing suite if new functionality was introduced in this PR.

@avolkov-intel avolkov-intel changed the title Update normalization parameters Update normalization parameters and add estimator params validation May 6, 2026
@avolkov-intel avolkov-intel added bug Something isn't working extend Extend benchmarks labels May 6, 2026
"data": {
"dataset": "hepmass",
"split_kwargs": { "train_size": 0.1, "test_size": null }
"split_kwargs": { "train_size": 0.1, "test_size": null },
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this is the only case where benchmark behavior changes - is it intended?

Comment on lines +7 to +8
"dataset" : "cifar",
"split_kwargs": { "ignore" : true },
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"dataset" : "cifar",
"split_kwargs": { "ignore" : true },
"dataset": "cifar",
"split_kwargs": { "ignore": true },

Comment thread configs/regular/svm.json
"dataset": "mnist",
"split_kwargs": { "train_size": 20000, "test_size": null },
"preprocessing_kwargs": { "normalize": false }
"preprocessing_kwargs": {"normalize" : null}
Copy link
Copy Markdown
Contributor

@ethanglaser ethanglaser May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"preprocessing_kwargs": {"normalize" : null}
"preprocessing_kwargs": { "normalize": null }

Minor adjustment but looks like spaces have been added before colons in a few places throughout the configs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working extend Extend benchmarks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants