Adjust regularization and data pre-processing to ensure logistic regression converges by david-cortes-intel · Pull Request #190 · IntelPython/scikit-learn_bench

david-cortes-intel · 2025-10-27T13:47:14Z

Description

Many benchmark cases for logistic regression are executing something that does not reach convergence within the specified tolerances. This PR adjust the cases used for logistic regression so as to use a more appropriate regularization value for each dataset, and pre-process them in ways that would avoid ending up with features in too different scales.

Before:

After:

Checklist:

Completeness and readability

Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have resolved any merge conflicts that might occur with the base branch.

Testing

I have run it locally and tested the changes extensively.
All CI jobs are green or I have provided justification why they aren't.

…rging

david-cortes-intel · 2025-10-27T13:53:38Z

CI error is from an xgb model conversion issue:

                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/miniconda/envs/bench-env/lib/python3.11/site-packages/daal4py/mb/gbt_convertors.py", line 546, in get_gbt_model_from_xgboost
    base_score = float(xgb_config["learner"]["learner_model_param"]["base_score"])
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: could not convert string to float: '[-2.5816395E0]'

It should be fixed with this PR in sklearnex: uxlfoundation/scikit-learn-intelex#2741

Vika-F · 2025-10-31T11:14:15Z

The changes look good to me.
The only question I have: Will it be possible to merge the resulting jsons after these changes with the jsons collected with the current version of the benchmarks? Or they would need to be recollected with the updated version?

david-cortes-intel · 2025-10-31T11:35:53Z

The changes look good to me. The only question I have: Will it be possible to merge the resulting jsons after these changes with the jsons collected with the current version of the benchmarks? Or they would need to be recollected with the updated version?

They would be mergeable, but there would be fewer overlapping entries so it wouldn't show comparisons for most cases.

change data processing and regularization so that models end up conve…

bad85b3

…rging

david-cortes-intel requested review from Vika-F and avolkov-intel October 27, 2025 13:47

david-cortes-intel requested a review from Alexsandruss as a code owner October 27, 2025 13:47

david-cortes-intel added the enhancement New feature or request label Oct 27, 2025

david-cortes-intel mentioned this pull request Oct 27, 2025

ENH: Increase correction pairs used for quasi Newton approximations uxlfoundation/scikit-learn-intelex#2752

Merged

5 tasks

Vika-F approved these changes Nov 3, 2025

View reviewed changes

david-cortes-intel merged commit 03df57a into IntelPython:main Nov 3, 2025
7 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust regularization and data pre-processing to ensure logistic regression converges#190

Adjust regularization and data pre-processing to ensure logistic regression converges#190
david-cortes-intel merged 1 commit intoIntelPython:mainfrom
david-cortes-intel:logreg_convergence

david-cortes-intel commented Oct 27, 2025

Uh oh!

david-cortes-intel commented Oct 27, 2025

Uh oh!

Vika-F commented Oct 31, 2025

Uh oh!

david-cortes-intel commented Oct 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

david-cortes-intel commented Oct 27, 2025

Description

Uh oh!

david-cortes-intel commented Oct 27, 2025

Uh oh!

Vika-F commented Oct 31, 2025

Uh oh!

david-cortes-intel commented Oct 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants