Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enhancement] WIP new finite checking in LinearRegression, Ridge and incremental variants #2206

Draft
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

icfaust
Copy link
Contributor

@icfaust icfaust commented Dec 3, 2024

Description

Introduce the new finite checker into LinearRegression and Ridge.


PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.

You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).

Checklist to comply with before moving PR from draft:

PR completeness and readability

  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have added a respective label(s) to PR if I have a permission for that.
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
  • I have extended testing suite if new functionality was introduced in this PR.

Performance

  • I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
  • I have provided justification why performance has changed or why changes are not expected.
  • I have provided justification why quality metrics have changed or why changes are not expected.
  • I have extended benchmarking suite and provided corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

@icfaust icfaust changed the title [enhancement] WIP new finite checking in LinearRegression and Ridge [enhancement] WIP new finite checking in LinearRegression, Ridge and incremental variants Dec 3, 2024
@icfaust
Copy link
Contributor Author

icfaust commented Dec 4, 2024

/intelci: run

icfaust added a commit to icfaust/scikit-learn-intelex that referenced this pull request Dec 4, 2024
icfaust added a commit that referenced this pull request Dec 10, 2024
…_sample_weight``` (#2177)

* add finiteness_checker pybind11 bindings

* added finiteness checker

* Update finiteness_checker.cpp

* Update finiteness_checker.cpp

* Update finiteness_checker.cpp

* Update finiteness_checker.cpp

* Update finiteness_checker.cpp

* Update finiteness_checker.cpp

* Rename finiteness_checker.cpp to finiteness_checker.cpp

* Update finiteness_checker.cpp

* add next step

* follow conventions

* make xtable explicit

* remove comment

* Update validation.py

* Update __init__.py

* Update validation.py

* Update __init__.py

* Update __init__.py

* Update validation.py

* Update _data_conversion.py

* Update _data_conversion.py

* Update policy_common.cpp

* Update policy_common.cpp

* Update _policy.py

* Update policy_common.cpp

* Rename finiteness_checker.cpp to finiteness_checker.cpp

* Create finiteness_checker.py

* Update validation.py

* Update __init__.py

* attempt at fixing circular imports again

* fix isort

* remove __init__ changes

* last move

* Update policy_common.cpp

* Update policy_common.cpp

* Update policy_common.cpp

* Update policy_common.cpp

* Update validation.py

* add testing

* isort

* attempt to fix module error

* add fptype

* fix typo

* Update validation.py

* remove sua_ifcae from to_table

* isort and black

* Update test_memory_usage.py

* format

* Update _data_conversion.py

* Update _data_conversion.py

* Update test_validation.py

* remove unnecessary code

* make reviewer changes

* make dtype check change

* add sparse testing

* try again

* try again

* try again

* temporary commit

* first attempt

* missing change?

* modify DummyEstimator for testing

* generalize DummyEstimator

* switch test

* further testing changes

* add initial validate_data test, will be refactored

* fixes for CI

* Update validation.py

* Update validation.py

* Update test_memory_usage.py

* Update base.py

* Update base.py

* improve tests

* fix logic

* fix logic

* fix logic again

* rename file

* Revert "rename file"

This reverts commit 8d47744.

* remove duplication

* fix imports

* Rename test_finite.py to test_validation.py

* Revert "Rename test_finite.py to test_validation.py"

This reverts commit ee799f6.

* updates

* Update validation.py

* fixes for some test failures

* fix text

* fixes for some failures

* make consistent

* fix bad logic

* fix in string

* attempt tp see if dataframe conversion is causing the issue

* fix iter problem

* fix testing issues

* formatting

* revert change

* fixes for pandas

* there is a slowdown with pandas that needs to be solved

* swap to transpose for speed

* more clarity

* add _check_sample_weight

* add more testing'

* rename

* remove unnecessary imports

* fix test slowness

* focus get_dataframes_and_queues

* put config_context around

* Update test_validation.py

* Update base.py

* Update test_validation.py

* generalize regex

* add fixes for sklearn 1.0 and input_name

* fixes for test failures

* Update validation.py

* Update test_validation.py

* Update validation.py

* formattintg

* make suggested changes

* follow changes made in #2126

* fix future device problem

* Update validation.py

* minor changes based on #2206, suggestions

* remove xp as keyword

* only_non_negative -> ensure_non_negative

* add commentary

* formatting

* address changes

* Update test_validation.py

* Update base.py

* Update test_validation.py

* Update sklearnex/utils/validation.py

Co-authored-by: ethanglaser <[email protected]>

---------

Co-authored-by: ethanglaser <[email protected]>
@icfaust icfaust mentioned this pull request Jan 9, 2025
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant