Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proper CPUID eax checking #3026

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Conversation

isuruf
Copy link

@isuruf isuruf commented Dec 17, 2024

This makes it possible to use the sse42, avx2 code paths on AMD processors

Description

Add a comprehensive description of proposed changes

  • Adds proper CPUID extension support checking which was previously done by guarding with a check for Intel CPUs. This allows AMD processors to use the sse42, avx2 code paths instead of the default sse2.

List associated issue number(s) if exist(s): #6 (for example)

Documentation PR (if needed): #1340 (for example)

Benchmarks PR (if needed): IntelPython/scikit-learn_bench#155 (for example)


PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.

You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).

Checklist to comply with before moving PR from draft:

PR completeness and readability

  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have added a respective label(s) to PR if I have a permission for that.
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
  • I have extended testing suite if new functionality was introduced in this PR.

Performance

  • I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
  • I have provided justification why performance has changed or why changes are not expected.
  • I have provided justification why quality metrics have changed or why changes are not expected.
  • I have extended benchmarking suite and provided corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

This makes it possible to use the sse42, avx2 code paths on AMD
processors
Copy link
Contributor

@icfaust icfaust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial review.

@@ -71,6 +71,19 @@ void run_cpuid(uint32_t eax, uint32_t ecx, uint32_t * abcd)
#endif
}

uint32_t __daal_internal_get_max_extension_support()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is not necessary, it can be folded in below.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This follows the pattern of __daal_internal_is_intel_cpu and daal_check_is_intel_cpu where the former is the one that does the work and the latter is there to cache the value in a static variable to avoid running the former multiple times.

@@ -71,6 +71,19 @@ void run_cpuid(uint32_t eax, uint32_t ecx, uint32_t * abcd)
#endif
}

uint32_t __daal_internal_get_max_extension_support()
{
uint32_t abcd[4];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clearer naming is necessary.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming used is abcd throughout this file. This variable stores the values of the eax, ebx, ecx, edx registers just like the other functions. Any ideas on what to rename all of them to?

uint32_t __daal_internal_get_max_extension_support()
{
uint32_t abcd[4];
run_cpuid(0x80000000, 0, abcd);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commentary for this would be nice.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments.

@@ -193,7 +211,7 @@ static int check_sse42_features()

DAAL_EXPORT bool __daal_serv_cpu_extensions_available()
{
return daal_check_is_intel_cpu();
return 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, if a bool, it would be best to return a bool. Secondly, if this is a no-op, then the function should be removed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the function and its usages.

@isuruf
Copy link
Author

isuruf commented Dec 19, 2024

Thanks for the review @icfaust

@isuruf
Copy link
Author

isuruf commented Jan 6, 2025

@icfaust what are the next steps for this PR?

@icfaust
Copy link
Contributor

icfaust commented Jan 6, 2025

@isuruf I think addressing uxlfoundation/scikit-learn-intelex#1000 may take priority, though I will defer to code owners on this. I think testability is key, and this PR may cover up an underlying issue.

@isuruf
Copy link
Author

isuruf commented Jan 7, 2025

I think testability is key, and this PR may cover up an underlying issue.

Yeah, it will cover up an underlying issue in sse2 code path, but those are really old hardware and it seems like the only way to make that bug surface is to use/emulate really old hardware or use AMD hardware. I don't think it is fair to keep AMD hardware throttled to sse2 just to have a way to reproduce this bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants