Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: rng primitive refactoring #2968

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

Alexandr-Solovev
Copy link
Contributor

@Alexandr-Solovev Alexandr-Solovev commented Nov 5, 2024

Description

Description:

Feature: RNG primitive refactoring

Summary:

This PR updates the oneDAL rng primitive. It includes various fixes and modifications for RNG primitive.

Key Changes:

  1. New generators have been added:

    • mrg32k3a and philox engines have been added in DAAL/oneDAL.
  2. Host and DPC engines have been refactored and added:

    • Opportunity to use RNG on all devices.

SKLEARNEX related pr: uxlfoundation/scikit-learn-intelex#2228

PR completeness and readability

  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have added a respective label(s) to PR if I have a permission for that.
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
  • I have extended testing suite if new functionality was introduced in this PR.

Performance

  • I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
  • I have provided justification why performance has changed or why changes are not expected.
  • I have provided justification why quality metrics have changed or why changes are not expected.
  • I have extended benchmarking suite and provided corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

@Alexandr-Solovev
Copy link
Contributor Author

/intelci: run

@Alexandr-Solovev Alexandr-Solovev added the dpc++ Issue/PR related to DPC++ functionality label Nov 19, 2024
cpp/daal/include/algorithms/engines/mrg32k3a/mrg32k3a.h Outdated Show resolved Hide resolved
cpp/daal/src/algorithms/engines/mrg32k3a/mrg32k3a_impl.i Outdated Show resolved Hide resolved
cpp/oneapi/dal/backend/primitives/rng/engine_gpu.hpp Outdated Show resolved Hide resolved
cpp/oneapi/dal/backend/primitives/rng/rng.hpp Outdated Show resolved Hide resolved
cpp/oneapi/dal/backend/primitives/rng/rng_types.hpp Outdated Show resolved Hide resolved
docs/source/daal/algorithms/engines/mrg32k3a.rst Outdated Show resolved Hide resolved
docs/source/daal/algorithms/engines/mrg32k3a.rst Outdated Show resolved Hide resolved
@Alexandr-Solovev
Copy link
Contributor Author

/intelci: run

@Alexandr-Solovev
Copy link
Contributor Author

/intelci: run

@Alexandr-Solovev
Copy link
Contributor Author

/intelci: run

@Alexandr-Solovev
Copy link
Contributor Author

sklearnex ref pr: uxlfoundation/scikit-learn-intelex#2228

@Alexandr-Solovev Alexandr-Solovev marked this pull request as ready for review December 18, 2024 10:05
@david-cortes-intel
Copy link
Contributor

Couple questions from my side:

  • Why would the default be left as MT2203 instead of something with better properties? Why not philox for example? It appears to be substantially faster.
    • All of the generators introduced here appear to produce 32-bit numbers, but some of the usages are for e.g. generating 64-bit integers, which would require two draws for one output. Wouldn't it be better to make the default dependent on the templated parameters that an algorithm will use? Might be rather complicated though so not sure how feasible it is.
  • Is the idea to simply include all generators from MKL, or to provide reasonably good choices for oneDAL? We already know for example that MT-family generators have issues when used like oneDAL does. How about removing those?
    • Do we know if these generators have state quality issues when initializing them from a single seed instead of a seed sequence? For example, the current default generator passed from sklearnex (I think MT19937 but not sure) had the issue (pointed out by @icfaust) of producing very biased first samples when initialized like that with a low-number seed.

@Alexandr-Solovev
Copy link
Contributor Author

@david-cortes-intel

Why would the default be left as MT2203 instead of something with better properties? Why not philox for example?

For RF algorithm the feature to choose engine_method will be added in the follow-up pr(likely here #3029). I hold MT2203 to save result compatibility with previous releases. Not sure that its the best strategy in terms of performance, but I expect potential accuracy/mse with philox might be worse. Moreover it can make the review process more complex.
For other algorithms the default engine method can be replaced with the new one, but I didn't want to do it in this pr, due to potential metrics degradations. Can be easily done in the next prs imho.

I probably didnt get the idea of

All of the generators introduced here appear to produce 32-bit numbers, but some of the usages are for e.g. generating 64-bit integers, which would require two draws for one output. Wouldn't it be better to make the default dependent on the templated parameters that an algorithm will use? Might be rather complicated though so not sure how feasible it is.

Do you mean create a dispatcher before, for example unifrom, or inside the unifrom, which will choose the best potential engine method?

@Alexandr-Solovev
Copy link
Contributor Author

@david-cortes-intel

Is the idea to simply include all generators from MKL, or to provide reasonably good choices for oneDAL?

No, we preliminary discussed it with oneMKL team and they suggested the best one in terms of performance and generator period engines for GPU(our initial goal was improve rf performance on GPU). The full list of engines is pretty big https://oneapi-spec.uxlfoundation.org/specifications/oneapi/v1.3-rev-1/elements/onemkl/source/domains/rng/host_api/engines-basic-random-number-generators so, the goal wasnt just add all engine methods from oneMKL.

We already know for example that MT-family generators have issues when used like oneDAL does. How about removing those?

I am not against this removing, but not sure that we can easily change the default behavior. Based on the experiments new engines(mrg32 philox) + mcg59 are significantly better. It makes sense to discuss, but probably just change the default engine could be good solution.

Do we know if these generators have state quality issues when initializing them from a single seed instead of a seed sequence? For example, the current default generator passed from sklearnex (I think MT19937 but not sure) had the issue (pointed out by @icfaust) of producing very biased first samples when initialized like that with a low-number seed.

I guess it depends on implementation on oneMKL side. Not sure, but as I know, by default oneMKL uses N threads sub-engines in mt2203 and mt19937. May be its the reason of such behavior. As I know mrg32 philox and mcg59 are implemented without sub-engines inside.

@david-cortes-intel
Copy link
Contributor

I hold MT2203 to save result compatibility with previous releases.

Note that NumPy itself, which is used by scikit-learn, does not have such compatibility guarantees when using their random.Generator module:
https://numpy.org/doc/stable/reference/random/compatibility.html

Thus, I don't think it should be a big deal to make breaking changes in produced random numbers in sklearnex.

I expect potential accuracy/mse with philox might be worse

I'd expect it should actually be the opposite, since (a) we are seeding MT with a single integer instead of a sequence, which leaves it with issues for the first draws; (b) philox appears to do better at statistical tests according to the paper that introduced it - see for example that it passed BigCrush:
https://www.thesalmons.org/john/random123/papers/random123sc11.pdf
.. whereas MT fail some of it: https://arxiv.org/pdf/1910.06437

Do you mean create a dispatcher before, for example unifrom, or inside the unifrom, which will choose the best potential engine method?

On a deeper look, it seems all generators from MKL are 32-bit only, so please ignore earlier comment.

I guess it depends on implementation on oneMKL side. Not sure, but as I know, by default oneMKL uses N threads sub-engines in mt2203 and mt19937. May be its the reason of such behavior. As I know mrg32 philox and mcg59 are implemented without sub-engines inside.

Philox is a counter-based generator, so parallelizing it and jumping states should be pretty straightforward, without needing to keep sub-engines.

@david-cortes-intel
Copy link
Contributor

I am not against this removing, but not sure that we can easily change the default behavior.

Let's leave the change in defaults for a different PR then.

@david-cortes-intel
Copy link
Contributor

@Alexandr-Solovev Are the issues from the CI meant to be solved with the PR from the sklearnex side?

In file included from /home/vsts/work/1/s/build/daal4py_cpp.cpp:2:
/home/vsts/work/1/s/build/daal4py_cpp.h: In instantiation of ‘engines_mrg32k3a_manager<fptype, method>::engines_mrg32k3a_manager() [with fptype = double; daal::algorithms::engines::mrg32k3a::Method method = daal::algorithms::engines::mrg32k3a::defaultDense]’:
/home/vsts/work/1/s/build/daal4py_cpp.cpp:423:94:   required from here
/home/vsts/work/1/s/build/daal4py_cpp.h:2358:22: error: ‘daal::algorithms::engines::mrg32k3a::interface1::Batch<algorithmFPType, method>::Batch(size_t) [with algorithmFPType = double; daal::algorithms::engines::mrg32k3a::Method method = daal::algorithms::engines::mrg32k3a::defaultDense; size_t = long unsigned int]’ is protected within this context
 2358 |         _algob.reset(new algob_type());
      |                      ^~~~~~~~~~~~~~~~
In file included from /home/vsts/work/1/daal/latest/include/daal.h:307,
                 from /home/vsts/work/1/s/src/daal4py.h:24,
                 from /home/vsts/work/1/s/src/daal4py_dist.h:23,
                 from /home/vsts/work/1/s/build/daal4py_cpp.h:3,
                 from /home/vsts/work/1/s/build/daal4py_cpp.cpp:2:
/home/vsts/work/1/daal/latest/include/algorithms/engines/mrg32k3a/mrg32k3a.h:152:5: note: declared protected here
  152 |     Batch(size_t seed = 777) { initialize(); }
      |     ^~~~~

@Alexandr-Solovev
Copy link
Contributor Author

@david-cortes-intel Thanks for the comments!

I'd expect it should actually be the opposite, since (a) we are seeding MT with a single integer instead of a sequence, which leaves it with issues for the first draws; (b) philox appears to do better at statistical tests according to the paper that introduced it - see for example that it passed BigCrush:
https://www.thesalmons.org/john/random123/papers/random123sc11.pdf
.. whereas MT fail some of it: https://arxiv.org/pdf/1910.06437

I will be glad to change it, but based on the testing I temporary disabled one test, so, need to investigate, but overall lets do it.

Let's leave the change in defaults for a different PR then.

For sure!

Are the issues from the CI meant to be solved with the PR from the sklearnex side?

Yes, it should be fixed here uxlfoundation/scikit-learn-intelex#2228. I am waiting for combined CI results

@Alexandr-Solovev
Copy link
Contributor Author

/intelci: run

@david-cortes-intel
Copy link
Contributor

david-cortes-intel commented Dec 20, 2024

Looks like something went wrong in the examples:

 cpp/oneapi/dal/backend/primitives/optimizers/test/newton_cg_dpc.cpp:146:27: error: expected ';' after expression
   146 |         primitives::engine eng(4014 + n_);
 cpp/oneapi/dal/backend/primitives/rng/rng.hpp:108:6: note: candidate function template not viable: requires at least 6 arguments, but 5 were provided
   108 | void uniform(sycl::queue& queue,
       |      ^       ~~~~~~~~~~~~~~~~~~~
   109 |              Size count,
       |              ~~~~~~~~~~~
   110 |              Type* dst,
       |              ~~~~~~~~~~
   111 |              dpc_engine<EngineType>& engine_,
       |              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   112 |              Type a,
       |              ~~~~~~~
   113 |              Type b,
       |              ~~~~~~~
   114 |              const event_vector& deps = {});

@Alexandr-Solovev
Copy link
Contributor Author

Looks like something went wrong in the examples:

 cpp/oneapi/dal/backend/primitives/optimizers/test/newton_cg_dpc.cpp:146:27: error: expected ';' after expression
   146 |         primitives::engine eng(4014 + n_);
 cpp/oneapi/dal/backend/primitives/rng/rng.hpp:108:6: note: candidate function template not viable: requires at least 6 arguments, but 5 were provided
   108 | void uniform(sycl::queue& queue,
       |      ^       ~~~~~~~~~~~~~~~~~~~
   109 |              Size count,
       |              ~~~~~~~~~~~
   110 |              Type* dst,
       |              ~~~~~~~~~~
   111 |              dpc_engine<EngineType>& engine_,
       |              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   112 |              Type a,
       |              ~~~~~~~
   113 |              Type b,
       |              ~~~~~~~
   114 |              const event_vector& deps = {});

Thanks for highlight it, will be fixed soon

@Alexandr-Solovev
Copy link
Contributor Author

/intelci: run

@Alexandr-Solovev
Copy link
Contributor Author

/intelci: run

@Alexandr-Solovev
Copy link
Contributor Author

/intelci: run

1 similar comment
@Alexandr-Solovev
Copy link
Contributor Author

/intelci: run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dpc++ Issue/PR related to DPC++ functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants