You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our cluster relocation function relies on a parallel argpartition function that doesn't have the same tie-breaking strategy than np.argpartition, and, besides, it chooses tie-breaks in a non-deterministic way.
It means that two consecutive KMeans.fit ran with the sklearn_numba_dpex engine, with the same seed, are not guaranteed to converge to the same list of centroids, but only to the same list of centroids up to a permutation. This is not user-friendly.
This seems to be a solid argument to justify the cost of adding some synchronization in our argpartition kernels to at least ensure a deterministic tie-break strategy ?
Or maybe, sort the cluster centers after the fit in a deterministic way ?
WDYT ?
The text was updated successfully, but these errors were encountered:
Our cluster relocation function relies on a parallel
argpartition
function that doesn't have the same tie-breaking strategy thannp.argpartition
, and, besides, it chooses tie-breaks in a non-deterministic way.It means that two consecutive
KMeans.fit
ran with thesklearn_numba_dpex
engine, with the same seed, are not guaranteed to converge to the same list of centroids, but only to the same list of centroids up to a permutation. This is not user-friendly.This can (rarely) cause sklearn
test_predict_kmeans
to fail.This seems to be a solid argument to justify the cost of adding some synchronization in our argpartition kernels to at least ensure a deterministic tie-break strategy ?
Or maybe, sort the cluster centers after the fit in a deterministic way ?
WDYT ?
The text was updated successfully, but these errors were encountered: