Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gather benchmark results for each CUB algorithm using different offset types #1787

Closed
9 of 10 tasks
jrhemstad opened this issue May 29, 2024 · 1 comment
Closed
9 of 10 tasks
Assignees

Comments

@jrhemstad
Copy link
Collaborator

jrhemstad commented May 29, 2024

In order to make an informed decision about the offset type solution, we would like to have a complete understanding of the impact on each CUB algorithm. Therefore, for each algorithm, we'd like a table that summarizes the following:

  • Performance with i32, u32, i64, u64 offset type
  • Verify that the algorithm still passes unit tests with this offset type
  • Report box plot of performance impacts across range of data types, input sizes, etc.
    • Can be summarized in table as min/max/median

Algorithms

  • device_exclusive_scan_max,
  • device_exclusive_scan_sum,
  • device_exclusive_scan_by_key,
  • device_select_flagged
  • device_select_if,
  • device_partition_if,
  • device_partition_flagged,
  • device_run_length_encode,
  • device_run_length_encode_non_trivial_runs,
  • device_segmented_sort,

System & environment

  • CTK 12.5
  • Ubuntu 22.04
  • H100

Benchmark data

device_exclusive_scan_max

Results
T{ct} Elements{io} I32 u32 u32/i32 time i64 i64/i32 time u64 u64/i32 time
I8 2^16 = 65536 8.91 9.367 105.13% 9.232 103.61% 9.359 105.04%
I8 2^20 = 1048576 13.494 13.368 99.07% 13.124 97.26% 13.187 97.72%
I8 2^24 = 16777216 53.134 52.893 99.55% 53.066 99.87% 53.063 99.87%
I8 2^28 = 268435456 729.843 729.989 100.02% 731.428 100.22% 731.704 100.25%
I16 2^16 = 65536 9.752 9.715 99.62% 9.792 100.41% 9.631 98.76%
I16 2^20 = 1048576 13.951 14.187 101.69% 14.069 100.85% 14.206 101.83%
I16 2^24 = 16777216 67.863 67.839 99.96% 67.01 98.74% 66.97 98.68%
I16 2^28 = 268435456 952.537 951.78 99.92% 978.596 102.74% 977.729 102.64%
I32 2^16 = 65536 9.769 9.261 94.80% 9.289 95.09% 9.176 93.93%
I32 2^20 = 1048576 15.022 14.511 96.60% 14.512 96.60% 14.564 96.95%
I32 2^24 = 16777216 99.151 98.738 99.58% 98.688 99.53% 98.659 99.50%
I32 2^28 = 268435456 1456 1456 100.00% 1455 99.93% 1455 99.93%
I64 2^16 = 65536 9.573 9.634 100.64% 9.478 99.01% 9.584 100.11%
I64 2^20 = 1048576 19.037 18.873 99.14% 19.032 99.97% 18.964 99.62%
I64 2^24 = 16777216 171.205 171.247 100.02% 171.164 99.98% 171.331 100.07%
I64 2^28 = 268435456 2631 2632 100.04% 2631 100.00% 2630 99.96%
I128 2^16 = 65536 16.374 16.465 100.56% 16.523 100.91% 16.542 101.03%
I128 2^20 = 1048576 65.597 65.4 99.70% 65.036 99.14% 65.177 99.36%
I128 2^24 = 16777216 868.064 868.039 100.00% 866.639 99.84% 867.766 99.97%
I128 2^28 = 268435456 13727 13724 99.98% 13715 99.91% 13716 99.92%
F32 2^16 = 65536 9.441 9.475 100.36% 9.38 99.35% 9.433 99.92%
F32 2^20 = 1048576 14.47 14.477 100.05% 14.584 100.79% 14.568 100.68%
F32 2^24 = 16777216 98.968 98.861 99.89% 98.991 100.02% 99.018 100.05%
F32 2^28 = 268435456 1459 1459 100.00% 1459 100.00% 1459 100.00%
F64 2^16 = 65536 9.767 9.579 98.08% 9.485 97.11% 9.711 99.43%
F64 2^20 = 1048576 18.928 18.857 99.62% 19.034 100.56% 18.935 100.04%
F64 2^24 = 16777216 173.845 173.919 100.04% 173.85 100.00% 173.954 100.06%
F64 2^28 = 268435456 2695 2695 100.00% 2695 100.00% 2694 99.96%
C64 2^16 = 65536 67.742 67.006 98.91% 66.853 98.69% 67.309 99.36%
C64 2^20 = 1048576 114.314 114.531 100.19% 115.019 100.62% 115.035 100.63%
C64 2^24 = 16777216 1165 1162 99.74% 1155 99.14% 1150 98.71%
C64 2^28 = 268435456 17419 17458 100.22% 17380 99.78% 17475 100.32%

device_exclusive_scan_sum

Results
T{ct} Elements{io} I32 u32 u32/i32 time i64 i64/i32 time u64 u64/i32 time
I8 2^16 = 65536 8.761 9.183 104.82% 9.104 103.92% 9.29 106.04%
I8 2^20 = 1048576 12.446 12.639 101.55% 12.35 99.23% 12.103 97.24%
I8 2^24 = 16777216 49.837 50.075 100.48% 52.084 104.51% 52.143 104.63%
I8 2^28 = 268435456 649.015 650.869 100.29% 685.588 105.64% 686.283 105.74%
I16 2^16 = 65536 9.739 9.557 98.13% 9.748 100.09% 9.58 98.37%
I16 2^20 = 1048576 13.246 13.363 100.88% 13.202 99.67% 13.321 100.57%
I16 2^24 = 16777216 60.795 60.863 100.11% 60.789 99.99% 60.878 100.14%
I16 2^28 = 268435456 799.135 798.496 99.92% 799.297 100.02% 799.511 100.05%
I32 2^16 = 65536 9.598 9.212 95.98% 9.342 97.33% 9.176 95.60%
I32 2^20 = 1048576 14.984 14.715 98.20% 14.78 98.64% 14.795 98.74%
I32 2^24 = 16777216 90.713 90.552 99.82% 90.508 99.77% 90.405 99.66%
I32 2^28 = 268435456 1366 1366 100.00% 1367 100.07% 1366 100.00%
I64 2^16 = 65536 9.714 9.792 100.80% 9.626 99.09% 9.938 102.31%
I64 2^20 = 1048576 17.97 17.932 99.79% 18.348 102.10% 18.371 102.23%
I64 2^24 = 16777216 160.284 160.186 99.94% 160.507 100.14% 160.582 100.19%
I64 2^28 = 268435456 2476 2475 99.96% 2476 100.00% 2476 100.00%
I128 2^16 = 65536 14.439 14.311 99.11% 14.591 101.05% 14.559 100.83%
I128 2^20 = 1048576 38.383 38.23 99.60% 38.186 99.49% 38.178 99.47%
I128 2^24 = 16777216 388.32 388.111 99.95% 388.071 99.94% 387.488 99.79%
I128 2^28 = 268435456 6036 6034 99.97% 6027 99.85% 6026 99.83%
F32 2^16 = 65536 9.63 9.58 99.48% 9.682 100.54% 9.694 100.66%
F32 2^20 = 1048576 14.878 14.867 99.93% 14.885 100.05% 14.976 100.66%
F32 2^24 = 16777216 90.628 90.576 99.94% 90.608 99.98% 90.615 99.99%
F32 2^28 = 268435456 1364 1364 100.00% 1363 99.93% 1363 99.93%
F64 2^16 = 65536 10.154 9.662 95.15% 9.764 96.16% 9.629 94.83%
F64 2^20 = 1048576 18.582 18.327 98.63% 18.171 97.79% 18.377 98.90%
F64 2^24 = 16777216 161.011 160.632 99.76% 160.657 99.78% 160.797 99.87%
F64 2^28 = 268435456 2479 2480 100.04% 2479 100.00% 2479 100.00%
C64 2^16 = 65536 12.738 12.733 99.96% 12.641 99.24% 12.891 101.20%
C64 2^20 = 1048576 29.051 28.972 99.73% 29.436 101.33% 29.412 101.24%
C64 2^24 = 16777216 291.342 291.891 100.19% 291.113 99.92% 291.2 99.95%
C64 2^28 = 268435456 4543 4550 100.15% 4530 99.71% 4530 99.71%

device_exclusive_scan_by_key

Results
KeyT{ct} ValueT{ct} Elements{io} I32 u32 u32/i32 time i64 i64/i32 time u64 u64/i32 time
I8 I8 2^16 = 65536 9.966 9.955 99.89% 10.134 101.69% 10.314 103.49%
I8 I8 2^20 = 1048576 14.122 14.316 101.37% 14.502 102.69% 14.51 102.75%
I8 I8 2^24 = 16777216 69.589 70.034 100.64% 75.626 108.68% 75.465 108.44%
I8 I8 2^28 = 268435456 943.499 947.544 100.43% 1050 111.29% 1047 110.97%
I8 I16 2^16 = 65536 11.669 11.72 100.44% 12.03 103.09% 12.053 103.29%
I8 I16 2^20 = 1048576 15.11 15.129 100.13% 15.401 101.93% 15.408 101.97%
I8 I16 2^24 = 16777216 87.262 80.269 91.99% 89.469 102.53% 89.615 102.70%
I8 I16 2^28 = 268435456 1176 1050 89.29% 1210 102.89% 1213 103.15%
I8 I32 2^16 = 65536 11.071 10.667 96.35% 11.084 100.12% 10.807 97.62%
I8 I32 2^20 = 1048576 17.406 17.274 99.24% 17.431 100.14% 17.272 99.23%
I8 I32 2^24 = 16777216 102.249 102.099 99.85% 111.93 109.47% 105.019 102.71%
I8 I32 2^28 = 268435456 1482 1482 100.00% 1594 107.56% 1517 102.36%
I8 I64 2^16 = 65536 10.801 10.853 100.48% 12.837 118.85% 12.802 118.53%
I8 I64 2^20 = 1048576 20.325 20.36 100.17% 27.543 135.51% 27.407 134.84%
I8 I64 2^24 = 16777216 175.416 175.46 100.03% 230.256 131.26% 228.794 130.43%
I8 I64 2^28 = 268435456 2628 2628 100.00% 3416 129.98% 3393 129.11%
I8 I128 2^16 = 65536 15.027 15.268 101.60% 15.411 102.56% 15.486 103.05%
I8 I128 2^20 = 1048576 39.536 39.466 99.82% 39.675 100.35% 39.816 100.71%
I8 I128 2^24 = 16777216 377.923 377.774 99.96% 379.113 100.31% 379.812 100.50%
I8 I128 2^28 = 268435456 5820 5822 100.03% 5841 100.36% 5854 100.58%
I16 I8 2^16 = 65536 10.173 10.126 99.54% 10.27 100.95% 10.328 101.52%
I16 I8 2^20 = 1048576 15.17 15.142 99.82% 15.316 100.96% 15.341 101.13%
I16 I8 2^24 = 16777216 76.868 76.437 99.44% 81.38 105.87% 81.35 105.83%
I16 I8 2^28 = 268435456 1034 1025 99.13% 1115 107.83% 1116 107.93%
I16 I16 2^16 = 65536 10.806 10.864 100.54% 10.857 100.47% 10.856 100.46%
I16 I16 2^20 = 1048576 15.546 15.759 101.37% 15.638 100.59% 15.722 101.13%
I16 I16 2^24 = 16777216 86.469 89.01 102.94% 89.666 103.70% 89.738 103.78%
I16 I16 2^28 = 268435456 1135 1162 102.38% 1183 104.23% 1183 104.23%
I16 I32 2^16 = 65536 10.925 11.328 103.69% 11.424 104.57% 11.362 104.00%
I16 I32 2^20 = 1048576 16.917 16.889 99.83% 17.317 102.36% 17.363 102.64%
I16 I32 2^24 = 16777216 107.751 107.719 99.97% 113.994 105.79% 113.756 105.57%
I16 I32 2^28 = 268435456 1563 1564 100.06% 1639 104.86% 1639 104.86%
I16 I64 2^16 = 65536 10.978 11.051 100.66% 12.78 116.41% 12.857 117.12%
I16 I64 2^20 = 1048576 20.974 20.938 99.83% 28.242 134.65% 28.125 134.09%
I16 I64 2^24 = 16777216 183.087 182.974 99.94% 233.706 127.65% 232.591 127.04%
I16 I64 2^28 = 268435456 2743 2743 100.00% 3458 126.07% 3438 125.34%
I16 I128 2^16 = 65536 15.678 15.674 99.97% 15.272 97.41% 15.507 98.91%
I16 I128 2^20 = 1048576 37.86 37.859 100.00% 40.635 107.33% 40.816 107.81%
I16 I128 2^24 = 16777216 359.871 360.242 100.10% 381.626 106.05% 382.443 106.27%
I16 I128 2^28 = 268435456 5497 5496 99.98% 5877 106.91% 5892 107.19%
I32 I8 2^16 = 65536 9.931 10.024 100.94% 10.258 103.29% 10.323 103.95%
I32 I8 2^20 = 1048576 15.61 15.709 100.63% 16.015 102.59% 16.033 102.71%
I32 I8 2^24 = 16777216 89.786 89.898 100.12% 93.634 104.29% 93.342 103.96%
I32 I8 2^28 = 268435456 1151 1152 100.09% 1228 106.69% 1222 106.17%
I32 I16 2^16 = 65536 10.724 10.8 100.71% 11.165 104.11% 10.934 101.96%
I32 I16 2^20 = 1048576 16.455 16.509 100.33% 16.848 102.39% 16.875 102.55%
I32 I16 2^24 = 16777216 100.294 100.195 99.90% 108.027 107.71% 108.048 107.73%
I32 I16 2^28 = 268435456 1401 1400 99.93% 1446 103.21% 1446 103.21%
I32 I32 2^16 = 65536 11.267 11.242 99.78% 11.423 101.38% 11.448 101.61%
I32 I32 2^20 = 1048576 18.348 18.48 100.72% 18.916 103.10% 18.586 101.30%
I32 I32 2^24 = 16777216 129.095 129.248 100.12% 136.07 105.40% 135.907 105.28%
I32 I32 2^28 = 268435456 1903 1904 100.05% 2044 107.41% 2043 107.36%
I32 I64 2^16 = 65536 11.749 11.85 100.86% 13.584 115.62% 13.575 115.54%
I32 I64 2^20 = 1048576 22.621 22.389 98.97% 27.752 122.68% 27.733 122.60%
I32 I64 2^24 = 16777216 197.436 197.411 99.99% 230.238 116.61% 230.167 116.58%
I32 I64 2^28 = 268435456 2937 2937 100.00% 3478 118.42% 3477 118.39%
I32 I128 2^16 = 65536 16.042 16.083 100.26% 15.575 97.09% 15.558 96.98%
I32 I128 2^20 = 1048576 38.876 39.092 100.56% 41.437 106.59% 41.631 107.09%
I32 I128 2^24 = 16777216 372.169 372.433 100.07% 395.638 106.31% 397.024 106.68%
I32 I128 2^28 = 268435456 5685 5687 100.04% 6082 106.98% 6101 107.32%
I64 I8 2^16 = 65536 10.476 10.437 99.63% 10.896 104.01% 10.877 103.83%
I64 I8 2^20 = 1048576 20.089 20.117 100.14% 20.423 101.66% 20.329 101.19%
I64 I8 2^24 = 16777216 128.927 129.206 100.22% 130.882 101.52% 131.046 101.64%
I64 I8 2^28 = 268435456 1731 1732 100.06% 1747 100.92% 1749 101.04%
I64 I16 2^16 = 65536 11.073 11.138 100.59% 11.432 103.24% 11.365 102.64%
I64 I16 2^20 = 1048576 21.17 21.208 100.18% 21.51 101.61% 22.357 105.61%
I64 I16 2^24 = 16777216 143.401 143.974 100.40% 144.905 101.05% 161.939 112.93%
I64 I16 2^28 = 268435456 2128 2132 100.19% 2135 100.33% 2299 108.04%
I64 I32 2^16 = 65536 11.304 11.222 99.27% 11.419 101.02% 11.289 99.87%
I64 I32 2^20 = 1048576 22.771 22.896 100.55% 22.949 100.78% 22.949 100.78%
I64 I32 2^24 = 16777216 170.41 170.411 100.00% 172.953 101.49% 172.9 101.46%
I64 I32 2^28 = 268435456 2528 2528 100.00% 2586 102.29% 2586 102.29%
I64 I64 2^16 = 65536 12.071 11.682 96.78% 13.468 111.57% 13.545 112.21%
I64 I64 2^20 = 1048576 26.026 25.483 97.91% 32.698 125.64% 32.799 126.02%
I64 I64 2^24 = 16777216 241.63 237.496 98.29% 274.646 113.66% 274.758 113.71%
I64 I64 2^28 = 268435456 3644 3571 98.00% 4188 114.93% 4188 114.93%
I64 I128 2^16 = 65536 15.811 15.962 100.96% 16.165 102.24% 16.484 104.26%
I64 I128 2^20 = 1048576 43.025 43.275 100.58% 44.472 103.36% 44.591 103.64%
I64 I128 2^24 = 16777216 421.234 422.094 100.20% 427.858 101.57% 427.764 101.55%
I64 I128 2^28 = 268435456 6493 6496 100.05% 6592 101.52% 6590 101.49%
I128 I8 2^16 = 65536 11.924 12.062 101.16% 12.055 101.10% 12.03 100.89%
I128 I8 2^20 = 1048576 28.097 28.258 100.57% 28.88 102.79% 29.01 103.25%
I128 I8 2^24 = 16777216 204.094 204.037 99.97% 223.889 109.70% 224.014 109.76%
I128 I8 2^28 = 268435456 3013 3011 99.93% 3273 108.63% 3273 108.63%
I128 I16 2^16 = 65536 11.774 11.791 100.14% 12.181 103.46% 12.259 104.12%
I128 I16 2^20 = 1048576 27.531 27.644 100.41% 30.256 109.90% 30.261 109.92%
I128 I16 2^24 = 16777216 220.072 220.154 100.04% 286.454 130.16% 286.554 130.21%
I128 I16 2^28 = 268435456 3264 3263 99.97% 4292 131.50% 4294 131.56%
I128 I32 2^16 = 65536 11.991 12.005 100.12% 12.303 102.60% 11.974 99.86%
I128 I32 2^20 = 1048576 29.531 29.567 100.12% 29.953 101.43% 29.941 101.39%
I128 I32 2^24 = 16777216 256.994 257.008 100.01% 257.249 100.10% 257.51 100.20%
I128 I32 2^28 = 268435456 3874 3875 100.03% 3877 100.08% 3882 100.21%
I128 I64 2^16 = 65536 12.453 12.52 100.54% 14.781 118.69% 14.398 115.62%
I128 I64 2^20 = 1048576 34.415 34.418 100.01% 41.487 120.55% 41.383 120.25%
I128 I64 2^24 = 16777216 321.329 321.324 100.00% 364.538 113.45% 364.233 113.35%
I128 I64 2^28 = 268435456 4919 4920 100.02% 5632 114.49% 5626 114.37%
I128 I128 2^16 = 65536 16.962 16.815 99.13% 17.058 100.57% 17.051 100.52%
I128 I128 2^20 = 1048576 49.078 48.766 99.36% 49.328 100.51% 49.362 100.58%
I128 I128 2^24 = 16777216 494.191 493.485 99.86% 495.389 100.24% 494.93 100.15%
I128 I128 2^28 = 268435456 7616 7609 99.91% 7631 100.20% 7628 100.16%
F32 I8 2^16 = 65536 10.01 10.018 100.08% 10.311 103.01% 10.226 102.16%
F32 I8 2^20 = 1048576 15.637 15.766 100.82% 15.987 102.24% 16.075 102.80%
F32 I8 2^24 = 16777216 90.437 90.606 100.19% 94.074 104.02% 93.835 103.76%
F32 I8 2^28 = 268435456 1160 1162 100.17% 1235 106.47% 1229 105.95%
F32 I16 2^16 = 65536 10.873 10.908 100.32% 11.113 102.21% 11.053 101.66%
F32 I16 2^20 = 1048576 16.481 16.829 102.11% 17.268 104.78% 17.231 104.55%
F32 I16 2^24 = 16777216 100.243 100.202 99.96% 118.08 117.79% 118.123 117.84%
F32 I16 2^28 = 268435456 1402 1400 99.86% 1561 111.34% 1562 111.41%
F32 I32 2^16 = 65536 11.224 11.193 99.72% 11.439 101.92% 11.449 102.00%
F32 I32 2^20 = 1048576 18.456 18.508 100.28% 18.907 102.44% 18.679 101.21%
F32 I32 2^24 = 16777216 129.388 129.414 100.02% 136.287 105.33% 136.065 105.16%
F32 I32 2^28 = 268435456 1906 1906 100.00% 2046 107.35% 2045 107.29%
F32 I64 2^16 = 65536 11.476 11.746 102.35% 13.369 116.50% 13.303 115.92%
F32 I64 2^20 = 1048576 22.167 22.174 100.03% 27.492 124.02% 27.588 124.46%
F32 I64 2^24 = 16777216 197.273 197.355 100.04% 230.205 116.69% 230.098 116.64%
F32 I64 2^28 = 268435456 2938 2939 100.03% 3481 118.48% 3479 118.41%
F32 I128 2^16 = 65536 15.786 15.551 98.51% 15.335 97.14% 15.209 96.34%
F32 I128 2^20 = 1048576 38.731 38.839 100.28% 41.108 106.14% 41.236 106.47%
F32 I128 2^24 = 16777216 372.851 372.969 100.03% 396.325 106.30% 396.85 106.44%
F32 I128 2^28 = 268435456 5694 5692 99.96% 6089 106.94% 6101 107.15%
F64 I8 2^16 = 65536 10.333 10.44 101.04% 10.601 102.59% 10.578 102.37%
F64 I8 2^20 = 1048576 19.716 20.031 101.60% 20.831 105.66% 20.091 101.90%
F64 I8 2^24 = 16777216 127.413 127.423 100.01% 131.051 102.86% 130.267 102.24%
F64 I8 2^28 = 268435456 1721 1721 100.00% 1734 100.76% 1731 100.58%
F64 I16 2^16 = 65536 10.895 10.911 100.15% 11.085 101.74% 11.121 102.07%
F64 I16 2^20 = 1048576 19.946 20.095 100.75% 22.012 110.36% 21.364 107.11%
F64 I16 2^24 = 16777216 137.948 137.941 99.99% 160.902 116.64% 144.365 104.65%
F64 I16 2^28 = 268435456 2035 2035 100.00% 2295 112.78% 2133 104.82%
F64 I32 2^16 = 65536 10.794 10.83 100.33% 11.078 102.63% 11.019 102.08%
F64 I32 2^20 = 1048576 22.374 22.355 99.92% 22.493 100.53% 22.509 100.60%
F64 I32 2^24 = 16777216 170.073 170.329 100.15% 172.606 101.49% 172.667 101.53%
F64 I32 2^28 = 268435456 2526 2525 99.96% 2583 102.26% 2584 102.30%
F64 I64 2^16 = 65536 11.606 11.342 97.73% 13.109 112.95% 13.707 118.10%
F64 I64 2^20 = 1048576 25.669 27.298 106.35% 32.124 125.15% 33.545 130.68%
F64 I64 2^24 = 16777216 241.189 244.218 101.26% 272.85 113.13% 279.738 115.98%
F64 I64 2^28 = 268435456 3641 3715 102.03% 4177 114.72% 4199 115.33%
F64 I128 2^16 = 65536 15.502 15.755 101.63% 16.087 103.77% 16.008 103.26%
F64 I128 2^20 = 1048576 42.598 43.033 101.02% 44.084 103.49% 44.292 103.98%
F64 I128 2^24 = 16777216 420.54 420.988 100.11% 426.709 101.47% 426.821 101.49%
F64 I128 2^28 = 268435456 6481 6486 100.08% 6578 101.50% 6576 101.47%
C64 I8 2^16 = 65536 10.632 10.696 100.60% 10.63 99.98% 10.733 100.95%
C64 I8 2^20 = 1048576 20.116 20.194 100.39% 21.653 107.64% 20.631 102.56%
C64 I8 2^24 = 16777216 129.099 128.937 99.87% 131.051 101.51% 130.131 100.80%
C64 I8 2^28 = 268435456 1731 1731 100.00% 1730 99.94% 1731 100.00%
C64 I16 2^16 = 65536 11.144 11.178 100.31% 11.437 102.63% 11.368 102.01%
C64 I16 2^20 = 1048576 20.543 20.479 99.69% 21.509 104.70% 22.413 109.10%
C64 I16 2^24 = 16777216 138.223 138.321 100.07% 144.949 104.87% 162.12 117.29%
C64 I16 2^28 = 268435456 2038 2038 100.00% 2137 104.86% 2301 112.90%
C64 I32 2^16 = 65536 11.346 11.251 99.16% 11.397 100.45% 11.383 100.33%
C64 I32 2^20 = 1048576 22.562 22.727 100.73% 22.861 101.33% 22.787 101.00%
C64 I32 2^24 = 16777216 170.436 170.499 100.04% 172.914 101.45% 172.916 101.46%
C64 I32 2^28 = 268435456 2528 2528 100.00% 2586 102.29% 2586 102.29%
C64 I64 2^16 = 65536 11.656 11.707 100.44% 13.545 116.21% 13.565 116.38%
C64 I64 2^20 = 1048576 25.454 25.372 99.68% 32.804 128.88% 33.026 129.75%
C64 I64 2^24 = 16777216 237.429 237.381 99.98% 276.199 116.33% 276.845 116.60%
C64 I64 2^28 = 268435456 3570 3571 100.03% 4194 117.48% 4200 117.65%
C64 I128 2^16 = 65536 15.752 15.865 100.72% 16.34 103.73% 15.994 101.54%
C64 I128 2^20 = 1048576 43.371 43.177 99.55% 44.374 102.31% 44.2 101.91%
C64 I128 2^24 = 16777216 421.998 421.845 99.96% 429.056 101.67% 427.997 101.42%
C64 I128 2^28 = 268435456

device_select_flagged

Results
T{ct} IsInPlace{ct} Elements{io} Entropy I32 u32 u32/i32 time i64 i64/i32 time u64 u64/i32 time
I8 FALSE 2^16 = 65536 1 9.995 10.418 104.23% 9.296 93.01% 9.905 99.10%
I8 FALSE 2^20 = 1048576 1 12.702 12.88 101.40% 13.763 108.35% 14.044 110.57%
I8 FALSE 2^24 = 16777216 1 60.62 60.536 99.86% 77.011 127.04% 77.395 127.67%
I8 FALSE 2^28 = 268435456 1 792.492 788.481 99.49% 1096 138.30% 1096 138.30%
I8 FALSE 2^16 = 65536 0.544 9.879 10.329 104.56% 9.201 93.14% 9.762 98.82%
I8 FALSE 2^20 = 1048576 0.544 12.48 12.797 102.54% 13.524 108.37% 13.894 111.33%
I8 FALSE 2^24 = 16777216 0.544 58.75 59.06 100.53% 75.243 128.07% 75.436 128.40%
I8 FALSE 2^28 = 268435456 0.544 764.061 763.908 99.98% 1063 139.13% 1063 139.13%
I8 FALSE 2^16 = 65536 0 9.628 10.18 105.73% 9.088 94.39% 9.159 95.13%
I8 FALSE 2^20 = 1048576 0 12.057 12.345 102.39% 13.783 114.32% 13.394 111.09%
I8 FALSE 2^24 = 16777216 0 52.13 51.971 99.69% 69.871 134.03% 69.613 133.54%
I8 FALSE 2^28 = 268435456 0 629.964 626.662 99.48% 950.144 150.83% 949.408 150.71%
I8 TRUE 2^16 = 65536 1 10.953 11.104 101.38% 10.385 94.81% 10.351 94.50%
I8 TRUE 2^20 = 1048576 1 13.73 13.715 99.89% 16.256 118.40% 15.897 115.78%
I8 TRUE 2^24 = 16777216 1 67.152 67.074 99.88% 103.498 154.12% 103.254 153.76%
I8 TRUE 2^28 = 268435456 1 916.669 910.642 99.34% 1531 167.02% 1531 167.02%
I8 TRUE 2^16 = 65536 0.544 11.06 10.677 96.54% 10.391 93.95% 10.142 91.70%
I8 TRUE 2^20 = 1048576 0.544 13.832 13.625 98.50% 16.04 115.96% 15.671 113.30%
I8 TRUE 2^24 = 16777216 0.544 65.984 65.799 99.72% 101.356 153.61% 101.077 153.18%
I8 TRUE 2^28 = 268435456 0.544 887.126 886.643 99.95% 1494 168.41% 1494 168.41%
I8 TRUE 2^16 = 65536 0 10.915 10.488 96.09% 10.091 92.45% 9.941 91.08%
I8 TRUE 2^20 = 1048576 0 13.441 13.252 98.59% 16.05 119.41% 15.69 116.73%
I8 TRUE 2^24 = 16777216 0 59.633 59.23 99.32% 95.516 160.17% 95.341 159.88%
I8 TRUE 2^28 = 268435456 0 757.158 754.197 99.61% 1380 182.26% 1380 182.26%
I16 FALSE 2^16 = 65536 1 10.589 10.877 102.72% 9.598 90.64% 9.529 89.99%
I16 FALSE 2^20 = 1048576 1 13.842 13.736 99.23% 14.541 105.05% 14.236 102.85%
I16 FALSE 2^24 = 16777216 1 77.725 77.586 99.82% 83.402 107.30% 83.178 107.02%
I16 FALSE 2^28 = 268435456 1 989.541 989.898 100.04% 1166 117.83% 1166 117.83%
I16 FALSE 2^16 = 65536 0.544 10.613 10.351 97.53% 9.685 91.26% 9.475 89.28%
I16 FALSE 2^20 = 1048576 0.544 14.054 13.832 98.42% 14.463 102.91% 14.13 100.54%
I16 FALSE 2^24 = 16777216 0.544 72.557 72.084 99.35% 80.977 111.60% 80.598 111.08%
I16 FALSE 2^28 = 268435456 0.544 907.356 905.404 99.78% 1133 124.87% 1132 124.76%
I16 FALSE 2^16 = 65536 0 10.129 9.929 98.03% 9.414 92.94% 9.196 90.79%
I16 FALSE 2^20 = 1048576 0 13.7 13.235 96.61% 14.139 103.20% 13.804 100.76%
I16 FALSE 2^24 = 16777216 0 64.863 64.404 99.29% 74.501 114.86% 74.266 114.50%
I16 FALSE 2^28 = 268435456 0 683.506 681.977 99.78% 984.721 144.07% 984.309 144.01%
I16 TRUE 2^16 = 65536 1 11.395 11.02 96.71% 10.361 90.93% 10.274 90.16%
I16 TRUE 2^20 = 1048576 1 15.223 15.025 98.70% 16.78 110.23% 16.405 107.76%
I16 TRUE 2^24 = 16777216 1 82.89 82.844 99.94% 110.122 132.85% 109.795 132.46%
I16 TRUE 2^28 = 268435456 1 1082 1084 100.18% 1620 149.72% 1619 149.63%
I16 TRUE 2^16 = 65536 0.544 11.329 11.059 97.62% 10.515 92.81% 10.201 90.04%
I16 TRUE 2^20 = 1048576 0.544 15.118 14.849 98.22% 16.625 109.97% 16.639 110.06%
I16 TRUE 2^24 = 16777216 0.544 78.322 77.925 99.49% 107.371 137.09% 107.354 137.07%
I16 TRUE 2^28 = 268435456 0.544 1021 1021 100.00% 1580 154.75% 1580 154.75%
I16 TRUE 2^16 = 65536 0 11.023 10.558 95.78% 10.025 90.95% 10.188 92.42%
I16 TRUE 2^20 = 1048576 0 14.565 14.311 98.26% 16.133 110.77% 16.424 112.76%
I16 TRUE 2^24 = 16777216 0 69.429 69.019 99.41% 100.17 144.28% 100.298 144.46%
I16 TRUE 2^28 = 268435456 0 820.309 819.437 99.89% 1420 173.11% 1420 173.11%
I32 FALSE 2^16 = 65536 1 10.112 10.135 100.23% 9.77 96.62% 9.695 95.88%
I32 FALSE 2^20 = 1048576 1 15.672 15.314 97.72% 16.033 102.30% 15.757 100.54%
I32 FALSE 2^24 = 16777216 1 104.585 104.221 99.65% 109.972 105.15% 109.806 104.99%
I32 FALSE 2^28 = 268435456 1 1516 1516 100.00% 1625 107.19% 1625 107.19%
I32 FALSE 2^16 = 65536 0.544 10.246 10.015 97.75% 9.732 94.98% 9.509 92.81%
I32 FALSE 2^20 = 1048576 0.544 15.771 15.206 96.42% 15.964 101.22% 15.502 98.29%
I32 FALSE 2^24 = 16777216 0.544 93.649 93.243 99.57% 98.324 104.99% 97.967 104.61%
I32 FALSE 2^28 = 268435456 0.544 1267 1267 100.00% 1358 107.18% 1358 107.18%
I32 FALSE 2^16 = 65536 0 9.66 9.541 98.77% 9.822 101.68% 9.787 101.31%
I32 FALSE 2^20 = 1048576 0 15.187 14.599 96.13% 15.234 100.31% 15.613 102.81%
I32 FALSE 2^24 = 16777216 0 78.986 78.558 99.46% 84.244 106.66% 84.49 106.97%
I32 FALSE 2^28 = 268435456 0 849.659 849.495 99.98% 1054 124.05% 1055 124.17%
I32 TRUE 2^16 = 65536 1 10.962 11.097 101.23% 10.425 95.10% 10.702 97.63%
I32 TRUE 2^20 = 1048576 1 17.145 17.019 99.27% 17.912 104.47% 18.234 106.35%
I32 TRUE 2^24 = 16777216 1 110.155 110.144 99.99% 130.047 118.06% 130.211 118.21%
I32 TRUE 2^28 = 268435456 1 1589 1589 100.00% 1906 119.95% 1907 120.01%
I32 TRUE 2^16 = 65536 0.544 10.684 11.006 103.01% 10.296 96.37% 10.333 96.71%
I32 TRUE 2^20 = 1048576 0.544 16.67 16.897 101.36% 17.654 105.90% 18.026 108.13%
I32 TRUE 2^24 = 16777216 0.544 99.444 99.745 100.30% 121.45 122.13% 121.81 122.49%
I32 TRUE 2^28 = 268435456 0.544 1330 1330 100.00% 1757 132.11% 1758 132.18%
I32 TRUE 2^16 = 65536 0 10.111 10.421 103.07% 10.255 101.42% 10.258 101.45%
I32 TRUE 2^20 = 1048576 0 15.683 16.06 102.40% 17.414 111.04% 17.773 113.33%
I32 TRUE 2^24 = 16777216 0 84.553 84.825 100.32% 108.812 128.69% 109.119 129.05%
I32 TRUE 2^28 = 268435456 0 966.425 966.334 99.99% 1498 155.00% 1499 155.11%
I64 FALSE 2^16 = 65536 1 10.445 10.184 97.50% 10.246 98.09% 10 95.74%
I64 FALSE 2^20 = 1048576 1 20.121 19.964 99.22% 20.442 101.60% 20.044 99.62%
I64 FALSE 2^24 = 16777216 1 180.796 181.002 100.11% 191.234 105.77% 191.101 105.70%
I64 FALSE 2^28 = 268435456 1 2744 2747 100.11% 2954 107.65% 2954 107.65%
I64 FALSE 2^16 = 65536 0.544 10.174 10.651 104.69% 10.187 100.13% 9.777 96.10%
I64 FALSE 2^20 = 1048576 0.544 19.672 19.795 100.63% 20.188 102.62% 19.834 100.82%
I64 FALSE 2^24 = 16777216 0.544 152.102 152.387 100.19% 161.845 106.41% 161.532 106.20%
I64 FALSE 2^28 = 268435456 0.544 2246 2246 100.00% 2419 107.70% 2419 107.70%
I64 FALSE 2^16 = 65536 0 9.368 9.909 105.77% 10.048 107.26% 9.707 103.62%
I64 FALSE 2^20 = 1048576 0 18.443 18.468 100.14% 19.652 106.56% 19.261 104.44%
I64 FALSE 2^24 = 16777216 0 112.768 112.864 100.09% 126.92 112.55% 126.679 112.34%
I64 FALSE 2^28 = 268435456 0 1384 1384 100.00% 1690 122.11% 1690 122.11%
I64 TRUE 2^16 = 65536 1 11.035 11.239 101.85% 11.371 103.04% 10.848 98.31%
I64 TRUE 2^20 = 1048576 1 21.012 21.313 101.43% 23.429 111.50% 23.181 110.32%
I64 TRUE 2^24 = 16777216 1 186.174 186.436 100.14% 219.016 117.64% 218.728 117.49%
I64 TRUE 2^28 = 268435456 1 2841 2841 100.00% 3324 117.00% 3326 117.07%
I64 TRUE 2^16 = 65536 0.544 11.062 11.354 102.64% 11.17 100.98% 10.738 97.07%
I64 TRUE 2^20 = 1048576 0.544 20.767 21.096 101.58% 23.228 111.85% 23.295 112.17%
I64 TRUE 2^24 = 16777216 0.544 158.215 158.444 100.14% 195.572 123.61% 195.6 123.63%
I64 TRUE 2^28 = 268435456 0.544 2355 2354 99.96% 2908 123.48% 2908 123.48%
I64 TRUE 2^16 = 65536 0 10.386 10.692 102.95% 10.847 104.44% 11.128 107.14%
I64 TRUE 2^20 = 1048576 0 19.307 19.884 102.99% 22.178 114.87% 22.674 117.44%
I64 TRUE 2^24 = 16777216 0 120.013 120.452 100.37% 168.491 140.39% 168.888 140.72%
I64 TRUE 2^28 = 268435456 0 1502 1502 100.00% 2428 161.65% 2428 161.65%
I128 FALSE 2^16 = 65536 1 10.824 10.612 98.04% 10.96 101.26% 11.33 104.67%
I128 FALSE 2^20 = 1048576 1 30.471 30.217 99.17% 33.423 109.69% 33.555 110.12%
I128 FALSE 2^24 = 16777216 1 346.419 345.94 99.86% 381.333 110.08% 380.909 109.96%
I128 FALSE 2^28 = 268435456 1 5405 5406 100.02% 6023 111.43% 6007 111.14%
I128 FALSE 2^16 = 65536 0.544 10.782 10.672 98.98% 10.965 101.70% 11.478 106.46%
I128 FALSE 2^20 = 1048576 0.544 27.85 27.636 99.23% 30.877 110.87% 31.098 111.66%
I128 FALSE 2^24 = 16777216 0.544 278.74 279.077 100.12% 315.154 113.06% 314.729 112.91%
I128 FALSE 2^28 = 268435456 0.544 4275 4280 100.12% 4862 113.73% 4862 113.73%
I128 FALSE 2^16 = 65536 0 10.473 10.198 97.37% 10.741 102.56% 10.989 104.93%
I128 FALSE 2^20 = 1048576 0 25.592 25.364 99.11% 28.565 111.62% 28.4 110.97%
I128 FALSE 2^24 = 16777216 0 184.311 185.224 100.50% 253.648 137.62% 253.715 137.66%
I128 FALSE 2^28 = 268435456 0 2589 2592 100.12% 3791 146.43% 3791 146.43%
I128 TRUE 2^16 = 65536 1 11.714 11.472 97.93% 12.611 107.66% 12.351 105.44%
I128 TRUE 2^20 = 1048576 1 31.781 31.491 99.09% 39.235 123.45% 38.88 122.34%
I128 TRUE 2^24 = 16777216 1 351.823 352.084 100.07% 458.808 130.41% 458.459 130.31%
I128 TRUE 2^28 = 268435456 1 5481 5493 100.22% 7163 130.69% 7165 130.72%
I128 TRUE 2^16 = 65536 0.544 11.422 11.209 98.14% 12.591 110.23% 12.335 107.99%
I128 TRUE 2^20 = 1048576 0.544 29.359 29.384 100.09% 37.618 128.13% 37.19 126.67%
I128 TRUE 2^24 = 16777216 0.544 290.386 290.97 100.20% 418.685 144.18% 418.244 144.03%
I128 TRUE 2^28 = 268435456 0.544 4481 4490 100.20% 6522 145.55% 6522 145.55%
I128 TRUE 2^16 = 65536 0 11.003 10.779 97.96% 12.596 114.48% 12.276 111.57%
I128 TRUE 2^20 = 1048576 0 27.196 27.277 100.30% 35.527 130.63% 35.246 129.60%
I128 TRUE 2^24 = 16777216 0 206.855 208.214 100.66% 366.605 177.23% 366.4 177.13%
I128 TRUE 2^28 = 268435456 0 2949 2971 100.75% 5634 191.05% 5640 191.25%
F32 FALSE 2^16 = 65536 1 10.093 10.377 102.81% 9.478 93.91% 9.986 98.94%
F32 FALSE 2^20 = 1048576 1 15.172 15.534 102.39% 15.61 102.89% 16.104 106.14%
F32 FALSE 2^24 = 16777216 1 104.098 104.33 100.22% 109.627 105.31% 109.917 105.59%
F32 FALSE 2^28 = 268435456 1 1513 1513 100.00% 1622 107.20% 1624 107.34%
F32 FALSE 2^16 = 65536 0.544 10.33 10.279 99.51% 9.313 90.15% 9.881 95.65%
F32 FALSE 2^20 = 1048576 0.544 15.462 15.674 101.37% 15.461 99.99% 15.742 101.81%
F32 FALSE 2^24 = 16777216 0.544 93.507 93.616 100.12% 97.712 104.50% 98.192 105.01%
F32 FALSE 2^28 = 268435456 0.544 1265 1265 100.00% 1355 107.11% 1356 107.19%
F32 FALSE 2^16 = 65536 0 9.931 9.62 96.87% 9.194 92.58% 9.653 97.20%
F32 FALSE 2^20 = 1048576 0 14.843 14.999 101.05% 15.156 102.11% 15.63 105.30%
F32 FALSE 2^24 = 16777216 0 78.724 78.722 100.00% 84.214 106.97% 84.52 107.36%
F32 FALSE 2^28 = 268435456 0 848.093 848.322 100.03% 1053 124.16% 1051 123.93%
F32 TRUE 2^16 = 65536 1 11.144 10.866 97.51% 10.397 93.30% 10.753 96.49%
F32 TRUE 2^20 = 1048576 1 16.704 16.86 100.93% 17.822 106.69% 18.232 109.15%
F32 TRUE 2^24 = 16777216 1 109.849 109.861 100.01% 129.661 118.04% 129.98 118.33%
F32 TRUE 2^28 = 268435456 1 1587 1587 100.00% 1901 119.79% 1903 119.91%
F32 TRUE 2^16 = 65536 0.544 11.007 10.878 98.83% 10.224 92.89% 10.345 93.99%
F32 TRUE 2^20 = 1048576 0.544 16.655 16.952 101.78% 17.595 105.64% 17.68 106.15%
F32 TRUE 2^24 = 16777216 0.544 99.531 99.614 100.08% 121.195 121.77% 121.47 122.04%
F32 TRUE 2^28 = 268435456 0.544 1328 1328 100.00% 1747 131.55% 1753 132.00%
F32 TRUE 2^16 = 65536 0 10.582 10.381 98.10% 10.442 98.68% 9.941 93.94%
F32 TRUE 2^20 = 1048576 0 15.909 16.015 100.67% 17.738 111.50% 17.384 109.27%
F32 TRUE 2^24 = 16777216 0 84.593 84.606 100.02% 108.872 128.70% 108.731 128.53%
F32 TRUE 2^28 = 268435456 0 964.432 964.378 99.99% 1487 154.18% 1493 154.81%
F64 FALSE 2^16 = 65536 1 10.024 10.402 103.77% 9.931 99.07% 10.299 102.74%
F64 FALSE 2^20 = 1048576 1 19.88 20.085 101.03% 20.03 100.75% 20.243 101.83%
F64 FALSE 2^24 = 16777216 1 180.422 181.02 100.33% 190.75 105.72% 190.869 105.79%
F64 FALSE 2^28 = 268435456 1 2744 2748 100.15% 2955 107.69% 2950 107.51%
F64 FALSE 2^16 = 65536 0.544 10.084 10.477 103.90% 9.967 98.84% 10.124 100.40%
F64 FALSE 2^20 = 1048576 0.544 19.411 20.162 103.87% 19.829 102.15% 20.06 103.34%
F64 FALSE 2^24 = 16777216 0.544 151.986 152.496 100.34% 161.334 106.15% 161.352 106.16%
F64 FALSE 2^28 = 268435456 0.544 2246 2246 100.00% 2416 107.57% 2413 107.44%
F64 FALSE 2^16 = 65536 0 9.59 9.659 100.72% 9.887 103.10% 9.709 101.24%
F64 FALSE 2^20 = 1048576 0 18.164 18.76 103.28% 19.428 106.96% 19.117 105.25%
F64 FALSE 2^24 = 16777216 0 112.634 113.08 100.40% 126.29 112.12% 126.039 111.90%
F64 FALSE 2^28 = 268435456 0 1383 1383 100.00% 1682 121.62% 1680 121.48%
F64 TRUE 2^16 = 65536 1 11.058 10.96 99.11% 11.132 100.67% 10.849 98.11%
F64 TRUE 2^20 = 1048576 1 21.084 21.173 100.42% 23.453 111.24% 23.031 109.23%
F64 TRUE 2^24 = 16777216 1 186.077 186.193 100.06% 218.024 117.17% 217.733 117.01%
F64 TRUE 2^28 = 268435456 1 2842 2840 99.93% 3313 116.57% 3313 116.57%
F64 TRUE 2^16 = 65536 0.544 11.339 10.842 95.62% 11.038 97.35% 10.729 94.62%
F64 TRUE 2^20 = 1048576 0.544 21.317 20.955 98.30% 23.042 108.09% 22.786 106.89%
F64 TRUE 2^24 = 16777216 0.544 158.347 158.071 99.83% 194.517 122.84% 194.228 122.66%
F64 TRUE 2^28 = 268435456 0.544 2353 2352 99.96% 2891 122.86% 2891 122.86%
F64 TRUE 2^16 = 65536 0 10.407 10.11 97.15% 11.129 106.94% 10.583 101.69%
F64 TRUE 2^20 = 1048576 0 19.943 19.746 99.01% 22.51 112.87% 22.179 111.21%
F64 TRUE 2^24 = 16777216 0 120.405 120.086 99.74% 167.749 139.32% 167.342 138.98%
F64 TRUE 2^28 = 268435456 0 1500 1500 100.00% 2410 160.67% 2410 160.67%

device_select_if

Results
T{ct} IsInPlace{ct} Elements{io} Entropy I32 u32 u32/i32 time i64 i64/i32 time u64 u64/i32 time
I8 FALSE 2^16 = 65536 1 9.593 9.286 96.80% 9.412 98.11% 9.229 96.21%
I8 FALSE 2^20 = 1048576 1 12.372 11.839 95.69% 13.272 107.27% 12.927 104.49%
I8 FALSE 2^24 = 16777216 1 51.753 50.355 97.30% 71.368 137.90% 71.033 137.25%
I8 FALSE 2^28 = 268435456 1 694.494 674.973 97.19% 1038 149.46% 1037 149.32%
I8 FALSE 2^16 = 65536 0.544 9.579 9.489 99.06% 9.539 99.58% 9.426 98.40%
I8 FALSE 2^20 = 1048576 0.544 12.117 11.601 95.74% 12.919 106.62% 12.699 104.80%
I8 FALSE 2^24 = 16777216 0.544 48.466 47.774 98.57% 68.568 141.48% 68.269 140.86%
I8 FALSE 2^28 = 268435456 0.544 639.299 633.134 99.04% 987.549 154.47% 987.764 154.51%
I8 FALSE 2^16 = 65536 0 9.351 9.385 100.36% 9.409 100.62% 9.796 104.76%
I8 FALSE 2^20 = 1048576 0 11.646 11.431 98.15% 12.485 107.20% 12.878 110.58%
I8 FALSE 2^24 = 16777216 0 42.488 42.102 99.09% 63.775 150.10% 64.159 151.00%
I8 FALSE 2^28 = 268435456 0 526.733 524.933 99.66% 888.127 168.61% 888.236 168.63%
I8 TRUE 2^16 = 65536 1 10.381 10.382 100.01% 9.85 94.88% 10.322 99.43%
I8 TRUE 2^20 = 1048576 1 13.477 13.688 101.57% 15.215 112.90% 15.423 114.44%
I8 TRUE 2^24 = 16777216 1 61.16 59.945 98.01% 96.48 157.75% 96.627 157.99%
I8 TRUE 2^28 = 268435456 1 857.119 839.771 97.98% 1472 171.74% 1472 171.74%
I8 TRUE 2^16 = 65536 0.544 10.29 10.334 100.43% 9.981 97.00% 10.312 100.21%
I8 TRUE 2^20 = 1048576 0.544 12.947 13.39 103.42% 15.046 116.21% 15.182 117.26%
I8 TRUE 2^24 = 16777216 0.544 57.827 57.808 99.97% 93.575 161.82% 94.154 162.82%
I8 TRUE 2^28 = 268435456 0.544 802.315 798.158 99.48% 1417 176.61% 1417 176.61%
I8 TRUE 2^16 = 65536 0 10.051 10.256 102.04% 9.943 98.93% 10.503 104.50%
I8 TRUE 2^20 = 1048576 0 12.533 13.033 103.99% 14.786 117.98% 15.076 120.29%
I8 TRUE 2^24 = 16777216 0 52.096 52.376 100.54% 89.2 171.22% 89.499 171.80%
I8 TRUE 2^28 = 268435456 0 693.302 692.789 99.93% 1317 189.96% 1317 189.96%
I16 FALSE 2^16 = 65536 1 10.11 9.802 96.95% 9.41 93.08% 9.639 95.34%
I16 FALSE 2^20 = 1048576 1 13.546 13.262 97.90% 13.727 101.34% 13.61 100.47%
I16 FALSE 2^24 = 16777216 1 63.446 63.064 99.40% 77.355 121.92% 77.197 121.67%
I16 FALSE 2^28 = 268435456 1 883.8 875.282 99.04% 1109 125.48% 1109 125.48%
I16 FALSE 2^16 = 65536 0.544 9.468 10.04 106.04% 9.377 99.04% 9.528 100.63%
I16 FALSE 2^20 = 1048576 0.544 12.542 12.99 103.57% 13.45 107.24% 13.549 108.03%
I16 FALSE 2^24 = 16777216 0.544 56.02 56.201 100.32% 74.915 133.73% 75.007 133.89%
I16 FALSE 2^28 = 268435456 0.544 704.52 699.244 99.25% 1071 152.02% 1071 152.02%
I16 FALSE 2^16 = 65536 0 9.199 9.548 103.79% 9.205 100.07% 9.386
@elstehle
Copy link
Collaborator

elstehle commented May 30, 2024

Priority-wise, I think we want to focus on the algorithms that currently do not support large number of items yet (see overview). That is,

  • device_select.cuh,
  • device_partition.cuh,
  • device_scan.cuh,
  • device_segmented_sort.cuh,
  • device_segmented_radix_sort.cuh,
  • device_run_length_encode.cuh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

2 participants