Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dispatcher that returns function pointer #44

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

fmcgg
Copy link

@fmcgg fmcgg commented Sep 14, 2024

I found this functionality useful when creating a noise library and want to know if you would add something like it. Usecase here is I have many noise functions that each have their own multiversion derive. They are composed at runtime and called in series to fill an array with noise. Going through the dispatcher for each function call adds too much overhead, so I instead want to call the dispatcher once while constructing and get the function pointer so I can skip it when computing.

No intention to merge this, just easier to display through a pr.

@calebzulawski
Copy link
Owner

I could see something like this being useful. The only dispatcher overhead vs this, however, should be an atomic load. Is there really that much overhead?

@fmcgg
Copy link
Author

fmcgg commented Sep 19, 2024

#![feature(portable_simd)]

use std::simd::prelude::*;

use multiversion::multiversion;

#[multiversion(targets = "simd", dispatcher = "indirect")]
pub fn indirect_add(res: Simd<f32, 8>) -> Simd<f32, 8> {
    res + Simd::splat(1.0)
}

#[multiversion(targets = "simd")]
pub fn benchmark_indirect() {
    let mut res = Simd::<f32, 8>::splat(0.0);
    for _ in 0..1000 {
        res = indirect_add(res);
        res = indirect_add(res);
        res = indirect_add(res);
        res = indirect_add(res);
        res = indirect_add(res);
        res = indirect_add(res);
        res = indirect_add(res);
        res = indirect_add(res);
        res = indirect_add(res);
    }
}

#[multiversion(targets = "simd", dispatcher = "pointer")]
pub fn pointer_add(res: Simd<f32, 8>) -> Simd<f32, 8> {
    res + Simd::splat(1.0)
}

#[multiversion(targets = "simd")]
pub fn benchmark_pointer() {
    let add_function = pointer_add();
    let mut res = Simd::<f32, 8>::splat(0.0);
    for _ in 0..1000 {
        unsafe {
            res = (add_function)(res);
            res = (add_function)(res);
            res = (add_function)(res);
            res = (add_function)(res);
            res = (add_function)(res);
            res = (add_function)(res);
            res = (add_function)(res);
            res = (add_function)(res);
            res = (add_function)(res);
        }
    }
}
indirect/lib            time:   [36.351 µs 36.420 µs 36.494 µs]
                        change: [-3.5538% -3.0421% -2.4988%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild
  1 (1.00%) high severe

pointer/lib             time:   [23.154 µs 23.259 µs 23.360 µs]
                        change: [-2.4580% -1.5319% -0.1121%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

Does this look correct? I don't know enough to comment on why, but I assumed it had to with atomics requiring going to the closest shared cache for each call. The gap grows the more calls are in the chain, but it goes away if you reduce it to just one call a for loop, maybe some compiler stuff idk...

@fmcgg
Copy link
Author

fmcgg commented Sep 19, 2024

Some other concern I bumped into again when making the benchmark, indirect calls don't allow you to do const generics. I probably removed the condition without knowing why it was there, but it made my function interfaces very nice I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants