Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
qddyy committed Jan 1, 2025
1 parent 2fec513 commit b08c987
Show file tree
Hide file tree
Showing 29 changed files with 304 additions and 161 deletions.
88 changes: 67 additions & 21 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@ options(
asciicast_knitr_svg = TRUE,
asciicast_padding_y = 0,
asciicast_start_wait = 0,
asciicast_end_wait = 1
asciicast_end_wait = 1,
asciicast_timeout = Inf
)
```

Expand Down Expand Up @@ -58,7 +59,6 @@ remotes::install_github("qddyy/LearnNonparam")

```{r, library, eval = FALSE}
library(LearnNonparam)
options(LearnNonparam.pmt_progress = TRUE)
```

- Construct a test object
Expand Down Expand Up @@ -130,34 +130,80 @@ LearnNonparam::pmts()
```
</details>

`define_pmt` allows users to define new permutation tests. Take the two-sample Cramér-Von Mises test as an example:
## Extending

```{asciicast, define}
t <- define_pmt(
`define_pmt` allows users to define new permutation tests. Take the two-sample Wilcoxon test as an example:

```{asciicast, define_r}
t_custom <- define_pmt(
# this is a two-sample permutation test
inherit = "twosample",
statistic = function(x, y) {
# (optional) pre-calculate certain constants that remain invariant during permutation
n_x <- length(x)
n_y <- length(y)
F_x <- seq_len(n_x) / n_x
G_y <- seq_len(n_y) / n_y
m <- length(x)
n <- length(y)
# return a closure to calculate the test statistic
function(x, y) {
x <- sort.int(x)
y <- sort.int(y)
F <- approxfun(x, F_x, "constant", 0, 1)
G <- approxfun(y, G_y, "constant", 0, 1)
sum(c(F_x - G(x), G_y - F(y))^2)
}
function(x, y) sum(x) / m - sum(y) / n
},
# reject the null hypothesis when the test statistic is large
rejection = "r",
name = "Two-Sample Cramér-Von Mises Test",
alternative = "samples are from different continuous distributions"
# reject the null hypothesis when the test statistic is too large or too small
rejection = "lr", n_permu = 1e5
)
```

Also, the statistic can be written in C++. Leveraging Rcpp sugars and C++14 features, only minor modifications are needed to make it compatible with C++ syntax.

```{asciicast, define_cpp}
t_cpp <- define_pmt(
inherit = "twosample", rejection = "lr", n_permu = 1e5,
statistic = "[](const auto& x, const auto& y) {
auto m = x.length();
auto n = y.length();
return [=](const auto& x, const auto& y) {
return sum(x) / m - sum(y) / n;
};
}"
)
```

It's easy to check that `t_custom` and `t_cpp` are equivalent:

```{asciicast, prepare_data}
x <- rnorm(10, mean = 0)
y <- rnorm(10, mean = 5)
```

```{asciicast, t_custom_res}
set.seed(0)
t_custom$test(x, y)$print()
```

```{asciicast, t_cpp_res}
set.seed(0)
t_cpp$test(x, y)$print()
```

## Performance

[coin](https://CRAN.R-project.org/package=coin) is a commonly used R package for performing permutation tests. Below is a benchmark:

```{asciicast, benchmark}
library(coin)
data <- c(x, y)
group <- factor(c(rep("x", length(x)), rep("y", length(y))))
options(LearnNonparam.pmt_progress = FALSE)
benchmark <- microbenchmark::microbenchmark(
R = t_custom$test(x, y),
Rcpp = t_cpp$test(x, y),
coin = wilcox_test(data ~ group, distribution = approximate(nresample = 1e5, parallel = "no"))
)
```

t$test(rnorm(10), runif(10))$print()
```{asciicast, benchmark_res}
benchmark
```

It can be seen that C++ brings significantly better performance than pure R, even surpassing the `coin` package (under sequential execution). However, all tests in this package are currently written in R with no plans for migration to C++ in the future. This is because the primary goal of this package is not to maximize performance but to offer a flexible framework for permutation tests.

## References
125 changes: 104 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,6 @@ remotes::install_github("qddyy/LearnNonparam")

``` r
library(LearnNonparam)
options(LearnNonparam.pmt_progress = TRUE)
```

- Construct a test object
Expand Down Expand Up @@ -162,42 +161,126 @@ See <code>pmts()</code> for tests implemented in this package.

</details>

## Extending

`define_pmt` allows users to define new permutation tests. Take the
two-sample Cramér-Von Mises test as an example:
two-sample Wilcoxon test as an example:

``` r
t <- define_pmt(
t_custom <- define_pmt(
# this is a two-sample permutation test
inherit = "twosample",
statistic = function(x, y) {
# (optional) pre-calculate certain constants that remain invariant during permutation
n_x <- length(x)
n_y <- length(y)
F_x <- seq_len(n_x) / n_x
G_y <- seq_len(n_y) / n_y
m <- length(x)
n <- length(y)
# return a closure to calculate the test statistic
function(x, y) {
x <- sort.int(x)
y <- sort.int(y)
F <- approxfun(x, F_x, "constant", 0, 1)
G <- approxfun(y, G_y, "constant", 0, 1)
sum(c(F_x - G(x), G_y - F(y))^2)
}
function(x, y) sum(x) / m - sum(y) / n
},
# reject the null hypothesis when the test statistic is large
rejection = "r",
name = "Two-Sample Cramér-Von Mises Test",
alternative = "samples are from different continuous distributions"
# reject the null hypothesis when the test statistic is too large or too small
rejection = "lr", n_permu = 1e5
)
```

<picture>
<source media="(prefers-color-scheme: dark)" srcset="man/figures/README/define_r-dark.svg">
<img src="man/figures/README/define_r.svg" width="100%" style="display: block; margin: auto;" />
</picture>

Also, the statistic can be written in C++. Leveraging Rcpp sugars and
C++14 features, only minor modifications are needed to make it
compatible with C++ syntax.

t$test(rnorm(10), runif(10))$print()
``` r
t_cpp <- define_pmt(
inherit = "twosample", rejection = "lr", n_permu = 1e5,
statistic = "[](const auto& x, const auto& y) {
auto m = x.length();
auto n = y.length();
return [=](const auto& x, const auto& y) {
return sum(x) / m - sum(y) / n;
};
}"
)
```

<picture>
<source media="(prefers-color-scheme: dark)" srcset="man/figures/README/define-dark.svg">
<img src="man/figures/README/define.svg" width="100%" style="display: block; margin: auto;" />
<source media="(prefers-color-scheme: dark)" srcset="man/figures/README/define_cpp-dark.svg">
<img src="man/figures/README/define_cpp.svg" width="100%" style="display: block; margin: auto;" />
</picture>

It’s easy to check that `t_custom` and `t_cpp` are equivalent:

``` r
x <- rnorm(10, mean = 0)
y <- rnorm(10, mean = 5)
```

<picture>
<source media="(prefers-color-scheme: dark)" srcset="man/figures/README/prepare_data-dark.svg">
<img src="man/figures/README/prepare_data.svg" width="100%" style="display: block; margin: auto;" />
</picture>

``` r
set.seed(0)
t_custom$test(x, y)$print()
```

<picture>
<source media="(prefers-color-scheme: dark)" srcset="man/figures/README/t_custom_res-dark.svg">
<img src="man/figures/README/t_custom_res.svg" width="100%" style="display: block; margin: auto;" />
</picture>

``` r
set.seed(0)
t_cpp$test(x, y)$print()
```

<picture>
<source media="(prefers-color-scheme: dark)" srcset="man/figures/README/t_cpp_res-dark.svg">
<img src="man/figures/README/t_cpp_res.svg" width="100%" style="display: block; margin: auto;" />
</picture>

## Performance

[coin](https://CRAN.R-project.org/package=coin) is a commonly used R
package for performing permutation tests. Below is a benchmark:

``` r
library(coin)

data <- c(x, y)
group <- factor(c(rep("x", length(x)), rep("y", length(y))))

options(LearnNonparam.pmt_progress = FALSE)
benchmark <- microbenchmark::microbenchmark(
R = t_custom$test(x, y),
Rcpp = t_cpp$test(x, y),
coin = wilcox_test(data ~ group, distribution = approximate(nresample = 1e5, parallel = "no"))
)
```

<picture>
<source media="(prefers-color-scheme: dark)" srcset="man/figures/README/benchmark-dark.svg">
<img src="man/figures/README/benchmark.svg" width="100%" style="display: block; margin: auto;" />
</picture>

``` r
benchmark
```

<picture>
<source media="(prefers-color-scheme: dark)" srcset="man/figures/README/benchmark_res-dark.svg">
<img src="man/figures/README/benchmark_res.svg" width="100%" style="display: block; margin: auto;" />
</picture>

It can be seen that C++ brings significantly better performance than
pure R, even surpassing the `coin` package (under sequential execution).
However, all tests in this package are currently written in R with no
plans for migration to C++ in the future. This is because the primary
goal of this package is not to maximize performance but to offer a
flexible framework for permutation tests.

## References

<div id="refs" class="references csl-bib-body hanging-indent">
Expand Down
1 change: 1 addition & 0 deletions man/figures/README/benchmark-dark.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions man/figures/README/benchmark.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions man/figures/README/benchmark_res-dark.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions man/figures/README/benchmark_res.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions man/figures/README/define_cpp-dark.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions man/figures/README/define_cpp.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions man/figures/README/define_r-dark.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions man/figures/README/define_r.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit b08c987

Please sign in to comment.