david_comparison.Rmd

---
title: "Testing Randomise and 3dttest++ on DT Threat Reactivity data"
author: "John Flournoy"
date: "`r Sys.Date()`"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(RNifti)
library(ggplot2)
```

# Load images

```{r}
#afni test is made up of 24 volumes
#[ 3] For (control = 0) - (control = 1), (control = 0), and (control = 1):
#[x4] for mean, female, age, inc_needs:
#[x2] regression coef and Z score.
#
# 1: control0-control1_mean
# 2: control0-control1_Zscr
# 3: control0-control1_FEMALE
# 4: control0-control1_FEMALE_Zscr
# ...
fslrandomize1 <- RNifti::readNifti(file = '~/otherhome/data/randomise_test/randomise_tstat1.nii.gz')
fslrandomize2 <- RNifti::readNifti(file = '~/otherhome/data/randomise_test/randomise_tstat2.nii.gz')
fslrandomize1_tfce <- RNifti::readNifti(file = '~/otherhome/data/randomise_test/randomise_tfce_corrp_tstat1.nii.gz')
fslrandomize2_tfce <- RNifti::readNifti(file = '~/otherhome/data/randomise_test/randomise_tfce_corrp_tstat2.nii.gz')
afni3dttest <- RNifti::readNifti(file = '~/otherhome/data/3dttest_test/test.nii')
afni3dttest_ETAC <- RNifti::readNifti(file = '~/otherhome/data/3dttest_test/etac.SDL.ETACmask.global.2sid.5perc.nii.gz')
orig <- RNifti::readNifti(file = '/net/holynfs01/srv/export/mclaughlin/share_root/stressdevlab/dep_threat_pipeline/Group/ThreatReactivity/David/Spank/Fear_GT_Calm_Spank_n147.gfeat/cope1.feat/stats/zstat1.nii.gz')

mask <- RNifti::readNifti(file = '/net/holynfs01/srv/export/mclaughlin/share_root/stressdevlab/dep_threat_pipeline/Group/ThreatReactivity/David/Spank/Fear_GT_Calm_Spank_n147.gfeat/cope1.feat/mask.nii.gz')

corrected_p <- p.adjust(pnorm(abs(orig[mask != 0]), lower.tail = FALSE)*2, 'fdr')
orig_p_fdr <- orig
orig_p_fdr[] <- 0
orig_p_fdr[mask != 0] <- 1-corrected_p #to compare with FSL which also uses this convention
```

Let's not try to plot every voxel--we can sample 10,000 voxels to get a good sense of correspondence between methods.

```{r}
set.seed(2472)
i <- sample(1:length(fslrandomize1), size = 5e4)
```

# Comparing Randomise, 3dttest++, and the original parametric _t_-tests

## The Two Randomise _t_-stats

I think the two contrasts are just the inverse of one another. This is good, because it looks like TFCE is a one-sided operation.

```{r}
ggplot2::qplot(as.vector(fslrandomize1)[i], as.vector(fslrandomize2)[i]) + 
  ggplot2::geom_smooth()
```

## Randomise _t_-statistic versus TFCE _p_-value

As the tstat gets bigger in the positive direction, the (1-p-value) gets bigger (but not 1:1 given the spatial info)
notice nothing exceeds the significance threshold of .95 (one sided) or .975 (two-sided).

```{r}
ggplot2::qplot(as.vector(fslrandomize1)[i], as.vector(fslrandomize1_tfce)[i]) + 
  ggplot2::geom_smooth() + 
  ggplot2::coord_cartesian(y = c(0,1))
```

```{r}
ggplot2::qplot(as.vector(fslrandomize2)[i], as.vector(fslrandomize2_tfce)[i]) + 
  ggplot2::geom_smooth() + 
  ggplot2::coord_cartesian(y = c(0,1))
```

## Compare Randomise to the parametric tests

```{r}
#Compare to orig afni:
ggplot2::ggplot(data.frame(x = as.vector(orig)[i], y = as.vector(fslrandomize1)[i]), aes(x = x, y = y)) + 
  ggplot2::geom_hex(aes(fill = log(..count..)), bins = 25, color = 'black', size = .25) + 
  ggplot2::scale_fill_continuous(breaks = log(c(10, 100, 1000, 10000)), labels = c(10, 100, 1000, 10000), name = 'Count') + 
  ggplot2::geom_smooth() + 
  ggplot2::geom_abline(intercept = 0, slope = 1) + 
  labs(x = 'Original parametric t-test', y = 'Randomise permutation Z-score')
```

Positive:

```{r}
#Compare to orig afni:
ggplot2::ggplot(data.frame(x = as.vector(orig)[i], y = as.vector(fslrandomize1_tfce)[i]), aes(x = x, y = y)) + 
  ggplot2::geom_hex(aes(fill = log(..count..)), bins = 25, color = 'black', size = .25) + 
  ggplot2::scale_fill_continuous(breaks = log(c(10, 100, 1000, 10000)), labels = c(10, 100, 1000, 10000), name = 'Count') + 
  ggplot2::geom_smooth() + 
  # xlim(c(0,4)) + 
  ylim(c(0,1)) + 
  labs(x = 'Original parametric t-test', y = 'Randomise (corrected) TFCE 1 - p')
```

```{r}
#Compare to orig afni:
ggplot2::ggplot(data.frame(x = as.vector(orig_p_fdr)[i], y = as.vector(fslrandomize1_tfce)[i]), aes(x = x, y = y)) + 
  ggplot2::geom_hex(aes(fill = log(..count..)), binwidth = c(.05, .05), color = 'black', size = .25) + 
  # geom_point() + 
  ggplot2::scale_fill_continuous(breaks = log(c(10, 100, 1000)), labels = c(10, 100, 1000), 
                                 name = 'Count', limits = c(0,log(1000))) + 
  ggplot2::geom_smooth() + 
  xlim(c(0,1)) + 
  ylim(c(0,1)) + 
  labs(x = 'FDR-corrected 1-p for parametric t-test', y = 'Randomise (corrected) TFCE 1 - p')
```

Negative:

```{r}
#Compare to orig afni:
ggplot2::ggplot(data.frame(x = as.vector(orig)[i], y = as.vector(fslrandomize2_tfce)[i]), aes(x = x, y = y)) + 
  ggplot2::geom_hex(aes(fill = log(..count..)), bins = 25, color = 'black', size = .25) + 
  ggplot2::scale_fill_continuous(breaks = log(c(1, 10, 100, 1000, 10000)), labels = c(1, 10, 100, 1000, 10000), name = 'Count') + 
  ggplot2::geom_smooth() + 
  # xlim(c(-4,0)) + 
  ylim(c(0,1)) + 
  labs(x = 'Original parametric t-test', y = 'Randomise (corrected) TFCE 1 - p')
```

```{r}
#Compare to orig afni:
ggplot2::ggplot(data.frame(x = as.vector(orig_p_fdr)[i], y = as.vector(fslrandomize2_tfce)[i]), aes(x = x, y = y)) + 
  ggplot2::geom_hex(aes(fill = log(..count..)), binwidth = c(.05, .05), color = 'black', size = .25) + 
  # geom_point() + 
  ggplot2::scale_fill_continuous(breaks = log(c(1, 10, 100, 1000)), labels = c(1, 10, 100, 1000), 
                                 name = 'Count', limits = c(0,log(1000))) + 
  ggplot2::geom_smooth() + 
  xlim(c(0,1)) + 
  ylim(c(0,1)) + 
  labs(x = 'FDR-corrected 1-p for parametric t-test', y = 'Randomise (corrected) TFCE 1 - p')
```

Positive and Negative together:

```{r}
#Compare to orig afni:
ggplot2::ggplot(data.frame(x = as.vector(orig_p_fdr)[i], y = as.vector(fslrandomize2)[i]), 
                aes(x = x, y = y, group = y > 0)) + 
  ggplot2::geom_hex(aes(fill = log(..count..)), binwidth = c(.025, .25), color = 'black', size = .25) + 
  # geom_point() + 
  ggplot2::scale_fill_continuous(breaks = log(c(1, 10, 100, 1000, 10000)), labels = c(1, 10, 100, 1000, 10000), 
                                 name = 'Count') + 
  ggplot2::geom_smooth() + 
  labs(x = 'FDR-corrected 1-p for parametric t-test', y = 'Randomise Z')
```

```{r}
#Compare to orig afni:
ggplot2::ggplot(data.frame(x = as.vector(orig_p_fdr)[i], 
                           y = as.vector(fslrandomize1_tfce)[i] + as.vector(fslrandomize2_tfce)[i]), 
                aes(x = x, y = y)) + 
  ggplot2::geom_hex(aes(fill = log(..count..)), binwidth = c(.025, .025), color = 'black', size = .25) + 
  # geom_point() + 
  ggplot2::scale_fill_continuous(breaks = log(c(1, 10, 100, 1000, 10000)), labels = c(1, 10, 100, 1000, 10000), 
                                 name = 'Count') + 
  ggplot2::geom_smooth() + 
  labs(x = 'FDR-corrected 1-p for parametric t-test', y = 'Randomise (corrected) TFCE 1 - p')
```

## Compare 3dttest++ to the parametric tests

Note that the documentation for 3dttest++ states:

```
+ Permutation with '-covariates' may not work the way you wish.
  In the past [pre-March 2020], covariates were NOT permuted along
  with their data. Now, covariate ARE permuted along with their data.
  This latter method seems more logical to me [RWCox].
```

This is not consistent with the methods providing the basis for FSL's Randomise as reviewed in:

>Winkler, A. M., Ridgway, G. R., Webster, M. A., Smith, S. M., & Nichols, T. E. (2014). Permutation inference for the general linear model. NeuroImage, 92, 381–397. https://doi.org/10.1016/j.neuroimage.2014.01.060

The intuition is that we are not building a null distribution for the whole regression (including covariates) but rather just for the predictor of interest. For that reason, the covariates should be allowed to account for the variance they can explain in each permutation.

```{r}
#Compare to orig afni:
ggplot2::ggplot(data.frame(x = as.vector(orig)[i], y = as.vector(unlist(afni3dttest[,,,1,2]))[i]), aes(x = x, y = y)) + 
  ggplot2::geom_hex(aes(fill = log(..count..)), bins = 25, color = 'black', size = .25) + 
  ggplot2::scale_fill_continuous(breaks = log(c(10, 100, 1000, 10000)), labels = c(10, 100, 1000, 10000), name = 'Count') + 
  ggplot2::geom_smooth() + 
  ggplot2::geom_abline(intercept = 0, slope = 1) + 
  labs(x = 'Original parametric t-test', y = '3dttest++ permutation Z-score')
```

```{r}
#Compare to orig afni:
ggplot2::ggplot(data.frame(x = as.vector(orig_p_fdr)[i], y = as.vector(unlist(afni3dttest[,,,1,2]))[i]), 
                aes(x = x, y = y, group = y > 0)) + 
  ggplot2::geom_hex(aes(fill = log(..count..)), bins = 25, color = 'black', size = .25) + 
  ggplot2::scale_fill_continuous(breaks = log(c(10, 100, 1000, 10000)), labels = c(10, 100, 1000, 10000), name = 'Count') + 
  ggplot2::geom_smooth() + 
  labs(x = 'FDR corrected parametric t-test 1 - p', y = '3dttest++ permutation Z-score')
```

Applying ETAC to the permuted data did not result in any clusters:

```{r}
sum(afni3dttest_ETAC != 0)
```