Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect auxiliary stats on wilcoxauc with dense matrix (AvgExpr and LogFC) #7

Open
bnprks opened this issue May 28, 2020 · 0 comments

Comments

@bnprks
Copy link

bnprks commented May 28, 2020

Thanks for publishing this package! I've found it very useful to get fast wilcoxon tests, however I've noticed some of the auxiliary stats returned are incorrect which could cause issues for users.

Specifically, wilcoxauc returns incorrect AvgExpr values when called on a dense matrix. This appears to be because the function cpp_rank_matrix_dense directly modifies the matrix X, thus altering its value for later computations.

Furthermore, the results column listed as logFC is not actually returning the log fold change, but the difference in means. I think either the logFC column should be named differently to reflect what it does return (e.g. meanDiff), or its calculation should be changed to actually return the logFC between group means.

I've included a brief example to reproduce below:

X <- matrix(c(0,1,2,15,16,17), nrow=1)
y <- factor(c("A", "A", "A", "B", "B", "B"))
res <- presto::wilcoxauc(X, y)

X_sparse <- sparseMatrix(i=rep(1, 6), j=1:6, x=X[,])
res2 <- presto::wilcoxauc(X_sparse, y)

The values of res and res2 are now:

> res
   feature group avgExpr logFC statistic auc      pval      padj pct_in pct_out
1 Feature1     A       2    -3         0   0 0.0808556 0.0808556    100     100
2 Feature1     B       5     3         9   1 0.0808556 0.0808556    100     100
> res2
   feature group avgExpr logFC statistic auc      pval      padj pct_in pct_out
1 Feature1     A       1   -15         0   0 0.0808556 0.0808556    100     100
2 Feature1     B      16    15         9   1 0.0808556 0.0808556    100     100

Here, the avgExpr in res is the average rank of the values, rather than the actual average. Additionally, we see that logFC in both cases is equal to the difference in means rather than the logFC (which should be calculated as +/-4, assuming we are taking log2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant