-
Notifications
You must be signed in to change notification settings - Fork 16
/
Copy pathslides.html
2232 lines (1440 loc) · 408 KB
/
slides.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html lang="" xml:lang="">
<head>
<title>Getting Started with Stan</title>
<meta charset="utf-8" />
<meta name="author" content="Dan Ovando" />
<script src="libs/header-attrs/header-attrs.js"></script>
<link href="libs/remark-css/default-fonts.css" rel="stylesheet" />
<link href="libs/remark-css/default.css" rel="stylesheet" />
<link href="libs/tile-view/tile-view.css" rel="stylesheet" />
<script src="libs/tile-view/tile-view.js"></script>
<script src="libs/fabric/fabric.min.js"></script>
<link href="libs/xaringanExtra-scribble/scribble.css" rel="stylesheet" />
<script src="libs/xaringanExtra-scribble/scribble.js"></script>
<script>document.addEventListener('DOMContentLoaded', function() { window.xeScribble = new Scribble({"pen_color":["#FF0000"],"pen_size":3,"eraser_size":30,"palette":[]}) })</script>
<script src="libs/htmlwidgets/htmlwidgets.js"></script>
<link href="libs/jsoneditor/jsoneditor.min.css" rel="stylesheet" />
<script src="libs/jsoneditor/jsoneditor.min.js"></script>
<script src="libs/jsonedit-binding/jsonedit.js"></script>
<link rel="stylesheet" href="my-theme.css" type="text/css" />
</head>
<body>
<textarea id="source">
class: center, middle, inverse, title-slide
.title[
# Getting Started with Stan
]
.author[
### Dan Ovando
]
.date[
### Updated 2022-06-23
]
---
# Objectives for Workshop
1. Basics of Bayes
- What is it and why to use it
2. Bayesian regression with `rstanarm`
- Diagnosing with `shinystan`
- Getting and plotting results with `tidybayes`/`ggdist`
3. Roll your own
- Writing your own models in Stan
- Not going to address `brms`
---
class: center, middle, inverse
### follow along at [danovando.github.io/learn-stan/slides](https://danovando.github.io/learn-stan/slides)
---
# Objectives for Workshop
<br>
<br>
<br>
.center[**To Bayes or not to Bayes should be a philosophical question, not a practical one**]
---
class: center, middle, inverse
# Basics of Bayes
---
# Basics of Bayes
Bayes Theorem:
<br>
<br>
$$p(model|data) = \frac{p(data|model) \times p(model)}{p(data)} $$
---
# Basics of Bayes
`$$prob(thing|data) \propto prob(data|thing)prob(thing)$$`
<img src="https://imgs.xkcd.com/comics/bridge.png " width="70%" style="display: block; margin: auto;" />
`$$prob(crazy|jumping) \propto prob(jumping|crazy)prob(crazy)$$`
.center[if `\(friends = CRAZY\)`, then stay on bridge and eat cookies!]
.small[[xkcd](https://imgs.xkcd.com/comics/bridge.png)]
---
# Bayesian vs. Frequentist
.pull-left[
### Bayes
Data are fixed, parameters are random
What is the probability of a model given the data?
- e.g. conditional on my model and data, how likely is it that a stock is overfished?
Clean way to bring in prior knowledge
But, means you have to think about priors
] .pull-right[
### Frequentist
Data are random, parameters are fixed
How likely are the observed data if a given model is true?
What you probably learned in stats classes.
Can't really say anything about the probability of a parameter.
No need to think about priors
]
---
# A Likelihood Refresher
Likelihoods and model fitting are an entire course, but we need some common language to move forward.
What's missing from this regression equation?
`$$y_i = \beta{x_i}$$`
--
`$$y_i = \beta{x_i} + \epsilon_i$$`
What do we generally assume about the error term `\(\epsilon_i\)`?
--
OLS assumes that errors are I.I.D; independent and identically distributed
`$$\epsilon_i \sim normal(0,\sigma)$$`
---
# A Likelihood Refresher
Another way to write that model is a data-generating model
`$$y_i \sim N(\beta{x_i},\sigma)$$`
This means that the likelihood ( P(data|model)) can be calculated as
`$$\frac{1}{\sqrt{2{\pi}\sigma^2}}e^{-\frac{(\beta{x_i} - y)^2}{2\sigma^2}}$$`
Or for those of you that prefer to speak code
`dnorm(y,beta * x, sigma)`
Most model fitting revolves around finding parameters that maximize likelihoods!
---
# I thought you said I needed a prior?
What we just looked at is a regression estimated via maximum likelihood (would get same result by OLS).
To go Bayes, you need to specify priors on all parameters.
What are the parameters of this model?
`$$y_i \sim normal(\beta{x_i},\sigma)$$`
--
We can use these priors for the parameters
`$$\beta \sim normal(0,2.5)$$`
`$$\sigma \sim cauchy(0,2.5)$$`
And so our posterior (no longer just likelihood) is proportional to
`dnorm(y,beta * x, sigma) x dnorm(beta,0, 2.5) X dcauchy(sigma,0,2.5)`
???
why only proportional?
notice a bit of a dirty secret here: the priors have to stop somewhere.
---
# A Quick Note on Priors
.pull-left[
Priors can be freaky:
We've been taught that our science should be absolutely objective
Now your telling me I *have* to include "beliefs"??
* Somebody pour me a good strong p-value.
] .pull-right[
<img src="https://imgs.xkcd.com/comics/frequentists_vs_bayesians.png" width="80%" style="display: block; margin: auto;" />
]
---
# A Quick Note on Priors
"Best" case scenario
- You can include the results of another study as priors in yours
You usually know *something*
- What's your prior on the average length of a whale?
Does get harder for more esoteric parameters
- A uniform prior is NOT UNINFORMATIVE
(informative) Data quickly overwhelm priors
When in doubt, check sensitivity to different priors
- If your key results depend on priors, be prepared to defend them
- See [Schad et al. (2019)](https://pubmed.ncbi.nlm.nih.gov/32551748/)
---
# Breath.
.pull-left[
If that all made you a little queasy, don't worry, we're back to code!
This can seem like deep stuff, but I really find it easier than the frequentist stats I was trained on.
The problem becomes more about making models of the world than remember what test goes with what kind of problem.
I can't recommend these two books enough.
[Statistical Rethinking](https://xcelab.net/rm/statistical-rethinking/)
[Bayesian Models: A Statistical Primer for Ecologists](https://xcelab.net/rm/statistical-rethinking/)
].pull-right[
<img src="https://media.giphy.com/media/51Uiuy5QBZNkoF3b2Z/giphy.gif" width="80%" style="display: block; margin: auto;" />
]
---
# Basics of Bayes - MCMC
Big reason that Bayes wasn't historically used as much was that except for some specific cases Bayesian models have no analytical solution (and debates about priors)
- Frequentist can usually solve for the answer (given some very specific assumptions usually involving asymptotics)
Started to change when computers made Markov Chain Monte Carlo (MCMC) practical
- Monte Carlo - try lots of different numbers
- Markov Chain - an elegant way of choosing those numbers
---
# Basics of Bayes - MCMC
.pull-left[
MCMC can be shown to always converge!
Sounds like magic, but basically says, "If you try every number in existence, you'll find the solution"
The trick is finding an efficient way to explore the parameter space. '
- Something that jumps around with purpose
] .pull-right[
<img src="http://giphygifs.s3.amazonaws.com/media/DpN771RFKeUp2/giphy.gif" style="display: block; margin: auto;" />
]
---
# Basics of Bayes - MCMC
```r
set.seed(42)
x <- 1:500 #make up independent data
true_param <- .5 # true relationship between x and y
y <- true_param * x # simulate y
steps <- 1e4 # number of MCMC steps to take
param <- rep(0, steps) # vector of parameter values
old_fit <- log(sum((y - (x * param[1]))^2)) # fit at starting guess
jumpyness <- 1 # controls random jumping around
for (i in 2:steps){
proposal <- rnorm(1,param[i - 1], 0.2) # propose a new parameter
new_fit <- log(sum((y - (x *proposal))^2)) # calculate fit of new parameter
rand_accept <- log(runif(1,0,jumpyness)) # adjust acceptance a bit
if (new_fit < (old_fit - rand_accept)){ # accept in proportion to improvment
param[i] <- proposal
} else {
param[i] <- param[i - 1]
}
}
```
---
# Basics of Bayes - MCMC
<img src="slides_files/figure-html/unnamed-chunk-7-1.svg" style="display: block; margin: auto;" />
---
# Enter Stan
Stan (primarily) uses a very elegant method called Hamiltonian Monte Carlo with a No-U-turn sampler (NUTs) to help Bayesian models converge quickly with relatively clear diagnostics.
We don't have time to go into it, but trust me, it's cool. See [Monnahan et al. 2018](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.12681)
---
# Enter Stan
<img src="https://besjournals.onlinelibrary.wiley.com/cms/asset/5ae32b4d-686e-4ea2-9d8a-402aba0a02fe/mee312681-fig-0003-m.jpg" width="50%" style="display: block; margin: auto;" />
---
# Why Stan?
.pull-left[
- There are lots of Bayesian software options out there
- BUGs, JAGS, NIMBLE, greta
- Stan is my personal favorite
- Blazing fast.
- Amazing documentation and community
- Robust & unique diagnostic tools
- Good enough for Gelman
- Available in lots of languages
The Bayesian standard in social sciences, medicine etc.
Modern Bayesian textbooks built around it
] .pull-right[
![:scale 75%](https://besjournals.onlinelibrary.wiley.com/cms/asset/d0c113e4-43d7-4f1e-8b8a-f48a8cf54ce6/mee312681-fig-0001-m.jpg)
]
---
# Why Stan?
![:scale 25%](https://besjournals.onlinelibrary.wiley.com/cms/asset/6d13eaf0-e808-4db9-a274-65bc94ac4a88/mee312681-fig-0004-m.jpg)
---
# Why Stan?
.pull-left[
FYI, it's named after a person, [Stanislaw Ulam](https://en.wikipedia.org/wiki/Stanislaw_Ulam), inventor of Monte Carlo process, so it's "Stan", not "stan","STAN", "S.T.A.N."
Cons...
- No easy way for discrete latent parameters (maybe coming? maybe in `cmdstan`)
- Less familiar in some parts of ecology?
- ?
] .pull-right[
![:scale 75%](https://upload.wikimedia.org/wikipedia/commons/thumb/8/82/Stanislaw_Ulam.tif/lossy-page1-800px-Stanislaw_Ulam.tif.jpg)
]
---
class: center, middle, inverse
# Bayesian regression with `rstanarm`
---
# Bayesian regression with `rstanarm`
Bayesian methods traditionally required writing your own code of the model. Made running say a regression pretty annoying.
- No one wants to run JAGS when `lm(y ~ x)` is on the table
`rstanarm` was developed to reduce the "it's a pain" reason for not using Bayesian regressions
- Modification of classic "applied regression modeling" (`arm`) package
---
# Bayesian regression with `rstanarm`
We're doing to play with the `gapminder` dataset
<table>
<thead>
<tr>
<th style="text-align:left;"> country </th>
<th style="text-align:left;"> continent </th>
<th style="text-align:right;"> year </th>
<th style="text-align:right;"> lifeExp </th>
<th style="text-align:right;"> pop </th>
<th style="text-align:right;"> gdpPercap </th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;"> Afghanistan </td>
<td style="text-align:left;"> Asia </td>
<td style="text-align:right;"> 1952 </td>
<td style="text-align:right;"> 28.801 </td>
<td style="text-align:right;"> 8425333 </td>
<td style="text-align:right;"> 779.4453 </td>
</tr>
<tr>
<td style="text-align:left;"> Afghanistan </td>
<td style="text-align:left;"> Asia </td>
<td style="text-align:right;"> 1957 </td>
<td style="text-align:right;"> 30.332 </td>
<td style="text-align:right;"> 9240934 </td>
<td style="text-align:right;"> 820.8530 </td>
</tr>
<tr>
<td style="text-align:left;"> Afghanistan </td>
<td style="text-align:left;"> Asia </td>
<td style="text-align:right;"> 1962 </td>
<td style="text-align:right;"> 31.997 </td>
<td style="text-align:right;"> 10267083 </td>
<td style="text-align:right;"> 853.1007 </td>
</tr>
<tr>
<td style="text-align:left;"> Afghanistan </td>
<td style="text-align:left;"> Asia </td>
<td style="text-align:right;"> 1967 </td>
<td style="text-align:right;"> 34.020 </td>
<td style="text-align:right;"> 11537966 </td>
<td style="text-align:right;"> 836.1971 </td>
</tr>
<tr>
<td style="text-align:left;"> Afghanistan </td>
<td style="text-align:left;"> Asia </td>
<td style="text-align:right;"> 1972 </td>
<td style="text-align:right;"> 36.088 </td>
<td style="text-align:right;"> 13079460 </td>
<td style="text-align:right;"> 739.9811 </td>
</tr>
<tr>
<td style="text-align:left;"> Afghanistan </td>
<td style="text-align:left;"> Asia </td>
<td style="text-align:right;"> 1977 </td>
<td style="text-align:right;"> 38.438 </td>
<td style="text-align:right;"> 14880372 </td>
<td style="text-align:right;"> 786.1134 </td>
</tr>
</tbody>
</table>
---
# Bayesian regression with `rstanarm`
Let's start with a simple model: predict log(life expectancy) as a function of log(gdp)
<img src="slides_files/figure-html/unnamed-chunk-10-1.svg" style="display: block; margin: auto;" />
---
# Bayesian regression with `rstanarm`
```r
simple_lifexp_model <- rstanarm::stan_glm(lifeExp ~ gdpPercap,
data = gapminder,
refresh = 1000,
chains = 1)
```
```
##
## SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 8.4e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.84 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.060138 seconds (Warm-up)
## Chain 1: 0.159642 seconds (Sampling)
## Chain 1: 0.21978 seconds (Total)
## Chain 1:
```
---
# Bayesian Regression using `rstanarm`
```r
summary(simple_lifexp_model)
```
```
##
## Model Info:
## function: stan_glm
## family: gaussian [identity]
## formula: lifeExp ~ gdpPercap
## algorithm: sampling
## sample: 1000 (posterior sample size)
## priors: see help('prior_summary')
## observations: 1704
## predictors: 2
##
## Estimates:
## mean sd 10% 50% 90%
## (Intercept) 54.0 0.3 53.6 54.0 54.3
## gdpPercap 0.0 0.0 0.0 0.0 0.0
## sigma 10.5 0.2 10.3 10.5 10.7
##
## Fit Diagnostics:
## mean sd 10% 50% 90%
## mean_PPD 59.5 0.3 59.0 59.5 59.9
##
## The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
##
## MCMC diagnostics
## mcse Rhat n_eff
## (Intercept) 0.0 1.0 1142
## gdpPercap 0.0 1.0 1099
## sigma 0.0 1.0 1085
## mean_PPD 0.0 1.0 1011
## log-posterior 0.1 1.0 321
##
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
```
---
# Digging into `rstanarm`
`rstanarm` function regression functions start with `stan_<model typs>`
- `stan_glm`
- `stan_glmer`
- `stan_gamm4`
- `stan_lm`
In most ways the syntax etc. works the same as the ML versions: going from a standard binominal glm to a Bayesian as as simple as
`glm(diagnosis ~ traits, data = data, family = binomial())`
`stan_glm(diagnosis ~ traits, data = data, family = binomial())`
---
# Digging into `rstanarm`
This isn't the class to learn `glm`s / hierarchical modeling (see the books I keep plugging)
What we will go over are some of the stan-specific options and diagnostics that apply across any model type you're using, but starting from this simple regression.
---
# Key Stan options
.pull-left[
`iter`: # of iterations per chain
`warmup`: # of iter to use in warmup
`chains`: # of chains to run
`cores`: # of parallel cores to run chains on
`max_treedepth`: controls how far the model will look for a new proposal before giving up
<!-- - higher values allow model to explore a flatter posterior -->
`adapt_delta`: the target rate that new proposals are accepted
<!-- - higher means the model takes smaller steps -->
`seed` sets the seed for the run
] .pull-right[
```r
simple_lifexp_model <- rstanarm::stan_glm(
log(lifeExp) ~ log(gdpPercap),
data = gapminder,
iter = 2000,
warmup = 1000,
chains = 1,
cores = 1,
prior = normal(),
control =
list(max_treedepth = 15,
adapt_delta = 0.8),
seed = 42
)
```
]
---
# iter, warmup, and chains
The # of kept iterations you keep is `iter` - `warmup`
- No need to thin! (though you can)
- Only real reason is to reduce memory use of saved models?
Balance between long enough to tune HMC / explore posterior, short enough to be fast. `warmup` defaults to half of `iter`
Sometimes slower is faster: giving stan enough iterations to warmup properly can actually be faster than trying to go with small number of iterations.
Best practice with any kind of MCMC is to run multiple chains
- If working, should all converge to the same place
- 4 is a good number
- Chains can be run in parallel, but have startup costs
- If model is really fast, parallel may be slower
---
# `treedepth` and `adapt_delta`
`treedepth` controls how many steps the NUTS sampler will take before giving up on finding a new value (at which time it will go back to where it started). Increasing this can helpful if the posterior is really flat.
`adapt_delta` controls how small the step sizes are. The higher `adapt_delta` is, the higher the target acceptance rate is, and therefore the smaller the step size.
- Basically, if the posterior has lots of small nooks and crannies, a low `adapt_delta` might result in step sizes that jump right over those points.
- This will produce a "divergence" warning, which is not good (we'll come back to this)
- Stan will suggest increasing `adapt_delta` if that happens
---
# Setting Priors
```r
simple_lifexp_model <- rstanarm::stan_glm(
log(lifeExp) ~ log(gdpPercap),
data = gapminder)
```
I thought you said Bayes requires priors? I don't see any in that regression.
`rstanarm` has some pretty clever selectors for [weakly informative priors](https://mc-stan.org/rstanarm/articles/priors.html)
---
# Setting Priors
```r
rstanarm::prior_summary(simple_lifexp_model)
```
```
## Priors for model 'simple_lifexp_model'
## ------
## Intercept (after predictors centered)
## Specified prior:
## ~ normal(location = 4.1, scale = 2.5)
## Adjusted prior:
## ~ normal(location = 4.1, scale = 0.58)
##
## Coefficients
## ~ normal(location = 0, scale = 2.5)
##
## Auxiliary (sigma)
## Specified prior:
## ~ exponential(rate = 1)
## Adjusted prior:
## ~ exponential(rate = 4.3)
## ------
## See help('prior_summary.stanreg') for more details
```
---
# Setting Priors
You can adjust these, and Stan recommends explicitly setting them since defaults may change.
see `?rstanarm::priors` and the vignette [here](https://mc-stan.org/rstanarm/articles/priors.html)
```r
lifexp_model <-
stan_glm(
log(lifeExp) ~ log(gdpPercap),
data = gapminder,
refresh = 0,
prior = normal(autoscale = TRUE), # prior on the model coefficients
prior_intercept = normal(autoscale = TRUE), # prior for any intercepts
prior_aux = exponential(autoscale = TRUE) # in the case prior sigma
)
```
---
# Setting Priors
Suppose I have a firm belief GDP is completely uncorrelated with life expectancy.
```r
sp_lifexp_model <-
stan_glm(
log(lifeExp) ~ log(gdpPercap),
data = gapminder,
refresh = 0,
prior = normal(0,0.025), # prior on the model coefficients
prior_intercept = normal(autoscale = TRUE), # prior for any intercepts
prior_aux = exponential(autoscale = TRUE) # in the case prior sigma
)
```
---
.pull-left[
<br>
<br>
```r
plot(sp_lifexp_model, pars = "log(gdpPercap)") +
labs(title = "Strong Prior")
```
<img src="slides_files/figure-html/unnamed-chunk-18-1.svg" style="display: block; margin: auto;" />
].pull-right[
<br>
<br>
```r
plot(lifexp_model, pars = "log(gdpPercap)") +
labs(title = "Weak Prior")
```
<img src="slides_files/figure-html/unnamed-chunk-19-1.svg" style="display: block; margin: auto;" />
]
---
# Potential Exercise - Priors
Try out a couple different kinds of priors
- How informative do they have to be before they start substantially affecting results?
- How does this change as you reduce the sample size of the data?
- Look at `?rstanarm::priors`
- Test out the effect of different distributions for the priors
---
# Diagnosing `stan` fits
`rstanarm` is just calling Stan in the background.
This means that all the same diagnostics apply whether you're using `rstan`, `rstanarm`, `brms`, or any of the other Stan packages.
The problem: things like `lm` will always<sup>1</sup> "work"
Numerical optimizers like HMC have no such guarantee
* You need to make sure the algorithm has converged
Stan has numerous built-in diagnostics to help you do this.
.footnote[
[1] except when it doesn't
---
# Key Stan Diagnostics - Divergences
Divergences are the big one to watch out for.
Divergences are a warning that the model has missed some part of the parameter space (they're a handy product of the math behind HMC).
Chains with large numbers of divergences probably have unreliable results.
Simplest solution is to increase `adapt_delta`, which will slow down the model some.
If divergences persist, try increasing the warmup period, and if that fails, you probably need to think about the structure of your model.
A few divergences may still pop up once in a while: This doesn't automatically mean that your model doesn't work, but you should explore further to see what's going on. Thinning a model with a few divergences out of thousands of iterations can help.
---
# Key Stan Diagnostics - `max_treedepth`
The `max_treedepth` parameter of a Stan model controls the maximum number of steps that the NUTS algorithm will take in search of a new proposal.
Getting lots of `max_treedepth exceeded` warnings means that the algorithm hit the max number of steps rather than finding where the trajectory of the parameter space doubles back on itself.
This is more of an efficiency problem than a fundamental model fitting problem like divergences.
Solutions include increasing `max_treedepth`, and trying a longer warmup period, and reparameterizing the model.
---
# More Key `stan` Diagnostics
* R-hat
Measures whether all the chains have mixed for each parameter. Parameters with high R-hat values probably haven't converged
* Bulk ESS
Measures the effective sample size of each parameter. Too low suggests you might need more iterations, a better parameterization, or some thinning
* Energy
E-BFMI is a measure of the warmup success. A warning here suggests you might need a model reparameterization or just need a longer warmup.
**THESE ARE ALL ROUGH APPROXIMATIONS SEE [here](https://mc-stan.org/misc/warnings.html) FOR MORE DETAILS**
---
# Diagnosing Stan models
<!-- This is one of the nicest features of Bayesian analysis: the diagnostics are more or less the same no matter what kind of model you're fitting. -->
Stan has numerous built in functions for looking at these diagnostics, and will even include helpful suggestions about them! They generally stat with `rstan::check_<something>`
```r
rstan::check_hmc_diagnostics(lifexp_model$stanfit)
```
```
##
## Divergences:
##
## Tree depth:
##
## Energy:
```
---
# Diagnosing with shinystan
The built-in functions are great for diagnosing large numbers of models, make plots, etc.
Stan also comes with a built in model visualizer called `shinystan` that allows you to explore all standard model diagnostics, as well as lots of other features
```r
rstanarm::launch_shinystan(lifexp_model)
```
---
# An Unhappy HMC
```r
sad_model <-
stan_glmer(
log(lifeExp) ~ log(gdpPercap) + (year | country),
data = gapminder,
cores = 4,
chains = 4,
refresh = 0,
adapt_delta = 0.2,
iter = 2000,
warmup = 400
)
```
---
# Happy HMC, Sad Model
What happens if we successfully run a bad model?
```r
library(gapminder)
library(rstanarm)
hist(log(gapminder$gdpPercap / 1e4))
gapminder$thing <- log(gapminder$gdpPercap / 1e4)
bad_model <- rstanarm::stan_glm(thing ~ year, family = gaussian(link = "log"), data = gapminder)
```
---
class: inverse, center, middle
# Analyzing Results