lab 10 - Bayesian Methods

SOLUTIONS

Introduction

In this lab, you’ll practice creating Bayesian models.

Packages

Exercise 1: Bayesian Regression

Data:

This data contains roughly 2000 trials of a mouse-tracking experiment,

It is a preprocessed data set from an experiment conducted by Kieslich et al. (2020) in which participants classified specific animals into broader categories. The data set contains response times, MAD, AUC and other attributes as well as all experimental conditions.

dolphin <- aida::data_MT

# aggregate
dolphin_agg <- dolphin |>
  dplyr::filter(correct == 1) |> # only correct values
  dplyr::group_by(subject_id) |> 
  dplyr::summarize(       # use the median AUC/MAD
    AUC = median(AUC, na.rm = TRUE),
    MAD = median(MAD, na.rm = TRUE)) |> 
  dplyr::ungroup() |> 
  dplyr::mutate(
    # the function scale centers and scales the data 
    AUC = scale(AUC),
    MAD = scale(MAD)
  )
  
# let's take a look
head(dolphin_agg)

# A tibble: 6 × 3
  subject_id AUC[,1] MAD[,1]
       <dbl>   <dbl>   <dbl>
1       1001   0.643   0.593
2       1002   0.732   0.367
3       1003  -0.839  -0.833
4       1004  -0.551  -0.535
5       1005   0.619   0.436
6       1006   0.748   1.02

Plot AUC vs MAD

SOLUTION:

ggplot(data = dolphin_agg, 
       aes(x = AUC, 
           y = MAD)) + 
  geom_point(size = 3, alpha = 0.3) +
  theme_minimal()

This graph displays the distribution of AUC and MAD values. We can see that there is a strong relationship between AUC and MAD. And that makes a lot of sense. The more the cursor strives toward the competitor, the larger is the overall area under the curve.

Regress AUC against MAD using a bayesian regression using the dolphin_agg data. Save the results in the fits directory as “model1.” Use tidybayes::summarise_draws to summarize the model

SOLUTION:

model1 <- brms::brm(
  # model formula
  AUC ~ MAD, 
  # data
  data = dolphin_agg,
  file = "fits/model1"
)
# show summary in tidy format
tidybayes::summarise_draws(model1)

# A tibble: 6 × 10
  variable        mean   median      sd     mad       q5      q95  rhat ess_bulk
  <chr>          <dbl>    <dbl>   <dbl>   <dbl>    <dbl>    <dbl> <dbl>    <dbl>
1 b_Intercept  5.49e-4  5.58e-4 0.0328  0.0319   -0.0536   0.0552  1.00    3967.
2 b_MAD        9.39e-1  9.39e-1 0.0348  0.0344    0.882    0.996   1.00    4384.
3 sigma        3.49e-1  3.47e-1 0.0251  0.0248    0.311    0.393   1.00    3838.
4 Intercept    5.49e-4  5.58e-4 0.0328  0.0319   -0.0536   0.0552  1.00    3967.
5 lprior      -3.16e+0 -3.16e+0 0.00237 0.00223  -3.16    -3.16    1.00    3890.
6 lp__        -4.32e+1 -4.29e+1 1.30    1.02    -45.8    -41.8     1.00    2030.
# ℹ 1 more variable: ess_tail <dbl>

Extract the model coefficients, then redo the graph to add a geom_abline with the model slope and intercept.

SOLUTION:

# extract model parameters:
model_intercept <- summary(model1)$fixed[1,1]
model_slope <- summary(model1)$fixed[2,1]

ggplot(data = dolphin_agg, 
       aes(x = AUC, 
           y = MAD)) + 
  geom_abline(intercept = model_intercept, slope = model_slope, color = project_colors[2], size  = 1) +
  geom_point(size = 3, alpha = 0.3, color = project_colors[1]) +
  theme_minimal()

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

Use tidybayes::get_variables to get the model variables, and then use tidybayes::spread_draws with the model1 to get the draws for “b_MAD” and “b_Intercept” as a tibble, bound to the variable posteriors1.

SOLUTION:

tidybayes::get_variables(model1)

 [1] "b_Intercept"   "b_MAD"         "sigma"         "Intercept"    
 [5] "lprior"        "lp__"          "accept_stat__" "stepsize__"   
 [9] "treedepth__"   "n_leapfrog__"  "divergent__"   "energy__"

posteriors1 <- model1 |>
  tidybayes::spread_draws(b_MAD, b_Intercept) |>
  select(b_MAD, b_Intercept)

posteriors1

# A tibble: 4,000 × 2
   b_MAD b_Intercept
   <dbl>       <dbl>
 1 0.933     0.0294 
 2 0.938    -0.0331 
 3 0.991    -0.0276 
 4 0.942    -0.0118 
 5 0.928    -0.0234 
 6 0.895     0.00991
 7 0.923    -0.0467 
 8 0.917    -0.0472 
 9 0.962     0.103  
10 0.981     0.110  
# ℹ 3,990 more rows

Repeat the last operation, but this time use ndraws = 100 to limit the number of draws (call this posterior2). Again using the first graph, add a line with intercept and slope corresponding to each row of posterior2 (i.e. add 100 lines).

SOLUTION:

# wrangle data frame
posteriors2 <- model1 |>
  # parameter 'ndraws' requests 100 random subsamples
  tidybayes::spread_draws(b_MAD, b_Intercept, ndraws = 100) |>
  select(b_MAD, b_Intercept)
  
# plot
ggplot(data = dolphin_agg, 
       aes(x = MAD, 
           y = AUC)) + 
  geom_abline(data = posteriors2,
              aes(intercept = b_Intercept, slope = b_MAD), 
              color = project_colors[2], size  = 0.1, alpha = 0.4) +
  geom_point(size = 3, alpha = 0.3, color = project_colors[1])

Use model1 and tidybayes::gather_draws, to get the draws for “b_MAD” as a tibble in long form, bound to the variable posteriors3. Rename ‘.variable’ to ‘parameter’ and ‘.value’ to ‘posterior’ and then keep only those two columns.

Calculate the mean, the lower and the upper bound of a 90% CrI, using the function tidybayes::hdi().

SOLUTION:

posteriors3 <- model1 |>
   # use the gather_draws() function for "long data"
   tidybayes::gather_draws(b_MAD) |> 
   # change names of columns
   dplyr::rename(parameter = .variable,
          posterior = .value) |> 
   # select only those columns that are relevant
   dplyr::select(parameter, posterior)

head(posteriors3)

# A tibble: 6 × 2
# Groups:   parameter [1]
  parameter posterior
  <chr>         <dbl>
1 b_MAD         0.933
2 b_MAD         0.938
3 b_MAD         0.991
4 b_MAD         0.942
5 b_MAD         0.928
6 b_MAD         0.895

posteriors3_agg <- posteriors3 |> 
  dplyr::group_by(parameter) |> 
  dplyr::summarise(
    `90lowerCrI`   = tidybayes::hdi(posterior, credMass = 0.90)[1],
    mean_posterior = mean(posterior),
    `90higherCrI`  = tidybayes::hdi(posterior, credMass = 0.90)[2])

posteriors3_agg

# A tibble: 1 × 4
  parameter `90lowerCrI` mean_posterior `90higherCrI`
  <chr>            <dbl>          <dbl>         <dbl>
1 b_MAD            0.870          0.939          1.01

Exercise 2: elasticity

In this exercise we will estimate price elasticity. The dataset contains price and monthly unit-volume data for four sites over 12 months. Since the unit-volumes are counts of unit sold, we’ll need a discrete pmf to model the data generation process.

# read the data
dat <- 
  readr::read_csv("data/price_data.csv",show_col_types = FALSE) |> 
  dplyr::filter(site == "Windsor")
# plot the data
dat |> 
  ggplot(aes(x=price, y=volume, group=site)) +
  geom_point()

Since elasticity is defined as the percentage change in volume ( $\Delta V/V$ ) for a given percentage change in price ( $\Delta p/p$ ), then with elasticity parameter $\beta$ we write:

$\begin{align*} \frac{\Delta V}{V} & = \beta\times\frac{\Delta p}{p} \\ \frac{\partial V}{V} & = \beta\times\frac{\partial p}{p} \\ \partial\log(V) & = \beta\times\partial\log(p) \end{align*}$

This equation is the justification for the log-log regression model of elasticity, and this model has solution $V = Kp^\beta$ , where $K$ is a constant.

As written, the value of $K$ is either the volume when $p=1$ which may or may not be useful, or it is the volume when $\beta=0$ , which is uninteresting.

To make the interpretation of the constant $K$ more useful, the model can be written as

$\partial\log(V) = \beta\times\partial\log(p/p_{\text{baseline}});\qquad V = K\left(\frac{p}{p_{\text{baseline}}}\right)^{\beta}$

in which case the constant is interpreted as the volume when the price equals the baseline price; the elasticity parameter $\beta$ is unchanged.

The implies that our regression model should be $\mathrm{volume} = \log(K) + \beta\log(p)$ given the $\log$ link in the Poisson model. To analyze the data

scale and transform the price data by dividing by the mean price and then taking the log. Assign the modified data to the variable dat01.
build and fit a Bayesian Poisson model with
- a normal(0,5) prior on the intercept, and
- a cauchy(0,10) prior on the independent variable with upper bound 0 (to ensure that elasticities are negative).
- assign the fit to the variable windsor_01 and save the fit in the folder “fits/windsor_01”
once the model is fit, summarize the draws

SOLUTION:

dat01 <- dat |> 
  dplyr::mutate( price = log( price/mean(price)) )

windsor_01 <- 
  brm(data =
    dat01,
    family = poisson(),
    volume ~ 1 + price,
    prior = c(prior(normal(0, 5), class = Intercept),
              prior(cauchy(0, 10), ub = 0, class = b)
    ),
    iter = 4000, warmup = 1000, chains = 4, cores = 4,
    seed = 4,
    file = "fits/windsor_01")

tidybayes::summarise_draws(windsor_01)

# A tibble: 5 × 10
  variable      mean median     sd    mad     q5    q95  rhat ess_bulk ess_tail
  <chr>        <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <dbl>    <dbl>    <dbl>
1 b_Intercept   4.18   4.18 0.0361 0.0360   4.12   4.24  1.00    6649.    6665.
2 b_price      -3.76  -3.76 1.35   1.38    -6.03  -1.58  1.00    5322.    3119.
3 Intercept     4.18   4.18 0.0360 0.0358   4.12   4.24  1.00    6660.    6704.
4 lprior       -5.78  -5.76 0.0881 0.0885  -5.94  -5.66  1.00    5496.    3277.
5 lp__        -58.3  -57.9  1.14   0.792  -60.5  -57.2   1.00    3802.    3857.

pairs(windsor_01, variable = c('b_Intercept', 'b_price'))

In a Poisson model, the mean and variance of the dependent variable are equal. This is clearly not the case here (check this). So we might not expect the Poisson generating process to be a good fit.

An alternative discrete pmf to the Poisson data generation process is the negative binomial process.

build and fit a Bayesian Negative Binomial model using the same priors as in the Poisson model
assign the fit to the variable windsor_02 and save the fit in the folder “fits/windsor_02”

SOLUTION:

windsor_02 <- 
  brm(data =
    dat01,
    family = negbinomial(),
    volume ~ 1 + price,
    prior = c(prior(normal(0, 5), class = Intercept),
              prior(cauchy(0, 10), ub = 0, class = b)
    ),
    iter = 4000, warmup = 1000, chains = 4, cores = 4,
    seed = 4,
    file = "fits/windsor_02")

tidybayes::summarise_draws(windsor_02)

# A tibble: 6 × 10
  variable    mean median      sd     mad     q5     q95  rhat ess_bulk ess_tail
  <chr>      <dbl>  <dbl>   <dbl>   <dbl>  <dbl>   <dbl> <dbl>    <dbl>    <dbl>
1 b_Inter…  4.18e0   4.19 7.22e-2  0.0647   4.07   4.30   1.00    5451.    5032.
2 b_price  -3.93e0  -3.78 2.16e+0  2.21    -7.79  -0.720  1.00    4462.    2829.
3 shape     1.40e4  26.5  1.08e+6 16.2      8.92  88.3    1.00    5367.    4462.
4 Interce…  4.19e0   4.19 7.22e-2  0.0646   4.07   4.30   1.00    5445.    5062.
5 lprior   -1.17e1 -11.7  1.01e+0  0.892  -13.4  -10.2    1.00    5273.    4638.
6 lp__     -5.68e1 -56.4  1.47e+0  1.21   -59.7  -55.2    1.00    3606.    3985.

pairs(windsor_02, variable = c('b_Intercept', 'b_price') )

Since we have discrete outcomes, the continuous distribution that is the default output of brms::pp_check is not appropriate here.

Instead use the type = “rootogram” argument of brms::pp_check to plot posterior predictive checks for the Poisson and NB models.

Rootograms graphically compare frequencies of empirical distributions and expected (fitted) probability models. For the observed distribution the histogram is drawn on a square root scale (hence the name) and superimposed with a line for the expected frequencies.

SOLUTION:

p1 <- brms::pp_check(windsor_01, type = "rootogram") + xlim(20,120)
p2 <- brms::pp_check(windsor_02, type = "rootogram") + xlim(20,120)
p1/p2

Finally, compare the two model fits using brms::loo and brms::loo_compare and comment on the model comparison based on all the analyses performed in this exercise.

SOLUTION:

brms::loo_compare(brms::loo(windsor_01), brms::loo(windsor_02))

           elpd_diff se_diff
windsor_02  0.0       0.0   
windsor_01 -6.5       4.0

The loo comparison suggests that negative binomial model is better than the Poisson model, however, as you can see from the plot of the data and the histograms (in the rootograms) there just isn’t much data so you might proceed with caution. Note that the NB model does give higher likelihood to the largest value n the right in the histograms (just less than 100), consistent with it being the better model per the loo comparison.

One approach in this case it to use all the sites, estimating a hierachical model for elasticity, with a group level elasticity and site separate site-level adjustment. This might be more useful in practice.

Exercise 3:

Consider the a complicated manufacturing process as follows, where production is one unit per day on average and manufacturing equipment maintenance takes 20% of the time on average.

# define parameters
prob_maintenance <- 0.2  # 20% of days
rate_work        <- 1    # average 1 unit per day

# sample one year of production
n <- 365

# simulate days of maintenance
set.seed(365)
maintenance <- rbinom(n, size = 1, prob = prob_maintenance)

# simulate units completed
y <- (1 - maintenance) * rpois(n, lambda = rate_work)

dat <-
  tibble::tibble(maintenance = factor(maintenance, levels = 1:0), y = y)
  
dat %>% 
  ggplot(aes(x = y)) +
  geom_histogram(aes(fill = maintenance),
                 binwidth = 1, linewidth = 1/10, color = "grey92") +
  scale_fill_manual(values = ggthemes::canva_pal("Green fields")(4)[1:2]) +
  xlab("Units completed") +
  theme(legend.position = "none")

With this data, the likelihood of observing zero on y, (i.e., the likelihood zero units were completed on a given day) is

$\begin{align*} \operatorname{Pr}(0 | p, \lambda) & = \operatorname{Pr}(\text{maintenance} | p) + \operatorname{Pr}(\text{work} | p) \times \operatorname{Pr}(0 | \lambda) \\ & = p + (1 - p) \exp (- \lambda). \end{align*}$

And the likelihood of a non-zero value $y$ is:

$\operatorname{Pr}(y | y > 0, p, \lambda) = \operatorname{Pr}(\text{maintenance} | p) (0) + \operatorname{Pr}(\text{work} | p) \operatorname{Pr}(y | \lambda) = (1 - p) \frac {\lambda^y \exp (- \lambda)}{y!}$

So letting $p$ be the probability that $y$ is zero and $lambda$ be the shape of the distribution, the zero-inflated Poisson (ZIPoisson) regression model might take the basic form:

$\begin{align*} y_{i} & \sim\mathrm{ZIPoisson}(p_{i},\lambda_{i})\\ \mathrm{logit}(p_{i}) & =\alpha_{p}+\beta_{p}x_{i}\\ \log(\lambda_{i}) & =\alpha_{\lambda}+\beta_{\lambda}x_{i}, \end{align*}$ where both parameters in the likelihood, $p_i$ and $\lambda_i$ might get their own statistical model. In brms, $p_i$ is denoted by zi.

Create a Bayesian zero-inflated Poisson Model (family = zero_inflated_poisson) with a normal(1,0.5) normal prior on the intercept and a beta(2,6) prior on zi (class = zi) and save the model in “fits/zi_model”

YOUR ANSWER:

zi_model <- 
  brms::brm(data = dat, 
      family = zero_inflated_poisson,
      y ~ 1,
      prior = c(prior(normal(1, 0.5), class = Intercept),
                prior(beta(2, 6), class = zi)),  # the brms default is beta(1, 1)
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      seed = 12,
      file = "fits/zi_model") 
print(zi_model)

 Family: zero_inflated_poisson 
  Links: mu = log; zi = identity 
Formula: y ~ 1 
   Data: dat (Number of observations: 365) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.02      0.09    -0.15     0.19 1.00     1333     1641

Further Distributional Parameters:
   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
zi     0.23      0.05     0.12     0.34 1.00     1463     1475

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

fixef(zi_model)

            Estimate  Est.Error       Q2.5     Q97.5
Intercept 0.02353553 0.08690357 -0.1504926 0.1878018

Express the fitted parameters as probabilities:

lambda = exp(Intercept) = exp(0.02) = 1.02

p = zi = 0.23

Exercise 4:

Load the regional sales data below

dat <- readr::read_csv("data/regional_sales.csv", show_col_types = FALSE)

Build the following sales model. How are the terms (1|region_id) and (1|store_id) referred to in the output, and how do you interpret them.

YOUR ANSWER:

# Fit hierarchical Bayesian model
sales_model <- brm(
  formula = log_sales ~ 1 + 
    scale(store_size) + 
    scale(competition_density) + 
    scale(marketing_spend) +
    seasonal_effect + 
    (1|region_id) + 
    (1|store_id),
  data = dat,
  family = gaussian(),
  prior = c(
    prior(normal(10, 1), class = "Intercept"),
    prior(normal(0, 0.5), class = "b"),
    prior(cauchy(0, 0.2), class = "sd"),
    prior(cauchy(0, 0.2), class = "sigma")
  ),
  chains = 4,
  cores = 4,
  seed = 456,
  file = "fits/sales_model"
)

The terms (1|region_id) and (1|store_id) are called random effects or random intercepts in a mixed-effects model.

Random intercepts allow different baseline levels for different groups (in this case, regions and stores) while assuming these levels come from a common distribution. This creates a partial pooling of information across groups - a middle ground between treating groups completely separately and completely the same.

In the model specification:

(1|region_id) means each region gets its own intercept adjustment
(1|store_id) means each store gets its own intercept adjustment

Create a tibble and bind it to the variable new_store_data. Give it the following columns:

store_size = 5000
competition_density = 5
marketing_spend = 10000
seasonal_effect = 0
region_id = 1
store_id = max(data$store_id) + 1

Use the model and this data to predict sales for the new store and give the confidence intervals for the prediction.

YOUR ANSWER:

new_store_data <- tibble::tibble(
  store_size = 5000,
  competition_density = 5,
  marketing_spend = 10000,
  seasonal_effect = 0,
  region_id = 1,
  store_id = max(dat$store_id) + 1
)

# Make predictions
predict(sales_model, newdata = new_store_data, allow_new_levels=TRUE)

     Estimate Est.Error     Q2.5    Q97.5
[1,]  10.9829 0.0967135 10.79838 11.17192

Grading

Total points available: 30 points.

Component	Points
Ex 1 - 4	30