How to avoid using inlamemi

library(inlamemi)
library(INLA)

In this vignette, we show how to build the measurement error and missingness models that inlamemi can fit, but without using inlamemi. This is so that if you have a model that for some reason does not work within inlamemi, you can see some examples of how to fit these models “manually”.

There are two ways to structure the data for the models, the first option is the one taken in Muff et al (2015) and Skarstein et al (2023), where the matrices are constructed directly by hand. The other approach is the one that is used internally in inlamemi, where we use the function inla.stack() to define the data structure for each sub-model, and then join these together. This latter approach is a bit more modular, and so this is the approach we will show here.

A short explanation of inla.stack() for hierarchical modelling

The inla.stack() function is most commonly used for spatial modelling with stochastic partial differential equations (SPDE), since the data then is a bit more complex to structure. A description of using stacks in that context can be found in chapter 7.3.3 of Bayesian inference with INLA. However, they can also be quite useful when defining hierarchical models with multiple likelihoods, which traditionally need to be defined by setting up matrices where the structure of the matrices encode the structure of the model, as explained in chapter 6.4 of Bayesian inference with INLA. The stacks are eventually converted to these matrices by INLA, but for the modeller it may be more convenient to define the model through the stacks. one reason is that when setting up the matrices you need to know from the beginning how many levels you will have in the model, since this determines the number of columns in the matrices. However, with the stacks, each model level has its own stack, and previous stacks do not change if you add additional stacks. That is, the definition of stacks for each model level can be done independently.

The inla.stack function takes four arguments:

Classical and Berkson measurement error and missingness

In the first example, we use the dataset simple_data, generated in the vignette Simulated examples. This dataset has classical and Berkson measurement error, and missing data, and so the main model we want to fit is

\[ y_i = \beta_0 + \beta_x x_i + \beta_z z_i + \varepsilon_i, \] the Berkson measurement error is \[ 0 = -x_{true, i} + r_i + u_{b,i}, \] the classical measurement error model is \[ x_i = r_i + u_{c,i}, \] and the imputation model is

\[ 0 = -r_i + \alpha_0 + \alpha_z z_i + \varepsilon_{r,i}. \]

We begin by loading the data and defining the number of observations.

data <- simple_data
n <- nrow(data)

Next, we define all the priors and initial values.

# Prior for beta.x
prior.beta <- c(0, 1/1000) # N(0, 10^3)

# Priors for y, measurement error and true x-value precision
prior.prec.y <- c(10, 9)
prior.prec.u_b <- c(10, 9)
prior.prec.u_c <- c(10, 9)
prior.prec.r <- c(10, 9)

# Initial values
initial.prec.y <- 1
initial.prec.u_b <- 1
initial.prec.u_c <- 1
initial.prec.r <- 1

In inlamemi, we could then fit the model like this:

# Fit the model
simple_model <- fit_inlamemi(data = data,
                           formula_moi = y ~ x + z,
                           formula_imp = x ~ z,
                           family_moi = "gaussian",
                           error_type = c("berkson", "classical"),
                           prior.prec.moi = prior.prec.y,
                           prior.prec.berkson = prior.prec.u_b,
                           prior.prec.classical = prior.prec.u_c,
                           prior.prec.imp = prior.prec.r,
                           initial.prec.moi = initial.prec.y,
                           initial.prec.berkson = initial.prec.u_b,
                           initial.prec.classical = initial.prec.u_c,
                           initial.prec.imp = initial.prec.r)

If we instead want to do this without inlamemi, we start by defining the stacks for each model level.

For the main regression model, we have the model \[ y_i = \beta_0 + \beta_x x_i + \beta_z z_i + \varepsilon_i, \] which is defined in a stack like this:

stk_moi <- inla.stack(data = list(y_moi = data$y),
                      A = list(1),
                      effects = list(
                        list(beta.0 = rep(1, n),
                             beta.x = 1:n,
                             beta.z = data$z)),
                      tag = "moi")

Next, the Berkson measurement error model is \[ 0 = -x_{true, i} + r_i + u_{b,i}, \] and the corresponding stack is:

stk_b <- inla.stack(data = list(y_berkson = rep(0, n)),
                    A = list(1),
                    effects = list(
                      list(id.x = 1:n,
                           weight.x = -1,
                           id.r = 1:n,
                           weight.r = 1)),
                    tag = "berkson")

The classical measurement error model is \[ x_i = r_i + u_{c,i}, \]

and the stack is:

stk_c <- inla.stack(data = list(y_classical = data$x),
                    A = list(1),
                    effects = list(
                      list(id.r = 1:n,
                           weight.r = 1)),
                    tag = "classical")

Finally, the imputation model is \[ 0 = -r_i + \alpha_0 + \alpha_z z_i + \varepsilon_{r,i}. \] and the corresponding stack is

stk_imp <- inla.stack(data = list(y_imp = rep(0, n)),
                      A = list(1),
                      effects = list(
                        list(id.r = 1:n,
                             weight.r = rep(-1, n),
                             alpha.0 = rep(1, n),
                             alpha.z = data$z)),
                      tag = "imputation")

We then stack these together, which will give us the matrix formulation that we also could have specified manually.

stk_full <- inla.stack(stk_moi, stk_b, stk_c, stk_imp)

For this model, we have two latent effects, x and r, which are both specified in the formula. For further details on the formula, see the supplementary material of Skarstein et al (2023), which can be found online here: https://emmaskarstein.github.io/Missing-data-and-measurement-error/simulation_example.html

formula <- list(y_moi, y_berkson, y_classical, y_imp) ~ - 1 + beta.0 + beta.z +
  f(beta.x, copy = "id.x",
    hyper = list(beta = list(param = c(0, 1/1000), fixed = FALSE))) +
  f(id.x, weight.x, model = "iid", values = 1:n,
    hyper = list(prec = list(initial = -15, fixed = TRUE))) +
  f(id.r, weight.r, model="iid", values = 1:n,
    hyper = list(prec = list(initial = -15, fixed = TRUE))) +
  alpha.0 + alpha.z

Finally, we call the inla() function. This model consists of four Gaussian sub-models, and in the control.family argument we give the priors for each of these.

model_sim <- inla(formula, data = inla.stack.data(stk_full),
                  family = c("gaussian", "gaussian", "gaussian", "gaussian"),
                  control.family = list(
                    list(hyper = list(prec = list(initial = log(initial.prec.y),
                                                  param = prior.prec.y,
                                                  fixed = FALSE))),
                    list(hyper = list(prec = list(initial = log(initial.prec.u_b),
                                                  param = prior.prec.u_b,
                                                  fixed = FALSE))),
                    list(hyper = list(prec = list(initial = log(initial.prec.u_c),
                                                  param = prior.prec.u_c,
                                                  fixed = FALSE))),
                    list(hyper = list(prec = list(initial = log(initial.prec.r),
                                                  param = prior.prec.r,
                                                  fixed = FALSE)))
                  ),
                  control.predictor = list(compute = TRUE)
)

summary(model_sim)
#> Time used:
#>     Pre = 1.87, Running = 2.34, Post = 0.176, Total = 4.38 
#> Fixed effects:
#>          mean    sd 0.025quant 0.5quant 0.975quant  mode kld
#> beta.0  1.030 0.219      0.612    1.029      1.435 1.025   0
#> beta.z  1.911 0.388      1.225    1.901      2.560 1.910   0
#> alpha.0 1.033 0.051      0.934    1.033      1.132 1.033   0
#> alpha.z 2.025 0.052      1.922    2.025      2.127 2.025   0
#> 
#> Random effects:
#>   Name     Model
#>     id.x IID model
#>    id.r IID model
#>    beta.x Copy
#> 
#> Model hyperparameters:
#>                                             mean    sd 0.025quant 0.5quant 0.975quant  mode
#> Precision for the Gaussian observations    1.128 0.365      0.556    1.080       1.98 0.992
#> Precision for the Gaussian observations[2] 1.127 0.355      0.581    1.076       1.96 0.983
#> Precision for the Gaussian observations[3] 0.927 0.111      0.729    0.920       1.17 0.905
#> Precision for the Gaussian observations[4] 0.977 0.128      0.749    0.969       1.25 0.954
#> Beta for beta.x                            1.972 0.207      1.559    1.974       2.38 1.981
#> 
#> Marginal log-Likelihood:  -20699.96 
#>  is computed 
#> Posterior summaries for the linear predictor and the fitted values are computed
#> (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')

A model for missing data, missing not at random

In this example, dealing with only missing data, we show how to include another level in the model to potentially capture how the missingness mechanism may depend on available covariates.

In the dataset, simulated below, we let the value of the covariate x with missingness depend on another covariate z1 (corresponding to the imputation model), while the probability of an entry in x missing depends on another covariate z2 (corresponding to the missingness model).

set.seed(2024)
n <- 1000

z1 <- rnorm(n, mean = 0, sd = 1)
z2 <- rnorm(n, mean = 0, sd = 1)

alpha.0 <- 1; alpha.z1 <- 0.3
x <- rnorm(n, mean = alpha.0 + alpha.z1*z1, sd = 1)

gamma.0 <- -1.5; gamma.z2 <- -0.5
m_pred <- gamma.0 + gamma.z2*z2
m_prob <- exp(m_pred)/(1 + exp(m_pred))
m <- as.logical(rbinom(n, 1, prob = m_prob))
x_obs <- x
x_obs[m] <- NA

sum(is.na(x_obs))/length(x_obs)
#> [1] 0.183

beta.0 <- 1; beta.x <- 2; beta.z1 <- 2; beta.z2 <- 2
y <- beta.0 + beta.x*x + beta.z1*z1 + beta.z2*z2 + rnorm(n)

missing_data <- data.frame(y = y, x = x_obs, x_true = x, z1 = z1, z2 = z2)

The main model of interest is in this case \[ y_i = \beta_0 + \beta_x x_{true,i} + \beta_{z_1} z_{1,i} + \beta_{z_2} z_{2,i} + \varepsilon_i , \] and the stack is:

stk_moi <- inla.stack(data = list(y_moi = missing_data$y),
                      A = list(1),
                      effects = list(
                        list(beta.0 = rep(1, n),
                             beta.x = 1:n,
                             beta.z1 = missing_data$z1,
                             beta.z2 = missing_data$z2)),
                      tag = "moi")

The classical measurement error is just functioning as a link in this case, but the model is \[ x_{obs, i} = x_{true, i} + u_{c,i}, \] where the precision of \(u_{c,i}\) is set to be very high for those \(i\) where \(x_{obs,i}\) is observed. The stack for this level is:

stk_c <- inla.stack(data = list(y_classical = missing_data$x),
                    A = list(1),
                    effects = list(
                      list(id.x = 1:n,
                           weight.x = 1)),
                    tag = "classical")

Next we have the imputation model \[ 0 = -x_{true,i} + \alpha_0 + \alpha_{z_1} z_{1,i} + \varepsilon_{x,i}, \] and the stack is

stk_imp <- inla.stack(data = list(y_imp = rep(0, n)),
                      A = list(1),
                      effects = list(
                        list(id.x = 1:n,
                             weight.x = -1,
                             alpha.0 = rep(1, n),
                             alpha.z1 = missing_data$z1)),
                      tag = "imputation")

The missingness model is a binomial model describing how the probability of missing, \(p_m\), may depend on other covariates. The model is \[ \text{logit}(p_{m,i}) = \gamma_0 + \gamma_x x_{true,i} + \gamma_{z_2} z_{2,i}, \] and the stack is:

stk_mis <- inla.stack(data = list(y_mis = as.numeric(is.na(missing_data$x))),
                      A = list(1),
                      effects = list(
                        list(gamma.x = 1:n,
                             gamma.0 = rep(1, n),
                             gamma.z2 = missing_data$z2)),
                      tag = "missingness")

We join the stacks together:

stk_full <- inla.stack(stk_moi, stk_c, stk_imp, stk_mis)

In defining the formula, we now need to define the hyperparameter gamma.x, the same way as we define beta.x. This is a scaled copy of id.x.

formula <- list(y_moi, y_classical, y_imp, y_mis) ~ - 1 +
  beta.0 + beta.z1 + beta.z2 +
  f(beta.x, copy = "id.x",
    hyper = list(beta = list(param = c(0, 1/1000), fixed = FALSE))) +
  f(id.x, weight.x, model = "iid", values = 1:n,
    hyper = list(prec = list(initial = -15, fixed = TRUE))) +
  f(gamma.x, copy = "id.x",
    hyper = list(beta = list(param = c(0, 1/1000), fixed = FALSE))) +
  alpha.0 + alpha.z1 + gamma.0 + gamma.z2

Since we don’t have any measurement error in this case, we set the precision to be very high for the classical error model in the argument scale.

model_mnar <- inla(formula, data = inla.stack.data(stk_full),
                  family = c("gaussian", "gaussian", "gaussian", "binomial"),
                  scale = c(rep(1, n), rep(10^8, n), rep(1, n), rep(1, n)),
                  control.family = list(
                    list(hyper = list(prec = list(initial = log(1),
                                                  param = c(10, 9),
                                                  fixed = FALSE))),
                    list(hyper = list(prec = list(initial = log(1),
                                                  param = c(10, 9),
                                                  fixed = FALSE))),
                    list(hyper = list(prec = list(initial = log(1),
                                                  param = c(10, 9),
                                                  fixed = FALSE))),
                    list()),
                  control.predictor = list(compute = TRUE, link = 1)
)

summary(model_mnar)
#> Time used:
#>     Pre = 1.9, Running = 3.8, Post = 0.175, Total = 5.88 
#> Fixed effects:
#>            mean    sd 0.025quant 0.5quant 0.975quant   mode kld
#> beta.0    1.038 0.050      0.940    1.038      1.136  1.038   0
#> beta.z1   2.004 0.036      1.933    2.004      2.075  2.004   0
#> beta.z2   1.985 0.036      1.914    1.985      2.057  1.985   0
#> alpha.0   1.011 0.032      0.947    1.011      1.074  1.011   0
#> alpha.z1  0.292 0.033      0.228    0.292      0.355  0.292   0
#> gamma.0  -1.510 0.122     -1.754   -1.509     -1.275 -1.509   0
#> gamma.z2 -0.425 0.088     -0.597   -0.425     -0.252 -0.425   0
#> 
#> Random effects:
#>   Name     Model
#>     id.x IID model
#>    beta.x Copy
#>    gamma.x Copy
#> 
#> Model hyperparameters:
#>                                              mean    sd 0.025quant 0.5quant 0.975quant   mode
#> Precision for the Gaussian observations     0.983 0.048      0.890    0.983      1.079  0.984
#> Precision for the Gaussian observations[2]  1.130 0.357      0.590    1.077      1.981  0.977
#> Precision for the Gaussian observations[3]  1.023 0.047      0.932    1.022      1.119  1.020
#> Beta for beta.x                             1.953 0.035      1.884    1.953      2.021  1.955
#> Beta for gamma.x                           -0.036 0.089     -0.213   -0.036      0.138 -0.034
#> 
#> Marginal log-Likelihood:  -11663.19 
#>  is computed 
#> Posterior summaries for the linear predictor and the fitted values are computed
#> (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')

Looking at the summary, most importantly, note that the model picks up gamma.0 and gamma.z2, which were defined to be -1.5 and -0.5, respectively. In the data simulation, we did not construct the missingness of x to depend on the value of x, which would correspond to a “missing not at random” mechanism. The fact that the missingness of x depends on other observed covariates indicates a “missing at random” mechanism (as we simulated it to be). If the missingness of x did not depend on any other covariates, it would be “missing completely at random”.