Mestim
This vignette serves as a short introduction to computing the variance-covariance matrix of a multidimensional parameter using M-estimation and the empirical sandwich variance.
Denoting \(F\) a probability distribution, a \(p\)-dimensional M-estimator of \(\psi\)-type \(T\) solves the vector equation \[\int_\mathcal{Z}\psi\big(z, T(F)\big)dF(z)=\boldsymbol{0}.\] In practice, this means that the estimator \(T\) has an unbiased estimating function \(\psi(z,\theta)=\{\psi_1(z,\theta), \ldots, \psi_p(z,\theta)\}^T\) with solution \(\hat{\theta}~(p\times 1)\) solving in \(\theta\) the \((p\times 1)\) set of “stacked” estimating equations given by \[ \sum_{i=1}^{n}\psi(Z_i,\theta)=\boldsymbol{0}.\] For a \(\psi\)-type M-estimator \(T\) with estimate \(\hat{\theta}\), under suitable regularity conditions for \(\psi\), the central limit theorem and Slutsky’s theorem yield \[\sqrt{n}\big(\hat{\theta}-T(F)\big)\xrightarrow{\mathcal{D}}\mathcal{N}\big(0, \Sigma)\] where \[\Sigma=\Bigg[\mathbb{E}\bigg\{\frac{\partial \psi\big(Z,T(F)\big) }{\partial\theta^T}\bigg\}\Bigg]^{-1}\mathbb{E}\Big\{ \psi\big(Z,T(F)\big) \psi^T\big(Z,T(F)\big)\Big\}\Bigg[\mathbb{E}\bigg\{\frac{\partial \psi\big(Z,T(F)\big) }{\partial\theta^T}\bigg\}\Bigg]^{-T}.\] This implies that the \(p\)-dimensional M-estimator \(\hat{\theta}\) is an asymptotically normal, \(\sqrt{n}\)-consistent, estimator for \(T(F)\). See Boos and Stefanski (2013) for a full introduction to M-estimation.
Mestim
doesProvided with observed data \((Z_i)_{1≤i≤n}\), a \(p\)-dimensional vector of estimates \(\hat{\theta}\) and a \((p\times 1)\) unbiased estimating function \(\psi\), the Mestim
package computes the sandwich formula \[\hat{\Sigma}=\Bigg[n^{-1}\sum_{i=1}^n\bigg\{\frac{\partial \psi\big(Z_i,\hat{\theta}\big) }{\partial\theta^T}\bigg\}\Bigg]^{-1}
n^{-1}\sum_{i=1}^n\Big\{ \psi\big(Z_i,\hat{\theta}\big) \psi^T\big(Z_i,\hat{\theta}\big)\Big\}
\Bigg[n^{-1}\sum_{i=1}^n\bigg\{\frac{\partial \psi\big(Z_i,\hat{\theta}\big) }{\partial\theta^T}\bigg\}\Bigg]^{-T}.\] The estimated asymptotic variance-covariance matrix of \(\hat{\theta}\) is \(n^{-1} \hat{\Sigma}\), and so, we have \[\hat{\theta} \mathrel{\dot\sim} \mathcal{N}\big(0, n^{-1} \hat{\Sigma}).\] Under the hood, Mestim
algorithmically computes the Jacobian matrix of \(\psi\); derivatives and outer products in \(\hat{\Sigma}\) are then computed in parallel. To compute the asymptotic variance-covariance matrix, the analyst thus only need to provide a list detailing the “stacked” estimating functions in \(\psi\). Below, we give examples of growing complexity to illustrate how Mestim
can leverage the flexibility of M-estimation theory to calculate asymptotic standard errors (and confidence intervals) for parameter estimates \(\hat{\theta}\).
Let’s try to compute the asymptotic standard errors of estimated parameters in a logistic regression model. This simple example serves to get familiar with Mestim
commands.
Here we use \[X_1 \sim \mathcal{N}(0,1)\] \[X_2 \sim \mathcal{N}(0,3)\] \[Y|X \sim \mathcal{B}\Big(\text{expit}(\beta_1^{0}X_1+\beta_2^{0}X_2)\Big)\] with \(\beta_1^{0}=4\), \(\beta_2^{0}=6\).
NB: We use superscript \(^{0}\) to denote true values of the parameters.
<- function(n, seed=123)
gen_lr_dat
{set.seed(seed)
<- rnorm(n, sd = 1); X_2 <- rnorm(n, sd = 3) # generate x_1 and x_2
X_1 <- c(4,5) # generate true parameters
true_betas <- model.matrix(~-1+X_1+X_2) # build the design matrix
X <- rbinom(n, 1, 1/(1 + exp(-X %*% true_betas)) ) # generate Y from X and true_betas
Y <- data.frame(X_1=X_1, X_2=X_2, Y=Y) # build a simulated dataset
dat return(dat)
}
<- gen_lr_dat(5000)
dat head(dat)
## X_1 X_2 Y
## 1 -0.56047565 -1.482522 0
## 2 -0.23017749 3.382780 1
## 3 1.55870831 -3.440849 0
## 4 0.07050839 4.443056 1
## 5 0.12928774 2.748574 1
## 6 1.71506499 1.005393 1
Let’s fit a logistic regression model and put the estimated parameters in a list.
<- glm(Y~-1 + X_1 + X_2, data=dat, family = "binomial")
mod <- list(thetas_1=coef(mod)[1], thetas_2=coef(mod)[2]) thetas_hat
Recall that the estimated parameters \(\hat{\theta}=(\hat{\theta}_1, \hat{\theta}_2)^T\) from this logistic regression model jointly solve \[\sum_{i=1}^{n}\bigg(\Big[1+\exp\big\{-{(\theta_1X_{1,i}+\theta_2X_{2,i})\big\}\Big]^{-1}}-Y_i\bigg)X_{1,i} =0\] and \[\sum_{i=1}^{n}\bigg(\Big[1+\exp\big\{-{(\theta_1X_{1,i}+\theta_2X_{2,i})\big\}\Big]^{-1}}-Y_i\bigg)X_{2,i} =0.\] Therefore, we can identify \[\psi_1(Z_i,\theta_1)=\bigg(\Big[1+\exp\big\{-{(\theta_1X_{1,i}+\theta_2X_{2,i})\big\}\Big]^{-1}}-Y_i\bigg)X_{1,i}\] and \[\psi_2(Z_i,\theta_1)=\bigg(\Big[1+\exp\big\{-{(\theta_1X_{1,i}+\theta_2X_{2,i})\big\}\Big]^{-1}}-Y_i\bigg)X_{2,i}.\] With that in mind, let’s build a list for the unbiased estimating function \(\psi(z,\theta)=\Big(\psi_1(z,\theta_1), \psi_2(z,\theta_2)\Big)^T\).
<- expression( ((1/(1+exp(-(thetas_1 * X_1 + thetas_2 * X_2)))) - Y) * X_1 )
psi_1 <- expression( ((1/(1+exp(-(thetas_1 * X_1 + thetas_2 * X_2)))) - Y) * X_2 )
psi_2 <- list(psi_1, psi_2) psi
NB: parameters’ names (here thetas_1
and thetas_2
) must be consistent with the previous list.
We are finally ready to pass these arguments to the get_vcov
function form the Mestim
package.
library(Mestim)
<- get_vcov(data=dat, thetas=thetas_hat, M=psi) res
You can obtain the variance-covariance matrix from a get_vcov
result as follows
$vcov res
## thetas_1 thetas_2
## thetas_1 0.05239025 0.05366863
## thetas_2 0.05366863 0.06795271
get_vcov
is similar to that of glm
.
vcov(mod)
## X_1 X_2
## X_1 0.05698408 0.06143937
## X_2 0.06143937 0.07963092
This is indeed very close the results inres$vcov
.
Asymptotic standard errors are square root of the diagonal elements from the estimated variance-covariance matrix. These are stored in the se
attribute.
$se res
## thetas_1 thetas_2
## 0.2288892 0.2606774
Let’s generate synthetic observational data with treatment allocation \(A\), continuous outcome \(Y\) and a single confounder \(X\) such that \(Z=(X,A,Y)^T\).
Here we use \[X \sim \mathcal{N}(0,1)\] \[A|X \sim \mathcal{B}\Big(\text{expit}(2X)\Big)\] \[\epsilon \sim \mathcal{N}(0,20)\] \[Y|X,\epsilon = \gamma_1^{0} X + \gamma_2^{0} A + \gamma_3^{0} AX + \epsilon\] with \(\gamma_1^{0}=4\), \(\gamma_2^{0}=3\), and \(\gamma_3^{0}=2\).
NB: We use superscript \(^{0}\) to denote true values of the parameters.
<- function(n, seed=123)
gen_obs_dat
{set.seed(seed)
<- rnorm(n) # generate X
X <- rbinom(n, 1, 1/(1 + exp(- 2 * X)) ) # generate treatment allocation A
A <- model.matrix(~ -1 + X + A + A:X) # build the design matrix
X_mat <- c(4,3,2)
true_gammas <- rnorm(n,0,20) # generate gaussian noise
epsi <- (X_mat %*% true_gammas) + epsi # generate observed outcomes
Y <- data.frame(X=X, A=A, Y=Y) # build a simulated dataset
dat return(dat)
}
<- gen_obs_dat(5000)
dat head(dat)
## X A Y
## 1 -0.56047565 0 4.758148
## 2 -0.23017749 0 15.368124
## 3 1.55870831 1 2.018927
## 4 0.07050839 1 -50.422238
## 5 0.12928774 1 -18.163366
## 6 1.71506499 1 -11.819112
In this example, the goal is to calculate standard errors for the outcome regression average causal effect estimator \[\hat{\delta}=n^{-1}\sum_{i=1}^n\mathbb{\hat{E}}(Y|X=X_i, A=1)-\mathbb{\hat{E}}(Y|X=X_i, A=0).\]
For \(\mathbb{E}(Y|X, A)\), let’s specify the regression model \(m(X, A;\boldsymbol{\gamma})=\gamma_1X + \gamma_2A + \gamma_3AX\) and store the estimated parameters.
<- lm(Y~ -1 + X + A + A:X, data = dat)
m <- coef(m)[1]
gamma_1_hat <- coef(m)[2]
gamma_2_hat <- coef(m)[3] gamma_3_hat
Recall that the estimated parameters \(\hat{\boldsymbol{\gamma}}=(\hat{\gamma}_1,\hat{\gamma}_2,\hat{\gamma}_3)^T\) from this linear regression model jointly solve \[\sum_{i=1}^n (Y_i-\gamma_1X_i - \gamma_2A_i - \gamma_3A_iX_i)X_i=0,\] \[\sum_{i=1}^n (Y_i-\gamma_1X_i - \gamma_2A_i - \gamma_3A_iX_i)A_i=0,\] \[\sum_{i=1}^n (Y_i-\gamma_1X_i - \gamma_2A_i - \gamma_3A_iX_i)A_iX_i=0.\] Disregarding the summation sign, we can straightforwardly identify the three first elements of the estimating function \(\psi(z,\theta)\). Before building a list detailing the function \(\psi\), we need to identify the estimating function of our main parameter of interest \(\delta.\) To do so, recall that we can estimate \(\delta\) as \[\hat{\delta}=n^{-1}\sum_{i=1}^nm(X_i,1;\hat{\boldsymbol{\gamma}})-m(X_i,0;\hat{\boldsymbol{\gamma}}) \\ =n^{-1}\sum_{i=1}^n\{\hat{\gamma}_1+ \hat{\gamma}_2 \times 1 +\hat{\gamma}_3 \times 1 \times X_i\} - \{\hat{\gamma}_1+ \hat{\gamma}_2 \times 0 +\hat{\gamma}_3 \times 0 \times X_i\} \\ =n^{-1}\sum_{i=1}^n \hat{\gamma}_2 +\hat{\gamma}_3 X_i. \] Let’s first compute it.
<- mean(gamma_2_hat + gamma_3_hat*dat$X) delta_hat
Note that rearranging the last equality we have \[\sum_{i=1}^n \hat{\gamma}_2 +\hat{\gamma}_3 X_i - \hat{\delta} = 0\] which straightforwardly yields the last element of the estimating function \(\psi(z,\theta)\). Disregarding the summation sign yields the last estimating function which we can now “stack” with the previous ones. Let’s now build a list detailing the full function \(\psi(z,\theta)\).
<- expression( (Y - gamma_1*X - gamma_2*A - gamma_3*A*X) * X )
psi_1 <- expression( (Y - gamma_1*X - gamma_2*A - gamma_3*A*X) * A )
psi_2 <- expression( (Y - gamma_1*X - gamma_2*A - gamma_3*A*X) * A*X )
psi_3 <- expression( gamma_2 + gamma_3 * X - delta )
psi_4 <- list(psi_1, psi_2, psi_3, psi_4) psi
Let’s also store all the estimated parameters \(\hat{\theta}=(\hat{\gamma}_1,\hat{\gamma}_2,\hat{\gamma}_3,\hat{\delta})^T\) in a list.
<- list(gamma_1=gamma_1_hat,
thetas_hat gamma_2=gamma_2_hat,
gamma_3=gamma_3_hat,
delta=delta_hat)
Let’s pass the relevant arguments to get_vcov
and check results for the variance-covariance matrix.
<- get_vcov(data=dat, thetas=thetas_hat, M=psi)
res $vcov res
## gamma_1 gamma_2 gamma_3 delta
## gamma_1 1.686258e-01 -5.116353e-17 -0.1686258 2.291608e-05
## gamma_2 -4.167777e-17 2.510135e-01 -0.1497095 2.509786e-01
## gamma_3 -1.686258e-01 -1.497095e-01 0.4228791 -1.496732e-01
## delta 2.291608e-05 2.509786e-01 -0.1496732 2.512757e-01
Let’s see how the results compare with standard errors obtained from the bootstrap.
<- function(d, i=1:nrow(d)) {
boot_fun <-d[i,]
z<- lm(Y~ -1 + X + A + A:X, data = z)
mod <- coef(mod)[1]
gamma_1_hat <- coef(mod)[2]
gamma_2_hat <- coef(mod)[3]
gamma_3_hat <- mean(gamma_2_hat*1 + gamma_3_hat*1*z$X)
delta_hat return( c(gamma_1_hat, gamma_2_hat, gamma_3_hat, delta_hat) )
}<- Sys.time()
boot_start_time <- boot::boot(dat, boot_fun, R=999)
res_boot <- Sys.time()
boot_end_time paste("Bootstrapping took", round(as.numeric(boot_end_time - boot_start_time), 2), "seconds.")
## [1] "Bootstrapping took 3.33 seconds."
## [,1] [,2] [,3] [,4]
## [1,] 0.166959339 -0.00255421 -0.1636470 -0.002452947
## [2,] -0.002554210 0.24003558 -0.1433816 0.240057058
## [3,] -0.163647006 -0.14338163 0.4128908 -0.143436074
## [4,] -0.002452947 0.24005706 -0.1434361 0.240449603
This is pretty close to the results in res$vcov
that we obtained 52.1 times faster with Mestim
.
Let’s generate synthetic observational data for dynamic clinical decisions. We note \(S_t\) the observed state at time \(t\), \(A_t\) the binary action taken at time \(t\), and \(Y\) the terminal outcome for the sequence where higher values indicate worse disease symptoms. For illustrative purpose, we consider only \(T=2\) decision points so that we have data \(Z=(S_1,A_1,S_2,A_2,Y)^T.\)
The data are generated via the following hidden Markov process, where we only get to observe \(S_t\), which is a noisy version of the underlying state \(X_t\): \[X_1 \sim \mathcal{N}(0,0.1)\] \[X_{t+1}|X_t, A_t \sim \mathbf{1}_{X_t>0} \mathcal{N}(1.1X_t - 0.5A_t, 0.05) + \mathbf{1}_{X_t<0}X_t\] \[S_t|X_t \sim \mathcal{N}(X_t,0.1)\] \[A_1|S_1 \sim \mathcal{B}\Big(\text{expit}(-0.1+\log S_t)\Big)\] \[A_2|S_2,A_1 \sim \mathcal{B}\Big(\text{expit}(0.1+\log S_t + 3A_1)\Big)\]
\[Y|X_3,A_2,A_1 \sim \text{exp}\Bigg(\mathcal{N}\Big(X_3 + \lambda(A_2+A_1),0.1\Big)\Bigg).\] We consider that receiving treatment actions has a side effect penalty of \(\lambda=0.1\).
<- function(n, seed=456)
gen_dtr_dat
{set.seed(seed)
<- function(x) 1/(1+exp(-x))
expit
<- rnorm(n, sd=.1)
X_1 <- exp(rnorm(n, mean = X_1, sd = .1))
S_1 <- rbinom(n, size = 1, prob = expit(-.1+log(S_1)))
A_1
<- (X_1>0) * rnorm(n, mean = 1.1*X_1 - .5 * A_1, sd=.05) + (X_1<0) * X_1
X_2 <- exp(rnorm(n, mean = X_2, sd = .1))
S_2 <- rbinom(n, size = 1, prob = expit(.1+log(S_2)+3*A_1))
A_2
<- (X_2>0) * rnorm(n, mean = 1.1*X_2 - .5 * A_2, sd=.05) + (X_2<0) * X_2
X_3 <- exp(rnorm(n, mean = X_3 + .1*(A_1 + A_2), sd = .1)) #0.1 penalty for treating
Y
<- data.frame(S_1=S_1, A_1=A_1, S_2=S_2, A_2=A_2, Y)
dat return(dat)
}
<- gen_dtr_dat(5000)
dat head(dat)
## S_1 A_1 S_2 A_2 Y
## 1 0.8402181 1 0.9292966 1 1.046362
## 2 1.0033476 0 1.0623962 0 1.024897
## 3 1.0889107 0 1.0687287 0 1.152630
## 4 0.8716224 1 0.8938716 1 1.069209
## 5 0.9743708 1 0.9324451 1 1.189594
## 6 1.0222173 1 0.9174528 1 1.195341
Given any treatment action regime \(d=\{d_1,\ldots, d_T\}^T\), an estimator for \(\mathbb{E}_{Z\sim d}(Y)\) is \[\begin{equation} \hat{\mathcal{V}}_{IPW}(d)=n^{-1}\sum_{i=1}^nY_i\prod_{t=1}^T\frac{\mathbf{1}\big({d_t(H_{t,i})=A_{t,i}}\big)}{\hat{e}_t(H_{t,i})^{d_t(H_{t,i})}\big\{1-\hat{e}_t(H_{t,i})\big\}^{1-d_t(H_{t,i})} } \quad \quad (1) \end{equation}\]
where we use the history notation \(H_t=(S_{t},A_{t-1},S_{t-1}, \ldots,S_{1})^T\) and write the relevant generalization of the propensity score as \(e_t(H_t)=\mathbb{E}(A_{t}|H_t)\) for clarity.
In this example we consider the regime \(\tilde{d}(Z)=\{\tilde{d}_1(H_1)=\mathbf{1}_{S_1>1},\tilde{d}_2(H_2)=\mathbf{1}_{S_2>1}\}^T\). The goal is to calculate standard errors for \(\hat{\mathcal{V}}_{IPW}(\tilde{d})\). As \(T=2\), we need specify models for \(e_1(H_1)\) and \(e_2(H_2)\). Let’s specify the parametric regression models \[e_1(H_1;\boldsymbol{\delta})=\text{expit}\big(\delta_1+\delta_2\log S_1)\] and \[e_2(H_2;\boldsymbol{\phi})=\text{expit}\big(\phi_1+\phi_2\log S_2 +\phi_3A_1).\] We fit and store the estimated parameters as follows.
<- glm(A_1~I(log(S_1)), data=dat, family = "binomial")
e_1 <- coef(e_1)[1]
delta_1_hat <- coef(e_1)[2]
delta_2_hat
<- glm(A_2~I(log(S_2)) + A_1 , data=dat, family = "binomial")
e_2 <- coef(e_2)[1]
phi_1_hat <- coef(e_2)[2]
phi_2_hat <- coef(e_2)[3] phi_3_hat
As in example 1, recall that for \(e_1\) the estimated parameters \(\hat{\boldsymbol{\delta}}=(\hat{\delta}_1, \hat{\delta}_2)^T\) jointly solve \[\sum_{i=1}^{n}\Big[1+\exp\big\{-{(\delta_1+\delta_2 \log S_{1,i})\big\}\Big]^{-1}}-A_{1,i} =0,\] \[\sum_{i=1}^{n}\bigg(\Big[1+\exp\big\{-{(\delta_1+\delta_2 \log S_{1,i})\big\}\Big]^{-1}}-A_{1,i}\bigg) \log S_{1,i}=0.\] Similarly for \(e_2\) the estimated parameters \(\hat{\boldsymbol{\phi}}=(\hat{\phi}_1, \hat{\phi}_2, \hat{\phi}_3)^T\) jointly solve \[\sum_{i=1}^{n}\Big[1+\exp\big\{-{(\phi_1+\phi_2 \log S_{2,i}+\phi_3 A_{1,i})\big\}\Big]^{-1}}-A_{2,i} =0,\] \[\sum_{i=1}^{n}\bigg(\Big[1+\exp\big\{-{(\phi_1+\phi_2 \log S_{2,i}+\phi_3 A_{1,i})\big\}\Big]^{-1}}-A_{2,i}\bigg) \log S_{2,i}=0,\] \[\sum_{i=1}^{n}\bigg(\Big[1+\exp\big\{-{(\phi_1+\phi_2 \log S_{2,i}+\phi_3 A_{1,i})\big\}\Big]^{-1}}-A_{2,i}\bigg) A_{1,i}=0.\]
Disregarding the summation sign, we can straightforwardly identify the five first elements of the estimating function \(\psi(z,\theta)\). Let’s store them before building our final list for \(\psi\).
Note that for programming convenience, we recommend to store all relevant variable transformations as columns in the original dataframe.
$log_S_1 <- log(dat$S_1) ; dat$log_S_2 <- log(dat$S_2) # For ease of programming
dat
<- expression( ((1/(1+exp(-(delta_1 + delta_2 * log_S_1)))) - A_1) * 1 )
psi_1 <- expression( ((1/(1+exp(-(delta_1 + delta_2 * log_S_1)))) - A_1) * log_S_1)
psi_2
<- expression( ((1/(1+exp(-(phi_1 + phi_2 * log_S_2 + phi_3 * A_1)))) - A_2) * 1 )
psi_3 <- expression( ((1/(1+exp(-(phi_1 + phi_2 * log_S_2 + phi_3 * A_1)))) - A_2) * log_S_2)
psi_4 <- expression( ((1/(1+exp(-(phi_1 + phi_2 * log_S_2 + phi_3 * A_1)))) - A_2) * A_1) psi_5
To obtain the last element of \(\psi\), we need to do algebraic manipulations. Denoting \(\mathcal{C}_{\tilde{d}, i}=\prod_{t=1}^2\mathbf{1}\big({\tilde{d}_t(H_{t,i})=A_{t,i}}\big)\) for simplicity, after substitution for \(\hat{e}_1(H_1;\boldsymbol{\hat{\delta}})\) and \(\hat{e}_2(H_2;\boldsymbol{\hat{\phi}})\), equation \((1)\) yields \[\begin{equation} \hat{\mathcal{V}}_{IPW}(\tilde{d})=\\ n^{-1}\sum_{i=1}^nY_i\mathcal{C}_{\tilde{d}, i} \frac{\Big[1+\exp\big\{-(\delta_1+\delta_2 \log S_{1,i})\big\}\Big]^{\tilde{d}_1(S_{1,i})} \Big[1+\exp\big\{-(\phi_1+\phi_2 \log S_{2,i}+\phi_3 A_{1,i})\big\}\Big]^{\tilde{d}_2(S_{2,i})}} {\bigg(1-\Big[1+\exp\big\{-(\delta_1+\delta_2 \log S_{1,i})\big\}\Big]^{-1}\bigg)^{1-\tilde{d}_1(S_{1,i})}\bigg(1-\Big[1+\exp\big\{-(\phi_1+\phi_2 \log S_{2,i}+\phi_3 A_{1,i})\big\}\Big]^{-1}\bigg)^{1-\tilde{d}_2(S_{2,i})}}. \end{equation}\]
Rearrangements of the equation above yield \[\begin{equation} \sum_{i=1}^nY_i\mathcal{C}_{\tilde{d}, i} \frac{\Big[1+\exp\big\{-(\delta_1+\delta_2 \log S_{1,i})\big\}\Big]^{\tilde{d}_1(S_{1,i})} \Big[1+\exp\big\{-(\phi_1+\phi_2 \log S_{2,i}+\phi_3 A_{1,i})\big\}\Big]^{\tilde{d}_2(S_{2,i})}} {\bigg(1-\Big[1+\exp\big\{-(\delta_1+\delta_2 \log S_{1,i})\big\}\Big]^{-1}\bigg)^{1-\tilde{d}_1(S_{1,i})}\bigg(1-\Big[1+\exp\big\{-(\phi_1+\phi_2 \log S_{2,i}+\phi_3 A_{1,i})\big\}\Big]^{-1}\bigg)^{1-\tilde{d}_2(S_{2,i})}} \\ -\hat{\mathcal{V}}_{IPW}(\tilde{d})=0. \end{equation}\] We can now straightforwardly identify and store the last elements of the estimating function \(\psi\).
# The regime we are interested in
$d_1 <- dat$S_1>1
dat$d_2 <- dat$S_2>1
dat
# For ease of programming
$C_d <- with(dat, d_1==A_1 & d_2==A_2)
dat
# Store the last element of psi
<- expression( Y * C_d *
psi_6 1+exp(-(delta_1 + delta_2 * log_S_1)))^d_1 * # numerator
( (1+exp(-(phi_1 + phi_2 * log_S_2 + phi_3 * A_1)))^d_2 )
(
/ ( (1-(1+exp(-(delta_1 + delta_2 * log_S_1)))^(-1))^(1-d_1) * # denominator
1-(1+exp(-(phi_1 + phi_2 * log_S_2 + phi_3 * A_1)))^(-1))^(1-d_2) )
(- V
)
Let’s now build a list detailing the full function \(\psi(z,\theta)\).
<- list(psi_1, psi_2, psi_3, psi_4, psi_5, psi_6) psi
Now, let’s compute \(\hat{\mathcal{V}}_{IPW}(\tilde{d})\) and stack it in a list of all the estimated parameters \(\hat{\theta}=\Big(\hat{\delta}_1,\hat{\delta}_2,\hat{\phi}_1,\hat{\phi}_2,\hat{\phi}_3, \hat{\mathcal{V}}_{IPW}(\tilde{d})\Big)^T\).
For simplicity in computing \(\hat{\mathcal{V}}_{IPW}(\tilde{d})\), we recommend to use a slightly modified version of psi_6
as follows.
# Just delete "- V" from in the previous expression
# add _hat as appropriate
<- expression( Y * C_d *
ipw_estimator 1+exp(-(delta_1_hat + delta_2_hat * log_S_1)))^d_1 * # numerator
( (1+exp(-(phi_1_hat + phi_2_hat * log_S_2 + phi_3_hat * A_1)))^d_2 )
(
/ ( (1-(1+exp(-(delta_1_hat + delta_2_hat * log_S_1)))^(-1))^(1-d_1) * # denominator
1-(1+exp(-(phi_1_hat + phi_2_hat * log_S_2 + phi_3_hat * A_1)))^(-1))^(1-d_2) )
(
)
# Compute individual contribution and take the average
<- with(dat, mean(eval(ipw_estimator))) # Other ways to compute this quantity are OK too.
V_hat
<- list(delta_1=delta_1_hat,
thetas_hat delta_2=delta_2_hat,
phi_1=phi_1_hat,
phi_2=phi_2_hat,
phi_3=phi_3_hat,
V=V_hat)
Let’s pass the relevant arguments to get_vcov
and check results for the standard errors.
<- get_vcov(data=dat, thetas=thetas_hat, M=psi)
res $se res
## delta_1 delta_2 phi_1 phi_2 phi_3 V
## 0.02836275 0.19963843 0.03921097 0.22778301 0.12032851 0.03641272
Let’s see how the results compare with standard errors obtained from the bootstrap.
<- function(d, i=1:nrow(d)) {
boot_fun <-d[i,]
z<- glm(A_1~I(log(S_1)), data=z, family = "binomial")
e_1 <- glm(A_2~I(log(S_2)) + A_1 , data=z, family = "binomial")
e_2
<- coef(e_1)[1]
delta_1_hat <- coef(e_1)[2]
delta_2_hat <- coef(e_2)[1]
phi_1_hat <- coef(e_2)[2]
phi_2_hat <- coef(e_2)[3]
phi_3_hat
<- expression( z$Y * z$C_d *
ipw_estimator 1+exp(-(delta_1_hat + delta_2_hat * z$log_S_1)))^z$d_1 * # numerator
( (1+exp(-(phi_1_hat + phi_2_hat * z$log_S_2 + phi_3_hat * z$A_1)))^z$d_2 )
(
/ ( (1-(1+exp(-(delta_1_hat + delta_2_hat * z$log_S_1)))^(-1))^(1-z$d_1) * # denominator
1-(1+exp(-(phi_1_hat + phi_2_hat * z$log_S_2 + phi_3_hat * z$A_1)))^(-1))^(1-z$d_2) )
(
)
<- mean(eval(ipw_estimator))
V_hat return( c(delta_1_hat, delta_2_hat, phi_1_hat, phi_2_hat, phi_3_hat, V_hat) )
}<- Sys.time()
boot_start_time <- boot::boot(dat, boot_fun, R=999)
res_boot <- Sys.time()
boot_end_time paste("Bootstrapping took", round(as.numeric(boot_end_time - boot_start_time), 2), "seconds.")
## [1] "Bootstrapping took 18.06 seconds."
##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot::boot(data = dat, statistic = boot_fun, R = 999)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* -0.10641382 -0.0002500751 0.02843059
## t2* 0.65733524 0.0080253525 0.20518095
## t3* 0.07486354 0.0005029474 0.03972660
## t4* 1.22872312 -0.0066237516 0.23196144
## t5* 3.12746280 0.0087706975 0.12208167
## t6* 0.83983316 0.0021767814 0.03644645
This is pretty close to the results in res$se
that we obtained 206.7 times faster with Mestim
.
Boos, D. D. and Stefanski, L. (2013). Essential Statistical Inference. Springer, New York. https://doi.org/10.1007/978-1-4614-4818-1.
Tsiatis, A. A., Davidian, M., Holloway, S. T. & Laber, E. B. (2019), Dynamic Treatment Regimes: Statistical Methods for Precision Medicine, CRC Press. https://doi.org/10.1201/9780429192692.