Bootstrapping and Instrumental Variables

Drew Dimmery

March 8, 2014

Structure

  • Rubber-duck debugging
  • Bootstrapping
    • I'll give some examples of the more complicated bootstrapping methods
  • Instrumental Variables
    • Basic 2SLS
    • Sensitivity Analysis for IV

Debugging

  • Rubber duck debugging:
    • Use when you can't figure out why your code doesn't work right
    • Find something inanimate to talk to
    • Explain what your code does, line by excruciating line
    • If you can't explain it, that's probably where the problem is.
    • This works ridiculously well.
    • You should also be able to tell your duck exactly what is stored in each variable at all times.
  • Check individual elements of your code on small data such that you know what the right answer should be.

A note on the Bootstrap

  • I'm going to talk about the non-parametric bootstrap (which Cyrus talked about)
  • The key to this is that we don't want to make an assumption about what our data looks like.
  • So what looks more like our data than anything else we know of?
  • Our data.
  • We sample repeatedly from the empirical distribution of our data.
  • This is why we resample with replacement. Each draw is an independent draw from the empirical distribution of the data we've collected.

Advanced bootstrapping

  • There are more advanced uses of the bootstrap in the construction of estimators. One example is in so called bias reduction. (digression ahead)
  • If \(b\) is the bias of an estimator \(\hat{\theta}\) of \(\theta\), we could obtain an unbiased estimator by subtracting \(b\) from \(\hat{\theta}\).
  • Bootstrapping let's us estimate \(b\) with \({1\over B} \sum_{i=1}^B [\hat{\theta_i^b} - \hat{\theta}]\) where \(\hat{\theta_i^b}\) is an estimate after resampling from the empirical distribution.
  • Not guaranteed to do better, but it's an interesting approach.
  • Talk to me for more info.

Bootstrap Example

  • We're going to start with Lalonde again.
  • We will do the IPW procedure as before, but this time get the "right" SEs.
require(MatchIt)
data(lalonde, package = "MatchIt")
p.model <- glm(treat ~ age + educ + black + hispan + married + nodegree + re74 + 
    re75, lalonde, family = "binomial")
pscore <- predict(p.model, type = "response")
ipw <- lalonde$treat + (1 - lalonde$treat)/(1 - pscore)
ipw.mod <- lm(re78 ~ treat + age + educ + black + hispan + married + nodegree + 
    re74 + re75, lalonde, weights = ipw)
N <- nrow(lalonde)
bootstrapLalonde <- function(out = "coef") {
    b.samp <- sample(N, replace = TRUE)
    b.p.model <- glm(treat ~ age + educ + black + hispan + married + nodegree + 
        re74 + re75, lalonde[b.samp, ], family = "binomial")
    b.pscore <- predict(b.p.model, type = "response")
    b.ipw <- lalonde$treat[b.samp] + (1 - lalonde$treat[b.samp])/(1 - b.pscore)
    b.ipw.mod <- lm(re78 ~ treat + age + educ + black + hispan + married + nodegree + 
        re74 + re75, lalonde[b.samp, ], weights = b.ipw)
    if (out == "coef") {
        out <- coef(b.ipw.mod)[2]
    } else if (out == "t") {
        out <- summary(b.ipw.mod)$coefficients[2, 3]
    } else {
        stop("Unknown output statistic.")
    }
    out
}
b.samps.coef <- replicate(500, bootstrapLalonde())
b.samps.t <- replicate(500, bootstrapLalonde(out = "t"))

Check bootstrap output

  • We should expect a larger SE since we're including additional first stage variation.
c(coef(ipw.mod)[2], mean(b.samps.coef))
## treat       
##  1332  1389
c(summary(ipw.mod)$coefficients[2, 2], sd(b.samps.coef))
## [1] 697.9 767.1
c(summary(ipw.mod)$coefficients[2, 3], mean(b.samps.t))
## [1] 1.909 1.904
  • How do we reconcile this? Is it a problem?

Cluster Bootstrapping

  • When treatment was assigned with clustering, our bootstrap must account for that.
  • So we adjust the resampling procedure.
  • We now move away from Lalonde, and we'll look at a replication of (and extension to) the Green, Vavreck (2008) PA paper which examines the performance of cluster robust SEs in an experimental context.
  • Treatment was a GOTV advertising campaign
  • Outcome was 2004 turnout.
  • 23869 individuals 19 or under in 85 clusters.

Read in G/V data

dat <- read.csv(file = "GreenVavreck_PolAnalysis_2008_PA_Replication.csv", head = TRUE, 
    sep = ",")
f1 <- paste("tout1 ~ treat + ", paste(names(dat)[11:49], collapse = " + "))
ols1 <- lm(f1, dat)
summary(ols1)$coefficients[2, ]
##   Estimate Std. Error    t value   Pr(>|t|) 
##  0.0241120  0.0070322  3.4288223  0.0006072

Try Cluster Robust SEs

  • \({H \over H-1} { N-1 \over N -K} (X'X)^{-1} \left(\sum_{h=1}^H X_h' \hat{e}_h \hat{e}_h' X_h\right)(X'X)^{-1}\)
  • The "estimating function" evaluates \((Y_i-X_i'\beta)X_i\) that is, \(\hat{e}_i X_i\). It is \(N \times p\).
robust.se <- function(model, cluster) {
    require(sandwich)
    require(lmtest)
    M <- length(unique(cluster))
    N <- length(cluster)
    K <- model$rank
    dfc <- (M/(M - 1)) * ((N - 1)/(N - K))
    uj <- apply(estfun(model), 2, function(x) tapply(x, cluster, sum))
    rcse.cov <- dfc * sandwich(model, meat = crossprod(uj)/N)
    rcse.se <- coeftest(model, rcse.cov)
    return(list(rcse.cov, rcse.se))
}
robust.se(ols1, dat$syscode)[[2]][2, ]
## Note: no visible binding for global variable 'lmtest'
##   Estimate Std. Error    t value   Pr(>|t|) 
##    0.02411    0.01405    1.71577    0.08622
ses <- c(model = summary(ols1)$coefficients[2, 2], cl.robust = robust.se(ols1, 
    dat$syscode)[[2]][2, 2])

Bootstrap SEs

  • We should expect the cluster robust sandwich estimator to do pretty well here
  • We have an actual experiment, and thus pretty plausible "true" clustering.
H <- length(unique(dat$syscode))
clus <- unique(dat$syscode)
obs.in.clus <- sapply(clus, function(h) which(dat$syscode == h))
clus <- as.character(clus)
names(obs.in.clus) <- clus
pairsBoot.GV <- function(out = "coef") {
    b.samp <- sample(clus, replace = TRUE)
    b.obs <- unlist(sapply(b.samp, function(h) obs.in.clus[[h]]))
    b.ols <- lm(f1, dat[b.obs, ])
    if (out == "coef") {
        out <- coef(b.ols)[2]
    } else if (out == "t") {
        out <- summary(b.ols)$coefficients[2, 3]
    } else {
        stop("Unknown output type.")
    }
    out
}
b.samps.coef <- replicate(500, pairsBoot.GV())
# b.samps.t <- replicate(500, pairsBoot.GV(coef='t'))
ses <- c(ses, pairs.boot = sd(b.samps.coef))
round(ses, 3)
##      model  cl.robust pairs.boot 
##      0.007      0.014      0.024

Wild Cluster Bootstrap

  • I'm not showing the residual cluster bootstrap
  • Clusters are not of equal size in this data
  • (And its probably rare that they will be)
ols.fit <- predict(ols1)
ols.res <- residuals(ols1)
dat$syscode <- as.character(dat$syscode)
f.b <- paste("newY ~ treat + ", paste(names(dat)[11:49], collapse = " + "))
wildBoot.GV <- function(out = "coef") {
    toflip <- rbinom(H, 1, 0.5)
    names(toflip) <- clus
    toflip <- names(toflip)[toflip == 1]
    b.resid <- ifelse(dat$syscode %in% toflip, -ols.res, ols.res)
    newY <- ols.fit + b.resid
    b.ols <- lm(f.b, cbind(newY, dat))
    if (out == "coef") {
        out <- coef(b.ols)[2]
    } else if (out == "t") {
        out <- summary(b.ols)$coefficients[2, 3]
    } else {
        stop("Unknown output type.")
    }
    out
}
b.samps.coef <- replicate(500, wildBoot.GV())
ses <- c(ses, wild.boot = sd(b.samps.coef))
round(ses, 3)
##      model  cl.robust pairs.boot  wild.boot 
##      0.007      0.014      0.024      0.015

Instrumental Variables

  • \(\rho = {\text{Cov}(Y_i,Z_i) \over \text{Cov}(S_i,Z_i)} = { { \text{Cov}(Y_i,Z_i) \over \text{Var}(Z_i)} \over {\text{Cov}(S_i,Z_i) \over \text{Var}(Z_i)}} = {\text{Reduced form} \over \text{First stage}}\)
  • If we have a perfect instrument, this will be unbiased.
  • But bias is a function of both violation of exclusion restriction and of strength of first stage.
  • 2SLS has finite sample bias. (Cyrus showed this, but didn't dwell on it)
  • In particular, it can be shown that this bias "is":
    \({\sigma_{\eta \xi} \over \sigma_{\xi}^2}{1 \over F + 1}\)
    where \(\eta\) is the error in the structural model and \(\xi\) is the error in the first stage.
  • With an irrelevant instrument (\(F=0\)), the bias is equal to that of OLS (regression of \(Y\) on \(X\)).
  • There are some bias corrections for this, we might talk about this next week.

Setup IV example

  • For our example with IV, we will look at AJR (2001) - Colonial Origins of Comparative Development
  • Treatment is average protection from expropriation
  • Exogenous covariates are dummies for British/French colonial presence
  • Instrument is settler mortality
  • Outcome is log(GDP) in 1995
require(foreign)
dat <- read.dta("maketable5.dta")
dat <- subset(dat, baseco == 1)

Estimate IV via 2SLS

require(AER)
first <- lm(avexpr ~ logem4 + f_brit + f_french, dat)
iv2sls <- ivreg(logpgp95 ~ avexpr + f_brit + f_french, ~logem4 + f_brit + f_french, 
    dat)
require(car)
linearHypothesis(first, "logem4", test = "F")
## Linear hypothesis test
## 
## Hypothesis:
## logem4 = 0
## 
## Model 1: restricted model
## Model 2: avexpr ~ logem4 + f_brit + f_french
## 
##   Res.Df RSS Df Sum of Sq    F  Pr(>F)    
## 1     61 117                              
## 2     60  94  1        23 14.7 0.00031 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Examine Output

summary(iv2sls)
## 
## Call:
## ivreg(formula = logpgp95 ~ avexpr + f_brit + f_french | logem4 + 
##     f_brit + f_french, data = dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.2716 -0.7488  0.0728  0.7544  2.4004 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    1.372      1.388    0.99    0.327    
## avexpr         1.078      0.218    4.95  6.3e-06 ***
## f_brit        -0.778      0.354   -2.20    0.032 *  
## f_french      -0.117      0.355   -0.33    0.743    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.04 on 60 degrees of freedom
## Multiple R-Squared: 0.0483,  Adjusted R-squared: 0.000748 
## Wald test: 10.1 on 3 and 60 DF,  p-value: 1.82e-05

Sensitivity Analysis

  • Conley, Hansen and Rossi (2012)
  • Suppose that the exclusion restriction does NOT hold, and there exists a direct effect from the instrument to the outcome.
  • That is, the structural model is:
    \(Y = X\beta + Z\gamma + \epsilon\)
  • If \(\gamma\) is zero, the exclusion restriction holds (we're in a structural framework)
  • We can assume a particular value of \(\gamma\), take \(\tilde{Y} = Y - Z\gamma\) and estimate our model, gaining an estimate of \(\beta\).
  • This defines a sensitivity analysis on the exclusion restriction.
  • Subject to an assumption about the support of \(\gamma\), they suggest estimating in a grid over this domain, and then taking the union of the confidence intervals for each value of \(\gamma\) as the combined confidence interval (which will cover).
gamma <- seq(-1, 1, 0.25)
ExclSens <- function(g) {
    newY <- dat$logpgp95 - g * dat$logem4
    coef(ivreg(newY ~ avexpr + f_brit + f_french, ~logem4 + f_brit + f_french, 
        cbind(dat, newY)))[2]
}
sens.coefs <- sapply(gamma, ExclSens)
names(sens.coefs) <- gamma
round(sens.coefs, 3)
##     -1  -0.75   -0.5  -0.25      0   0.25    0.5   0.75      1 
## -0.793 -0.326  0.142  0.610  1.078  1.546  2.013  2.481  2.949