Sensitivity and Effective Samples

Drew Dimmery

February 14, 2014

Plan for today

  • First homework
    • Some issues with identification / estimation
  • Second homework
    • Some basic asymptotics
  • Effective Samples in regression
    • Aronow & Samii (2013)
  • Confounding & Sensitivity Analysis
    • Imbens (2003)
    • And generalizations (which are easy)

Identification / Estimation

  • The "Four questions" may have been a little confusing.
  • Let me re-break down the last two questions as "Identification" and "Estimation"
  • "A regression is causal when the CEF it approximates is causal" - MHE
  • Identification consists (in this context) as the set of things you need to believe in order to believe the CEF is causal.
  • Thus, when we think about identification, we should think about assumptions.
  • Estimation is the process you use to estimate the CEF.

In practice

  • The Conditional Independence Assumption would fall in identification.
  • I'd also throw in assumptions of additivity / linearity in this pot.
  • Estimation would include the populations/samples of interest, and all statistical inference.
  • But there's a close connection between the two.
  • Problems in estimation can lead to changes in identifying assumptions.
  • So they aren't completely separable.

Basic Identities and Definitions

  • Convergence in probability:
    \(X_n\,{\buildrel p \over \to}\,X \text{ if } \forall \epsilon > 0 \quad \Pr(|X_n - X| > \epsilon) \to 0\text{ as }n \to \infty\)
  • A very basic decomposition of variance:
    \(\text{var}(X)=E[X^2]-E[X]^2\)
  • Something we do NOT know: \(E[XY]=E[X]\times E[Y]\)
  • Pretty much everybody assumed this to be true.

Problem Set 2

  • We want to find the probability limit of \({D_d'D_d \over N}\)
  • Pretty much everyone correctly observed that this was:
    \[\begin{pmatrix} \frac{1}{N}\sum_i (X_i - \bar{X})^2 & \frac{1}{N}\sum_i (X_i - \bar{X})(Z_i - \bar{Z}) \\ \frac{1}{N}\sum_i (X_i - \bar{X})(Z_i - \bar{Z}) & \frac{1}{N}\sum_i (Z_i - \bar{Z})^2 \end{pmatrix}\]
  • We can't simply say that this is equal to the following, though:
    \[\begin{pmatrix} \frac{1}{N}\sum_i (X_i - p)^2 & \frac{1}{N}\sum_i (X_i - p)(Z_i - q) \\ \frac{1}{N}\sum_i (X_i - p)(Z_i - q) & \frac{1}{N}\sum_i (Z_i - q)^2 \end{pmatrix}\]
  • This is a fact which you should all be able to recognize as false IMMEDIATELY.

Continued

  • The sample variance is BIASED when divided by \(N\) rather than \(N-1\).
  • It is still consistent, though, which is what is relevant to us.
  • The simple proof is as follows:
    \({\sum_i (X_i - \bar{X})^2 \over N} = \frac{1}{N} \left( \sum_i X_i^2 - 2 \sum_i \bar{X}X_i + \sum_i \bar{X}^2\right)\)
  • We can pull out constants from the sums:
    \(\frac{1}{N}\left(\sum_i X_i^2 - 2\bar{X}\sum_i X_i + \bar{X}^2 \sum_i 1 \right)\)
  • And simplify some things:
    \(\frac{\sum_i X_i^2}{N} - 2\bar{X}^2 + \bar{X}^2 = \frac{\sum_i X_i^2}{N} - \frac{\sum_i \bar{X}^2}{N}\)
  • This is the point at which we can use the LLN. Averages converge in probability to their expectations.
    \(E[X^2]-E[\bar{X}^2]\)
  • Use this identity: \(E[Z^2] = E[Z]^2+\text{var}(Z)\)
  • Which gives us: \(E[X]^2+\text{var}(X)\) and \(E[\bar{X}]^2+\text{var}(\bar{X})\)
  • Substituting in these identities:
    \(E[X]^2+\text{var}(X)-E[\bar{X}]^2-\text{var}(\bar{X})\)
  • Which gives us: \(p^2+p(1-p)-p^2-\frac{p(1-p)}{N} = \frac{N-1}{N}p(1-p)\)
  • Which converges in probability to \(p(1-p)\) (i.e. it is consistent)

Finishing it up:

  • After that there weren't many issues
  • We know off-diagonals are zero because \(X\) and \(Z\) are independent.
  • This means we have a diagonal matrix, which is easy to invert (just take the inverse of the diagonal elements)
  • We then use a similar method to find \(D_d'Y\), and multiply them together, which completes the first part.
  • For the next part, observe that we know \(\text{var}(X_i)=p(1-p)\), so we only need to think about the expectation of the numerator.
  • We end up with the following:
    \(\text{cov}(X,Y)=E[XY]-E[X]E[Y] = E[Y|X=1]p(X=1)-pE[Y]\)
  • We then use the law of total probability on \(E[Y]\):
    \(pE[Y|X=1] - p(pE[Y|X=1] + (1-p)E[Y|X=0])\)
  • Distribute and collect terms:
    \(p(1-p)\left( E[Y|X=1] - E[Y|X=0]\right)\)
  • By random assignment of X, we have:
    \(p(1-p)\left( E[Y_1 - Y_0] \right)\)
  • So dividing by \(p(1-p)\) will leave us with \(\rho\)

Effective Samples

  • We're going to be investigating how to check the properties of your effective sample in regression.
  • The key result is:
    \(\hat{\rho}_{reg}\,{\buildrel p \over \to}\,\frac{E[w_i \rho_i]}{E[w_i]} \text{ where } w_i = (D_i - E[D_i|X_i])^2\)
  • We estimate these weights with:
    \(\hat{w}_i = \hat{D}_i^2\) where \(D_i^2\) is the \(i\)th squared residual.
  • Because these estimates are "bad" for each unit, using them to reweight the sample is a bad idea.
  • Instead, we just use them to get a sense for what the effective sample is by examining the weight allocated to particular strata.
  • We will now explore how to do this.

Example paper

  • We will be looking at Egan and Mullin (2012)
  • This paper explores the effect of local weather variations on belief in global warming.
  • Very cool paper! With an interesting randomized treatment.
  • But what is the effective sample?
  • In other words, where is weather (conditional on covariates) most variable?
  • That's what we'll explore.

Load in data

require(foreign)
d <- read.dta("gwdataset.dta")
## Warning: value labels ('q2') for 'jan07_q2' are missing
zips <- read.dta("zipcodetostate.dta")
zips <- unique(zips[, c("statenum", "statefromzipfile")])
pops <- read.csv("population_ests_2013.csv")
pops$state <- tolower(pops$NAME)
d$getwarmord <- as.double(d$getwarmord)
# And estimate primary model of interest:
out <- lm(getwarmord ~ ddt_week + educ_hsless + educ_coll + educ_postgrad + 
    educ_dk + party_rep + party_leanrep + party_leandem + party_dem + male + 
    raceeth_black + raceeth_hisp + raceeth_notwbh + raceeth_dkref + age_1824 + 
    age_2534 + age_3544 + age_5564 + age_65plus + age_dk + ideo_vcons + ideo_conservative + 
    ideo_liberal + ideo_vlib + ideo_dk + attend_1 + attend_2 + attend_3 + attend_5 + 
    attend_6 + attend_9 + as.factor(doi) + as.factor(statenum) + as.factor(wbnid_num), 
    d)

Base Model

  • We won't worry about standard errors yet.
summary(out)$coefficients[1:10, ]
##                Estimate Std. Error t value  Pr(>|t|)
## (Intercept)    1.945740   0.771479  2.5221 1.169e-02
## ddt_week       0.004858   0.002476  1.9621 4.980e-02
## educ_hsless    0.057740   0.024484  2.3583 1.839e-02
## educ_coll      0.021457   0.027095  0.7919 4.284e-01
## educ_postgrad  0.049387   0.030734  1.6069 1.081e-01
## educ_dk        0.102936   0.250495  0.4109 6.811e-01
## party_rep     -0.243900   0.036627 -6.6591 2.992e-11
## party_leanrep -0.092809   0.041812 -2.2197 2.648e-02
## party_leandem  0.147409   0.039024  3.7774 1.600e-04
## party_dem      0.175427   0.035267  4.9743 6.725e-07

Estimate D^2

  • We can simply square the residuals of a partial regression to get \(D^2\):
outD <- lm(ddt_week ~ educ_hsless + educ_coll + educ_postgrad + educ_dk + party_rep + 
    party_leanrep + party_leandem + party_dem + male + raceeth_black + raceeth_hisp + 
    raceeth_notwbh + raceeth_dkref + age_1824 + age_2534 + age_3544 + age_5564 + 
    age_65plus + age_dk + ideo_vcons + ideo_conservative + ideo_liberal + ideo_vlib + 
    ideo_dk + attend_1 + attend_2 + attend_3 + attend_5 + attend_6 + attend_9 + 
    as.factor(doi) + as.factor(statenum) + as.factor(wbnid_num), d)
D2 <- residuals(outD)^2

Effective Sample Statistics

  • We can use these estimated weights for examining the sample.
compare_samples <- d[, c("wave", "ddt_week", "ddt_twoweeks", "ddt_threeweeks", 
    "party_rep", "attend_1", "ideo_conservative", "age_1824", "educ_hsless")]
compare_samples <- apply(compare_samples, 2, function(x) c(mean(x), sd(x), weighted.mean(x, 
    D2), sqrt(weighted.mean((x - weighted.mean(x, D2))^2, D2))))
compare_samples <- t(compare_samples)
colnames(compare_samples) <- c("Nominal Mean", "Nominal SD", "Effective Mean", 
    "Effective SD")
compare_samples
##                   Nominal Mean Nominal SD Effective Mean Effective SD
## wave                   3.09694     1.4253        3.20788       1.5609
## ddt_week               3.83549     5.9047        5.11579      10.8980
## ddt_twoweeks           3.85506     5.4572        5.00137       9.2263
## ddt_threeweeks         3.96720     4.7690        5.10859       8.4348
## party_rep              0.29527     0.4562        0.28978       0.4537
## attend_1               0.11433     0.3182        0.12343       0.3289
## ideo_conservative      0.31133     0.4631        0.29325       0.4553
## age_1824               0.07196     0.2584        0.06881       0.2531
## educ_hsless            0.34151     0.4743        0.31220       0.4634

Effective Sample Maps

  • But one of the most interesting things is to see this visually.
  • Where in the US does the effective sample emphasize?
  • To get at this, we'll use some tools in R that make this incredibly easy.
  • In particular, we'll do this in ggplot2.
# Effective sample by state
wt.by.state <- tapply(D2, d$statenum, sum)
wt.by.state <- wt.by.state/sum(wt.by.state) * 100
wt.by.state <- cbind(D2 = wt.by.state, statenum = names(wt.by.state))
data_for_map <- merge(wt.by.state, zips, by = "statenum")
# Nominal Sample by state
wt.by.state <- tapply(rep(1, 6726), d$statenum, sum)
wt.by.state <- wt.by.state/sum(wt.by.state) * 100
wt.by.state <- cbind(Nom = wt.by.state, statenum = names(wt.by.state))
data_for_map <- merge(data_for_map, wt.by.state, by = "statenum")
# Get correct state names
require(maps, quietly = TRUE)
data(state.fips)
data_for_map <- merge(state.fips, data_for_map, by.x = "abb", by.y = "statefromzipfile")
data_for_map$D2 <- as.double(as.character(data_for_map$D2))
data_for_map$Nom <- as.double(as.character(data_for_map$Nom))
data_for_map$state <- sapply(as.character(data_for_map$polyname), function(x) strsplit(x, 
    ":")[[1]][1])
data_for_map$Diff <- data_for_map$D2 - data_for_map$Nom
data_for_map <- merge(data_for_map, pops, by = "state")
data_for_map$PopPct <- data_for_map$POPESTIMATE2013/sum(data_for_map$POPESTIMATE2013) * 
    100
data_for_map$PopDiffEff <- data_for_map$D2 - data_for_map$PopPct
data_for_map$PopDiffNom <- data_for_map$Nom - data_for_map$PopPct
data_for_map$PopDiff <- data_for_map$PopDiffEff - data_for_map$PopDiffNom
require(ggplot2, quietly = TRUE)
state_map <- map_data("state")

More setup

plotEff <- ggplot(data_for_map, aes(map_id = state))
plotEff <- plotEff + geom_map(aes(fill = D2), map = state_map)
plotEff <- plotEff + expand_limits(x = state_map$long, y = state_map$lat)
plotEff <- plotEff + scale_fill_continuous("% Weight", limits = c(0, 16), low = "white", 
    high = "black")
plotEff <- plotEff + labs(title = "Effective Sample")
plotEff <- plotEff + theme(legend.position = c(0.2, 0.1), legend.direction = "horizontal", 
    axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), 
    axis.title = element_blank(), panel.background = element_blank(), plot.background = element_blank(), 
    panel.border = element_blank(), panel.grid = element_blank())

plotNom <- ggplot(data_for_map, aes(map_id = state))
plotNom <- plotNom + geom_map(aes(fill = Nom), map = state_map)
plotNom <- plotNom + expand_limits(x = state_map$long, y = state_map$lat)
plotNom <- plotNom + scale_fill_continuous("% Weight", limits = c(0, 16), low = "white", 
    high = "black")
plotNom <- plotNom + labs(title = "Nominal Sample")
plotNom <- plotNom + theme(legend.position = c(0.2, 0.1), legend.direction = "horizontal", 
    axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), 
    axis.title = element_blank(), panel.background = element_blank(), plot.background = element_blank(), 
    panel.border = element_blank(), panel.grid = element_blank())

And the maps

require(gridExtra, quietly = TRUE)
grid.arrange(plotNom, plotEff, ncol = 2)

Setup Comparison Plot

plotDiff <- ggplot(data_for_map, aes(map_id = state))
plotDiff <- plotDiff + geom_map(aes(fill = Diff), map = state_map)
plotDiff <- plotDiff + expand_limits(x = state_map$long, y = state_map$lat)
plotDiff <- plotDiff + scale_fill_gradient2("% Weight", low = "red", mid = "white", 
    high = "black")
plotDiff <- plotDiff + labs(title = "Effective Weight Minus Nominal Weight")
plotDiff <- plotDiff + theme(legend.position = c(0.2, 0.1), legend.direction = "horizontal", 
    axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), 
    axis.title = element_blank(), panel.background = element_blank(), plot.background = element_blank(), 
    panel.border = element_blank(), panel.grid = element_blank())

Difference in Weights

plotDiff

Population Comparison

plotEff <- ggplot(data_for_map, aes(map_id = state))
plotEff <- plotEff + geom_map(aes(fill = PopDiffEff), map = state_map)
plotEff <- plotEff + expand_limits(x = state_map$long, y = state_map$lat)
plotEff <- plotEff + scale_fill_gradient2("% Weight", limits = c(-2, 6), low = "red", 
    mid = "white", high = "black")
plotEff <- plotEff + labs(title = "Effective Sample")
plotEff <- plotEff + theme(legend.position = c(0.2, 0.1), legend.direction = "horizontal", 
    axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), 
    axis.title = element_blank(), panel.background = element_blank(), plot.background = element_blank(), 
    panel.border = element_blank(), panel.grid = element_blank())

plotNom <- ggplot(data_for_map, aes(map_id = state))
plotNom <- plotNom + geom_map(aes(fill = PopDiffNom), map = state_map)
plotNom <- plotNom + expand_limits(x = state_map$long, y = state_map$lat)
plotNom <- plotNom + scale_fill_gradient2("% Weight", limits = c(-2, 6), low = "red", 
    mid = "white", high = "black")
plotNom <- plotNom + labs(title = "Nominal Sample")
plotNom <- plotNom + theme(legend.position = c(0.2, 0.1), legend.direction = "horizontal", 
    axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), 
    axis.title = element_blank(), panel.background = element_blank(), plot.background = element_blank(), 
    panel.border = element_blank(), panel.grid = element_blank())

Population Comparison Plots

grid.arrange(plotNom, plotEff, ncol = 2)

Setup New Comparison Plot

plotDiff <- ggplot(data_for_map, aes(map_id = state))
plotDiff <- plotDiff + geom_map(aes(fill = PopDiff), map = state_map)
plotDiff <- plotDiff + expand_limits(x = state_map$long, y = state_map$lat)
plotDiff <- plotDiff + scale_fill_gradient2("% Weight", low = "red", mid = "white", 
    high = "black")
plotDiff <- plotDiff + labs(title = "Effective Weight Minus Nominal Weight")
plotDiff <- plotDiff + theme(legend.position = c(0.2, 0.1), legend.direction = "horizontal", 
    axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), 
    axis.title = element_blank(), panel.background = element_blank(), plot.background = element_blank(), 
    panel.border = element_blank(), panel.grid = element_blank())

Plot Difference

plotDiff

Sensitivity Analysis

  • The homework mentions a couple places to find pre existing code.
  • I'm going to walk you through how to do a generalized version of the Imbens (2003) method.
  • It may be easier to use one of the canned routines for your homework, though.
  • We're going to keep working with Pat's data, since we already have it handy.
  • Imbens process:
    • Simulate (or imagine simulating) an unobserved confounder like the following:
      \(Y_d | X, U \sim \mathcal{N}(\tau d + \boldsymbol\beta'X + \delta U, \sigma^2)\)
      \(D | X, U \sim f(\boldsymbol\gamma'X + \alpha U)\) (with \(f\) known)
    • That is, \(Y_1, Y_0 \perp D | X, U\)
    • So we want to generate an additively linear confounder with both \(D\) and \(Y\).

Generate a confounder

  • For our analysis, \(Y\) is belief in global warming and \(D\) is local variation in temperature.
  • We want to standardize these variables first.
d$getwarmord <- scale(d$getwarmord)
d$ddt_week <- scale(d$ddt_week)
genConfound <- function(alpha, delta) {
    e <- rnorm(nrow(d), 0, 1)
    U <- alpha * d$ddt_week + delta * d$getwarmord + e
    return(U)
}
  • So we can vary partial correlations with \(D\) and \(Y\) by varying alpha and delta.
U1 <- genConfound(0, 2)
U2 <- genConfound(10, 10)
c(D = cor(U1, d$ddt_week), Y = cor(U1, d$getwarmord))
##       D       Y 
## 0.04399 0.89670
c(D = cor(U2, d$ddt_week), Y = cor(U2, d$getwarmord))
##    D    Y 
## 0.72 0.72
c(D = coef(lm(paste0("ddt_week~U1+", X), d))["U1"], Y = coef(lm(paste0("getwarmord~U1+", 
    X), d))["U1"])
##     D.U1     Y.U1 
## 0.008375 0.390998
c(D = coef(lm(paste0("ddt_week~U2+", X), d))["U2"], Y = coef(lm(paste0("getwarmord~U2+", 
    X), d))["U2"])
##    D.U2    Y.U2 
## 0.03238 0.06690

Continued

  • More importantly, we can see how this changes our estimate of the treatment effect:
out <- lm(paste0("getwarmord~ddt_week+", X), d)
coef(out)["ddt_week"]
## ddt_week 
##  0.03618
coef(lm(paste0("getwarmord~ddt_week+U1+", X), d))["ddt_week"]
##  ddt_week 
## 0.0008224
coef(lm(paste0("getwarmord~ddt_week+U2+", X), d))["ddt_week"]
## ddt_week 
##  -0.9914
  • Now we want to do this over a larger number of values of alpha and delta
alphas <- rnorm(250, 0, 0.5)
deltas <- rnorm(250, 0, 0.5)
results <- NULL
for (i in seq_len(length(alphas))) {
    U <- genConfound(alphas[i], deltas[i])
    corD <- cor(U, d$ddt_week)
    corY <- cor(U, d$getwarmord)
    estTE <- coef(lm(paste0("getwarmord~ddt_week+U+", X), d))["ddt_week"]
    names(estTE) <- NULL
    res <- c(estTE = estTE, corD = corD, corY = corY)
    results <- rbind(results, res)
}
results <- cbind(results, TEchange = (results[, "estTE"] - coef(out)["ddt_week"]))

Plot Simulation Code

color <- ifelse(results[, "estTE"] <= 0.5 * coef(out)["ddt_week"], "red", NA)
color <- ifelse(is.na(color) & results[, "estTE"] >= 1.5 * coef(out)["ddt_week"], 
    "blue", color)
color <- ifelse(is.na(color), "green", color)
plot(results[, "corD"], results[, "corY"], col = color, xlab = "correlation with D", 
    ylab = "correlation with Y", xlim = c(-1, 1), ylim = c(-1, 1))
vars <- strsplit(X, "[+]", perl = TRUE)[[1]]
vars <- vars[grep("factor", vars, invert = TRUE)]
for (v in vars) {
    corD <- with(d, cor(get(v), d$ddt_week))
    corY <- with(d, cor(get(v), d$getwarmord))
    points(corD, corY, pch = "+", col = "black")
}
abline(v = 0, col = "grey", lty = 3)
abline(h = 0, col = "grey", lty = 3)

Plot of the Results

Blackwell (2013)

  • Instead, imagine a function which defines the confounding.
  • \(q(d,x) = E[Y_i(d)|D_i = d, X_i = x] - E[Y_i(d)|D_i = 1 - d, X_i = x]\)
  • Treated counterfactuals always higher (lower): \(q(d,x;\alpha) = \alpha\)
  • Treated group potential outcomes always higher (lower): \(q(d,x;\alpha) = \alpha(2 d - 1)\)
  • Package on CRAN: causalsens
  • You should probably use this for the homework.

Example

  • Remove the fixed effects to make it sensitive:
require(causalsens)
d$ddt_week <- ifelse(d$ddt_week > 0, 1, 0)
out <- lm(paste0("getwarmord~ddt_week+", paste(vars, collapse = "+")), data = d)
coef(out)["ddt_week"]
## ddt_week 
##  0.04557
outD <- glm(paste0("ddt_week~", paste(vars, collapse = "+")), data = d, family = binomial())
alpha <- seq(-0.1, 0.1, by = 0.001)
SensAnalysis <- causalsens(out, outD, as.formula(paste0("~", paste(vars, collapse = "+"))), 
    data = d, alpha = alpha, confound = one.sided)

Sensitivity Plots

par(mfrow = c(1, 2))
plot(SensAnalysis, type = "raw", bty = "n")
plot(SensAnalysis, type = "r.squared", bty = "n")