Bias

Drew Dimmery

February 21, 2014

Structure of Class

Today we're going to talk about:
1. *apply functions
2. Marginal Effects
3. Sensitivity analysis for post-treament bias

The *apply family

These functions allow one to efficiently perform a large number of actions on data.
apply - performs actions on the rows or columns of a matrix/array (1 for rows, 2 for columns, 3 for ??)
sapply - performs actions on every element of a vector
tapply - performs actions on a vector by group
replicate - performs the same action a given number of times

apply

##   c d
## a 1 3
## b 2 4

apply(A, 1, sum)

## a b 
## 4 6

apply(A, 2, mean)

##   c   d 
## 1.5 3.5

sapply

vec

## [1] "a" "b" "c"

sapply(vec, function(x) paste0(x, ".vec"))

##       a       b       c 
## "a.vec" "b.vec" "c.vec"

Can be accomplished more simply with:

paste0(vec, ".vec")

## [1] "a.vec" "b.vec" "c.vec"

Why?
replicate is basically just sapply(1:N,funct) where funct never uses the index.

tapply

tapply(1:10, makeGroups(5, 2), mean)

##   1   2   3   4   5 
## 1.5 3.5 5.5 7.5 9.5

Local Linear Regression

\(W\) is an \(n\times p\) diagonal weighting matrix, \(h\) is a "bandwidth".
Diagonal entries are \(\frac{3}{4}\cdot(1-d^2)\cdot 1_{\{|d|\le 1\}}\) where \(d = \frac{X-c}{h}\)
\(\hat{\beta}_c = (X'WX)^{-1}X'WY\)
Covariance matrix is \(s^2(X'WX)^{-1}\)

loc.lin <- function(Y, X, c = 0, bw = sd(X)/2) {
    d <- (X - c)/bw
    W <- 3/4 * (1 - d^2) * (abs(d) < 1)
    W <- diag(W)
    X <- cbind(1, d)
    b <- solve(t(X) %*% W %*% X) %*% t(X) %*% W %*% Y
    sigma <- t(Y - X %*% b) %*% W %*% (Y - X %*% b)/(sum(diag(W) > 0) - 2)
    sigma <- solve(t(X) %*% W %*% X) * c(sigma)
    return(c(est = b[1], se = sqrt(diag(sigma))[1]))
}

Simulate some Data

set.seed(1023)  # Important for replication
X <- rnorm(1000, 0, 5)
Y <- sin(5 * X) * exp(abs(X)) + rnorm(1000)
dat <- data.frame(X, Y)
plot(X, Y, xlim = c(0, 5), ylim = c(-50, 50))

Look at the Kernel

x <- seq(-1, 1, 0.01)
y <- 3/4 * (1 - x^2)
plot(x, y, type = "l", xlab = "h", ylab = "weight")

Fit the Surface

X.est <- seq(0, 5, 0.1)
dat.llm <- sapply(X.est, function(x) loc.lin(Y, X, c = x, bw = 0.25))
plot(X, Y, xlim = c(0, 5), ylim = c(-50, 50), pch = 20)
lines(X.est, dat.llm[1, ], col = "red")
lines(X.est, dat.llm[1, ] + 1.96 * dat.llm[2, ], col = "pink")
lines(X.est, dat.llm[1, ] - 1.96 * dat.llm[2, ], col = "pink")

Introduce Example

We'll be working with data from a paper in the most recent issue of IO.
Helfer, L.R. and E. Voeten. (2014) "International Courts as Agents of Legal Change: Evidence from LGBT Rights in Europe"
The treatment we are interested in is the presence of absence of a ECtHR judgment.
The outcome is the adoption of progressive LGBT policy.
And there's a battery of controls, of course.
Voeten has helpfully put all replication materials online.

Prepare example

require(foreign, quietly = TRUE)
d <- read.dta("replicationdataIOLGBT.dta")
# Base specification
d$ecthrpos <- as.double(d$ecthrpos) - 1
d.lm <- lm(policy ~ ecthrpos + pubsupport + ecthrcountry + lgbtlaws + cond + 
    eumember0 + euemploy + coemembe + lngdp + year + issue + ccode, d)
d <- d[-d.lm$na.action, ]
d$issue <- as.factor(d$issue)
d$ccode <- as.factor(d$ccode)
summary(d.lm)$coefficients[1:11, ]

##                Estimate Std. Error t value  Pr(>|t|)
## (Intercept)  -1.589e+00  4.956e-01 -3.2052 1.360e-03
## ecthrpos      6.501e-02  1.056e-02  6.1537 8.289e-10
## pubsupport    6.549e-03  2.743e-03  2.3877 1.700e-02
## ecthrcountry  1.297e-01  3.584e-02  3.6201 2.980e-04
## lgbtlaws      2.358e-02  6.281e-03  3.7548 1.759e-04
## cond          9.277e-02  1.796e-02  5.1657 2.509e-07
## eumember0    -8.586e-03  8.498e-03 -1.0105 3.123e-01
## euemploy      3.659e-03  1.269e-02  0.2883 7.731e-01
## coemembe      2.083e-02  7.277e-03  2.8623 4.227e-03
## lngdp        -7.522e-07  4.501e-07 -1.6711 9.477e-02
## year          8.020e-04  2.522e-04  3.1799 1.484e-03

Marginal Effects

There has seemed to be a bit of confusion over marginal effects.
The Blattman paper in HW3 uses marginal effects "well" in the sense of causal inference.
The Huber et al. paper uses them in a very standard way, but perhaps not the way we'd want to think about them in THIS class.
Use the builtin predict function; it will make your life easier.

d.lm.interact <- lm(policy ~ ecthrpos * pubsupport + ecthrcountry + lgbtlaws + 
    cond + eumember0 + euemploy + coemembe + lngdp + year + issue + ccode, d)
frame0 <- frame1 <- model.frame(d.lm.interact)
frame0[, "ecthrpos"] <- 0
frame1[, "ecthrpos"] <- 1
meff <- mean(predict(d.lm.interact, newd = frame1) - predict(d.lm.interact, 
    newd = frame0))
meff

## [1] 0.08197

Why might this be preferable to "setting things at their means/medians"?
It's essentially integrating over the sample's distribution of observed characteristics.
(And if the sample is a SRS from the population [or survey weights make it LOOK like it is], this will then get you the marginal effect on the population of interest)

Delta Method

Note 1: We know that our vector of coefficients are asymptotically multivariate normal.
Note 2: We can approximate the distribution of many (not just linear) functions of these coefficients using the delta method.
Delta method says that you can approximate the distribution of \(h(b_n)\) with \(\bigtriangledown{h}(b)'\Omega\bigtriangledown{h}(b)\) Where \(\Omega\) is the asymptotic variance of \(b\).
In practice, this means that we just need to be able to derive the function whose distribution we wish to approximate.

Trivial Example

We're interested in the ratio of the coefficient on ecthrpos to that of pubsupport.
Call it \(b_2 \over b_3\). The gradient is \((\frac{1}{b_3}, \frac{b_2}{b_3^2})\)
Estimate this easily in R with:

grad <- c(1/coef(d.lm)[3], coef(d.lm)[2]/coef(d.lm)[3]^2)
grad

## pubsupport   ecthrpos 
##        334       8046

se <- sqrt(t(grad) %*% vcov(d.lm)[2:3, 2:3] %*% grad)
est <- coef(d.lm)[2]/coef(d.lm)[3]
c(estimate = est, std.error = se)

## estimate.ecthrpos         std.error 
##             24.09             35.33

require(car)
deltaMethod(d.lm, "ecthrpos/pubsupport")

##                     Estimate    SE
## ecthrpos/pubsupport    24.09 35.55

Linear Functions

But for most "marginal effects", you don't need to use the delta method.
Just remember your rules for variances.
\(\text{var}(aX+bY) = a^2\text{var}(X) + b^2\text{var}(Y) + 2ab\text{cov}(X,Y)\)
If you are just looking at changes with respect to a single variable, you can just multiply standard errors.
That is, a change in a variable of 3 units means that the standard error for the marginal effect would be 3 times the standard error of the coefficient.
This isn't what Clarify does, though. It is weird.

Zelig for Marginal Effects

(Zelig works like Clarify. [gee, I wonder why?])

# install.packages('Zelig', repos='http://r.iq.harvard.edu', type='source')
require(Zelig, quietly = TRUE)
d.zg <- zelig(policy ~ ecthrpos * pubsupport + ecthrcountry + lgbtlaws + cond + 
    eumember0 + euemploy + coemembe + lngdp + year + issue + ccode, d, model = "ls", 
    cite = FALSE)
x0 <- setx(d.zg, ecthrpos = 0)
x1 <- setx(d.zg, ecthrpos = 1)
out <- sim(d.zg, x = x0, x1 = x1)
c(mean(out$qi$fd), meff)

## [1] 0.08150 0.08197

Sensitivity Analysis

We're adding to Cyrus's discussion on post-treatment bias with a sensitivity analysis.
This is also in Rosenbaum (1984), which he mentioned in class.
The variable which one might think could induce post-treatment bias in our example is that of "public acceptance".

Rosenbaum Bounding

In general Rosenbaum is a proponent of trying to "bound" biases.
He does this in his "normal" sensitivity analysis method, and we do the same, here.
We will assume a "surrogate" for \(U\) (necessary for CIA), which is observed post-treatment.
The surrogate has two potential outcomes: \(S_1\) and \(S_0\)
It is presumed to have a linear response on the outcome.
(As are the other observed covariates)
This gives us the following two regression models: \(E[Y_1|S_1 = s , X = x] = \mu_1 + \beta' x + \gamma's\) and
\(E[Y_0|S_0 = s , X = x] = \mu_0 + \beta' x + \gamma's\)
This gives us:
\(\tau = E[ (\mu_1 + \beta' X + \gamma'S_1) - (\mu_0 + \beta' X + \gamma'S_0)]\)
Which is equal to the following useful expression:
\(\tau = \mu_1 - \mu_0 + \gamma'( E[S_1 - S_0])\)
For us, this means that \(\tau = \beta_1 + \beta_2 E[S_1 - S_0]\)

Back to example

Our surrogate is public acceptance.
But it can be swayed by court opinions, right? This is at least plausible.
Let's try and get some reasonable bounds on \(\tau\).

sdS <- sd(d$pubsupport)
Ediff <- c(-1.5 * sdS, -sdS, -sdS/2, 0, sdS/2, sdS, 1.5 * sdS)
tau <- coef(d.lm)[2] + coef(d.lm)[3] * Ediff
names(tau) <- c("-1.5", "-1", "-.5", "0", ".5", "1", "1.5")
tau

##    -1.5      -1     -.5       0      .5       1     1.5 
## 0.06621 0.06818 0.07015 0.07212 0.07409 0.07606 0.07803

But with this method, you don't necessarily have to assume that the regression functions are this rigid.
Can you think about how one might relax some assumptions?