Why might this be preferable to “setting things at their means/medians”?
It’s essentially integrating over the sample’s distribution of observed characteristics.
(And if the sample is a SRS from the population [or survey weights make it LOOK like it is], this will then get you the marginal effect on the population of interest)
Delta Method
Note 1: We know that our vector of coefficients are asymptotically multivariate normal.
Note 2: We can approximate the distribution of many (not just linear) functions of these coefficients using the delta method.
Delta method says that you can approximate the distribution of \(h(b_n)\) with \(\bigtriangledown{h}(b)'\Omega\bigtriangledown{h}(b)\) Where \(\Omega\) is the asymptotic variance of \(b\).
In practice, this means that we just need to be able to derive the function whose distribution we wish to approximate.
Trivial Example
Maybe we’re interested in the ratio of the coefficient on ecthrpos to that of pubsupport.
Call it \(b_2 \over b_3\). The gradient is \((\frac{1}{b_3}, \frac{b_2}{b_3^2})\)
Estimate this easily in R with:
grad<-c(1/coef(d.lm)[3],coef(d.lm)[2]/coef(d.lm)[3]^2)
grad
If you are just looking at changes with respect to a single variable, you can just multiply standard errors.
That is, a change in a variable of 3 units means that the standard error for the marginal effect would be 3 times the standard error of the coefficient.
If we have a perfect instrument, this will be unbiased.
But bias is a function of both violation of exclusion restriction and of strength of first stage.
2SLS has finite sample bias. (Cyrus showed this, but didn’t dwell on it)
In particular, it can be shown that this bias “is”: \({\sigma_{\eta \xi} \over \sigma_{\xi}^2}{1 \over F + 1}\) where \(\eta\) is the error in the structural model and \(\xi\) is the error in the first stage.
With an irrelevant instrument (\(F=0\)), the bias is equal to that of OLS (regression of \(Y\) on \(X\)).
There are some bias corrections for this, we might talk about this next week.
Setup IV example
For our example with IV, we will start with AJR (2001) - Colonial Origins of Comparative Development
Treatment is average protection from expropriation
Exogenous covariates are dummies for British/French colonial presence
Instrument is settler mortality
Outcome is log(GDP) in 1995
require(foreign,quietly=TRUE)
dat <-read.dta("maketable5.dta")
dat <-subset(dat, baseco==1)
Estimate IV via 2SLS
require(AER,quietly=TRUE)
first <-lm(avexpr~logem4+f_brit+f_french,dat)
iv2sls<-ivreg(logpgp95~avexpr+f_brit+f_french,~logem4+f_brit+f_french,dat)
require(car)
linearHypothesis(first,"logem4",test="F")
## Linear hypothesis test
##
## Hypothesis:
## logem4 = 0
##
## Model 1: restricted model
## Model 2: avexpr ~ logem4 + f_brit + f_french
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 61 116.983
## 2 60 94.013 1 22.969 14.659 0.0003101 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Suppose that the exclusion restriction does NOT hold, and there exists a direct effect from the instrument to the outcome.
That is, the structural model is: \(Y = X\beta + Z\gamma + \epsilon\)
If \(\gamma\) is zero, the exclusion restriction holds (we’re in a structural framework)
We can assume a particular value of \(\gamma\), take \(\tilde{Y} = Y - Z\gamma\) and estimate our model, gaining an estimate of \(\beta\).
This defines a sensitivity analysis on the exclusion restriction.
Subject to an assumption about the support of \(\gamma\), they suggest estimating in a grid over this domain, and then taking the union of the confidence intervals for each value of \(\gamma\) as the combined confidence interval (which will cover).