Regression Discontinuity

Drew Dimmery

April 4, 2014

Structure

  • RDD interpretation
  • RDD estimation
  • Placebo tests
  • Sorting
  • Other stuff

Interpretation

  • It's a LATE!
  • A different kind of LATE!
  • It can be interpreted as a weighted average over all units (Lee & Lemieux 2010)
  • \((W,U)\) are observed and unobserved factors which explain all heterogeneity.
  • \(X=c\) is the cutpoint on the running variable, \(Y\) is the outcome \(\lim_{\epsilon \downarrow 0} E[Y|X=c+\epsilon] - \lim_{\epsilon\uparrow 0} E[Y|X=c+\epsilon]\)
    \(= \sum_{w,u} \tau(w,u) p(W=w,U=u|X=c)\)
    \(= \sum_{w,u} \tau(w,u) {f(c|W=w,U=u) \over f(c)} p(W=w,U=u)\)
  • What does this mean?
  • It's a weight of individual treatment effects weighted by the likelihood that a unit will lie near the threshhold on the running variable.
  • Keep this in mind as you interpret results.

Estimation

  • If only someone wrote a package to do this...
  • http://github.com/ddimmery/rdd
  • The current best pracices is to use local polynomial regression.
  • Typically linear
  • There are also some interesting methods using randomization inference, though. (Cattaneo et al n.d.)

Replication

  • I'll be replicating the recent Meyersson paper that's been making noise.
  • Replication materials
  • The paper shows a (local) result that when Islamic parties won elections in Turkey, this resulted in better outcomes for women.
  • Running variable: vote margin (but not exclusively 2 party system as in Lee)
  • Outcome that we'll look at: high school education
require(foreign, quietly = TRUE)
d <- read.dta("regdata0.dta")
summary(d$iwm94)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    -1.0    -0.5    -0.3    -0.3    -0.1     1.0     544

Explore data

  • Plot the raw data.
with(d, plot(iwm94, hischshr1520f, pch = 19, cex = 0.2, xlim = c(-0.5, 0.5)))
left.lm <- lm(hischshr1520f ~ iwm94, d, subset = iwm94 < 0)
right.lm <- lm(hischshr1520f ~ iwm94, d, subset = iwm94 >= 0)
left.x <- seq(-0.5, 0, 0.01)
right.x <- -left.x
lines(left.x, predict(left.lm, newd = data.frame(iwm94 = left.x)), col = "red")
lines(right.x, predict(right.lm, newd = data.frame(iwm94 = right.x)), col = "red")

Estimation

  • So the basic estimation would just take the difference of the intercepts from left.lm and right.lm.
  • And there's an equivalency to just running a single regression as Cyrus showed in class.
  • But I'm just going to use rdd
require(rdd, quietly = TRUE)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Loading required package: car
## Loading required package: survival
## Loading required package: splines
rd.out <- RDestimate(hischshr1520f ~ iwm94, d)
rd.out
## 
## Call:
## RDestimate(formula = hischshr1520f ~ iwm94, data = d)
## 
## Coefficients:
##      LATE    Half-BW  Double-BW  
##    0.0296     0.0250     0.0228

Full Results

summary(rd.out)
## 
## Call:
## RDestimate(formula = hischshr1520f ~ iwm94, data = d)
## 
## Type:
## sharp 
## 
## Estimates:
##            Bandwidth  Observations  Estimate  Std. Error  z value
## LATE       0.24       1020          0.0296    0.0124      2.39   
## Half-BW    0.12        589          0.0250    0.0165      1.52   
## Double-BW  0.48       2050          0.0228    0.0101      2.26   
##            Pr(>|z|)   
## LATE       0.0169    *
## Half-BW    0.1286     
## Double-BW  0.0240    *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## F-statistics:
##            F      Num. DoF  Denom. DoF  p       
## LATE        4.99  3         1016        3.86e-03
## Half-BW     1.70  3          585        3.30e-01
## Double-BW  25.77  3         2046        4.44e-16

Plot it

plot(rd.out, range = c(-0.4, 0.4))
title(xlab = "Islamic Party Vote Margin", ylab = "Female High School Education Share")

Placebo tests

  • Do placebo tests on other covariates and other outcomes.
  • They're "placebo" because there "shouldn't" be an effect on them (except occasionally by chance)
# Age 19+
RDestimate(ageshr19 ~ iwm94, d)[c("est", "se")]
## $est
##      LATE   Half-BW Double-BW 
## -0.003737  0.006946 -0.004117 
## 
## $se
## [1] 0.010314 0.013783 0.008307
# Log Population
RDestimate(lpop1994 ~ iwm94, d)[c("est", "se")]
## $est
##      LATE   Half-BW Double-BW 
##   0.06921  -0.04339   0.03000 
## 
## $se
## [1] 0.2384 0.3276 0.1879
# Household Size
RDestimate(shhs ~ iwm94, d)[c("est", "se")]
## $est
##      LATE   Half-BW Double-BW 
## -0.006963  0.321148 -0.091759 
## 
## $se
## [1] 0.3543 0.5431 0.2557

More Placebos

# Men in 2000
RDestimate(hischshr1520m ~ iwm94, d)[c("est", "se")]
## $est
##      LATE   Half-BW Double-BW 
##  0.009632  0.016188  0.007619 
## 
## $se
## [1] 0.009037 0.011807 0.007435
# Women in 1990 (pre-treatment)
RDestimate(c90hischshr1520f ~ iwm94, d)[c("est", "se")]
## $est
##      LATE   Half-BW Double-BW 
## 0.0079389 0.0007974 0.0130517 
## 
## $se
## [1] 0.012239 0.017631 0.009308
# Men in 1990 (pre-treatment)
RDestimate(c90hischshr1520m ~ iwm94, d)[c("est", "se")]
## $est
##      LATE   Half-BW Double-BW 
##  0.005930  0.002779  0.003861 
## 
## $se
## [1] 0.009770 0.013259 0.007891

Sorting

  • As Cyrus discussed, density tests are also a good way to examine the possibility of sorting.
DCdensity(d$iwm94, verbose = TRUE, plot = FALSE)
## Assuming cutpoint of zero.
## Using calculated bin size:  0.009 
## Using calculated bandwidth:  0.165 
## Log difference in heights is  -0.095  with SE  0.147 
##   this gives a z-stat of  -0.650 
##   and a p value of  0.515
## [1] 0.5154

Density Plot

DCdensity(d$iwm94)
## [1] 0.5154

Fuzzy designs

  • I don't have an example for this, but it's quite easy.
  • Do it the same way as before, but with RDestimate(Y~runvar+treatment)

Overall

  • Some big things for RDD:
    • Lots of plots
    • Think about locality in interpretation
    • Use your covariates for robustness/placebo tests
    • Everything should be robust to different bandwidths, etc
    • If effects start disappearing as bw goes down, that's a bad sign.
    • Your bandwidth is probably to wide.
  • If there's still more time, maybe I'll go through some high points of the rdd code.