Rambachan and Roth (2020) Design-based uncertainty for quasi-experiments, arXiv.

Section 2: A finite population model for quasi-experiments
  • サイズ  N の有限母集団を考える*1
  • Binary treatment:  D_i, potential outcomes:  Y_i(1), Y_i(0). Potential outcome は固定値と仮定する.
  • Observed outcome:  Y_i = D_i Y_i(1) + (1 - D_i)Y_i(0).
  •  i が独立にトリートメントを受ける確率を  p_i とする. p_i は未知で,potential outcome と相関しても良い:  p_i = g(Y_i(1), Y_i(0), W_i), where  W_i は pre-treatment covariates.
  • Treatment group と control group のサイズをそれぞれ  N_1 N_0 とする.
  •  \mathbf{D} = (D_1, \ldots, D_N) とすれば,

 \Pr\left( \mathbf{D} = \mathbf{d} \mid \sum_{i = 1}^N D_i = N_1 \right) \propto \prod_{i = 1}^N p_i^{d_i} (1 - p_i)^{1 - d_i}

s.t.  \sum_{i = 1}^N d_i = N_1 (zero otherwise)

  • また, \pi_i = \Pr(D_i = 1 \mid \sum_{i = 1}^N D_i = N_1) と定義する.
Section 3: Simple difference-in-means
  • Simple difference-in-means (SDIM) estimator:

 \displaystyle \hat t \equiv \frac{1}{N_1} \sum_{i = 1}^N D_i Y_i - \frac{1}{N_0} \sum_{i = 1}^N (1 - D_i) Y_i

  • このとき, t_i = Y_i(1) - Y_i(0) と定義すれば,

\begin{align} \mathbb{E} \left[ \hat t \mid \sum_{i = 1}^N D_i = N_1 \right] & = \frac{1}{N_1} \sum_{i = 1}^N \pi_i (Y_i(0) + t_i) - \frac{1}{N_0} \sum_{i = 1}^N (1 - \pi_i) Y_i(0) \\& = \frac{1}{N_1}  \sum_{i = 1}^N \pi_i  t_i + \frac{N}{N_0 N_1} \sum_{i = 1}^N \pi_i Y_i(0) - \frac{1}{N_0} \sum_{i = 1}^N Y_i(0) \end{align}

ここで,

 \displaystyle \frac{1}{N_1}  \sum_{i = 1}^N \pi_i  t_i = \mathbb{E} \left[ \frac{1}{N_1}  \sum_{i = 1}^N D_i  t_i \mid \sum_{i = 1}^N D_i = N_1 \right] \equiv t_{ATT}

= expected SATT (sample average treatment effect on the treated)

\begin{align}  \frac{N}{N_0 N_1} \sum_{i = 1}^N \pi_i Y_i(0) - \frac{1}{N_0} \sum_{i = 1}^N Y_i(0) & = \frac{N}{N_0 N_1} \sum_{i = 1}^N \pi_i Y_i(0) - \frac{N}{N_0 N_1} \sum_{i = 1}^N \frac{N_1}{N} Y_i(0) \\ & =   \frac{N N}{N_0 N_1} \left( \frac{1}{N}\sum_{i = 1}^N \left[ \pi_i Y_i(0) - \frac{N_1}{N} Y_i(0) \right] \right) \\ & = \frac{N N}{N_0 N_1} [\text{$\pi_i$ と $Y_i(0)$ の共分散}] \end{align}

  • したがって, \mathbb{E} [ \hat t \mid \sum_{i = 1}^N D_i = N_1 ] = t_{ATT} + Bias で, Bias は全ての個人が等しい確率でトリートメントを受けるときには( \pi_i = N_1/N for all  i)ゼロになる.
  • 以降は variance bound や  \hat t- \mathbb{E} [ \hat t \mid \sum_{i = 1}^N D_i = N_1 ] の漸近正規性について.
Section 4: Difference-in-differences

省略

Section 5: Instrumental variables
  •  Z_i \in \{0,1\} を操作変数として,potential treatment を  D_i (z) \in \{0,1\} と書く.観測できる outcome は  Y_i = Y_i(D_i(Z_i)).
  •  Z = 1 を満たすグループと  Z = 0 を満たすグループのサイズをそれぞれ  N^Z_1 N^Z_0 とする.
  •  \mathbf{Z} = (Z_1, \ldots, Z_N) とすれば,

 \Pr\left( \mathbf{Z} = \mathbf{z} \mid \sum_{i = 1}^N Z_i = N_1^Z \right) \propto \prod_{i = 1}^N p_i^{z_i} (1 - p_i)^{1 - z_i}  s.t.  \sum_{i = 1}^N z_i = N_1^Z

  • 単調性を仮定: D_i(1) \ge D_i(0) for all  i.
  • 2SLS  \hat \beta_{2SLS} \equiv \hat t_{RF}/ \hat t_{FS} with

\begin{align}\hat t_{RF} & \equiv \frac{1}{N^Z_1} \sum_{i=1}^N Z_i Y_i - \frac{1}{N^Z_0} \sum_{i=1}^N (1 - Z_i) Y_i \\\hat t_{FS} & \equiv \frac{1}{N^Z_1} \sum_{i=1}^N Z_i D_i - \frac{1}{N^Z_0} \sum_{i=1}^N (1 - Z_i) D_i \\\end{align}

  • ここで, \hat t_{RF} Z_i の SDIM なので,上と同じ計算から

\begin{align} \mathbb{E} \left[ \hat t_{RF} \mid \sum_{i = 1}^N Z_i = N_1^Z \right] & = \frac{1}{N_1^Z} \sum_{i = 1}^N \pi_i^Z (Y_i(D_i(1)) - Y_i(D_i(0))) \\ & \quad + \frac{N N}{N_0^Z N_1^Z} \underbrace{[\text{$\pi_i^Z$ と $Y_i(D_i(0))$ の共分散}]}_{= \: N^{-1} \sum_i (\pi_i^Z - N_1^Z/N)Y_i(D_i(0))}  \end{align}

  • さらに,Complier の集合を  \mathcal{C} = \{i \mid D_i(1) \gt D_i(0) \} と定義すれば,単調性の仮定から,

 \displaystyle \frac{1}{N_1^Z} \sum_{i = 1}^N \pi_i^Z (Y_i(D_i(1)) - Y_i(D_i(0))) = \frac{1}{N_1^Z} \sum_{i \in \mathcal{C}} \pi_i^Z (Y_i(1) - Y_i(0))

  • 同様に,

\begin{align} \mathbb{E} \left[ \hat t_{FS} \mid \sum_{i = 1}^N Z_i = N_1^Z \right] & = \frac{1}{N_1^Z} \sum_{i \in \mathcal{C}}^N \pi_i^Z + \frac{N N}{N_0^Z N_1^Z} \underbrace{[\text{$\pi_i^Z$ と $D_i(0)$ の共分散}] }_{= \: N^{-1} \sum_i (\pi_i^Z - N_1^Z/N) D_i(0)}\end{align}

  • したがって, \pi_i^Z = N_1^Z /N for all  i ならば以下が成り立つ:

 \displaystyle \beta_{2SLS} \equiv \frac{\mathbb{E} \left[ \hat t_{RF} \mid \sum_{i = 1}^N Z_i = N_1^Z \right] }{\mathbb{E} \left[ \hat t_{FS} \mid \sum_{i = 1}^N Z_i = N_1^Z \right]} = \pi_i^Z-weighted” LATE. 

  • 以降は  \sqrt{N}(\hat \beta_{2SLS} - \beta_{2SLS}) の漸近正規性について.

 

 

*1:州や県レベルのデータなどを想定.