Leung (2020) Causal inference under approximate neighborhood interference, arXiv.

Setup

Finite population of $n$ units connected through a network $\mathbf{A}$ .
Treatment vector: $\mathbf{d} = (d_i)_{i=1}^n \in \{0, 1\}^n$ . Treatment の観測値： $\mathbf{D} = (D_i)_{i=1}^n$ .
Potential outcome: $Y_i(\mathbf{d})$ .
クロスセクションデータで，観測可能な $\mathbf{D}$ が１パターンしかない場合， $Y_i(\cdot)$ に何らかの制約を置かない限り，"exposure effect" $n^{-1} \sum_{i=1}^n (Y_i(\mathbf{d}) - Y_i(\mathbf{d}'))$ は推定できない．
これまでの多くの先行研究では， $\mathbf{D}$ がある低次元の十分統計量を通じてoutcomeに影響するものと仮定している = exposure mapping

Exposure mapping

$T_i \equiv T(i, \mathbf{D}, \mathbf{A})$ such that $Y_i(\mathbf{D}) = \tilde Y_i(T_i)$

典型的な例としては， $T_i = (D_i, \sum_{j} A_{i,j}D_j)$
この場合のexposure effect: $n^{-1} \sum_{i=1}^n (\tilde Y_i(t) - \tilde Y_i(t'))$ for $t, t' \in \mathcal{T}$ .
この研究では，exposure mappingの特定化が誤っているケースについて考察する*1．すなわち， $Y_i(\mathbf{D}) \neq \tilde Y_i(T_i)$ であるときに，どのような条件の下で意味のあるcausal parameterを推定できるか．

以下， $n$ は有限， $\mathbf{D}$ 以外はすべて非確率変数として扱う = Design-based uncertainty (cf., Abadie et al. (2020) ECTA)．

定義：Unit-level exposure effect

$\tau_i(t,t') = \mu_i(t) - \mu_i(t')$ , where $\displaystyle \mu_i(t) = \sum_{\mathbf{d} \in \{0,1\}^n} Y_i(\mathbf{d})\Pr(\mathbf{D} = \mathbf{d} \mid T_i = t)$

The estimand of interest：misspecification-robust exposure effect

$\tau(t,t') = \mu(t) - \mu(t')$ , where $\displaystyle \mu(t) = \frac{1}{n}\sum_{i=1}^n \mu_i(t)$

$\displaystyle \mu_i(t)$ の右辺は $\mathbb{E}[ Y_i \mid T_i = t$ ] に等しいことに注意すれば*2， $\tau(t,t')$ の推定方法として通常のIPWが考えられる：

$\hat \tau(t,t') = \hat \mu(t) - \hat \mu(t')$ , where $\displaystyle \hat \mu(t) = \frac{1}{n}\sum_{i=1}^n Y_i \frac{\mathbf{1}\{T_i = t\}}{\Pr(T_i = t)}$

Approximate Neighborhood Interference

定義：K-neighborhood

$\mathcal{N}_{\mathbf{A}}(i, K) = \{j \mid \text{ path length between } i \text{ and } j \le K\}$

各 $i$ について， $\mathbf{D}$ のs-neighborhoodを $D_{\mathcal{N}_{\mathbf{A}}(i, s)}$ で表す．
また， $\mathbf{D}'$ を $\mathbf{D}$ と独立に生成し，各 $i$ について， $\mathbf{D}'$ のs-neighborhood以外を $D'_{-\mathcal{N}_{\mathbf{A}}(i, s)}$ で表す．さらに， $\mathbf{D}^{(s)} = (D_{\mathcal{N}_{\mathbf{A}}(i, s)}, D'_{-\mathcal{N}_{\mathbf{A}}(i, s)})$ とおく．

Assumption (Approximate Neighborhood Interference: ANI)

$\displaystyle \lim\sup_{n \to \infty} \max_{1 \le i \le n} \mathbb{E}|Y_i(\mathbf{D}) - Y_i(\mathbf{D}^{(s)}) |= 0$ as $s \to \infty$

ANIの意味：ネットワーク上で十分離れた相手からの影響はほぼ無視できる*3．影響が完全にゼロでなくてもOK．
Linear-in-means modelなどでも自然に成立する（Prop. 1）．
$Z_i = (\mathbf{1}\{T_i = t\}/\Pr(T_i = t) - \mathbf{1}\{T_i = t'\}/\Pr(T_i = t'))Y_i$ とおく so that $\hat \tau (t, t') = n^{-1}\sum_{i=1}^n Z_i$ .

Theorem 1. ANI + regularity conditions $\Longrightarrow \{Z_i\} \text{ is } \psi$ -weakly dependent: Kojevnikov et al. (202x)*4

Kojevnikov et al. (202x): network dataが $\psi$ -weakly dependent $\Longrightarrow$ LLN + CLT
結果として，追加的な仮定の下， $|\hat \tau (t, t') - \tau (t, t')| \overset{p}{\to} 0$ (Theorem 2) と $\sigma_n^{-1}\sqrt{n}(\hat \tau (t, t') - \tau (t, t')) \overset{d}{\to} N(0,1)$ (Theorem 3)を得る.

*1:Misspecified exposure mappingに関する先行研究としてはChin (2019)やSavje (2019)などがある．どちらもarXiv プレプリント．

*2: $\because Y_i = \sum_{\mathbf{d}} Y_i(\mathbf{d}) \mathbf{1}\{\mathbf{D} = \mathbf{d}\}$

*3: $\mathbf{D}^{(s)}$ において $s$ 以上離れた相手のtreatment $D'_{-\mathcal{N}_{\mathbf{A}}(i, s)}$ はランダムに変わる．

*4:Kojevnikov, Marmer & Song (202x) Limit theorems for network dependent random variables, JoE