Gold, Lederer & Tao (2020) Inference for high-dimensional instrumental variables regression, JoE.

Introduction

  • モデル:
    •  \mathbf{y} = \mathbf{X}\beta + \mathbf{u},  \beta = (\beta_j)_{j = 1}^{p_x}
    •  E[\mathbf{u} \mid \mathbf{Z}] = 0
    • Doubly high-dimensional setting:  \mathbf{X},  \mathbf{Z} 両方とも high-dimensional. 
  •  \beta_j について statistical inference したい.
  • van de Geer et al. (2014 AoS)のような二段階のde-biased LASSOを考える.
  • Belloni et al. (2018 arXiv)も似たような設定を考えているが,彼らの方法はNeyman orthogonalityに基づいた別の方法.

Two-stage estimation

  • For each  i = 1, \ldots, n,

\begin{align*}y_i & = x_i^\top \beta + u_i \\x_{ij} & = z_i^\top \alpha^{j} + v_{ij}\end{align*}

  • Exclusion restriction:  E[u_i \mid z_i ] = 0,  E[ v_i \mid z_i ] = 0.
  • In matrix form,

\begin{align*} \mathbf{y} & = \mathbf{X}\beta + \mathbf{u} \\  \mathbf{X} & = \mathbf{Z}\mathbf{A} + \mathbf{V} \equiv   \mathbf{D} + \mathbf{V} \end{align*}

where  \mathbf{X} \in \mathbb{R}^{n \times p_x},  \mathbf{D} = E[\mathbf{X} \mid \mathbf{Z}] \in \mathbb{R}^{n \times p_x},  \mathbf{Z} \in \mathbb{R}^{n \times p_z}, and  \mathbf{A} \in \mathbb{R}^{p_z \times p_x}.

  •  p_x \le p_z for identification.
  • Let  \hat{\Sigma}_z = \mathbf{Z}^\top \mathbf{Z}/n.

Assumption 2.2:  z_i are i.i.d. sub-Gaussian and satisfy  E(z_i) = 0.

Assumption 2.4:  \mathbf{v}^j and  \mathbf{u} are sub-Gaussian.

  • First-stage estimator:  \hat{\alpha}^j \equiv \hat{\alpha}(x^j, \mathbf{Z}),  \hat{\mathbf{A}} = (\hat{\alpha}^1, \ldots, \hat{\alpha}^{p_x}),  \hat{\mathbf{D}} = \mathbf{Z}\hat{\mathbf{A}}.
  • Let  \hat{\Sigma}_d = \hat{\mathbf{D}}^\top \hat{\mathbf{D}}/n.
  • Second-stage estimator:  \hat{\beta} \equiv \hat{\beta}(\mathbf{y}, \hat{\mathbf{D}}).

One-step update

  •  \mathbf{y} = \mathbf{X}\beta + \mathbf{u} をOLS推定する場合

 \tilde{\beta} = \hat{\beta} + \hat{\Theta} \mathbf{X}^\top (\mathbf{y} - \mathbf{X}\hat{\beta})/n, where  \hat{\Theta} is the inverse of  \hat{\Sigma}_x = \mathbf{X}^\top \mathbf{X}/n.

  •  \hat{\Sigma}_x が非特異行列ならば第二項はOLSの性質によりゼロ.一方, p_x \gt n のときは  \hat{\Sigma}_x は可逆でないため,

 \tilde{\beta} = \beta + \hat{\Theta} \mathbf{X}^\top \mathbf{u}/n + \underbrace{(\hat{\Theta}\hat{\Sigma}_x - I_{p_x} )(\beta - \hat{\beta})/n}_{f/\sqrt{n}}.

 \Longrightarrow \sqrt{n}(\tilde{\beta}_j - \beta_j) = \frac{1}{\sqrt{n}} \sum_{i = 1}^n \hat{\theta}_j^\top x_i u_i + f_j.

  •  f o_P(1) であることを確かめればよい.
  •  \hat{\beta} が LASSO のとき, \tilde{\beta} を "desparsified LASSO" や "debiased LASSO" などと呼ぶ.
  • IVモデルに対応する one-step update:  \tilde{\beta} = \hat{\beta} + \hat{\Theta} \hat{\mathbf{D}}^\top (\mathbf{y} - \mathbf{X}\hat{\beta})/n.  \hat{\Theta} \Theta = \Sigma_d^{-1} の推定量
  • Lemma 3.1:  \sqrt{n}(\tilde{\beta} - \beta) = \Theta \mathbf{D}^\top \mathbf{u}/n + \text{remainders}.
  •  \hat{\Theta} をどうやって作る? p_x \gt n のときは  \hat{\Sigma}_d は特異行列.=> Cai et al. (2011 JASA) の CLIME estimator を使う.

Theorem 3.4: いくつかのhigh-level conditionsの下で, \sqrt{n}(\tilde{\beta}_j - \beta_j)/ \omega_j \rightsquigarrow N(0,1).

Two-stage LASSO

  • First-stage estimator:  \hat{\alpha}^j \equiv  \arg\min_{\alpha} || x^j - \mathbf{Z} \alpha||/(2n) + r_j ||\alpha||_1.
  • Second-stage estimator:  \hat{\beta} \equiv  \arg\min_{b} || \mathbf{y} - \hat{\mathbf{D}} b ||/(2n) + r_\beta || b ||_1.
  • 以降は Theorem 3.4 の条件を満たすための tuning parameter  r_j, \; r_\beta の選び方や compatibility condition (see Ch.6, Buhlmann and van de Geer, 2011) などについて.