03 — Onatski (2010)

Determining the Number of Factors from Empirical Distribution of Eigenvalues

Review of Economics and Statistics 92(4), 1004–1016 · Kei Matsumae · 2026-05-15

What this paper does

Bai-Ng (2002) IC work when factors are strong. When factors are weak (eigenvalues bounded or grow slowly), BN tends to over-select.
Onatski proposes two tests without the strong-factor assumption: edge distribution (ED) and eigenvalue-ratio (ER).
ER is simple: $\lambda_k / \lambda_{k+1}$ peaks at $k = K$.
Uses random matrix theory (Marchenko-Pastur) for the null distribution.
Direct ancestor of CRW (2023)'s $\hat K$ selector.
For AOF: the right $K$-selector for weak-factor regimes — likely Japan small-stock universe, and rolling-window estimation anywhere.

1. Why this paper exists

Bai-Ng (2002) assumes strong factors: top-$K$ eigenvalues of the sample covariance grow linearly with $N$, the rest stay bounded. In practice (especially in finance) factors can be "weak" — relevant but bounded, shrinking the gap between signal and noise eigenvalues.

Under weak factors:

BN penalties pick up noise eigenvalues as factors → over-selection.
Bai (2003) asymptotic distributions can fail.
Need a procedure based on the shape of the eigenvalue distribution, not the level.

Random matrix theory gives the tool. Marchenko-Pastur (1967) describes the limiting eigenvalue distribution of a sample covariance from i.i.d. noise. Onatski uses it as the null.

2. Setup

Same approximate factor model: $X_{it} = \lambda_i' F_t + e_{it}$. Define sample covariance $S = X'X/T$ with eigenvalues $\lambda_1 \geq \lambda_2 \geq \dots \geq \lambda_N$.

3. Key insight — Marchenko-Pastur null

Under "no factors, i.i.d. idiosyncratic with variance $\sigma^2$", as $N, T \to \infty$ with $N/T \to c \in (0, \infty)$, the empirical eigenvalue distribution converges to the Marchenko-Pastur (MP) law on $[\sigma^2(1 - \sqrt c)^2,\, \sigma^2(1 + \sqrt c)^2]$.

So without factors, eigenvalues bunch in a known continuous interval. With factors, the top-$K$ eigenvalues separate from the bulk and float above $\sigma^2(1 + \sqrt c)^2$.

Imagine plotting the sorted eigenvalues. Under no factors, they form a smooth distribution that fills the MP interval. Each true factor pushes one eigenvalue out of the bulk and above the upper edge. The number of factors is the number of outliers.

4. The two tests

4.1 ED (Edge Distribution) test

Tests $H_0: K \leq k_0$ vs $H_1: K > k_0$. Statistic: how much the $(k_0 + 1)$-th sample eigenvalue exceeds the MP upper edge. Calibrated by the Tracy-Widom distribution from random matrix theory.

4.2 ER (Eigenvalue Ratio) test — easier and what CRW uses

$$ ER(k) = \frac{\lambda_k}{\lambda_{k+1}}, \qquad \hat K = \arg\max_{1 \leq k \leq k_{\max}} ER(k). $$

If true $K = 3$: $\lambda_3$ is a "signal" eigenvalue, $\lambda_4$ is "noise" — their ratio is large. For $k \neq 3$, numerator and denominator are the same kind, so the ratio is moderate.

Onatski shows this estimator is consistent under weaker conditions than BN — including the weak-factor regime.

5. Empirical findings

Monte Carlo: ER outperforms BN when factor strengths are heterogeneous (some strong, some borderline). On Stock-Watson macro data, ER selects fewer factors than BN. Applied to US stock returns (FF25 and individual stocks): 1–3 factors depending on universe — matches CRW's later 1–2 finding on individuals.

6. Connection to other papers in this series

Bai-Ng (2002) — use when factors are strong. ER is the weak-factor counterpart.
Bai (2003) — relies on strong-factor assumption; under weak factors its CIs are wrong.
CRW (2023) — direct adaptation of ER on the slope matrix $\hat\Gamma$ (size $J \times T$) instead of the raw covariance. Proves consistency for fixed $T$. This is the critical jump enabling rolling sub-sample analysis.
Kelly-Pruitt-Su (2019) IPCA — uses BN-style criteria, possibly over-selecting (they advocate 5; CRW's ER finds 1–2). Some of the gap is which selector you trust.

7. What this gives the AOF model

ER is the right selector when any of:

Factors are weak (small/micro caps; sector sub-panels)
$T$ is small (rolling windows)
You want to be conservative about over-selection (extra factors → more loadings to estimate → more noise)

Use case	Selector
US full panel, large $N$ and $T$	BN $\text{IC}_{p2}$ (default; ER as robustness check)
US large-cap subset	Either; agreement is good signal
US micro-cap subset	ER (weak factors likely)
JP full panel	ER (factors less strong than US in our priors)
Any rolling window ($T < 120$)	ER (CRW fixed-$T$ result applies)
Sector deep-dives (energy, biotech)	ER

Report both BN and ER selections side-by-side as a diagnostic — agreement strengthens the claim; disagreement is a red flag worth investigating.

8. Reading next

CRW (2023) — direct sequel; ER selector on slope matrix, fixed-$T$ regime.
Connor & Linton (2007) — semiparametric factor models in finance.

← 学習ベース index · 8 本中 3 本目

03 — Onatski (2010)

固有値の経験分布によるファクター数の決定

Review of Economics and Statistics 92(4), 1004–1016 · 松前景一郎 · 2026-05-15

論文の要点

Bai-Ng (2002) の IC はファクターが強い場合に機能。弱い（固有値が有界 or ゆっくり成長）場合は過大選択する。
Onatski は強因子仮定不要の 2 検定を提案：edge distribution (ED) と eigenvalue-ratio (ER)。
ER はシンプル：$\lambda_k / \lambda_{k+1}$ が $k = K$ で最大。
帰無分布にランダム行列理論（Marchenko-Pastur）を利用。
CRW (2023) の $\hat K$ セレクターの直接の祖先。
AOF 用途：弱因子レジームに対する正しい $K$ セレクター — 日本の小型株ユニバース、どこの市場でもローリング窓推定で適切。

1. なぜこの論文が必要か

Bai-Ng (2002) は強因子を仮定：標本共分散の上位 $K$ 固有値が $N$ に比例して成長、残りは有界。実際（特にファイナンス）にはファクターは「弱い」可能性がある — 関連はあるが固有値が有界で、シグナル・ノイズ固有値のギャップが縮む。

弱因子の場合：

BN ペナルティはノイズ固有値をファクターとして拾う → 過大選択。
Bai (2003) の漸近分布が破綻し得る。
固有値の水準ではなく形に基づく手続きが必要。

道具立てはランダム行列理論。Marchenko-Pastur (1967) は i.i.d. ノイズの標本共分散の固有値極限分布を記述。Onatski はそれを帰無分布として用いる。

2. 設定

同じ近似ファクターモデル：$X_{it} = \lambda_i' F_t + e_{it}$。標本共分散 $S = X'X/T$、固有値 $\lambda_1 \geq \lambda_2 \geq \dots \geq \lambda_N$。

3. 鍵となる洞察 — Marchenko-Pastur 帰無

「ファクターなし、$e_{it}$ が i.i.d.、分散 $\sigma^2$」のもと、$N, T \to \infty$ かつ $N/T \to c \in (0, \infty)$ で経験固有値分布は MP 法則 $[\sigma^2(1 - \sqrt c)^2,\, \sigma^2(1 + \sqrt c)^2]$ に収束。

つまりファクターなしでは固有値は既知の連続区間に集まる。ファクターありでは上位 $K$ 固有値がこの bulk から離れ、$\sigma^2(1 + \sqrt c)^2$ より上に浮かぶ。

固有値をソートしてプロットすることを想像せよ。ファクターなしでは MP 区間を埋めるスムーズな分布になる。真のファクター 1 つにつき、1 つの固有値が bulk から外れて上縁を超えて押し上げられる。ファクター数＝外れ値の数。

4. 2 つの検定

4.1 ED（Edge Distribution）検定

$H_0: K \leq k_0$ vs $H_1: K > k_0$ を検定。統計量は $(k_0 + 1)$ 番目の標本固有値が MP 上縁をどれだけ超えるか。ランダム行列理論の Tracy-Widom 分布で較正。

4.2 ER（Eigenvalue Ratio）検定 — より容易で CRW が利用

$$ ER(k) = \frac{\lambda_k}{\lambda_{k+1}}, \qquad \hat K = \arg\max_{1 \leq k \leq k_{\max}} ER(k). $$

真 $K = 3$ なら：$\lambda_3$ は「シグナル」固有値、$\lambda_4$ は「ノイズ」固有値 — 比は大きい。$k \neq 3$ では分子分母が同種なので比は中程度。

Onatski はこの推定量が BN より弱い条件下で一致性を持つことを示す — 弱因子レジームを含む。

5. 実証結果

モンテカルロ：ファクター強度が異質（強・弱混在）な場合、ER は BN を上回る。Stock-Watson マクロデータでは ER の方が BN より少なくファクターを選択。米国株式リターン（FF25 と個別株）：ユニバースに応じ 1〜3 ファクター — 個別株での CRW の 1〜2 ファクターという後続結果と整合。

6. 本シリーズ内での位置づけ

Bai-Ng (2002) — 強因子なら使う。ER は弱因子側の対応物。
Bai (2003) — 強因子仮定に依存。弱因子下では CI が誤る。
CRW (2023) — 生の共分散ではなく係数行列 $\hat\Gamma$（$J \times T$）上で ER を直接適用。固定 $T$ での一致性を証明 — ローリングサブサンプル分析を可能にする決定的な飛躍。
Kelly-Pruitt-Su (2019) IPCA — BN 型基準を使い過大選択の可能性（彼らは 5 を主張、CRW の ER は 1〜2）。差の一部はどのセレクターを信頼するかに由来。

7. AOF モデルへの貢献

以下のいずれかの場合、ER が正しいセレクター：

ファクターが弱い（小型・極小型株ユニバース；業種部分パネル）
$T$ が小さい（ローリング窓）
過大選択に保守的でありたい（余分なファクター → ローディング推定が増える → ノイズも増える）

ユースケース	セレクター
米国フルパネル、大 $N$・大 $T$	BN $\text{IC}_{p2}$（デフォルト；ER はロバストネスチェック）
米国大型株サブセット	どちらでも；一致は良い兆候
米国極小型株サブセット	ER（弱因子の可能性が高い）
日本フルパネル	ER（米国より因子が弱いという事前見立て）
ローリング窓（$T < 120$）	ER（CRW の固定 $T$ 結果が適用される）
業種深掘り（エネルギー、バイオ等）	ER

BN と ER の選択結果を並べて報告することは診断として有用 — 一致は主張を強化、不一致は調査すべき赤旗。

8. 次に読むべきもの

CRW (2023) — 直接の続編。係数行列上の ER セレクター、固定 $T$。
Connor & Linton (2007) — ファイナンスにおけるセミパラメトリックファクターモデル。