02 — Bai (2003)

Inferential Theory for Factor Models of Large Dimensions

Econometrica 71(1), 135–171 · Kei Matsumae · 2026-05-15

What this paper does

Companion to Bai-Ng (2002): given $K$, how confident should we be in the PCA-estimated factors $\hat F_t$ and loadings $\hat\lambda_i$?
Derives rates of convergence and asymptotic normality for both, when $N, T \to \infty$ jointly.
Factors identified only up to a $K\times K$ rotation $H$ — what's consistent is $\hat F_t$ for $H'F_t$ and $\hat\lambda_i$ for $H^{-1}\lambda_i$. Common component $\hat\lambda_i'\hat F_t$ converges to $\lambda_i'F_t$ with no rotation indeterminacy.
Rate is $\min(\sqrt N, \sqrt T)$.
For AOF: analytical SEs for factor returns in the US large-$T$ regime; bootstrap (CRW) for small-$T$ JP rolling windows.

1. Why this paper exists

After Bai-Ng (2002) you know how many factors to pull out. But you don't yet know:

How accurate is $\hat F_t$? Can I put error bars on factor returns?
How accurate is $\hat\lambda_i$? Can I test "is $\lambda_i$ different from zero"?
How does the rotation indeterminacy affect inference?

Bai (2003) answers all three. It's the inference layer for PCA factor estimation in the same approximate factor model.

1.5. Background — what's an "approximate factor model"?

Bai's paper uses the phrase "approximate factor model" as if everyone knows what it means. Worth pinning down before the formal setup. Asset-pricing models come in three flavours of increasing realism:

(1) One-factor / CAPM-style

$$ r_{it} = \alpha_i + \beta_i\, R_{mt} + \varepsilon_{it} $$

One observed factor (market return), constant loading $\beta_i$ per stock, constant intercept $\alpha_i$. Classical assumption: $\varepsilon_{it}$ is i.i.d. across stocks and over time — the noise covariance $\text{Cov}(\varepsilon)$ is a diagonal matrix, every stock's idiosyncratic shock independent of every other's.

(2) Multi-factor (Fama-French and descendants)

$$ r_{it} = \alpha_i + \beta_i'\, F_t + \varepsilon_{it} $$

$K$ observed factors (market, size, value, momentum, profitability, …). Still assumes $\varepsilon_{it}$ are cross-sectionally uncorrelated — factors are supposed to capture all the comovement.

Clean on the chalkboard. Doesn't survive real data: even after controlling for FF5, residuals across stocks remain correlated — sector-specific news, supply-chain shocks, fund-flow effects all leak through.

(3) Approximate factor model (Chamberlain & Rothschild 1983) — what Bai uses

Same equation form, but the noise assumption is relaxed:

$$ X_{it} = \lambda_i'\, F_t + e_{it} $$

$e_{it}$ is allowed to have weak cross-sectional and serial correlation — sector spillovers OK, mild persistence OK.
Formal condition: the largest eigenvalue of the $N \times N$ noise covariance matrix $\text{Cov}(e_t)$ stays bounded as $N$ grows.
Meanwhile, the factor part $\lambda_i' F_t$ has variance that grows with $N$ — the top-$K$ eigenvalues of the factor-driven covariance scale linearly in $N$.

The word "approximate" refers to this relaxation. Ross's "exact" APT (1976) demanded uncorrelated $e_{it}$; Chamberlain-Rothschild's "approximate" version allows weak correlation but draws the line at the eigenvalue-scaling gap above.

Why finance needs the relaxation. In equity panels, residuals are always weakly correlated — that's just reality. Forcing exact-factor assumptions makes the model unidentifiable from data because the math says "if residuals are correlated at all, the structure could explain it instead of factors." Approximate-factor relaxes this so identification works.

Vocabulary that follows

This eigenvalue-scaling gap is the identification source when $K$ is unknown and $F_t$ is latent. From here, the standard terminology:

Term	Meaning
Strong factors	Top-$K$ eigenvalues scale linearly in $N$. Signal grows with universe; PCA picks them out cleanly.
Weak factors	Top-$K$ eigenvalues bounded or grow slowly. S/N stays bounded; standard PCA can over-select. (Onatski 2010 designs an estimator for this regime.)
Common component	$\lambda_i' F_t$ — the part factors explain.
Idiosyncratic component	$e_{it}$ — the rest. Weak correlation allowed, independence not assumed.
Large-$N$, large-$T$ asymptotics	Both panel dimensions grow. Large $N$ identifies factors via cross-section (top eigenvalues separate from the bulk). Large $T$ identifies loadings via time-series (each $\lambda_i$ averaged over $T$ months).

Why this matters for AOF

CAPM and Fama-French sit in case (2): factors are pre-specified and named. Approximate factor models sit in case (3) and do not pre-specify what the factors are — they're whatever the top-$K$ eigenvalues of return covariance pick out. The model is agnostic about factor meaning but cleanly identifies how many there are and how each stock loads.

When AOF runs PCA-style estimators (regressed-PCA, IPCA), we are in case (3). The papers in this learning base all work under approximate-factor assumptions and use the eigenvalue gap as their identification source. The rest of Bai's paper builds the inference layer on top of this setup.

1.6. Unpacking the basics — what those terms actually mean

§1.5 still relies on terms (covariance matrix, eigenvalue, AIC/BIC, "consistent") that need their own ground-up explanation. Here is each, with finance-concrete examples.

(a) "Noise covariance $\text{Cov}(\varepsilon)$ is a diagonal matrix" — what does that actually say?

A covariance matrix for $N$ stocks' noise terms is an $N \times N$ grid:

The diagonal entry $(i, i)$ = stock $i$'s own variance — the size of its idiosyncratic noise.
The off-diagonal entry $(i, j)$ = covariance between stock $i$'s noise and stock $j$'s noise — how much they move together after factors are stripped out.

A diagonal matrix has all off-diagonals = 0 — i.e., no two stocks' noises are correlated. Strong claim.

Finance intuition. Are Toyota's and Sony's company-specific shocks (earnings surprises, scandals, new-product cycles) truly independent? In real data, no — JPY-USD moves, BoJ policy, global business cycle all leak through as "factor-residual" effects that hit both at once. So in practice the diagonal assumption breaks. Chamberlain-Rothschild relaxed it: no factor-strength correlation allowed, but weak correlation between residuals is fine.

(b) "Largest eigenvalue bounded vs. linearly scaling in $N$" — what's an eigenvalue and why does the scaling matter?

An eigenvalue of a covariance matrix measures the magnitude of one principal axis of variation. The largest eigenvalue = "how big is the most dominant direction in which the data moves".

Eigenvalues of $\text{Cov}(\lambda_i' F_t)$ → the magnitudes of the $K$ axes that factors drive.
Eigenvalues of $\text{Cov}(e_t)$ → the magnitudes of all the residual directions.

Why one grows and the other doesn't.

Add a stock. The new stock loads on the same factors as everyone else (market, size, value, …). One more vector pointing in the same direction → variance in the factor direction grows by ~1 unit. After $N$ stocks, factor eigenvalue ≈ $N$. Linear in $N$.
Add a stock. Its noise is roughly independent of everyone else's. Noise doesn't pile up in any single direction. Each noise eigenvalue stays at the size of one stock's noise. Bounded in $N$.

Consequence. The gap between factor and noise eigenvalues grows with $N$. With 10 stocks you can barely tell signal from noise. With 5,000 stocks, the top factor eigenvalue is hundreds of times larger than noise eigenvalues. That gap is what makes PCA work — and exactly what the eigenvalue-ratio selector $\hat K$ measures.

(c) Vocabulary fleshed out with examples

Strong factor. Example: market beta. Every stock loads positively on the market (some more, some less). As $N$ grows, more loadings stack in the same direction → eigenvalue scales linearly.

Weak factor. Example: a niche sector theme like the 2024 AI-CapEx narrative. Meaningful for ~50 stocks, ~zero for the other ~4,950. Even with large $N$, eigenvalue stays small. Standard PCA can mistake it for noise (or noise for it).

Common component $\lambda_i' F_t$. Example: of Toyota's 3% return this month, suppose market contributes +0.5%, size factor +0.3%, value factor −0.1%. Total of 0.7% is the common component — explained by factors.

Idiosyncratic component $e_{it}$. The remaining $3.0\% - 0.7\% = 2.3\%$ — Toyota-specific news (CEO change, EV announcement, recall). Not assumed independent of Honda or Nissan — same-industry residual correlation is allowed.

Large-$N$, large-$T$ asymptotics. Statistical jargon for "both panel dimensions are big enough that asymptotic theory applies". CRSP US monthly: $N \approx 5{,}000$, $T \approx 700$ — both large. J-Quants Japan: $N \approx 4{,}000$, $T \approx 200$ — both large. By contrast, "S&P 500 monthly over 5 years" has $N = 500$, $T = 60$ — both small, asymptotic theorems don't kick in.

(d) Why Bai-Ng was needed — what scree plot and AIC/BIC don't do

Scree plot. Literally a chart of eigenvalues in descending order. You eyeball where they "elbow" (the kink between fast and slow decay) and pick $\hat K$ at the elbow.

eigenvalue
 ↑
 |  *
 |   *
 |    *  ← elbow at K=3
 |     ·  ·  ·  ·  ·  ·
 |
 +─────────→ rank

Problem: subjective. Two analysts looking at the same scree plot can pick $K=3$ vs. $K=5$ depending on visual judgement. No reproducibility, no formal test.

AIC / BIC. Standard model-selection criteria from time-series econometrics: minimise error + penalty × parameter-count.

Problem: AIC/BIC were derived assuming "$N$ fixed, $T \to \infty$" or fully parametric models. In factor models, both $N$ and $T$ grow, and residuals are allowed to be weakly correlated. Standard AIC/BIC penalty rates don't match the asymptotic behaviour of $V(k)$ in this regime. Result: standard AIC/BIC either over-selects ($k = k_{\max}$ always) or under-selects ($k = 0$ always) — not consistent.

"Consistent" criterion. Statistical term: as $N, T \to \infty$, $\Pr(\hat K = K) \to 1$. Bai-Ng (2002) give the first criteria provably satisfying this in the approximate-factor regime. Their innovation: introduce a new rate condition — penalty must shrink slower than $\min(N,T)^{-1}$ — and design the $\text{PC}_p$ / $\text{IC}_p$ families to satisfy it.

2. The model

$$ X_{it} = \lambda_i' F_t + e_{it}, \qquad i=1,\dots,N,\; t=1,\dots,T. $$

Assumptions: $K$ known (or consistently estimated from BN); factor strengths "strong" (eigenvalues of $\Lambda'\Lambda/N$ bounded away from zero); $e_{it}$ allows weak cross-sectional and serial correlation.

3. The rotation matrix $H$

Without restrictions, factors are not identified — for any invertible $H$, $X = (\Lambda H)(H^{-1} F) + e$ is observationally equivalent. PCA picks a particular normalisation:

$$ H = \frac{1}{T} V_{NT}^{-1} \, \hat F' F \, \frac{\Lambda'\Lambda}{N}, $$

where $V_{NT}$ is the diagonal matrix of the top-$K$ eigenvalues of $XX'/(NT)$. Data-dependent but well-defined asymptotically.

When you report $\hat F$ you're reporting a rotation $H'F$ of the true factor. Coefficients in regressions on $\hat F$ are rotated accordingly. The fitted value $\hat\lambda_i'\hat F_t = (H^{-1}\lambda_i)'(H'F_t) = \lambda_i'F_t$ is rotation-invariant — so $R^2$-style measures are clean.

4. Main results

4.1 Factor estimate

$$ \sqrt{N} \,\bigl( \hat F_t - H' F_t \bigr) \;\xrightarrow{d}\; \mathcal{N}(0, \, V_t), $$

where $V_t$ is computable from cross-sectional moments of $\lambda_i$ and $e_{it}$. Rate $\sqrt N$: more stocks → sharper factor.

4.2 Loading estimate

$$ \sqrt{T} \,\bigl( \hat \lambda_i - H^{-1} \lambda_i \bigr) \;\xrightarrow{d}\; \mathcal{N}(0, \, W_i), $$

$W_i$ from temporal moments. Rate $\sqrt T$: longer samples → sharper loading.

4.3 Common component (rotation-invariant)

$$ \min(\sqrt N, \sqrt T) \,\bigl( \hat \lambda_i' \hat F_t - \lambda_i' F_t \bigr) \;\xrightarrow{d}\; \mathcal{N}(0, \, U_{it}). $$

5. Practical consequences

Question	Answer
Can I report a 95% CI on $\hat F_t$?	Yes, but on $H'F_t$, not $F_t$. Usually OK if you only need to explain returns.
Can I test $\lambda_i = 0$?	Yes, against $H^{-1}\lambda_i = 0$, which is equivalent under invertible $H$.
Can I report a CI on fitted return $\hat\lambda_i'\hat F_t$?	Yes — clean, rotation-invariant.
What if $T$ is small (rolling window)?	Bai's asymptotics fail. Use bootstrap (CRW 2023).

6. Connection to other papers in this series

Bai-Ng (2002) chooses $K$; Bai (2003) does inference given $K$. Companion papers.
Kelly-Pruitt-Su (2019) IPCA extends to characteristics-instrumented setup with similar large-$N$-large-$T$ asymptotics plus regressors.
CRW (2023) notes Bai's theory needs $T\to\infty$ and fails for short panels. Develops bootstrap that works for fixed $T$ — the AOF small-window use case.

7. What this gives the AOF model

Use Bai (2003) analytic SEs where panel is long ($T \geq 120$) and universe stable:

US full-sample factor returns 1968→today
JP full-sample factor returns 1990→today
Monthly publishable factor-return series for the AOF research library — point + analytic 95% CI

Use CRW bootstrap where:

Rolling 5-year windows (factor-structure drift studies)
Sub-sector / sub-universe panels (cleantech, semiconductors) — $N$ small
Stress / rapid-regime tests where $H$ stability is suspect

Both coexist in the production pipeline — pick by sample size at run time.

8. Reading next

Onatski (2010) — when factor strengths are weak (Bai's strong-factor assumption fails).
Kelly-Pruitt-Su (2019) — applies the same asymptotic machinery to IPCA.

← 学習ベース index · 8 本中 2 本目

02 — Bai (2003)

大次元ファクターモデルの推測理論

Econometrica 71(1), 135–171 · 松前景一郎 · 2026-05-15

論文の要点

Bai-Ng (2002) の姉妹論文：$K$ が決まったとして、PCA 推定された $\hat F_t$、$\hat\lambda_i$ をどれだけ信頼すべきか？
$N, T \to \infty$ 同時極限における収束率と漸近正規性を導出。
ファクターは $K \times K$ 回転 $H$ までしか識別できない — 一致するのは $H'F_t$ への $\hat F_t$ と $H^{-1}\lambda_i$ への $\hat\lambda_i$。共通成分 $\hat\lambda_i'\hat F_t$ は回転不変で $\lambda_i'F_t$ に収束。
収束率は $\min(\sqrt N, \sqrt T)$。
AOF 用途：米国の長 $T$ 状況では解析的 SE、日本のローリング窓のような短 $T$ では CRW ブートストラップ。

1. なぜこの論文が必要か

Bai-Ng (2002) でファクター数は決められる。だがまだわからないこと：

$\hat F_t$ はどれだけ正確か？ファクターリターンに信頼区間を付けられるか？
$\hat\lambda_i$ はどれだけ正確か？「$\lambda_i$ がゼロ」を検定できるか？
回転の不定性は推測にどう影響するか？

Bai (2003) は 3 つすべてに答える。同じ近似ファクターモデルでの PCA 推定に対する推測層。

1.5. 背景 — 「近似ファクターモデル」とは何か

Bai 論文では「近似ファクターモデル（approximate factor model）」という用語が説明なしに使われる。形式設定に入る前に整理しておく。アセットプライシング・モデルは現実度の段階で 3 つに分けられる：

(1) 1 ファクター／CAPM 型

$$ r_{it} = \alpha_i + \beta_i\, R_{mt} + \varepsilon_{it} $$

観測可能なファクター 1 つ（マーケットリターン）、各銘柄に固定ローディング $\beta_i$、固定の切片 $\alpha_i$。古典的仮定：$\varepsilon_{it}$ は銘柄間でも時間軸でも i.i.d. — ノイズ共分散 $\text{Cov}(\varepsilon)$ は対角行列で、各銘柄の個別ショックは他のあらゆる銘柄から独立。

(2) マルチファクター（Fama-French とその系列）

$$ r_{it} = \alpha_i + \beta_i'\, F_t + \varepsilon_{it} $$

観測可能なファクター $K$ 個（マーケット、規模、バリュー、モメンタム、収益性、…）。依然として $\varepsilon_{it}$ は銘柄間で無相関と仮定 — ファクターが共動のすべてを捉えていることが前提。

教科書では綺麗。実データには通用しない：FF5 を控除しても残差は銘柄間で相関する — 業種ニュース、サプライチェーンショック、ファンドフロー効果などが残る。

(3) 近似ファクターモデル（Chamberlain & Rothschild 1983） — Bai が使う設定

式の形は同じ、しかしノイズの仮定を緩める：

$$ X_{it} = \lambda_i'\, F_t + e_{it} $$

$e_{it}$ に銘柄間および時系列で弱い相関を許容 — 業種波及も、緩やかな自己相関も OK。
形式的条件：$N \times N$ のノイズ共分散行列 $\text{Cov}(e_t)$ の最大固有値が、$N$ を増やしても有界に留まる。
一方、ファクター部分 $\lambda_i' F_t$ の分散は $N$ とともに成長 — ファクター駆動の共分散の上位 $K$ 固有値は $N$ に線形にスケールする。

「近似」という言葉はこの緩和を指す。Ross の「厳密」APT (1976) は $e_{it}$ の無相関を要求した；Chamberlain-Rothschild の「近似」版は弱い相関を許すが、上の固有値スケーリングのギャップで線を引く。

なぜファイナンスでこの緩和が必要か。株式パネルでは残差は常に弱く相関している — それが現実。厳密ファクター仮定を強制すると、「残差が少しでも相関するなら、その構造がファクターの代わりに説明してしまう」とモデルがデータから識別不能になる。近似ファクターはこれを緩めて識別を可能にする。

ここから派生する用語

この固有値スケーリングのギャップが、$K$ が未知で $F_t$ が潜在変数のままモデルを識別する源泉。標準用語：

用語	意味
強い因子	上位 $K$ 固有値が $N$ に線形成長。ユニバースとともにシグナルが成長し、PCA がクリーンに拾える。
弱い因子	上位 $K$ 固有値が有界 or 緩慢な成長。S/N が有界に留まり、標準 PCA は過大選択しうる。（Onatski 2010 はこの領域用の推定量。）
共通成分	$\lambda_i' F_t$ — ファクターが説明する分。
個別成分	$e_{it}$ — 残り。弱相関は許容、独立は仮定しない。
大 $N$・大 $T$ 漸近	パネル両次元が成長。大 $N$ がクロスセクションでファクターを識別（上位固有値が bulk から離れる）。大 $T$ が時系列でローディングを識別（各 $\lambda_i$ を $T$ ヶ月で平均化）。

AOF にとっての含意

CAPM と Fama-French はケース (2)：ファクターは事前指定で名前がついている。近似ファクターモデルはケース (3) で、ファクターが何かを事前指定しない — リターン共分散の上位 $K$ 固有値が拾うものすべてがファクター。モデルはファクターの意味については agnostic だが、いくつあって各銘柄がどうロードするかをクリーンに識別する。

AOF が PCA 系の推定量（regressed-PCA、IPCA）を走らせるとき、設定はケース (3)。学習ベースのすべての論文は近似ファクター仮定の下で動き、固有値ギャップを識別の源泉とする。Bai (2003) はこの設定の上に推測層を載せる。

1.6. 噛み砕いて理解する — 出てきた用語を実例で展開

§1.5 はまだ用語（共分散行列、固有値、AIC/BIC、「一致性」）に依存している。それぞれを底から、ファイナンスの具体例で展開する。

(a) 「ノイズ共分散 $\text{Cov}(\varepsilon)$ は対角行列」とは何を言っているか

$N$ 銘柄のノイズの共分散行列は $N \times N$ の格子状の表：

対角の $(i, i)$ 成分 = 銘柄 $i$ 自身の分散 — 個別ノイズの大きさ。
非対角の $(i, j)$ 成分 = 銘柄 $i$ のノイズと銘柄 $j$ のノイズの共分散 — ファクター除去後にどれだけ一緒に動くか。

対角行列とは非対角成分がすべて 0 — つまりどの 2 銘柄のノイズも互いに無相関という強い主張。

ファイナンスの直感。 トヨタとソニーの「企業特有のショック」（決算サプライズ、不祥事、新製品サイクルなど）は本当に独立か？実データでは違う — 円ドル、日銀政策、グローバル景気サイクルが「ファクター残差」効果として両方に同時に効く。だから対角の仮定は実証的に破綻する。Chamberlain-Rothschild はこれを緩めた：ファクター強度の相関はまだ NG だが、残差間の弱い相関は OK。

(b) 「最大固有値が有界 vs N に線形」の意味 — 固有値とは何か、スケーリングはなぜ重要か

固有値とは、共分散行列の主要な変動方向の大きさを測る数値。最大固有値 = 「データが最も大きく振れる方向の振幅」。

$\text{Cov}(\lambda_i' F_t)$ の固有値 → ファクターが効く $K$ 個の主方向の振幅。
$\text{Cov}(e_t)$ の固有値 → 残差の各方向の振幅。

なぜ片方は成長し片方は留まるか。

銘柄を 1 つ追加。新しい銘柄も他の全銘柄と同じファクター（マーケット、規模、バリュー…）にロードする。同じ方向に向かうベクトルが 1 つ増える → そのファクター方向の分散が約 1 単位だけ増える。$N$ 銘柄ならファクター固有値 ≈ $N$。$N$ に線形。
銘柄を 1 つ追加。そのノイズは他の全銘柄のノイズとほぼ独立。ノイズは特定の方向に積み上がらない。各ノイズ固有値は 1 銘柄分のノイズの大きさのまま。$N$ について有界。

帰結。 ファクター固有値とノイズ固有値のギャップが $N$ とともに広がる。$N = 10$ ではシグナルとノイズの区別がほぼつかない。$N = 5{,}000$ では、トップのファクター固有値はノイズ固有値の数百倍。このギャップが PCA を機能させる根拠 — そしてまさに固有値比セレクター $\hat K$ が測るもの。

(c) 用語を実例で肉付け

強い因子。 例：マーケット（市場ベータ）。全銘柄が大なり小なり正にロード。$N$ を増やすと同方向のロードが積み重なる → 固有値が $N$ に線形成長。

弱い因子。 例：ニッチなセクターテーマ（2024 年の AI CapEx ナラティブ）。50 銘柄程度には効くが、残り 4,950 銘柄ではほぼゼロ。$N$ を増やしてもほとんどの銘柄でロードがゼロ → 固有値があまり伸びない。標準 PCA はこれをノイズと取り違えうる（逆も）。

共通成分 $\lambda_i' F_t$。 例：トヨタの今月リターン 3% のうち、マーケットが +0.5%、規模ファクターが +0.3%、バリューが −0.1% を寄与とする。合計 0.7% が共通成分 — ファクターで説明される分。

個別成分 $e_{it}$。 残り $3.0\% - 0.7\% = 2.3\%$ — トヨタ固有のニュース（社長交代、新型 EV 発表、リコールなど）。ホンダや日産のノイズと完全独立とは仮定しない — 同業他社との残差相関は許容する。

大 $N$・大 $T$ 漸近。 統計用語で「パネル両次元が漸近理論が効く程度に十分大きい」状態。CRSP 米国月次：$N \approx 5{,}000$、$T \approx 700$ — 両方大。J-Quants 日本：$N \approx 4{,}000$、$T \approx 200$ — 両方大。逆に「S&P 500 銘柄 × 5 年」だと $N = 500$、$T = 60$ — 両方小で漸近定理は効かない。

(d) なぜ Bai-Ng が必要だったか — スクリープロットと AIC/BIC が効かない理由

スクリープロット (scree plot)。 固有値を大きい順にプロットしたグラフ。視覚的に「肘 (elbow)」がある場所を $\hat K$ とする。

固有値
 ↑
 |  *
 |   *
 |    *  ← elbow at K=3
 |     ·  ·  ·  ·  ·  ·
 |
 +─────────→ 順位

問題：主観的。同じスクリープロットを 2 人の分析者が見ても $K=3$ か $K=5$ かで意見が割れうる。再現性なし、形式的検定なし。

AIC / BIC。 時系列計量経済学の標準モデル選択基準。誤差 + ペナルティ × パラメータ数 の形。

問題：AIC・BIC は「$N$ 固定、$T \to \infty$」または完全パラメトリックモデルを前提に導出された。ファクターモデルでは $N$ も $T$ も増える、かつ残差が弱相関を許す。標準的な AIC/BIC ペナルティ速度はこの領域の $V(k)$ の漸近挙動と合わない。結果、標準 AIC/BIC は $k = k_{\max}$ を常に選ぶ（過大選択）か $k = 0$ を常に選ぶ（過小選択）のいずれかで、一致性がない。

「一致性ある (consistent) 基準」。 統計用語で「サンプルサイズ $N, T$ が ∞ に発散すると、選ばれた $\hat K$ が真の $K$ と一致する確率が 1 に収束する」。Bai-Ng (2002) は近似ファクター設定でこれを満たす最初の基準を与えた。技術的貢献：「ペナルティは $\min(N,T)^{-1}$ より遅く 0 に収束する必要がある」という新しい速度条件を導入し、$\text{PC}_p$ / $\text{IC}_p$ 族を設計してこれを満たした。

2. モデル

$$ X_{it} = \lambda_i' F_t + e_{it}, \qquad i=1,\dots,N,\; t=1,\dots,T. $$

仮定：$K$ は既知（または BN から一致推定）；ファクターは「強い」（$\Lambda'\Lambda/N$ の固有値が 0 から有界に離れている）；$e_{it}$ はクロス・時系列の弱相関を許容。

3. 回転行列 $H$

制約なしではファクターは識別不能 — 任意の可逆 $H$ について $X = (\Lambda H)(H^{-1}F) + e$ は観測同値。PCA は特定の正規化を選ぶ：

$$ H = \frac{1}{T} V_{NT}^{-1} \, \hat F' F \, \frac{\Lambda'\Lambda}{N}, $$

$V_{NT}$ は $XX'/(NT)$ の上位 $K$ 固有値の対角行列。データ依存だが漸近的には well-defined。

$\hat F$ を報告するとき、それは真ファクターの回転 $H'F$ を報告している。$\hat F$ への回帰係数も対応して回転される。当てはめ値 $\hat\lambda_i'\hat F_t = (H^{-1}\lambda_i)'(H'F_t) = \lambda_i'F_t$ は回転不変 — したがって $R^2$ 系の指標はクリーン。

4. 主要結果

4.1 ファクター推定値

$$ \sqrt{N} \,\bigl( \hat F_t - H' F_t \bigr) \;\xrightarrow{d}\; \mathcal{N}(0, \, V_t), $$

$V_t$ は $\lambda_i$ と $e_{it}$ のクロスセクション 2 次モーメントから計算可能。$\sqrt N$ レート：銘柄が多いほどファクターが鋭くなる。

4.2 ローディング推定値

$$ \sqrt{T} \,\bigl( \hat \lambda_i - H^{-1} \lambda_i \bigr) \;\xrightarrow{d}\; \mathcal{N}(0, \, W_i), $$

$W_i$ は時系列モーメント。$\sqrt T$ レート：期間が長いほどローディングが鋭くなる。

4.3 共通成分（回転不変）

$$ \min(\sqrt N, \sqrt T) \,\bigl( \hat \lambda_i' \hat F_t - \lambda_i' F_t \bigr) \;\xrightarrow{d}\; \mathcal{N}(0, \, U_{it}). $$

5. 実務上の帰結

問	答
$\hat F_t$ の 95% CI は付けられるか？	付けられるが、$F_t$ ではなく $H'F_t$ への CI。リターンを説明したいだけなら通常問題なし。
$\lambda_i = 0$ を検定できるか？	$H^{-1}\lambda_i = 0$ を検定するかたちで可能（$H$ が可逆なら等価）。
当てはめ値 $\hat\lambda_i'\hat F_t$ の CI は？	クリーンに付けられる、回転不変。
$T$ が小さい場合は？	Bai の漸近論は破綻。ブートストラップ（CRW 2023）を使う。

6. 本シリーズ内での位置づけ

Bai-Ng (2002) が $K$ を選び、Bai (2003) が $K$ 既知の下で推測を行う。姉妹論文。
Kelly-Pruitt-Su (2019) IPCA は同じ大 $N$・大 $T$ 漸近論を特性をインストルメントとする設定に拡張。
CRW (2023) は Bai の理論が $T\to\infty$ を要求し短パネルで破綻すると指摘。固定 $T$ で機能するブートストラップを開発 — AOF の短窓ユースケースそのもの。

7. AOF モデルへの貢献

パネルが長く（$T \geq 120$）、ユニバースが安定している場合 Bai (2003) の解析的 SE を使う：

米国フルサンプルのファクターリターン 1968→現在
日本フルサンプルのファクターリターン 1990→現在
月次の公開可能ファクターリターン系列（AOF リサーチライブラリ用）— 点推定＋解析的 95% CI

以下では CRW ブートストラップ：

5 年ローリング窓（ファクター構造変化の研究）
業種・部分ユニバース（クリーンテック、半導体）で $N$ が小さい場合
ストレス／急速レジーム検定で $H$ の安定性が疑わしいとき

両者は実装パイプライン内で共存 — 実行時にサンプルサイズで分岐。

8. 次に読むべきもの

Onatski (2010) — ファクターが弱い場合（Bai の強因子仮定が成立しないとき）。
Kelly-Pruitt-Su (2019) — 同じ漸近装置を IPCA に適用。