04 — Connor & Linton (2007)

Semiparametric Estimation of a Characteristic-Based Factor Model of Stock Returns

Journal of Empirical Finance 14, 694–717 · Kei Matsumae · 2026-05-15

What this paper does

The first semiparametric factor model in finance. Sets up the model that IPCA (Kelly-Pruitt-Su 2019) and regressed-PCA (CRW 2023) refine and extend.
Factor loadings $\beta_i$ are unknown smooth functions of observable characteristics $z_i$ (size, B/M, beta).
Estimator: two-step kernel-based procedure. First a non-parametric regression to recover loading functions, then PCA on residuals to extract latent factors.
Empirically: characteristics-based factors mostly mimic Fama-French (1993) factors; test isn't powerful enough to nail mispricing vs. risk.
For AOF: the conceptual seed. Estimator superseded — IPCA / CRW are faster, easier, and don't restrict to time-invariant $z$.

1. Why this paper exists

Pre-2007 asset pricing had two camps:

Latent factor camp (Chamberlain & Rothschild 1983, Connor & Korajczyk 1986): PCA on returns; factors are abstract.
Characteristic factor camp (Fama-French 1993, 1996): pre-specified long-short portfolios sorted on characteristics.

Latent factors lack economic content. Characteristic factors are ad-hoc — why these characteristics, this functional form? Connor & Linton propose a bridge: loadings are functions of characteristics, but the functions are estimated nonparametrically and factors are still latent.

2. The model

$$ r_{it} \;=\; \alpha_i \;+\; \sum_{k=1}^{K} g_k(z_i)\, F_{kt} \;+\; \varepsilon_{it}. $$

Symbol	Meaning
$r_{it}$	excess return
$z_i$	time-invariant characteristics (size, B/M, beta)
$g_k(\cdot)$	$k$-th loading function (unknown, smooth)
$F_{kt}$	$k$-th latent factor return
$\alpha_i$	stock fixed effect (mispricing if non-zero)

Two restrictions later papers relax:

$z_i$ is time-invariant. Real characteristics (size, momentum, B/M) change month-to-month. IPCA and CRW allow $z_{it}$.
No pricing-error function $\alpha(z)$ — alphas are fixed effects $\alpha_i$, not characteristic-driven.

3. The estimator — kernel approach

3.1 Step 1 — Estimate loading functions

For each $k$, estimate $g_k(\cdot)$ via local kernel regression. For each target $z_0$:

$$ \hat g_k(z_0) = \arg\min_g \sum_i K_h(z_i - z_0)\,(r_{it} - g\cdot F_{kt})^2 $$

iteratively, with $F_{kt}$ also estimated. The kernel analogue of cross-sectional regressions weighted by characteristic similarity.

3.2 Step 2 — Extract factors

Given $\hat g_k(\cdot)$, model is linear in $F_t$. PCA on the projected return matrix gives $\hat F_t$. Iterate steps 1–2 to convergence.

Why kernels are awkward in practice: bandwidth choice, slow when $N$ is large, curse of dimensionality with multiple characteristics (only feasible for $M \leq 3$). This is why CRW switch to sieve-based regression — same nonparametric flexibility, much faster.

4. Empirical findings

US individual stocks 1962–2001, characteristics = (log size, log B/M, market beta).

Three factors selected. Match well with Fama-French market/size/value, especially the first two.
Loading functions are mildly nonlinear. Size loading roughly log-linear; B/M shows a "U-shape" — extreme value and extreme growth both load positively on one factor.
Pricing errors $\alpha_i$ jointly significant. Joint test has limited power; can't cleanly conclude risk vs. mispricing.

5. Connection to other papers in this series

flowchart TB CL["Connor & Linton (2007)
Time-invariant z, kernel, no α(z)"] IPCA["Kelly-Pruitt-Su (2019) IPCA
+ linear-in-z, + time-varying z
+ α(z)"] CRW["Chen-Roussanov-Wang (2023)
+ nonparametric α and β
+ fixed-T asymptotics
+ α = 0 test"] AOF["AOF Quant Model
(replication target)"] FNW["Freyberger-Neuhierl-Weber (2020)
which characteristics matter?"] CL --> IPCA --> CRW --> AOF FNW --> AOF style CL fill:#fff6e3,stroke:#b8651e

IPCA modernises: kernel → linear-in-$z$ projection; allows time-varying $z_{it}$; adds characteristic-driven $\alpha(z)$; fits jointly via alternating least squares.
CRW modernises further: nonparametric flexibility via sieves; drops large-$T$ requirement; makes $\alpha = 0$ formally testable.
Freyberger-Neuhierl-Weber (2020) answers the question CL couldn't: which characteristics, when many are candidates? Adaptive group LASSO + 36-characteristic shortlist.

6. What this gives the AOF model

CL (2007) contributes the conceptual frame, not the estimator:

The right way to think about characteristics is as covariates of loadings — not as factors themselves.
Loadings should be allowed to be nonlinear in characteristics; restricting to linear is a testable assumption, not a structural one.
The same panel can support a latent factor interpretation and a characteristic-based interpretation simultaneously.

What we do not use:

Kernel estimator (replaced by sieve / B-spline projection in CRW).
Time-invariant $z_i$ assumption (replaced by time-varying $z_{it}$).
"Factors mimic Fama-French" interpretation — useful intuition but not the production target.

7. Reading next

Kelly-Pruitt-Su (2019) — the modern linearised version of the same idea.
Freyberger-Neuhierl-Weber (2020) — answers the "which characteristics" question.

← 学習ベース index · 8 本中 4 本目

04 — Connor & Linton (2007)

特性ベースファクターモデルのセミパラメトリック推定

Journal of Empirical Finance 14, 694–717 · 松前景一郎 · 2026-05-15

論文の要点

ファイナンスにおける最初のセミパラメトリックファクターモデル。IPCA (Kelly-Pruitt-Su 2019) と regressed-PCA (CRW 2023) が洗練・拡張するモデルを提示。
ファクターローディング $\beta_i$ を観測可能な特性 $z_i$（規模、B/M、ベータ）の未知の滑らかな関数とする。
推定量：2 段階カーネル手続き。まずノンパラメトリック回帰でローディング関数を復元、次に残差に PCA で潜在ファクターを抽出。
実証：特性ベースファクターは概ね Fama-French (1993) ファクターを模倣。検定はリスク vs. ミスプライシングを明確に判別する力は不足。
AOF 用途：概念的種。推定量自体は後継に取って代わられている — IPCA / CRW がより速く、容易で、$z$ の時不変性を要求しない。

1. なぜこの論文が必要か

2007 年以前のアセットプライシングには 2 つの陣営：

潜在ファクター陣営（Chamberlain & Rothschild 1983、Connor & Korajczyk 1986）：リターンに PCA。ファクターは抽象。
特性ファクター陣営（Fama-French 1993, 1996）：特性ソートのロング・ショートポートフォリオを事前指定。

潜在ファクターは経済的内容が薄い。特性ファクターは ad-hoc — なぜこの特性、この関数形か？ Connor & Linton は橋渡しを提案：ローディングは特性の関数だが、関数自体はノンパラメトリックに推定、ファクターは依然として潜在。

2. モデル

$$ r_{it} \;=\; \alpha_i \;+\; \sum_{k=1}^{K} g_k(z_i)\, F_{kt} \;+\; \varepsilon_{it}. $$

記号	意味
$r_{it}$	超過収益率
$z_i$	時不変の特性（規模、B/M、ベータ）
$g_k(\cdot)$	$k$ 番目のローディング関数（未知、滑らか）
$F_{kt}$	$k$ 番目の潜在ファクターリターン
$\alpha_i$	銘柄固有効果（非ゼロならミスプライシング）

後続論文が緩める 2 つの制約：

$z_i$ は時不変。現実の特性（規模、モメンタム、B/M）は月次で変動。IPCA と CRW は $z_{it}$ を許容。
価格付け誤差関数 $\alpha(z)$ はない — アルファは固有効果 $\alpha_i$ であり、特性駆動ではない。

3. 推定量 — カーネルアプローチ

3.1 第 1 段階 — ローディング関数推定

各 $k$ について、ターゲット値 $z_0$ ごとにローカルカーネル回帰：

$$ \hat g_k(z_0) = \arg\min_g \sum_i K_h(z_i - z_0)\,(r_{it} - g\cdot F_{kt})^2 $$

反復的に（$F_{kt}$ も同時推定）。これは特性類似度で重み付けされたクロスセクション回帰のカーネル版。

3.2 第 2 段階 — ファクター抽出

$\hat g_k(\cdot)$ が決まれば、モデルは $F_t$ について線形。射影されたリターン行列に PCA を適用し $\hat F_t$ を得る。Step 1〜2 を収束まで反復。

カーネル法が実務で扱いにくい理由：バンド幅選択、$N$ が大きいと遅い、複数特性で次元の呪い（$M \leq 3$ までしか実用的でない）。CRW が篩ベース回帰に切り替えた理由 — 同じノンパラ柔軟性で、はるかに高速。

4. 実証結果

米国個別株 1962–2001、特性 = (対数規模、対数 B/M、市場ベータ)。

3 ファクター選択。Fama-French のマーケット/規模/バリューと、特に最初の 2 つは良く一致。
ローディング関数は緩やかに非線形。規模ローディングはほぼ対数線形、B/M は「U字型」 — 極端なバリュー株と極端なグロース株が同じファクターに正にロードする。
価格付け誤差 $\alpha_i$ は同時に有意。だが同時検定の検出力は限定的、リスク vs. ミスプライシングをクリーンに判別できず。

5. 本シリーズ内での位置づけ

flowchart TB CL["Connor & Linton (2007)
時不変 z、カーネル、α(z) なし"] IPCA["Kelly-Pruitt-Su (2019) IPCA
+ z 線形、+ 時間変動 z
+ α(z)"] CRW["Chen-Roussanov-Wang (2023)
+ ノンパラ α・β
+ 固定 T 漸近
+ α = 0 検定"] AOF["AOF Quant Model
（レプリケーション目標）"] FNW["Freyberger-Neuhierl-Weber (2020)
どの特性が重要か？"] CL --> IPCA --> CRW --> AOF FNW --> AOF style CL fill:#fff6e3,stroke:#b8651e

IPCA が近代化：カーネル → $z$ 線形射影；時間変動 $z_{it}$ を許容；特性駆動の $\alpha(z)$ を導入；交互最小二乗で同時フィット。
CRW がさらに近代化：篩でノンパラ柔軟性を保持；大 $T$ 要件を撤廃；$\alpha = 0$ を形式的に検定可能に。
Freyberger-Neuhierl-Weber (2020) が CL が答えられなかった問い「多数の候補からどの特性か？」に対し、adaptive group LASSO ＋ 36 特性ショートリストで答える。

6. AOF モデルへの貢献

CL (2007) は概念的フレームを提供する（推定量ではなく）：

特性の正しい捉え方はローディングの共変量であり、ファクター自体ではない。
ローディングは特性について非線形を許容すべき。線形に制限するのは構造的事実ではなく検定可能な仮定。
同じパネルが潜在ファクター解釈と特性ベース解釈を同時にサポートできる — 排他ではない。

採用しないもの：

カーネル推定量（CRW の篩 / B スプライン射影で置換）。
時不変 $z_i$ 仮定（時間変動 $z_{it}$ で置換）。
「ファクターは Fama-French を模倣」解釈 — 直感としては有用だが production の目標ではない。

7. 次に読むべきもの

Kelly-Pruitt-Su (2019) — 同じアイデアの近代的線形化版。
Freyberger-Neuhierl-Weber (2020) — 「どの特性か」問題に答える。