AOF Quant Model — Blueprint

Synthesis of the 7-paper learning base into a concrete model architecture

Kei Matsumae · 2026-05-15 · v0.1 (draft for internal alignment)

0. Purpose

This synthesis composes the seven papers into a concrete factor-model architecture for the Abundia Opportunity Fund (AOF) quant stack. Every design choice below is tagged with the paper that justifies it. Intent:

Define the target model — what the engineering team builds.
Define the research roadmap — what the analyst team validates, in what order.

Opinionated. Where the papers disagree (most notably KPS vs CRW on whether mispricing exists), this document picks a side and explains why.

1. Model architecture — production target

1.1 The core model

$$ r_{i, t+1} \;=\; \alpha(z_{it}) \;+\; \beta(z_{it})' f_{t+1} \;+\; \varepsilon_{i, t+1} $$

Component	Specification	Justification
$z_{it}$	15 characteristics (full-sample) or 7 (large-cap), lagged, rank-transformed to $[-0.5, 0.5]$ monthly	05 FNW 2020: joint-significance selection
$\alpha(z), \beta(z)$	nonparametric via sieve basis $\phi(z) =$ (1, $z$, linear B-splines with 1–2 knots)	08 CRW 2023: nonlinearity matters
$f_t$	$K$-vector of latent factors	04 Connor-Linton 2007: latent + characteristic-driven
$K$	selected from data — see §1.3	01 BN 2002 or 03 Onatski 2010 depending on regime

1.2 The estimator — regressed-PCA

Use CRW (2023) two-step regressed-PCA, not IPCA, as the primary estimator. Rationale:

Issue	IPCA	Regressed-PCA
Functional form	linear only	nonparametric
Asymptotic regime	needs large $T$	works for fixed $T$
Rolling-window analysis	breaks	works
Empirical $K$ on US individual stocks	5 (advocated)	1–2 (selected)
Mispricing test	linear-restricted	nonparametric (more flexible)

Decisive arguments: nonparametric flexibility and small-$T$ asymptotics. Latter critical for AOF because JP sample is shorter, rolling analysis is essential, and sector/sub-universe deep-dives have small $N$.

1.3 $K$-selection

Run both selectors and report side-by-side as a diagnostic:

Bai-Ng IC_p2 (01) — large-panel default.
Eigenvalue-ratio (03 → 08) — robust to weak factors and small $T$.

Sample	$K$ used
US full panel, both large	$K = $ BN-selected (typically 1–3 from $\hat\Gamma$)
JP full panel	$K = $ ER-selected (likely 1–2)
Rolling 5-year window	$K = $ ER-selected, re-selected each window
Sector sub-panel	$K = $ ER-selected with $k_{\max} = 5$

1.4 Inference layer

Bai (2003) analytic SEs (02) — for full-sample US and JP runs. Cheaper, faster, deployed wherever $T \geq 120$.
CRW weighted bootstrap (08) — for rolling windows, sector sub-panels, and any specification test ($\alpha = 0$, linearity). Always for the $\alpha = 0$ Wald test.

Bootstrap weights: $w_i \sim \text{Exp}(1)$ i.i.d. per CRW.

1.5 Output products

Product	Use
$\hat f_t$ time series, $T \times K$	Factor return library — published monthly
$\hat\alpha(z)$ surface	Mispricing map — fed into trading book
$\hat\beta(z)$ surface	Risk model — fed into risk system
Pure-$\alpha$ portfolio	$w_t = (\Phi_t' \Phi_t)^{-1} \hat a$ rebalanced monthly — trading expression of thesis
Specification tests	$\alpha=0$ p-value, linearity p-values for each characteristic

2. Where IPCA fits

Not as the primary model, but as a parallel benchmark. Production runs IPCA alongside regressed-PCA. Disagreements between the two are flagged for analyst review.

Why keep IPCA in production:

Faster (no sieve basis, smaller dimensionality).
Interpretable to non-quant audiences (linear loadings).
Well-understood literature — disagreement is a reportable result, not a black box.

3. Where KKN fits

Use KKN's arbitrage portfolio formulation as:

Phase-1 implementation: build before the full CRW model. ~50 lines of Python. If after-cost Sharpe is meaningful, thesis validated cheaply.
Out-of-sample reality check: once CRW is running, expect the CRW pure-$\alpha$ portfolio and the KKN arbitrage portfolio to correlate 0.6–0.8 on the same window. Lower → investigate.

4. Implementation phases

gantt dateFormat YYYY-MM-DD title AOF Quant Model Roadmap axisFormat %b-%Y section Phase 0 — Data Kelly IPCA panel :p0a, 2026-05-15, 7d J-Quants ingest :p0b, 2026-05-15, 21d EDINET XBRL pipeline :p0c, 2026-05-22, 21d section Phase 1 — Replication Reproduce CRW Tables I–IV :p1a, after p0a, 14d Reproduce KPS IPCA :p1b, after p1a, 7d KKN-style arb portfolio :p1c, after p1a, 7d section Phase 2 — Extension Extend US panel post-2014 :p2a, after p1a, 14d Build JP panel 2014→today :p2b, after p0c, 14d Regressed-PCA on both :p2c, after p2b, 14d section Phase 3 — Production Monthly pipeline :p3a, after p2c, 28d Output products :p3b, after p3a, 14d section Phase 4 — Research Rolling K stability :p4a, after p3b, 28d Multi-region joint :p4b, after p4a, 56d

5. Open research questions

Does JP have the same 1–2 factor structure as US? Unknown — no published full-universe JP regressed-PCA exists. AOF can be first to publish.
Which JP characteristics survive joint selection? Unknown. FNW (2020) is US-only. Re-run their adaptive group LASSO on JP panel.
What is the post-cost JP $\alpha$-portfolio Sharpe? Unknown. JP transaction costs are higher (small-cap turnover, ADV constraints) — this is the make-or-break question for the thesis.
Does the factor structure differ across macro regimes? Use rolling-window estimation enabled by CRW's fixed-$T$ result. Look for breaks at QE start, COVID, etc.
Multi-region joint estimation. Can we estimate a joint US+JP factor structure where some factors are global and some are local?

6. What this is and is not

This synthesis IS: the opinionated design document for AOF's first-generation quant model; a research roadmap with explicit phase boundaries; a reference for engineering and analyst teams.

This synthesis IS NOT: a final research paper (JP empirical claims are conjectures); a replacement for paper-by-paper reading (per-paper notes carry technical detail); static (will be revised as Phase 1 results come in).

Appendix — design choices tagged to papers

Decision	Paper	Why
Regressed-PCA as primary	08 CRW 2023	Nonparametric flexibility, fixed-$T$ asymptotics
Sieve basis = (1, z, linear B-splines)	08 CRW 2023	Empirically tested; minimal complexity capturing nonlinearity
$K$ via Bai-Ng for large panels	01 BN 2002	Standard, well-tested
$K$ via ER for small-$T$ / weak factors	03 Onatski 2010, 08 CRW	Consistency under conditions BN doesn't cover
36-char (or shortlist 15/7) for $z_{it}$	05 FNW 2020	Joint-significance selected; consensus academic panel
Lagged characteristics, 6-month delay	Fama-French convention	Point-in-time correctness
Rank to $[-0.5, 0.5]$ each month	05 FNW 2020, 08 CRW 2023	Eliminates outliers, makes chars comparable
Bai (2003) SE for full-sample inference	02 Bai 2003	Cheap, fast
CRW weighted bootstrap for $\alpha=0$ Wald	08 CRW 2023	Handles rotation indeterminacy correctly
Parallel IPCA benchmark	06 KPS 2019	Robustness, communicability
Parallel KKN arbitrage portfolio	07 KKN 2020	Phase-1 quick win + ongoing reality check
JP factor structure as open research	—	No published evidence; AOF first-mover opportunity

← 学習ベース index

AOF クオンツモデル — 設計図

7 論文学習ベースから具体的モデルアーキテクチャへの統合

松前景一郎 · 2026-05-15 · v0.1（社内アラインメント用ドラフト）

0. 目的

7 つの論文を統合してAbundia Opportunity Fund (AOF) クオンツスタックの具体的ファクターモデルアーキテクチャを提示する。各設計判断にそれを正当化する論文を紐付ける。意図：

目標モデルの定義 — エンジニアリングチームが構築するもの。
研究ロードマップの定義 — アナリストチームが検証する内容と順序。

意見的（オピニオネイテッド）。論文間の不一致（特に KPS と CRW のミスプライシング有無）には、本書が立場を取り理由を説明する。

1. モデルアーキテクチャ — production の目標

1.1 コアモデル

$$ r_{i, t+1} \;=\; \alpha(z_{it}) \;+\; \beta(z_{it})' f_{t+1} \;+\; \varepsilon_{i, t+1} $$

コンポーネント	仕様	正当化
$z_{it}$	15 特性（フルサンプル）または 7（大型株）、ラグ付き、月次で $[-0.5, 0.5]$ にランク変換	05 FNW 2020：同時有意性選択
$\alpha(z), \beta(z)$	篩基底 $\phi(z) =$ (1, $z$, 節点 1〜2 個の線形 B スプライン) でノンパラメトリック	08 CRW 2023：非線形性は重要
$f_t$	$K$ 次元の潜在ファクター	04 Connor-Linton 2007：潜在＋特性駆動
$K$	データから選択 — §1.3 参照	体制により 01 BN 2002 または 03 Onatski 2010

1.2 推定量 — regressed-PCA

主推定量として IPCA ではなく CRW (2023) の 2 段階 regressed-PCA を採用：

論点	IPCA	Regressed-PCA
関数形	線形のみ	ノンパラメトリック
漸近体制	大 $T$ 必要	固定 $T$ で機能
ローリング窓分析	破綻	機能
米国個別株での実証 $K$	5（主張）	1〜2（選択）
ミスプライシング検定	線形制約付き	ノンパラ（より柔軟）

決定的論点：ノンパラ柔軟性と短 $T$ 漸近。後者は AOF にとって critical — JP サンプルは US より短く、ローリング分析は不可欠、業種・サブユニバース掘り下げは $N$ が小さい。

1.3 $K$ 選択

診断として両セレクターを並走報告：

Bai-Ng IC_p2（01） — 大パネルデフォルト。
固有値比（03 → 08） — 弱因子・短 $T$ にロバスト。

サンプル	使用する $K$
米国フルパネル、両次元大	$K = $ BN 選択（$\hat\Gamma$ から典型的に 1〜3）
日本フルパネル	$K = $ ER 選択（おそらく 1〜2）
5 年ローリング窓	$K = $ ER 選択、窓ごとに再選択
業種サブパネル	$K = $ ER 選択、$k_{\max} = 5$

1.4 推測層

Bai (2003) 解析的 SE（02） — フルサンプルの US・JP 実行向け。安価で高速、$T \geq 120$ の場所すべてで配備。
CRW 重み付きブートストラップ（08） — ローリング窓、業種サブパネル、あらゆる仕様検定（$\alpha = 0$、線形性）。$\alpha = 0$ Wald 検定は常にこちら。

ブートストラップウェイト：CRW 通り $w_i \sim \text{Exp}(1)$（i.i.d.）。

1.5 出力プロダクト

プロダクト	用途
$\hat f_t$ 時系列、$T \times K$	ファクターリターンライブラリ — 月次公開
$\hat\alpha(z)$ サーフェス	ミスプライシングマップ — トレーディングブックへ
$\hat\beta(z)$ サーフェス	リスクモデル — リスクシステムへ
純 $\alpha$ ポートフォリオ	$w_t = (\Phi_t' \Phi_t)^{-1} \hat a$、月次リバランス — テーゼの取引表現
仕様検定	$\alpha=0$ p 値、各特性の線形性 p 値

2. IPCA の位置づけ

主モデルではなく、並走ベンチマーク。production では regressed-PCA と並走させ、両者の不一致をアナリストレビュー対象としてフラグする。

IPCA を production に残す理由：

高速（篩基底なし、次元小）。
非クオンツ聴衆にも解釈可能（線形ローディング）。
文献で十分理解されている — 不一致は報告可能な結果であり、ブラックボックスではない。

3. KKN の位置づけ

KKN のアービトラージポートフォリオ定式化を以下に使う：

Phase-1 実装：フル CRW モデルの前に構築。Python 約 50 行。取引コスト後シャープが意味あれば、テーゼを安価に検証可能。
Out-of-sample のリアリティチェック：CRW 稼働後、同じ窓で CRW 純 $\alpha$ ポートフォリオと KKN アービトラージポートフォリオが 0.6〜0.8 で相関すべし。低ければ調査。

4. 実装フェーズ

gantt dateFormat YYYY-MM-DD title AOF クオンツモデルロードマップ axisFormat %b-%Y section Phase 0 — データ Kelly IPCA パネル :p0a, 2026-05-15, 7d J-Quants 取込 :p0b, 2026-05-15, 21d EDINET XBRL パイプライン :p0c, 2026-05-22, 21d section Phase 1 — レプリケーション CRW Table I–IV 再現 :p1a, after p0a, 14d KPS IPCA 再現 :p1b, after p1a, 7d KKN 流アービトラージ :p1c, after p1a, 7d section Phase 2 — 拡張米国 2014→現在拡張 :p2a, after p1a, 14d JP パネル 2014→現在構築 :p2b, after p0c, 14d 両者で regressed-PCA :p2c, after p2b, 14d section Phase 3 — Production 月次パイプライン :p3a, after p2c, 28d 出力プロダクト :p3b, after p3a, 14d section Phase 4 — 研究ローリング K 安定性 :p4a, after p3b, 28d 多地域同時推定 :p4b, after p4a, 56d

5. オープン研究課題

日本は米国と同じ 1〜2 ファクター構造か？不明 — フルユニバース JP 上の regressed-PCA は未公刊。AOF が最初の発表者になり得る。
JP でどの特性が同時選択で生き残るか？不明。FNW (2020) は米国のみ。JP パネルで adaptive group LASSO を再実行。
取引コスト後の JP $\alpha$ ポートフォリオシャープは？不明。JP の取引コストは US より高い（小型回転、ADV 制約） — テーゼの make-or-break。
ファクター構造はマクロレジームで変動するか？CRW の固定 $T$ 結果が可能にするローリング窓推定を活用。QE 開始、COVID 等での構造変化を探る。
多地域同時推定。一部ファクターをグローバル、一部をローカルとする US+JP 同時ファクター構造を推定できるか？

6. 本書の射程

本統合論は：AOF 第一世代クオンツモデルのオピニオネイテッド設計文書；フェーズ境界明示の研究ロードマップ；エンジニアリングとアナリストチームの参照資料。

本統合論ではない：最終研究論文（JP 実証主張は推測）；論文ごとの読書の代替（技術的詳細は論文別ノートが担う）；静的（Phase 1 の結果に応じ改訂される）。

付録 — 設計判断と論文の紐付け

判断	論文	理由
Regressed-PCA を主推定量に	08 CRW 2023	ノンパラ柔軟性、固定 $T$ 漸近
篩基底 = (1, z, 線形 B スプライン)	08 CRW 2023	実証済；非線形性を捉える最小限の複雑性
大パネルで $K$ は Bai-Ng	01 BN 2002	標準、十分テスト済
短 $T$・弱因子で $K$ は ER	03 Onatski 2010, 08 CRW	BN がカバーしない条件で一致性
$z_{it}$ に 36 特性（または 15/7 ショートリスト）	05 FNW 2020	同時有意性で選択；学術コンセンサスパネル
ラグ付き特性、6 か月遅延	Fama-French 慣行	Point-in-time 正確性
月次 $[-0.5, 0.5]$ ランク変換	05 FNW 2020, 08 CRW 2023	外れ値除去、特性間比較可能化
フルサンプル推測に Bai (2003) SE	02 Bai 2003	安価、高速
$\alpha=0$ Wald 検定に CRW ブートストラップ	08 CRW 2023	回転不定性を正しく扱う
IPCA を並走ベンチマーク	06 KPS 2019	ロバストネス、伝達可能性
KKN アービトラージポートフォリオ並走	07 KKN 2020	Phase-1 の早期勝利＋継続的リアリティチェック
JP ファクター構造をオープン課題に	—	未公刊、AOF 先行者機会