AI Search Evaluation ⑪ Freshness — How Recent the Cited Content Is

ダブルクリックで英日反転

Applied Sciences · Engineering

AI Search Evaluation ⑪ Freshness — How Recent the Cited Content Is

Freshness measures the elapsed days between a cited source's publication (or last-updated) date and the observation point. Smaller = more recent. It is the primary indicator for temporal quality across AI search engines.

Why It Matters

Time-varying industries (statistics, campaigns, regulations) are immediately harmed when an engine cites stale sources.
Low-freshness engines require clients to publish new articles frequently to stay cited.
Grounded in temporal QA research: TimeQA (Chen et al. 2021) and StreamingQA (Liska et al. 2022).

Measurement

Elapsed days = observation date − publication/last-updated date of the cited page.
Aggregate as a distribution per engine: median, p25, p75.
Median ≤ 30 days → high freshness; median > 1 year → low freshness.
Also classify each numeric match against ground_truth.last_verified as match / newer / older.

Worked Example (hypothetical, 35 time-varying queries)

ChatGPT Search: median 45 days — most recent.
Claude: median 120 days. Gemini: median 80 days.
AI Overview: median 210 days — cites articles 6+ months old even for 'latest' queries.
Risk: IR figures or fiscal-year data presented by AI Overview may be persistently outdated.

Role in the ai-search Project

Applied to ~30–40 of 78 factual-static questions containing time-varying phrases + Track A temporal-fresh category.
Primary indicator for construct C5 Temporal Freshness Tracking.
Most compatible metric for daily longitudinal (継続的な時系列) tracking across engines.

→ Freshness (elapsed days since citation) is the sharpest single signal for whether an AI search engine keeps pace with the real world.

Applied Sciences · Engineering

AI検索評価指標 ⑪ 鮮度（Freshness）— 引用コンテンツの新しさ

鮮度とは、引用元コンテンツの公開日（または最終更新日）から観測時点までの経過日数を指す。値が小さいほど新しい情報を引用していることを意味し、AIエンジンの時系列品質を測る主要指標となる。

重要な理由

統計・キャンペーン・法令改正など時変情報を扱う業界では、古いソースの引用が即座に「的外れ」な回答を生む。
低鮮度エンジンを使うクライアントには、引用され続けるために新規記事を頻繁に更新するよう推奨する必要がある。
TimeQA（Chen et al. 2021）・StreamingQA（Liska et al. 2022）の時間的評価研究を実務に応用した指標。

計測方法

鮮度（経過日数）＝観測時点 − 引用元コンテンツの公開日／最終更新日。
エンジンごとに分布（中央値・p25・p75）で集計する。
中央値30日以内→高鮮度、中央値1年超→低鮮度と判定。
ground_truth.last_verified との一致を match／newer／older の3値分類でも集計する。

計算例（仮想シナリオ・時変フレーズ含む35問）

ChatGPT Search：中央値45日（最も新しい）。Gemini：中央値80日。
Claude：中央値120日。AI Overview：中央値210日（約7ヶ月前）。
AI Overviewは「最新」を問うクエリでも6ヶ月以上前の記事を引用するケースが多い。
当期IRデータや年度数値をAI Overviewが古いまま提示するリスクが特に高い。

ai-searchプロジェクト内での位置づけ

78問の事実静的設問のうち時変フレーズを含む推定30〜40問＋Track A（時間的鮮度カテゴリ）に適用。
構成概念C5「Temporal Freshness Tracking（時間的鮮度追跡）」の主要指標。
エンジン横断の日次縦断（longitudinal：同一対象を継続追跡する）計測と最も相性が良い指標。

→ 鮮度（経過日数）は、AIエンジンが現実世界の変化に追いついているかを示す最も鋭い単一指標である。

Applied Sciences · Engineering