How AI Search Decides What to Cite — and What It Ignores
AI search engines are not search engines. They are answer compilers with aggressive content filters, volatile citation behavior, and almost no correlation with traditional SEO metrics. Here is what the data actually shows.
Compiled by Aviel Fahl
Key Findings
AI search engines filter out roughly 95% of retrieved content before generating an answer, and only about 15% of retrieved pages earn a visible citation. Traditional SEO authority (Domain Authority, backlink counts) explains almost nothing about which pages get cited (r² = 0.05). What matters instead: entity recognition lifts citation rates by up to 267%, cosine-similarity between query and passage is 7.3× more predictive than domain authority, and ChatGPT's cited sources are on average 25.7% fresher than Google's organic results. Each AI platform — Google AI Overviews, ChatGPT, Perplexity, Gemini — uses a different index, different ranking logic, and different source preferences, with as little as 11% domain overlap between them.
Contents
~5%
of retrieved content reaches the user
15%
of retrieved pages earn a citation
r²=0.05
traffic explains almost nothing about citation
30%
brand visibility retention per answer
The Pipeline: How AI Search Actually Selects Content
Every AI search answer goes through a multi-stage reduction pipeline. Understanding these stages explains why most content never gets cited, regardless of its organic ranking.
Dan Petrovic at DEJAN AI reverse-engineered Google's Vertex AI Search pipeline and found a five-step process: the user query decomposes into fan-out sub-queries, each sub-query retrieves candidate pages, those pages get trimmed to condensed versions, the condensed snippets become LLM context, and the model generates an answer with citations.
At each stage, content is filtered. The numbers are stark: across five test domains, only 32% of total page characters from cited pages survived into the final answer — the content survival rate. The variation was extreme: one domain retained 65% of its content, another only 21%. What survived: service descriptions, pricing structures, process instructions. What got filtered: navigation, promotional claims, unrelated product categories, customer review quotations.
Separately, AirOps analyzed 548,534 pages retrieved by ChatGPT and found that only 82,108 (15%) earned any citation at all. Retrieval does not equal citation — the reranking layer between retrieval and citation is an aggressive filter.
Combine these two filters and the math is clear: roughly 15% of retrieved pages get cited, and of those, roughly 32% of their text survives. About 5% of retrieved content reaches the end user.
The grounding budget
Google's grounding budget per query is approximately 1,929 words (median), according to DEJAN's SRO synthesis of 7,060 queries. The #1 source receives about 531 words (28% of the budget); #5 gets 266 words. Most pages receive 200-600 words of grounding regardless of their original length. Pages under 1,000 words retain 61% of their content; pages over 3,000 words retain only 13%. Grounding plateaus at ~540 words. This is the strongest empirical argument for density over length.
Google uses extractive summarization — exact sentences from source pages, not paraphrases. DEJAN confirmed this by fine-tuning a DeBERTa model to replicate the behavior. The system applies query-focused selection with a heavy lead bias: opening paragraphs are extracted near-wholesale. Every sentence needs to function as a standalone extractable claim — a principle explored in depth in content engineering for AI extraction. Pronouns and anaphora ("it," "they") create extraction failures because the model cannot resolve them outside the original context.
The Decoupling: Organic Rankings and AI Citations Are Diverging
In mid-2025, Ahrefs found that 76% of AI Overview citations came from pages ranking in Google's organic top 10. By February 2026, that number was 38%. The remaining 62% came from positions 11-100 (31.2%) and beyond position 100 (31%). The cause: Google's Gemini 3 upgrade in January 2026, which expanded the query fan-out system to pull from a dramatically wider source pool.
The decoupling is even more pronounced across platforms.
| Platform | Domain Overlap with Google Top 10 | URL Overlap |
|---|---|---|
| Perplexity | 91%+ | 82% |
| Google AI Overviews | 86% | 67% |
| Google AI Mode | ~54% | ~35% |
| ChatGPT | Lowest | ~10% |
ChatGPT has only 10% URL overlap with Google's top 10. An arXiv study found just 4% domain overlap between GPT-4o and Google. These are functionally different retrieval systems — a divergence that compounds the gap between Google's public statements and its internal behavior.
But there's a nuance that matters. AirOps measured the relationship in the other direction: of pages ChatGPT does cite, 55.8% rank somewhere in Google's top 20 for at least one query (including fan-out sub-queries). Pages that rank #1 in Google get cited at 43.2% versus 12.4% for pages beyond position 20 — a 3.5x advantage.
This is not a contradiction. Google has vastly more ranking pages than ChatGPT has citations, so most Google-ranked pages are never cited. But pages with strong fundamentals tend to surface in both systems because both reward similar quality signals. The relationship is a correlation through shared quality, not a causal pathway from rank to citation.
The strongest signal
Profound analyzed 250M+ AI responses and found that traditional SEO metrics explain almost nothing about AI citation behavior: traffic r²=0.05 and backlinks r²=0.038. Entity richness (267% citation lift), cosine similarity to the query (7.3x at 0.88+), and content clarity (+32.83%) are far stronger predictors.
Platform Divergence: There Is No Single "AI Search"
86% of top-mentioned sources are not shared across ChatGPT, Perplexity, and AI Overviews. Only 7 of the top 50 domains appear in all three platforms' top 50. Each platform has distinct retrieval architecture, source preferences, and citation behavior.
Citation rates vary 615x across platforms. Grok cites sources in 27% of responses. ChatGPT cites in 0.59%. A brand visible on one platform may be invisible on another.
| Platform | Citation Rate |
|---|---|
| Grok | 27.01% |
| Perplexity | 13.05% |
| Google AI Mode | 9.09% |
| Gemini | 6.38% |
| Google AI Overview | 2.11% |
| Copilot | 1.27% |
| ChatGPT | 0.59% |
Even Google's own AI products diverge. AI Mode, AI Overviews, and Gemini cite very differently despite sharing an owner. Ahrefs compared 730K response pairs and found 86% semantic similarity but only 13.7% citation overlap — they agree on what to say but cite completely different sources. AI Mode cited 143% more unique domains than AI Overviews by January 2026, includes 2.5x more brand mentions, and behaves more like ChatGPT than like AI Overviews. There is also a self-citation bias: AI Mode cites Google's own properties at 17.42% (tripled from 5.7% previously), raising questions about platform neutrality in source selection.
Social media citation influence is also platform-specific. Tinuiti/Profound's Q1 2026 report found Reddit drove 24% of Perplexity's citations in January 2026 but effectively 0% of Gemini's. YouTube matters for Gemini but barely registers on ChatGPT. Reddit's citation share grew 73-100%+ across all verticals between October 2025 and January 2026.
The source preference differences are structural:
- AI Overviews: Over-indexes on UGC — YouTube (9.5%), Reddit (7.4%), Quora (3.6%)
- ChatGPT: Wikipedia dominant (16.3%); leans toward publishers and news sources
- Perplexity: YouTube (16.1%) + Wikipedia (12.5%); broadest international corpus
- AI Mode: Favors commercial and authoritative sources; highest entity density
What Gets Cited: Content Format and Structure
Content format is one of the strongest predictors of AI citation. Structured content — tables, comparison formats, guides with clear section hierarchy — consistently outperforms narrative prose.
| Content Format | Citation Rate |
|---|---|
| Comprehensive guides with data tables | 67% |
| Product comparison pages | 60-70% |
| Structured how-to guides | 54% |
| Comparative listicles | 32.5% of all citations |
| Narrative how-to guides | 25-40% |
| Opinion pieces | 18% |
The structural advantage is measurable at the element level, according to Onely's compiled research and the AirOps 2026 State of AI Search:
- Semantic HTML tables increase citation rates approximately 2.5x versus paragraph text
- ChatGPT citations include tables 2.3x more frequently than traditional search (30% vs 13%)
- FAQ-structured content shows 28-40% higher citation probability
- Sequential headings correlate with 2.8x higher citation likelihood
- Pages with title-query alignment of 50%+ see 2.2x citation lift
DEJAN's reverse-engineering of AI Mode found that AI snippet selection caps at approximately 160 characters. The selection algorithm prioritizes semantic relevance, structural importance (HTML hierarchy), content density, and value proposition detection. Customer-centric language ("you," "your team") gets selected more frequently.
Content also exists as what Petrovic calls "semantic topography" — different regions of the same page live at different semantic coordinates. When AI systems decompose a query into sub-queries, each sub-query surfaces different passages from the same page. A query about "risks of X" surfaces avoidance language; a query about "benefits of X" surfaces benefit language. A well-structured page can serve multiple fan-out queries simultaneously if each section is independently extractable — the same architectural principle behind programmatic SEO.
What does not work
Ahrefs tested three pages of AI-generated content on ahrefs.com (DR 91). None ranked for target keywords. A competing Ahrefs page on a different topic outranked the AI-generated page about the actual topic — evidence for information gain scoring. Even exceptional domain authority cannot compensate for a lack of original information.
The Authority Paradox: High DR, Lower Conversion
Ahrefs found that ChatGPT's most-cited pages have a median DR of 90. That makes it sound like domain authority matters. But AirOps measured something different: the rate at which retrieved pages actually convert to citations.
| Domain Authority Range | Citation Rate (Retrieved to Cited) |
|---|---|
| 0-20 | 21.5-23.6% |
| 20-40 | 21.5-23.6% |
| 40-60 | 21.5-23.6% |
| 60-80 | 21.5-23.6% |
| 80-100 | 15.0% |
Citation rate is consistent at 21.5-23.6% across DA 0-80. It actually drops to 15% for DA 80-100 sites. High-authority sites get retrieved more often — they accumulate more total citations through sheer volume — but they convert from retrieval to citation at a lower rate. Probably because they cover topics broadly rather than addressing specific queries precisely.
Both data points can be true simultaneously. High-DR sites accumulate more total citations (volume-weighted). Mid-authority sites compete on citation rate (conversion-weighted). This is a base-rate effect, not a contradiction. The practical implication: mid-authority sites (DA 20-80) can compete on citation if they nail topical precision — as the Banksparency case study demonstrates with 10K+ monthly visits on a low-authority domain. The barrier is relevance, not domain authority.
Further evidence: 67% of ChatGPT's top 1,000 citations are structurally uninfluenceable — Wikipedia (29.7%), homepages (23.8%), app stores, reference sites. Only 32.3% represent opportunities where content optimization or outreach could make a difference. Most brands experience what RankScience calls a ghost citation problem — they get cited as evidence sources without ever being recommended as a brand.
Freshness Is a Primary Signal, Not a Tiebreaker
AI assistants cite content that is 25.7% fresher than what organic search results surface. Ahrefs analyzed 16.975M cited URLs and found ChatGPT's citations are 458 days newer on average. Google's AI Overviews are the exception — they actually prefer slightly older content.
| Platform | Avg Days Since Publication | Difference from Organic |
|---|---|---|
| Google AIO (top 3) | 1,432 | +16 (prefers older) |
| Organic SERP | 1,416 | baseline |
| Perplexity | 1,166 | -250 |
| Gemini | 1,118 | -298 |
| Copilot | 1,056 | -360 |
| ChatGPT (references) | 1,023 | -393 |
| ChatGPT (citations) | 958 | -458 (strongest) |
76.4% of ChatGPT's top-cited pages were updated within 30 days. 89.7% were updated in 2025. Seer Interactive found that 65% of AI bot crawl hits target content published within the past year, and 50% of Perplexity citations are from 2025 content alone.
This is not just a correlation. Reverse-engineering of ChatGPT's configuration revealed a use_freshness_scoring_profile: true flag that is non-disableable. Freshness scoring is an active layer that can override content quality — adding fake publication dates boosted AI visibility by up to 95 rank positions.
The average cited page is still 2.9 years old — freshness is not everything. But for non-Google AI platforms, content refresh cadence is a direct lever for visibility. Pages not updated quarterly are 3x more likely to lose citations.
Query Fan-Out: The Invisible Keyword Problem
When a user asks an AI search engine a question, the system does not search for that question. It decomposes the query into multiple sub-queries — between 2.9 (AirOps) and 10.7 (Gemini 3, Seer Interactive) on average — and retrieves results for each one independently. Google Patent US11663201B2 defines 8 variant types: Equivalent, Follow-up, Generalization, Specification, Canonicalization, Translation, Entailment, and Clarification.
95% of these fan-out queries have zero traditional search volume. They are invisible to every keyword research tool on the market — which is why diagnostic methodology needs to account for AI retrieval paths. But they are the primary retrieval pathway: 89.6% of ChatGPT searches generate 2+ fan-out queries, and 32.9% of cited pages appear only in fan-out results — not in the original query's top 20.
This is the "invisible keyword" problem. Nearly a third of AI citations come from queries the user never typed and that conventional measurement cannot detect. The acceleration is rapid: Gemini 2.5 averaged 6.01 sub-queries; Gemini 3 (January 2026) increased that by 78% to 10.7. Each model generation widens the retrieval surface, meaning content previously invisible to AI search becomes reachable without any changes on the page itself.
Fan-out behavior is not random. It varies by intent:
- Definition queries stay close to the original phrasing (51.6% near-verbatim)
- Research queries add temporal modifiers — 21.3% of fan-out queries contain a year
- Comparison queries decompose most aggressively (38.4% sub-question splitting)
The recency injection is notable: AI-generated sub-queries inject temporal bias even when users don't ask for it. The term "2026" appeared 184x more often than "2025" in Gemini 3's sub-queries.
Content that addresses multiple facets of a topic has multiplicative retrieval entry points. This is the structural case for comprehensive, well-sectioned content — not because longer is better (grounding plateaus at 540 words), but because more semantic coverage means more fan-out queries can match. Writesonic's GPT-5.4 analysis found that 75% of cited domains don't appear in traditional Google or Bing results, confirming that fan-out retrieval operates independently of conventional SERPs.
Traffic Impact: Fewer Clicks, Higher Conversion
AI Overviews reduce organic clicks by approximately 34.5% across queries where they appear. In B2B SaaS, the figure is steeper: Kevin Indig measured a 56.6% click decline across 10 sites and ~450M impressions since the March 2025 AIO rollout intensification. Zero-click rates in AI Mode reach 92-94%, with users clicking only once per 20 prompts — a dynamic explored further in the zero-click paradox.
But the clicks that do happen are dramatically more valuable.
16.8%
Claude conversion rate
14.2%
ChatGPT conversion rate
12.4%
Perplexity conversion rate
2.8%
Google organic conversion rate
AI visitors arrive pre-briefed by the AI's context. They exhibit deeper engagement, faster conversion, and lower bounce rates. This is the "educated click" pattern: fewer clicks total, but each click carries 5x the conversion value of a traditional organic visit.
The traffic measurement problem compounds this. AI referrer attribution is systematically broken across platforms — Google AI Mode strips referrers entirely (confirmed as a bug by John Mueller), ChatGPT strips them for paid accounts, and desktop apps for Perplexity and Copilot also drop attribution. AI traffic shows up as "Direct" in GA4, which means the actual traffic impact of AI search is being undercounted and misclassified. Seer Interactive's 2026 analysis found a 70.6% misclassification rate — the majority of AI search traffic is attributed to wrong channels in standard GA4 configurations. Without custom channel groupings and UTM parameters, any traffic impact analysis of AI search is operating on fundamentally inaccurate data.
What This Means for Practitioners
The data converges on a few clear principles:
Density over length. The grounding budget is ~1,929 words. Content past the first 540 words of grounding has diminishing returns. Pages under 1,000 words retain 61% of their content in AI answers; pages over 3,000 words retain 13%. Front-load the substance.
Structure for extraction. Every sentence should function as a standalone, citable claim. Use semantic HTML — tables increase citation rates 2.5x. Use clear heading hierarchies. Avoid pronoun chains that break when a sentence is extracted out of context.
Optimize for fan-out, not just the primary query. A single user query generates 3-28 sub-queries. Content that covers multiple facets of a topic — with each section independently extractable — has more entry points into the AI retrieval pipeline.
Freshness is a lever. For non-Google AI platforms, content updated within 3 months gets cited at nearly 2x the rate of older content. Quarterly refresh cadence is a baseline, not a luxury.
Platform-specific strategies are not optional. The 615x citation rate variance across platforms means a single "AI optimization" strategy will fail. Each platform has different source preferences, retrieval architectures, and content biases.
Mid-authority sites can compete. Citation rate is flat from DA 0-80 and actually drops for DA 80-100. The barrier is topical precision, not domain authority.
Traditional SEO metrics are near-irrelevant for AI citation. Traffic and backlinks explain less than 5% of citation behavior. But the fundamentals that drive organic quality — entity richness, semantic relevance, clear structure — transfer because both systems reward similar content characteristics. The clinical diagnostic framework maps these shared quality signals systematically.
Measurement is broken. AI traffic is systematically undercounted. Citation visibility is volatile — only 30% brand retention per answer, 20% across five consecutive runs. Any AI visibility strategy requires ongoing monitoring, not one-time optimization.