Research Synthesis

Topical Authority: From SEO Folklore to Confirmed Signal

For years, “topical authority” occupied the same space as “domain authority” in SEO discourse, a concept everyone referenced, nobody could define precisely, and skeptics dismissed as a correlation artifact. Then Google's internal API documentation leaked. What emerged was not one signal but seven overlapping systems that quantify exactly how deeply a domain covers a subject.

Compiled by Aviel Fahl · Last updated April 1, 2026

Key Findings

Google's 2024 API leak confirmed topical authority as a real, multi-signal system. siteFocusScore, siteRadius, and site-level topic embeddings all quantify how deeply a domain covers a subject. Text relevance is the single strongest ranking factor at 0.47 correlation across 16,298 keywords. Fan-out query coverage has a 0.77 Spearman correlation with AI citation likelihood across 36 million AI Overviews. The system is not one toggleable score; it is the combined output of topic embeddings, NsrChunks, ClusterUplift, and pairwise quality comparisons. Content depth and entity coverage shift the vector. Volume alone does not.

On this page

The API Leak Ended the Debate
A Multi-Signal System, Not One Signal
Three Site Archetypes
Text Relevance Dominates Rankings
Authority Accelerates Visibility
Fan-Out Coverage and AI Citation
Information Gain: Authority's Complement
The Overclustering Trap
Measuring It: Topic Share
What This Means for Practitioners

0.47

Text relevance correlation (strongest factor)

0.77

Fan-out coverage ↔ AI citation (Spearman)

57%

Faster visibility for high-authority content

Confirmed topical authority signals in API

The API Leak Ended the Debate

In May 2024, an automated bot uploaded thousands of pages of internal Google Search API documentation to GitHub. The Content API Warehouse leak revealed 2,596 modules containing 14,014 attributes across 2,500+ pages, the most comprehensive view of Google's ranking infrastructure ever disclosed. Among those attributes: siteFocusScore, siteRadius, and site2vecEmbeddingEncoded.

iPullRank's analysis and Hobo Web's breakdown identified what these signals do. siteFocusScore quantifies how dedicated a site is to a single topic, specialist vs. generalist. siteRadius measures how much an individual page deviates from the site's central theme. And site2vecEmbeddingEncoded is a compressed vector embedding of the site's overall theme, with pageEmbeddings measuring each page against that site-level vector.

This was not inference. These are named attributes in production API documentation that Google acknowledged as authentic, even while cautioning the data was “out-of-context, outdated, or incomplete”. The cautionary framing is technically fair, but we cannot determine active weights or deployment status. But the architecture exists. The signals exist. The debate about whether Google measures topical coherence at the site level is over.

What changed

Before the leak, skeptics like Kevin Indig argued that topical authority was an “SEO ghost concept,” a narrative practitioners imposed on correlation data. After the leak confirmed siteFocusScore and siteRadius, Indig reversed his position publicly, proposing Topic Share as the primary operational metric for tracking topical authority over time.

A Multi-Signal System, Not One Signal

There is no single “topical authority” score. The effect SEOs observe comes from at least four overlapping systems, each operating at a different level of granularity.

QualityAuthorityTopicEmbeddings positions a site in mathematical vector space relative to every other site. These propagate all the way to SuperRoot, the final ranking layer. Similar embeddings mean topically related; distant embeddings mean unrelated. This is how Google knows a personal finance blog and a bank's advice section are covering the same territory, even if they share zero backlinks.

NSR (Normalized Site Rank) is a 63+ field site-level quality scoring system. Within it, NsrChunks breaks the site into topical sections and evaluates each independently. A blog section can have a completely different NSR chunk score than product pages on the same domain. ClusterUplift groups sites with similar sites and applies collective boosts or demotions to the entire cluster. If the cluster has a quality problem, every site in it gets demoted, even clean ones.

PairwiseQ comparisons favor sites with deeper topical coverage when matched head-to-head against competitors. And NLP entity coverage, the breadth and depth of entity recognition within content, feeds Google's Entity-Based Ranking patent (US10235423), which assigns composite scores from knowledge graph metrics weighted by entity type.

Source:Google API Leak (May 2024), iPullRank / Hobo Web analysis
Signal	What It Measures	Level
siteFocusScore	How dedicated a site is to a single topic	Site
siteRadius	How much a page deviates from the site's central theme	Page → Site
site2vecEmbeddingEncoded	Compressed vector embedding of a site's overall theme	Site
pageEmbeddings	Per-page vectors compared against site embeddings	Page
QualityAuthorityTopicEmbeddings	Multi-dimensional vector positioning site relative to all others	Site
NsrChunks	Independent quality evaluation per topical section of a site	Section
ClusterUplift	Collective quality boosts/demotions applied to similar-site clusters	Cluster

When SEOs say “I tested topical authority and it didn't work,” they tested one dimension of a multi-dimensional system. Publishing 50 thin articles does not move the topic embedding vector. Twenty deeply comprehensive, entity-rich, properly interlinked articles will. Depth and entity coverage shift the vector; volume alone does not.

Signal versioning

All of these signals are versioned. Google runs live experiments with different weightings simultaneously. A site can literally rank differently under different experimental versions of the same signals. This is another source of ranking fluctuation with no obvious external cause.

Three Site Archetypes

Hobo Web's analysis of the API leak data identified three archetypes based on how siteFocusScore and siteRadius interact:

Source:Hobo Web, API Leak Analysis (2024)
Archetype	Description	Implication
Perfect Topicality	Every page has low siteRadius — tight coherence around a single theme	Maximum siteFocusScore. The specialist advantage.
High Focus with Topical Drift	Strong core topic, but outlier pages raise siteRadius	Pruning or improving off-topic pages strengthens calculated authority.
Generalist with Niche Core	Expertise diluted by tangential content coverage	The blog-around-everything pattern. Depth obscured by breadth.

The practical implication is that pruning or improving off-topic content directly strengthens calculated authority. A site does not need to add more content about its core topic if it already has depth. It may need to remove content that dilutes the signal. This aligns with the Panda patent (US9031929), which evaluates quality at the site level: thin or irrelevant pages drag down the entire domain's score.

Text Relevance Dominates Rankings

The Semrush 2024 Ranking Factors Study analyzed 16,298 English keywords across the top 20 positions, evaluating 65 factors. Text relevance, how closely a page's content matches the query, showed the strongest correlation with rankings at 0.47, more than double the next-strongest factor.

Source:Semrush (16,298 keywords), Surfer SEO (260K SERPs), Ahrefs (5 tools)
Factor	Correlation	Source
Text relevance	0.47	Semrush (16,298 keywords)
URL organic traffic	0.33	Semrush (16,298 keywords)
Domain authority	0.21	Semrush (16,298 keywords)
Content quality score	0.17	Semrush (16,298 keywords)
Content comprehensiveness	0.17	Surfer SEO (260K SERPs)
Content tool score → ranking	Weak	Ahrefs (5 tools tested)

Ranking factor correlations: Text relevance 0.47, URL organic traffic 0.33, Domain authority 0.21, Content quality 0.17, Content comprehensiveness 0.17

Pages ranking for one keyword have significantly better odds of ranking for related keywords, supporting the topical cluster thesis. But there is an important nuance: content comprehensiveness alone shows only a 0.17 correlation with rankings (Surfer SEO, 260K SERPs). And when Ahrefs tested five content optimization tools (Surfer, Frase, NeuronWriter, Clearscope, AI Content Helper), they found weak correlations across the board. Content tool scores do not strongly predict rankings on their own.

The distinction matters: text relevance (are you writing about the thing the user searched for?) is strong. Content comprehensiveness (did you cover every subtopic?) is necessary but not sufficient. Topical authority appears to operate as a multiplier on relevance. Deep topical coverage increases the probability that any individual page achieves high text relevance for its target query.

A separate Surfer SEO / WLDM study (~260,000 SERPs) found that page-level topical authority was the largest on-page ranking factor, stronger than domain monthly traffic volume. This distinguishes page-level from domain-level signals: a highly relevant page on a lower-authority domain can outperform an irrelevant page on a stronger domain.

Authority Accelerates Visibility

A Graphite study tracking 332 URLs published across 12 domains in June–July 2023 found that content published on domains with high topical authority gains visibility 57% faster, is 62% more likely to get traffic within the first week, and reaches impression milestones 30% faster.

Source:Multiple sources (2024–2025)
Metric	Result	Source	Confidence
Visibility speed (high TA vs. low)	57% faster	Graphite (332 URLs)	Moderate
First-week traffic likelihood	62% more likely	Graphite (332 URLs)	Moderate
Impression milestone speed	30% faster	Graphite (332 URLs)	Moderate
Niche Expertise algorithm weight	~13%	First Page Sage (2025)	Moderate
Topic cluster traffic (case study)	500 → 190K monthly	HubSpot (2024)	Moderate (single case study)
Fan-out coverage ↔ AI citation	0.77 Spearman	Surfer SEO (36M AIOs)	High

Topical authority accelerates visibility: 57% faster visibility, 62% more likely first-week traffic, 30% faster impression milestones, 500 to 190K topic cluster traffic (HubSpot case study)

The sample is small (332 URLs), but the controlled methodology and consistent direction across metrics give it moderate confidence. First Page Sage's 2025 ranking factor analysis weights Niche Expertise, defined as having 10+ authoritative pages around the same hub keyword, at approximately 13% of the ranking algorithm, the fourth-highest factor. They introduce the concept of “Net DR”: a DR 40 domain outranking a DR 70 domain nearly always has substantially higher Niche Expertise.

This is the quantitative evidence for what the clinical diagnostic framework calls the evidence-builder loop: authority priors built on earlier wins compound returns on subsequent content. Sequencing matters. Publishing into topics where you already have depth yields faster ROI than scattershot coverage across new topics.

Practitioner note

NavBoost operates on a rolling 13-month window of click data, segmented per topic. Topical authority compounds behavioral signals within a topic cluster. Each new page benefits from the accumulated click history of existing pages in the same cluster. New domains face a structural disadvantage: no click history means no NavBoost signal, regardless of content quality.

Fan-Out Coverage and AI Citation

The strongest empirical link between topical coverage and AI visibility comes from Surfer SEO's AI Citation Report (2025), analyzing 36 million AI Overviews and 46 million citations. Pages ranking for fan-out queries, the sub-queries AI systems generate when decomposing a user's question, are 161% more likely to be cited in AI responses, with a Spearman correlation of 0.77 between fan-out query coverage and citation likelihood.

This is where topical authority and AI citation mechanics converge. When Google's AI systems decompose a query into sub-queries (averaging 10.7 per prompt in Gemini 3), content addressing multiple facets of a topic captures more of those sub-queries. Topical depth creates fan-out coverage as a structural byproduct.

But only 27% of fan-out sub-queries are stable across repeated searches. The implication: you cannot optimize for specific sub-queries. You optimize for topical coverage, and fan-out capture follows.

Source:Surfer SEO, Digital Bloom, AiModeBoost (2025)
Metric	Value	Source
Fan-out coverage → AI citation likelihood	161% more likely	Surfer SEO (36M AIOs, 46M citations)
Fan-out sub-query stability	Only 27% stable	Surfer SEO (36M AIOs)
Brand search volume → AI citation	0.334 correlation	Digital Bloom (680M+ citations)
Entity-rich vs. keyword-optimized content	267% more AI citations	AiModeBoost
Entity ID matching (Wikidata Q-IDs)	8.9x citation increase	AiModeBoost
Multi-platform presence (4+ channels)	2.8x more AI mentions	Digital Bloom (680M+ citations)

The Digital Bloom 2025 AI Visibility Report (680M+ citations) adds a related finding: brand search volume, not backlinks, is the strongest predictor of AI citations at 0.334 correlation. Brands present on four or more platforms are 2.8x more likely to appear in ChatGPT responses. This extends the topical authority thesis into AI retrieval: it extends beyond what you write to whether the AI system recognizes you as an entity worth citing on that topic.

AiModeBoost's entity research (67,394 content pieces) connects this to entity mechanics directly: entity-rich content achieves 267% more AI citations compared to keyword-optimized content, and entity ID matching (Wikidata Q-IDs, Knowledge Graph MIDs) produces an 8.9x citation increase. The correlation between knowledge graph alignment and AI visibility is 89%.

The concentration effect is extreme. Goodie AI's September 2025 analysis of 5.7 million citations found the top 50 domains capture approximately 53% of all citations, while 40,000+ sites share the remainder. Brands in the top quartile for web mentions receive 10x more AI Overview citations than the next quartile down. The relationship is non-linear: incremental mentions below the top quartile produce diminishing returns. This is not a gradual gradient. It is a threshold effect where concentrated brand presence triggers disproportionate citation volume.

Third-party validation layers amplify the signal. A ConvertMate study (methodology undisclosed, treat as directional) found that active profiles on Trustpilot, G2, and Capterra correlate with a 3x higher ChatGPT citation rate. Review platforms function as validation layers for comparison and evaluation queries, the intent categories where AI systems need to establish entity credibility before citing a source.

The content type mix shifts dramatically by query intent. Omniscient Digital analyzed 23,387 sources (January 2026) and found that for branded queries, third-party validation dominates: reviews and social proof account for 57% of citations, directories 17%, product pages only 12%, and thought leadership 5.4%. The implication for topical authority is that brand-level authority in AI search depends on an ecosystem of third-party signals, not just owned content depth.

Platform context matters as much as intent. Yext's analysis of 6.8 million citations (via Surfer SEO, February 2026) found brand-controlled sources account for 86% of citations when intent is controlled for. Reddit, often cited as dominant in AI results, accounts for only 2% in that context. However, Reddit's share jumps to 46.7% of evidence citations on Perplexity specifically. Reddit dominance is platform-specific and intent-specific, not a general authority signal. The takeaway: owned content and structured data perform better across AI systems than user-generated discussion, except on platforms that privilege forum-style evidence.

Information Gain: Authority's Complement

Google's Information Gain patent (US20200349181A1, filed 2018, granted June 2024) describes a system that scores documents on how much novel content they contain beyond what was already presented to the user. Documents scoring near zero can be demoted or excluded entirely.

Information gain is distinct from topical authority. Authority is credibility through consistent expertise, a track record of deep, accurate coverage within a topic. Information gain is the uniqueness of a specific contribution. They are complementary, not competing signals.

Clearscope defines information gain operationally as “concepts and entities on the fringe of Google's Knowledge Graph for the topic,” created through survey data, case studies, interviews, personal stories, and novel perspectives. A site with high topical authority that publishes the same synthesis as everyone else scores low on information gain. A site with moderate authority that publishes original research or proprietary data scores high.

The strongest content strategy combines both: deep topical authority (to be recognized as credible on the subject) plus novel data or analysis (to be worth citing over competitors). This is why programmatic SEO architecture built on proprietary data has a structural advantage. Every page template generates information gain automatically because the underlying data does not exist elsewhere.

The Overclustering Trap

Topical authority is not a license to publish everything tangentially related to your subject. Off-topic content actively dilutes the signal. The NsrChunks mechanism evaluates topical sections independently: a blog section full of off-topic content drags down the chunk score for the site's core topic, even if the off-topic content is individually high quality.

Keyword Insights documented a case where a large travel client narrowed property type pages from 413 to 85, reducing approximately 15 million URLs. Result: a 110% organic traffic increase almost immediately. The pruning removed pages that were splitting signals and diluting the domain's topical coherence.

But the reverse error is equally dangerous. Ahrefs identified 9,700 cases of “keyword cannibalization” and found that almost none needed fixing. In a sample of 80 keywords with multiple ranking pages, only 1 actually required consolidation. Multiple rankings create both cannibalization and diversification effects simultaneously.

Diagnostic approach

Not all keyword overlap is cannibalization. Over-consolidation can be as harmful as over-clustering. Diagnose via GSC: are multiple pages splitting impressions with declining CTR? If impressions are growing and CTR is stable, the multiple rankings are diversification, not cannibalization. Distinguish between the two before acting.

If a site has broad content coverage but underperforms on core queries, the diagnostic methodology should check for topical dilution before assuming the core content itself is the problem. The binding constraint may be overclustering, not content quality.

The API leak tells us what Google measures internally. But practitioners cannot access siteFocusScore or NsrChunks directly. Kevin Indig's Topic Share framework fills the gap: the percentage of organic traffic a domain captures from all keywords within a defined topic, compared to competitors.

Measurement (via Ahrefs): identify the head entity or term (must have Knowledge Panel presence), extract matching keywords (10+ monthly search volume), upload the keyword set in Keyword Explorer, then pull “Traffic Share by Domains.” Topic Share = your domain's aggregate traffic percentage within that topic.

The metric composites rank position, search volume competitiveness, multi-keyword rankings, SERP feature capture, and snippet performance into a single number. It does not map to any single Google signal, but it reflects the outcome of the multi-signal system.

Benchmarks from Indig's data: in ecommerce (29K+ keywords), Shopify holds 11% Topic Share and BigCommerce 10%. In spend analysis (142 keywords), Jaggaer holds 15% and Sievo 13%. Monthly measurement cadence is appropriate, as anything more frequent is noise.

Limitation: Topic Share is a proxy for competitive share, not a direct measurement of Google's internal authority signals. A site could have high Topic Share from brand searches alone. Pair it with entity coverage assessment and content depth analysis for a complete picture.

What This Means for Practitioners

The evidence converges on several actionable conclusions:

Depth beats breadth. Google's topic embeddings reward sites that go deep on a subject, not sites that go wide across many. Twenty deeply comprehensive, entity-rich articles will move the site2vec embedding vector more than fifty thin ones. The N-gram Quality Prediction patent (US9767157) detects thin content via phrase-frequency fingerprinting, the mechanism behind why volume without depth fails.

Pruning is a lever. Removing or noindexing off-topic content improves the siteFocusScore calculation. This is not hypothetical. The Keyword Insights case study showed 110% traffic increase from consolidating 413 pages to 85. IBM, Progressive, and DoorDash have all reported organic traffic gains post-pruning.

Sequencing matters. The evidence-builder loop is real: high topical authority content gains visibility 57% faster. NavBoost's per-topic click signals compound within topic clusters. Start with queries where you already have depth, accumulate authority priors, then extend into adjacent topics. The clinical framework's ceiling vs. weight distinction applies: topical authority is a ceiling issue, not a weight issue. No amount of link building overcomes weak topical coherence.

Entity coverage is the mechanism. Topical authority requires entity recognition depth, not surface-level topic coverage. All relevant entities must be associated with detailed attributes, facts, and relationships. Koray Tugberk's semantic SEO methodology operationalizes this: match entity types between question and answer, reduce dependency hops, disambiguate entities, boost salience through co-occurring terms, and avoid unclear antecedents. His case studies show results without link building or brand power. GetWordly.com went from zero to 128,000 organic traffic in 123 days through semantic methodology alone.

Source:Oncrawl / Koray Tugberk (2023–2024)
Project	Result	Method
GetWordly.com	0 → 128,000 organic traffic in 123 days	Semantic SEO methodology
Interingilizce.com	10,000 → 200,000+ monthly in 5 months	Semantic SEO methodology
Third client	600% growth in 5 months (10,000 → 70,000 monthly)	Semantic SEO methodology

AI citation follows topical authority. Fan-out coverage (0.77 Spearman) is the strongest known predictor of AI citation. Entity-rich content gets 267% more AI citations. Knowledge graph alignment correlates at 89% with AI visibility. Building topical authority is simultaneously building AI citation infrastructure. The same depth that signals expertise to Google's traditional ranking systems also creates the coverage that AI systems need for extractive grounding.

Cluster quality drags affect innocent sites. ClusterUplift means your competitive position is partially determined by the quality of the sites Google groups you with. If you are clustered with low-quality sites in the same niche, the cluster-level demotion applies to you regardless of your individual signals. This explains why entire niches get hammered in updates while other niches are untouched, and why differentiating from the cluster (through novel data, better entity coverage, or superior technical implementation) is a competitive necessity.