Research Synthesis

Topical Authority: From SEO Folklore to Confirmed Signal

For years, “topical authority” occupied the same space as “domain authority” in SEO discourse — a concept everyone referenced, nobody could define precisely, and skeptics dismissed as a correlation artifact. Then Google's internal API documentation leaked. What emerged was not one signal but seven overlapping systems that quantify exactly how deeply a domain covers a subject.

By Aviel Fahl · March 2026 · ~12 min read

Key Findings

Google's 2024 API leak confirmed topical authority as a real, multi-signal system. siteFocusScore, siteRadius, and site-level topic embeddings all quantify how deeply a domain covers a subject. Text relevance is the single strongest ranking factor at 0.47 correlation across 16,298 keywords. Fan-out query coverage has a 0.77 Spearman correlation with AI citation likelihood across 36 million AI Overviews. The system is not one toggleable score; it is the combined output of topic embeddings, NsrChunks, ClusterUplift, and pairwise quality comparisons. Content depth and entity coverage shift the vector. Volume alone does not.

On this page

0.47

Text relevance correlation (strongest factor)

0.77

Fan-out coverage ↔ AI citation (Spearman)

57%

Faster visibility for high-authority content

7

Confirmed topical authority signals in API

The API Leak Ended the Debate


In May 2024, an automated bot uploaded thousands of pages of internal Google Search API documentation to GitHub. The Content API Warehouse leak revealed 2,596 modules containing 14,014 attributes across 2,500+ pages — the most comprehensive view of Google's ranking infrastructure ever disclosed. Among those attributes: siteFocusScore, siteRadius, and site2vecEmbeddingEncoded.

iPullRank's analysis and Hobo Web's breakdown identified what these signals do. siteFocusScore quantifies how dedicated a site is to a single topic — specialist vs. generalist. siteRadius measures how much an individual page deviates from the site's central theme. And site2vecEmbeddingEncoded is a compressed vector embedding of the site's overall theme, with pageEmbeddings measuring each page against that site-level vector.

This was not inference. These are named attributes in production API documentation that Google acknowledged as authentic, even while cautioning the data was “out-of-context, outdated, or incomplete”. The cautionary framing is technically fair — we cannot determine active weights or deployment status. But the architecture exists. The signals exist. The debate about whether Google measures topical coherence at the site level is over.

What changed

Before the leak, skeptics like Kevin Indig argued that topical authority was an “SEO ghost concept” — a narrative practitioners imposed on correlation data. After the leak confirmed siteFocusScore and siteRadius, Indig reversed his position publicly, proposing Topic Share as the primary operational metric for tracking topical authority over time.

Not One Signal — A Multi-Signal System


There is no single “topical authority” score. The effect SEOs observe comes from at least four overlapping systems, each operating at a different level of granularity.

QualityAuthorityTopicEmbeddings positions a site in mathematical vector space relative to every other site. These propagate all the way to SuperRoot — the final ranking layer. Similar embeddings mean topically related; distant embeddings mean unrelated. This is how Google knows a personal finance blog and a bank's advice section are covering the same territory, even if they share zero backlinks.

NSR (Normalized Site Rank) is a 63+ field site-level quality scoring system. Within it, NsrChunks breaks the site into topical sections and evaluates each independently. A blog section can have a completely different NSR chunk score than product pages on the same domain. ClusterUplift groups sites with similar sites and applies collective boosts or demotions to the entire cluster. If the cluster has a quality problem, every site in it gets demoted — even clean ones.

PairwiseQ comparisons favor sites with deeper topical coverage when matched head-to-head against competitors. And NLP entity coverage — the breadth and depth of entity recognition within content — feeds Google's Entity-Based Ranking patent (US10235423), which assigns composite scores from knowledge graph metrics weighted by entity type.

Source:Google API Leak (May 2024), iPullRank / Hobo Web analysis
SignalWhat It MeasuresLevel
siteFocusScoreHow dedicated a site is to a single topicSite
siteRadiusHow much a page deviates from the site's central themePage → Site
site2vecEmbeddingEncodedCompressed vector embedding of a site's overall themeSite
pageEmbeddingsPer-page vectors compared against site embeddingsPage
QualityAuthorityTopicEmbeddingsMulti-dimensional vector positioning site relative to all othersSite
NsrChunksIndependent quality evaluation per topical section of a siteSection
ClusterUpliftCollective quality boosts/demotions applied to similar-site clustersCluster

When SEOs say “I tested topical authority and it didn't work,” they tested one dimension of a multi-dimensional system. Publishing 50 thin articles does not move the topic embedding vector. Twenty deeply comprehensive, entity-rich, properly interlinked articles will. Depth and entity coverage shift the vector; volume alone does not.

Signal versioning

All of these signals are versioned — Google runs live experiments with different weightings simultaneously. A site can literally rank differently under different experimental versions of the same signals. This is another source of ranking fluctuation with no obvious external cause.

Three Site Archetypes


Hobo Web's analysis of the API leak data identified three archetypes based on how siteFocusScore and siteRadius interact:

Source:Hobo Web, API Leak Analysis (2024)
ArchetypeDescriptionImplication
Perfect TopicalityEvery page has low siteRadius — tight coherence around a single themeMaximum siteFocusScore. The specialist advantage.
High Focus with Topical DriftStrong core topic, but outlier pages raise siteRadiusPruning or improving off-topic pages strengthens calculated authority.
Generalist with Niche CoreExpertise diluted by tangential content coverageThe blog-around-everything pattern. Depth obscured by breadth.

The practical implication is that pruning or improving off-topic content directly strengthens calculated authority. A site does not need to add more content about its core topic if it already has depth — it may need to remove content that dilutes the signal. This aligns with the Panda patent (US9031929), which evaluates quality at the site level: thin or irrelevant pages drag down the entire domain's score.

Text Relevance Dominates Rankings


The Semrush 2024 Ranking Factors Study analyzed 16,298 English keywords across the top 20 positions, evaluating 65 factors. Text relevance — how closely a page's content matches the query — showed the strongest correlation with rankings at 0.47, more than double the next-strongest factor.

Source:Semrush (16,298 keywords), Surfer SEO (260K SERPs), Ahrefs (5 tools)
FactorCorrelationSource
Text relevance0.47Semrush (16,298 keywords)
URL organic traffic0.33Semrush (16,298 keywords)
Domain authority0.21Semrush (16,298 keywords)
Content quality score0.17Semrush (16,298 keywords)
Content comprehensiveness0.17Surfer SEO (260K SERPs)
Content tool score → rankingWeakAhrefs (5 tools tested)
Ranking factor correlations: Text relevance 0.47, URL organic traffic 0.33, Domain authority 0.21, Content quality 0.17, Content comprehensiveness 0.17

Pages ranking for one keyword have significantly better odds of ranking for related keywords, supporting the topical cluster thesis. But there is an important nuance: content comprehensiveness alone shows only a 0.17 correlation with rankings (Surfer SEO, 260K SERPs). And when Ahrefs tested five content optimization tools (Surfer, Frase, NeuronWriter, Clearscope, AI Content Helper), they found weak correlations across the board. Content tool scores do not strongly predict rankings on their own.

The distinction matters: text relevance (are you writing about the thing the user searched for?) is strong. Content comprehensiveness (did you cover every subtopic?) is necessary but not sufficient. Topical authority appears to operate as a multiplier on relevance — deep topical coverage increases the probability that any individual page achieves high text relevance for its target query.

A separate Surfer SEO / WLDM study (~260,000 SERPs) found that page-level topical authority was the largest on-page ranking factor — stronger than domain monthly traffic volume. This distinguishes page-level from domain-level signals: a highly relevant page on a lower-authority domain can outperform an irrelevant page on a stronger domain.

Authority Accelerates Visibility


A Graphite study tracking 332 URLs published across 12 domains in June–July 2023 found that content published on domains with high topical authority gains visibility 57% faster, is 62% more likely to get traffic within the first week, and reaches impression milestones 30% faster.

Source:Multiple sources (2024–2025)
MetricResultSourceConfidence
Visibility speed (high TA vs. low)57% fasterGraphite (332 URLs)Moderate
First-week traffic likelihood62% more likelyGraphite (332 URLs)Moderate
Impression milestone speed30% fasterGraphite (332 URLs)Moderate
Niche Expertise algorithm weight~13%First Page Sage (2025)Moderate
Topic cluster traffic increase (avg)+43%HubSpot (2024)Moderate (self-reported)
Fan-out coverage ↔ AI citation0.77 SpearmanSurfer SEO (36M AIOs)High
Topical authority accelerates visibility: 57% faster visibility, 62% more likely first-week traffic, 30% faster impression milestones, +43% topic cluster traffic

The sample is small (332 URLs), but the controlled methodology and consistent direction across metrics give it moderate confidence. First Page Sage's 2025 ranking factor analysis weights Niche Expertise — defined as having 10+ authoritative pages around the same hub keyword — at approximately 13% of the ranking algorithm, the fourth-highest factor. They introduce the concept of “Net DR”: a DR 40 domain outranking a DR 70 domain nearly always has substantially higher Niche Expertise.

This is the quantitative evidence for what the clinical diagnostic framework calls the evidence-builder loop: authority priors built on earlier wins compound returns on subsequent content. Sequencing matters. Publishing into topics where you already have depth yields faster ROI than scattershot coverage across new topics.

Practitioner note

NavBoost operates on a rolling 13-month window of click data, segmented per topic. Topical authority compounds behavioral signals within a topic cluster — each new page benefits from the accumulated click history of existing pages in the same cluster. New domains face a structural disadvantage: no click history means no NavBoost signal, regardless of content quality.

Fan-Out Coverage and AI Citation


The strongest empirical link between topical coverage and AI visibility comes from Surfer SEO's AI Citation Report (2025), analyzing 36 million AI Overviews and 46 million citations. Pages ranking for fan-out queries — the sub-queries AI systems generate when decomposing a user's question — are 161% more likely to be cited in AI responses, with a Spearman correlation of 0.77 between fan-out query coverage and citation likelihood.

This is where topical authority and AI citation mechanics converge. When Google's AI systems decompose a query into sub-queries (averaging 10.7 per prompt in Gemini 3), content addressing multiple facets of a topic captures more of those sub-queries. Topical depth creates fan-out coverage as a structural byproduct.

But only 27% of fan-out sub-queries are stable across repeated searches. The implication: you cannot optimize for specific sub-queries. You optimize for topical coverage, and fan-out capture follows.

Source:Surfer SEO, Digital Bloom, iPullRank (2025)
MetricValueSource
Fan-out coverage → AI citation likelihood161% more likelySurfer SEO (36M AIOs, 46M citations)
Fan-out sub-query stabilityOnly 27% stableSurfer SEO (36M AIOs)
Brand search volume → AI citation0.334 correlationDigital Bloom (680M+ citations)
Entity-rich vs. keyword-optimized content267% more AI citationsiPullRank
Entity ID matching (Wikidata Q-IDs)8.9x citation increaseiPullRank
Multi-platform presence (4+ channels)2.8x more AI mentionsDigital Bloom (680M+ citations)
AI citation predictors: Knowledge graph alignment 89% correlation, Fan-out query coverage 0.77 Spearman, Brand search volume 0.334 correlation

The Digital Bloom 2025 AI Visibility Report (680M+ citations) adds a related finding: brand search volume — not backlinks — is the strongest predictor of AI citations at 0.334 correlation. Brands present on four or more platforms are 2.8x more likely to appear in ChatGPT responses. This extends the topical authority thesis into AI retrieval: it is not just about what you write, but whether the AI system recognizes you as an entity worth citing on that topic.

iPullRank's research connects this to entity mechanics directly: entity-rich content achieves 267% more AI citations compared to keyword-optimized content, and entity ID matching (Wikidata Q-IDs, Knowledge Graph MIDs) produces an 8.9x citation increase. The correlation between knowledge graph alignment and AI visibility is 89%.

Information Gain: Authority's Complement


Google's Information Gain patent (US20200349181A1, filed 2018, granted June 2024) describes a system that scores documents on how much novel content they contain beyond what was already presented to the user. Documents scoring near zero can be demoted or excluded entirely.

Information gain is distinct from topical authority. Authority is credibility through consistent expertise — a track record of deep, accurate coverage within a topic. Information gain is the uniqueness of a specific contribution. They are complementary, not competing signals.

Clearscope defines information gain operationally as “concepts and entities on the fringe of Google's Knowledge Graph for the topic” — created through survey data, case studies, interviews, personal stories, and novel perspectives. A site with high topical authority that publishes the same synthesis as everyone else scores low on information gain. A site with moderate authority that publishes original research or proprietary data scores high.

The strongest content strategy combines both: deep topical authority (to be recognized as credible on the subject) plus novel data or analysis (to be worth citing over competitors). This is why programmatic SEO architecture built on proprietary data has a structural advantage — every page template generates information gain automatically because the underlying data does not exist elsewhere.

The Overclustering Trap


Topical authority is not a license to publish everything tangentially related to your subject. Off-topic content actively dilutes the signal. The NsrChunks mechanism evaluates topical sections independently: a blog section full of off-topic content drags down the chunk score for the site's core topic, even if the off-topic content is individually high quality.

Keyword Insights documented a case where a large travel client narrowed property type pages from 413 to 85, reducing approximately 15 million URLs. Result: a 110% organic traffic increase almost immediately. The pruning removed pages that were splitting signals and diluting the domain's topical coherence.

But the reverse error is equally dangerous. Ahrefs identified 9,700 cases of “keyword cannibalization” and found that almost none needed fixing. In a sample of 80 keywords with multiple ranking pages, only 1 actually required consolidation. Multiple rankings create both cannibalization and diversification effects simultaneously.

Diagnostic approach

Not all keyword overlap is cannibalization. Over-consolidation can be as harmful as over-clustering. Diagnose via GSC: are multiple pages splitting impressions with declining CTR? If impressions are growing and CTR is stable, the multiple rankings are diversification, not cannibalization. Distinguish between the two before acting.

If a site has broad content coverage but underperforms on core queries, the diagnostic methodology should check for topical dilution before assuming the core content itself is the problem. The binding constraint may be overclustering, not content quality.

Measuring It: Topic Share


The API leak tells us what Google measures internally. But practitioners cannot access siteFocusScore or NsrChunks directly. Kevin Indig's Topic Share framework fills the gap: the percentage of organic traffic a domain captures from all keywords within a defined topic, compared to competitors.

Measurement (via Ahrefs): identify the head entity or term (must have Knowledge Panel presence), extract matching keywords (10+ monthly search volume), upload the keyword set in Keyword Explorer, then pull “Traffic Share by Domains.” Topic Share = your domain's aggregate traffic percentage within that topic.

The metric composites rank position, search volume competitiveness, multi-keyword rankings, SERP feature capture, and snippet performance into a single number. It does not map to any single Google signal, but it reflects the outcome of the multi-signal system.

Benchmarks from Indig's data: in ecommerce (29K+ keywords), Shopify holds 11% Topic Share and BigCommerce 10%. In spend analysis (142 keywords), Jaggaer holds 15% and Sievo 13%. Monthly measurement cadence — anything more frequent is noise.

Limitation: Topic Share is a proxy for competitive share, not a direct measurement of Google's internal authority signals. A site could have high Topic Share from brand searches alone. Pair it with entity coverage assessment and content depth analysis for a complete picture.

What This Means for Practitioners


The evidence converges on several actionable conclusions:

Depth beats breadth. Google's topic embeddings reward sites that go deep on a subject, not sites that go wide across many. Twenty deeply comprehensive, entity-rich articles will move the site2vec embedding vector more than fifty thin ones. The N-gram Quality Prediction patent (US9767157) detects thin content via phrase-frequency fingerprinting — the mechanism behind why volume without depth fails.

Pruning is a lever. Removing or noindexing off-topic content improves the siteFocusScore calculation. This is not hypothetical — the Keyword Insights case study showed 110% traffic increase from consolidating 413 pages to 85. IBM, Progressive, and DoorDash have all reported organic traffic gains post-pruning.

Sequencing matters. The evidence-builder loop is real: high topical authority content gains visibility 57% faster. NavBoost's per-topic click signals compound within topic clusters. Start with queries where you already have depth, accumulate authority priors, then extend into adjacent topics. The clinical framework's ceiling vs. weight distinction applies: topical authority is a ceiling issue, not a weight issue. No amount of link building overcomes weak topical coherence.

Entity coverage is the mechanism. Topical authority is not just about “writing about the topic” — it is about entity recognition depth. All relevant entities must be associated with detailed attributes, facts, and relationships. Koray Tugberk's semantic SEO methodology operationalizes this: match entity types between question and answer, reduce dependency hops, disambiguate entities, boost salience through co-occurring terms, and avoid unclear antecedents. His case studies show results without link building or brand power — GetWordly.com went from zero to 128,000 organic traffic in 123 days through semantic methodology alone.

Source:Oncrawl / Koray Tugberk (2023–2024)
ProjectResultMethod
GetWordly.com0 → 128,000 organic traffic in 123 daysSemantic SEO methodology
Interingilizce.com10,000 → 200,000+ monthly in 5 monthsSemantic SEO methodology
Unnamed client195% traffic, 340% clicks, 430% impressions in 2 monthsSemantic SEO methodology

AI citation follows topical authority. Fan-out coverage (0.77 Spearman) is the strongest known predictor of AI citation. Entity-rich content gets 267% more AI citations. Knowledge graph alignment correlates at 89% with AI visibility. Building topical authority is simultaneously building AI citation infrastructure — the same depth that signals expertise to Google's traditional ranking systems also creates the coverage that AI systems need for extractive grounding.

Cluster quality drags affect innocent sites. ClusterUplift means your competitive position is partially determined by the quality of the sites Google groups you with. If you are clustered with low-quality sites in the same niche, the cluster-level demotion applies to you regardless of your individual signals. This explains why entire niches get hammered in updates while other niches are untouched — and why differentiating from the cluster (through novel data, better entity coverage, or superior technical implementation) is not just good practice but a competitive necessity.