Selection Rate Optimization (SRO) is a new discipline focused on visibility in AI-powered search by measuring how often content is selected for grounding.
Traditional search engine optimization is changing. It is no longer enough to rank number one on a search page. In the era of artificial intelligence, a new discipline has emerged called Selection Rate Optimization, or SRO.
When you ask an AI assistant a question, it does not just show your website. It retrieves a few sources and extracts specific sentences to build its answer. This process is called grounding. If the AI retrieves your page but does not actually select and use your sentences, your brand remains invisible.
Research shows that Google uses exact, word-for-word sentences from your pages. But there is a catch. For any given query, the AI operates on a strict budget of about two thousand words, shared across all sources. The higher you rank, the more of that budget you get. However, on average, only about a third of a page's content actually survives this filter.
This means that long, wordy pages are no longer effective. Instead, content density is everything. A tight, eight-hundred-word article can see over half its content used, while a massive four-thousand-word guide sees almost all of its text discarded.
To win in AI search, front-load your most important facts. Write clear, self-contained sentences that can stand alone as direct answers, and build a strong brand reputation that the AI trusts.
SRO — Selection Rate Optimization — is a new discipline coined by DEJAN that addresses visibility in AI-powered search (Google AI Mode, Gemini Chat, AI Overviews). It is the AI-native successor to traditional SEO click-through-rate optimization.
The core premise: ranking #1 in traditional search is necessary but no longer sufficient. In AI search, your page content goes through a grounding pipeline that extracts only select sentences to feed to the generative model. If your content isn’t selected and grounded, you’re invisible — even if you rank.
Selection Rate (SR) measures how often an AI system selects and incorporates a specific source from the total set of grounding results it retrieves.
SR = (Number of selections / Total available results) × 100
SR is the Gen AI equivalent of CTR. Unlike CTR, which requires a user click, SR captures the AI’s implicit selection behavior — what information actually influences outputs versus what gets retrieved but ignored.
DEJAN reverse-engineered Google’s Gemini grounding pipeline by examining raw groundingSupports and groundingChunks from the API. The pipeline operates in this sequence:
Key insight: Because snippets are query-dependent, the same page yields different extractions for different fanout queries.
Google uses extractive (not abstractive) summarization for grounding. This means it pulls exact sentences from your page — it does not rewrite or paraphrase your content for the grounding context.
¶ markers are treated as sentences and scored alongside prose.DEJAN successfully fine-tuned microsoft/deberta-v3-large to produce results similar to Google’s extraction behavior.
A pivotal finding from analysis of 7,060 queries with 2,275 tokenized pages and 883,262 total snippets:
Each query operates under a fixed grounding budget of approximately 2,000 words total, distributed across sources by relevance rank.
PercentileTotal Words Per Queryp251,546p50 (median)1,929p752,325p952,798
This budget is remarkably consistent regardless of the number of sources used or the length of individual pages. The average grounding chunk is ~15.5 words.
The fixed budget is divided among sources based on relevance ranking:
RankMedian WordsShare of Total#153128%#243323%#337820%#433017%#526613%
The #1 source gets 2× the grounding of the #5 source. You’re competing for share of a fixed pie, not expanding it.
On average, only about one-third of a page’s content makes it through the AI search filter into the grounding context. But this varies dramatically by page length:
Page LengthAvg Grounding WordsCoverage<1K words37061%1–2K words49235%2–3K words53222%3K+ words54413%
Grounding plateaus at ~540 words / ~3,500 characters. Pages over 2,000 words see sharply diminishing returns — more content dilutes your coverage percentage without increasing what gets selected.
Based on DEJAN’s annotated analysis of actual grounding extractions:
The primary bias affecting SR is the model’s internal relevance perception of the grounding entity (brand, site, source). This is essentially the model’s pre-existing “worldview” about how relevant a source is for a given topic — formed during training and fine-tuning.
If a brand is perceived as highly relevant for a topic (e.g., “custom cycling jerseys”), it’s much more likely to achieve a higher SR when supplied as a grounding source. A brand with low primary bias for that topic will be deprioritized even if it appears in the result set.
DEJAN developed a “Tree Walker” algorithm that walks the probability paths of what a model wants to say about a brand, identifying high-uncertainty spots — token positions where the model is least confident about associating a concept with the brand. These represent opportunities for brand-association strengthening.
Analysis of 158 grounding responses revealed a power-law relationship between snippet count and snippet length:
$$\bar{L} = 1283.15 \times N^{-0.07}$$
The exponent β ≈ 0.07 shows a weak but consistent compression effect: as more snippets are added, average snippet length decreases slightly. The system emphasizes coverage over brevity, compressing only mildly — a sign of balanced aggregation rather than aggressive summarization.
Total text volume remains relatively stable across responses, implying word-limit constraints operate at the response level rather than per snippet.
A tight 800-word page can get 50%+ of its content grounded. A 4,000-word page gets ~13%. Focus on making every sentence count rather than adding volume.
The lead/positional bias means sentences appearing early and standalone are much more likely to be extracted. Put your most important, query-relevant statements at the top.
Clear, factual, self-contained statements perform best. Each sentence should be able to stand alone as a useful answer fragment. Avoid sentences that rely heavily on surrounding context to make sense.
The AI decomposes prompts into sub-queries. Structure content to directly address multiple facets of intent, not just the primary keyword.
Google’s system ingests ToC entries, headers, and navigation artifacts as “sentences.” Clean, well-structured pages reduce noise competing with your actual content for selection.
Dan Petrovic suggested testing “small modular content pieces that can be assembled into different content units like lego blocks” — controlling completeness of context and avoiding undesirable narrative fragmentation.
Primary bias is the biggest lever on SR and it’s rooted in model training data. Invest in the traditional off-page and on-page signals that shape how models perceive your brand’s topical authority.
microsoft/deberta-v3-large.Sign in with Google to comment.