---
okf_version: "0.1"
title: blog.dejan.ai
description: Articles and concepts on AI visibility and AI-centric SEO.
---

# Articles

- [Generative Self-Retrieval: How AI Models Recall Brand Facts From Memory](posts/generative-self-retrieval.md): When an AI answers about your brand from memory, generative self-retrieval decides whether it recalls you correctly or invents a plausible wrong answer.
- [Primary Bias](posts/primary-bias.md): Primary bias is what an AI model already believes about your brand before it searches: an ungrounded confidence baked into training that becomes the biggest factor in whether your content is selected in AI answers.
- [Grounding Snippets](posts/grounding-snippets.md): Grounding snippets are the short, extractive sentences AI systems pull from your page to build their answers: the atomic unit of AI-search visibility, where most of your page never makes the cut.
- [Selection Rate Optimization](posts/selection-rate-optimization.md): Selection Rate Optimization is the AI-search counterpart to click-through-rate optimization: improving how often AI systems choose your content as the source they ground and cite their answers on.
- [Relevance Engineering](posts/relevance-engineering.md): Relevance Engineering is the practice of building a page's semantic relevance to a query with embeddings and vector math, treating search visibility as an engineering problem rather than keyword optimization.
- [AIO (AI Optimization)](posts/aio.md): AIO (AI Optimization) is shaping your content and presence so AI systems favor them in generated responses — one route to AI Visibility.
- [GEO (Generative Engine Optimization)](posts/geo.md): GEO (Generative Engine Optimization) is optimizing for LLM-powered search and chat so your brand and pages appear in generated answers.
- [AEO (Answer Engine Optimization)](posts/aeo.md): AEO (Answer Engine Optimization) is optimizing to be the answer that AI assistants and answer engines give to a question.
- [AI SEO](posts/ai-seo.md): AI SEO is optimizing your content and brand so AI systems surface them when answering questions — the broad practice behind AI Visibility.
- [The Open Knowledge Format (OKF)](posts/the-open-knowledge-format-okf.md): Google Cloud's Open Knowledge Format is an open, vendor-neutral way to package the context AI systems need, as plain markdown files any model or agent can read.
- [AI Visibility](posts/ai-visibility.md): AI Visibility is the outcome of AI-centric SEO, focusing on being cited and recommended by AI systems through brand mentions and source citations in answers.
- [Teaching a Model to Reason Before It Learns to Talk](posts/teaching-a-model-to-reason-before-it-learns-to-talk.md): An exploration of building tiny, logic-first models using cellular automata to challenge the transformer paradigm and identify the primitives of reasoning.
- [How Search Grounding Biased an LLM Against YouTube](posts/how-search-grounding-biased-an-llm-against-youtube.md): An analysis of how Claude's webinar platform recommendations were influenced by affiliate-driven content, and a correction regarding YouTube's live features.
- [How AI Search Grounding Actually Works: Google vs OpenAI vs Anthropic](posts/grounding.md): An analysis of how Google, OpenAI, and Anthropic handle web grounding, comparing their search processes, citation rates, and how they process page content.
- [Emotion Geometry of Google’s AI Models](posts/emotions-gemma.md): A replication study of Anthropic’s emotion research on Google’s Gemma 4 31B model, finding that internal emotion representations organize along a valence axis.
- [Google’s (still) doesn’t see your live page.](posts/googles-still-doesnt-see-your-live-page.md): Australian AI SEO agency specialising in brand visibility optimisation for global brands and e-commerce websites using advanced machine learning techniques.
- [Gemma 4 Brand Authority Map](posts/gemma-4-brand-authority-map.md): A comparison of brand recall between Google's Gemma 4 and Gemini 3 Flash models, analyzing how open-weight and closed models prioritize different brands.
- [Chrome’s New Shopping Classifier](posts/google-shopping-classifier.md): An analysis of Google's shopping classifier model in Chrome, detailing its content extraction pipeline, chunking logic, and impact on e-commerce SEO.
- [AI Brand Authority Index: Ranking 2.9 Million Brands by Associative Embeddedness in Gemini’s Memory](posts/brands.md): This research presents a methodology for quantifying brand authority in large language model memory using Personalized PageRank and directed association graphs.
- [TurboQuant: From Paper to Triton Kernel in One Session](posts/turboquant.md): An implementation and technical analysis of Google's TurboQuant algorithm, testing KV cache compression on Gemma 3 4B using PyTorch and custom Triton kernels.
- [Clickbait Titles Exploit Attention Through Latent Entities](posts/latent-entities.md): Clickbait titles function by withholding a latent entity—the subject, reason, process, or outcome—to force a click and resolve an artificial information gap.
- [Fanout Query Analysis](posts/fanout-query-analysis.md): An analysis of 365,920 fanout queries from Google, OpenAI, and Amazon reveals how different AI models generate internal search queries for web grounding.
- [Reverse Prompting: Reconstructing Prompts from AI-Generated Text](posts/reverse-prompting.md): A fine-tuned Gemma 3 270M model reconstructs the most likely prompts from AI-generated responses using synthetic data and contrastive search configurations.
- [Rufus – Under the Hood. What Drives Amazon’s AI Shopping Assistant?](posts/rufus.md): An overview of the technical architecture behind Amazon's Rufus, covering its query planning, RAG-based retrieval, custom LLM models, and streaming response.
- [Is Query Length a Reliable Predictor of Search Volume?](posts/query-length-vs-volume.md): An analysis of 39.6 million Amazon search queries reveals that query length is an unreliable predictor of search volume compared to semantic content.
- [Search Grounding is Transient](posts/search-grounding-is-transient.md): Google’s AI search and Gemini use a single-turn transient architecture that purges raw web snippets from working memory immediately after a response is sent.
- [SRO & Grounding Snippets](posts/sro-grounding-snippets.md): Selection Rate Optimization (SRO) is a new discipline focused on visibility in AI-powered search by measuring how often content is selected for grounding.
- [What extraction method is Google using to build grounding snippets?](posts/what-extraction-method-is-google-using-to-build-grounding-snippets.md): An analysis of Google's Gemini grounding pipeline, examining how extractive summarization selects query-focused sentences to build grounding context from web sources.
- [Implicit Queries in AI Search](posts/implicit-queries-in-ai-search.md): An analysis of Google patent US11769017B1, detailing a system that uses context and implied input engines to proactively generate and push AI summaries.
- [Sorry Google, I was wrong.](posts/sorry-google-i-was-wrong.md): An analysis of a $2,000 Gemini API bill caused by the URL Context tool, which ingests entire web pages as input tokens without providing size estimates.
- [AI Search Has a Spam Problem](posts/ai-spam.md): This article examines GEO spam, a method of manipulating AI-generated answers through self-referential content and engineered claims designed for grounding.
- [WebMCP](posts/webmcp.md): WebMCP is a proposed web standard that allows websites to expose structured tools to AI agents via declarative and imperative APIs for better reliability.
- [Bias and Prejudice in AI Search](posts/bias-and-prejudice-in-ai-search.md): An exploration of primary bias in AI, defined as a model's inherent confidence in an entity based on training data, and its impact on brand selection rates.
- [Most People Don’t Read](posts/most-people-dont-read.md): A qualitative study comparing self-reported reading habits against actual user behavior, tracking mouse movements, scroll patterns, and time on page.
- [Google’s Trajectory: 2026 and Beyond](posts/googles-trajectory-2026-and-beyond.md): Google's shift toward agentic AI involves Gemini robotics, A2UI for secure interfaces, and the AP2 protocol for autonomous agent payments and commerce.
- [Google’s Ranking Signals](posts/googles-ranking-signals.md): Overview of search ranking factors including popularity signals, PCTR models, semantic relevance, keyword matching, freshness, and various search modes.
- [How big are Google’s grounding chunks?](posts/how-big-are-googles-grounding-chunks.md): Analysis of how Google selects content to ground Gemini-powered AI shows a fixed 2,000-word budget per query, where relevance rank determines word share.
- [Google’s AI Uses Schema?](posts/googles-ai-schema.md): An investigation into whether Google uses structured data to ground Gemini in AI search, exploring the relationship between LD+JSON and RAG grounding sources.
- [Dynamic Visual Layouts](posts/dvl.md): Dynamic visual layout (DVL) is a generative user interface where layouts are created on demand to suit specific queries, shifting the focus from SEO to information.
- [Grounding Snippet Extraction Tool](posts/grounding-snippet-extraction-tool.md): The Gemini Grounding Tool identifies which URLs and specific sentences Google's AI extracts to ground its answers, helping optimize content for AI search.
- [How Long Are Web Pages?](posts/how-long-are-web-pages.md): An analysis of 44,684 web pages reveals a median content length of 3,201 tokens and an average of 10,403 tokens, highlighting implications for AI systems.
- [Google AI Search Update: Completely New Grounding Format](posts/google-ai-search-update-completely-new-grounding-format.md): An observation of a new, custom grounding context format for Gemini that deviates from the traditional index-based model used in previous prompt types.
- [AI Mode, Content & Search Index](posts/ai-mode-content-search-index.md): Tests suggest Google’s AI Mode uses a proprietary content store rather than retrieving live web content from the search index during the query fan out process.
- [How user prompts shape your content visibility in AI search.](posts/how-user-prompts-shape-your-content-visibility-in-ai-search.md): An analysis of how AI search rankers use semantic alignment to surface different content zones within a single article based on query specificity and intent.
- [Report: How People Use AI at Work](posts/report-ai-workplace.md): An analysis of qualitative interviews with 1,250 professionals exploring how the general workforce, creatives, and scientists integrate AI into their work.
- [How do people use AI assistants?](posts/how-do-people-use-ai-assistants.md): An analysis of 3.9 million AI chat sessions reveals that most interactions are short, non-commercial, and involve users seeking help with writing, learning, or coding.
- [Ricursive: The Most Interesting AI Company You Haven’t Heard Of](posts/ricursive.md): Ricursive Intelligence, founded by Anna Goldie and Azalia Mirhoseini, aims to automate chip design using AI to enable recursive self-improvement in hardware.
- [Better Vector Clustering With Head Noun Extraction](posts/better-clustering.md): An exploration of how standard embeddings can create a semantic soup by grouping search queries by adjectives rather than head nouns during clustering.
- [Advanced Prompting Techniques for AI SEO](posts/advanced-prompting-techniques.md): Explore prompt engineering techniques for SEO, including zero-shot, few-shot, role, and chain-of-thought prompting to improve content and automate tasks.
- [To block or not to block? Bot is the question.](posts/ai-bots.md): An overview of AI bots, distinguishing between training data scrapers used for LLM development and agentic bots designed for autonomous, goal-oriented tasks.
- [Gemini 3 hallucinates fan-out queries](posts/gemini-3-hallucinates-fan-out-queries.md): An analysis of Gemini 3 API responses reveals the model fabricating search queries to justify its answers, demonstrating persistent hallucination behaviors.
- [AI SEO Deep Dive – Tom Critchlow & Dan Petrovic](posts/ai-seo-deep-dive-tom-critchlow-dan-petrovic.md): A deep-dive conversation with Tom Critchlow on the mechanics of AI search, focusing on Selection Rate Optimization (SRO) and how to influence LLM behavior.
- [OpenAI’s Sparse Circuits Breakthrough and What It Means for AI SEO](posts/openais-sparse-circuits-breakthrough-and-what-it-means-for-ai-seo.md): OpenAI research on sparse circuits shows AI models can be built with fewer connections, making them more interpretable and easier to analyze for AI SEO.
- [How GPT Sees the Web](posts/how-gpt-sees-the-web.md): A technical walkthrough of how GPT handles web search, including snippets, expansions, context size settings, and the sliding window mechanism for retrieval.
- [BlockRank: A Faster, Smarter Way to Rank Documents with LLMs](posts/blockrank.md): BlockRank is a novel method for in-context ranking that uses structured sparse attention and contrastive training to improve LLM efficiency and accuracy.
- [In AI SEO #10 is the new #1](posts/in-ai-seo-10-is-the-new-1.md): An empirical study analyzing how Google's AI Mode uses text snippets from multiple sources, finding that snippets are more prompt-aligned than full web pages.
- [How much of your content survives the AI Search filter?](posts/ai-search-filter.md): An analysis of the Google grounding process, detailing how user prompts and source snippets are processed by models and measuring citation coverage rates.
- [Browsing vs Content Fetcher](posts/browsing-vs-content-fetcher.md): Google's AI Mode uses browsing for single URL retrieval and content_fetcher for batch processing of multiple structured sources within a workflow.
- [From Free-Text to Likert Distributions: A Practical Guide to SSR for Purchase Intent](posts/purchase-intent.md): Semantic Similarity Rating (SSR) maps LLM free-text responses to Likert distributions to improve purchase intent realism and match human response patterns.
- [Claude System Internals](posts/claude-system-internals.md): An exploration of the internal processes of Claude, including system prompts, token budgets, search grounding algorithms, and hidden reasoning blocks.
- [CAPS: A Content Attribution Payment Scheme for the AI Era](posts/caps.md): The collapse of the web's economic model due to AI is addressed through the Content Attribution Payment Scheme, a framework for micropayments and grounding.
- [AI Search Citation Mining](posts/ai-search-citation-mining.md): Raw data dump from a citation mining pipeline demo featuring 60 prompts across AEO, AI marketing, AI optimization, AI SEO, and AIO using GPT-5 and Gemini.
- [Using GPT-5 Structured Output Markers to Detect AI-Generated Content Online](posts/ai-reveal.md): Publishing unedited AI-generated text can leak internal GPT-5 structured output markers like turn0search21, which can lead to SEO and reputational risks.
- [TimesFM-ICF](posts/timesfm-icf.md): Google Research's TimesFM-ICF uses in-context fine-tuning to achieve high-performance time-series forecasting without the need for traditional model training.
- [Chrome Screen AI Protos](posts/chrome-screen-ai-protos.md): A directory of protocol buffer files covering various machine intelligence technologies, including OCR, vision, face detection, and image classification.
- [RexBERT](posts/rexbert.md): RexBERT is a domain-specialized language model trained on e-commerce text to optimize product titles, descriptions, attribute extraction, and semantic search.
- [Annotated Page Content (APC)](posts/annotated-page-content-apc.md): Annotated Page Content (APC) is a structured protobuf representation of a webpage's layout and content, designed for actionable and efficient downstream use.
- [Deconstructing DomDistiller: How Chrome’s Reader Mode Algorithm Impacts Technical SEO](posts/deconstructing-domdistiller-how-chromes-reader-mode-algorithm-impacts-technical-seo.md): An analysis of Chrome's DomDistiller engine explains how it uses heuristics, DOM traversal, and semantic HTML to isolate main content from page boilerplate.
- [LLM is a Presentation Layer in AI Search](posts/llm-is-a-presentation-layer-in-ai-search.md): Large language models act as a presentation layer on top of classic information retrieval. They rely on crawling, indexing, and ranking to prevent hallucinations.
- [Gemini App Tools – A Technical Overview](posts/gemini-app-tools-a-technical-overview.md): Gemini acts as an orchestration layer that manages a large language model by deconstructing prompts into tasks for tools like Code Interpreter and APIs.
- [EmbeddingGemma: The Game-Changing Model Every SEO Professional Needs to Know](posts/embeddinggemma.md): Google's EmbeddingGemma is a multilingual embedding model that mirrors Gemini's architecture to provide insights into semantic search and query intent.
- [Primary Bias on Selection Rate in AI Search](posts/sr.md): Selection Rate measures how often AI systems select specific items from grounding results. It explores primary bias, model relevance, and the Tree Walker algo.
- [The Latent History of AI Boom](posts/ai-boom.md): An exploration of how the transition from RNNs to transformers and the discovery of double descent enabled the scaling of large language models like GPT.
- [AI Overviews = Dialogflow Agent?](posts/ai-overviews-dialogflow-agent.md): An analysis of AI Overview leaks suggesting that Google's implementation may be based on the Dialogflow agentic framework, specifically regarding intent priority.
- [Fan-Out Query Search Volume Prediction Using Deep Learning](posts/fan-out-query-search-volume-prediction-using-deep-learning.md): A deep learning approach using a Query Demand Estimator to automatically predict search volume ranges for long-tail queries generated by a fan-out model.
- [Comprehensive Guide to Identifying AI Comment Bots](posts/comment-bots.md): Identify AI-generated comments through statistical analysis of sentiment, formulaic linguistic patterns, repetitive vocabulary, and a lack of human imperfection.
- [What is “Help Me Write” in Chrome?](posts/chrome-help-me-write.md): Help Me Write is Google Chrome's AI-powered assistant that generates context-aware text suggestions for short-form content like emails, posts, and forms.
- [Introducing Tree Walker](posts/tree-walker.md): Tree Walker is an analysis tool designed to deconstruct how AI models like Gemini perceive brands by uncovering word uncertainty and probabilistic language paths.
- [Does Schema Help With “AI”?](posts/does-schema-help-with-ai.md): An experiment testing whether OpenAI's browsing tool provides GPT-5 with grounding context from page schema or only extracts plain text and markdown content.
- [Your website is about to start talking. Are you ready for this?](posts/your-website-is-about-to-start-talking-are-you-ready-for-this.md): Explore how Chrome's built-in Gemini Nano model uses semantic HTML and the accessibility tree to enable private, on-device AI conversations on websites.
- [Inside Chrome’s Semantic Engine: A Technical Analysis of History Embeddings](posts/inside-chromes-semantic-engine-a-technical-analysis-of-history-embeddings.md): Technical analysis of Chrome's history embeddings system, detailing the DocumentChunker algorithm, passage extraction, and the 1540-dimensional vector pipeline.
- [What does an SEO do in the AI age?](posts/what-does-an-seo-do-in-the-ai-age.md): Modern search engines use a hybrid structure consisting of a strategic Agentic Layer for decision-making and an Interpretative Layer for generative synthesis.
- [Understanding and Control](posts/understanding-and-control.md): AI optimization relies on mechanistic interpretability to understand internal neural computations and model steering to actively control model behavior.
- [People call them AI. That’s it.](posts/people-call-them-ai-thats-it.md): Social media poll results from 864 votes show that while AI is the dominant label for tools like ChatGPT and Claude, users remain divided on preferred terms.
- [GPT-5 Made SEO Irreplaceable](posts/gpt-5-made-seo-irreplaceable.md): OpenAI is shifting its model design to prioritize reasoning and intelligence over memorized world knowledge, relying on tools and retrieval for information.
- [Google’s Query Fan-Out System – A Technical Overview](posts/googles-query-fan-out-system-a-technical-overview.md): This article describes a system that replicates Google's query fan-out approach by using generative neural networks to automatically create intelligent search variants.
- [GPT-5 System Prompt](posts/gpt-5-system-prompt.md): Here it is: Credit to: https://x.com/elder_plinius/status/1953583554287562823H/T https://x.com/DarwinSantosNYC for spotting it.
- [Journalism Is Dead. Say Hello to Gournalism.](posts/journalism-is-dead-say-hello-to-gournalism.md): Explores the rise of Gournalism, a shift toward generative, AI-produced content optimized for machine consumption and algorithmic indexing.
- [Human Friendly Content is AI Friendly Content](posts/human-friendly-content-is-ai-friendly-content.md): Explore the parallels between human and AI attention mechanisms and learn how to optimize content for both through scannable structures and hierarchy.
- [Analysis of Gemini Embed Task-Based Dimensionality Deltas](posts/analysis-of-gemini-embed-task-based-dimensionality-deltas.md): An analysis of Gemini Embed optimization modes, including classification, retrieval, and semantic similarity, through vector embedding dimension visualization.
- [Dynamic per-label thresholds for large-scale search query classification with Otsu’s method](posts/otsu.md): Explore how to use Otsu's algorithm to solve the problem of inconsistent confidence thresholds in search-query intent classifiers using dynamic, per-label tuning.
- [Prompt Engineer’s Guide to Gemini Schemas](posts/prompt-engineers-guide-to-gemini-schemas.md): A technical guide to the Gemini API GenerateContentResponse schema, detailing the structure of candidates, usage metadata, safety ratings, and parsed data.
- [Top 10 Most Recent Papers by MUVERA Authors](posts/top-10-most-recent-papers-by-muvera-authors.md): A collection of recent research papers and focus areas for MUVERA authors Laxman Dhulipala, Majid Hadian, Jason Lee, and Rajesh Jayaram.
- [Training Gemma‑3‑1B Embedding Model with LoRA](posts/gemma-embed.md): Gemma-Embed is a bespoke 256-dim embedding model created by fine-tuning google/gemma-3-1b-pt with LoRA to enable high-fidelity query reformulation.
- [Training a Query Fan-Out Model](posts/training-a-query-fan-out-model.md): Google generates high-quality query reformulations by traversing the mathematical latent space between queries and documents to train the qsT5 model.
- [Cosine Similarity or Dot Product?](posts/cosine-similarity-or-dot-product.md): An examination of the Chrome codebase reveals that the history_embeddings component uses the dot product of normalized vectors to perform similarity searches.
- [Universal Query Classifier](posts/universal-query-classifier.md): A zero-shot, multi-label search query classifier that maps queries to any user-provided label taxonomy without the need for retraining or bespoke models.
- [Another failed attempt to kill SEO](posts/geo-to-seo.md): An analysis of the term Generative Engine Optimization (GEO) and a critique of industry rebranding efforts following opinions shared by Andreessen Horowitz personnel.
- [Vector Embedding Optimization](posts/vector-embedding-optimization.md): An evaluation of four embedding methods comparing speed, storage, and accuracy. Results show mrl truncation maintains high accuracy while reducing file size.
- [Dissecting Gemini’s Tokenizer and Token Scores](posts/gemini-toknizer.md): Explore how Google’s Gemini processes text using subword tokenization. Use this tool to inspect SentencePiece log-likelihood scores for common and rare tokens.
- [There’s a small army of on-device models coming to Chrome](posts/theres-a-small-army-of-on-device-models-coming-to-chrome.md): Technical interpretations and parameter breakdowns for various AI models, including Gemini, Gemma, ULM, and StableLM, covering architecture and scale.
- [AI Mode Site Search](posts/ai-mode-site-search.md): Explore Vertex AI website search features, including Enterprise edition tools like extractive answers, image search, and advanced LLM capabilities for summaries.
- [Multi-Step Research Agent](posts/multi-step-research-agent.md): An implementation of Google's query fan-out in an agentic framework used to research the machine learning and SEO services offered by DEJAN Marketing.
- [Query Fan-Out Prompt Implementation in Google’s Open-Source Agentic Framework](posts/query-fan-out-prompt.md): Google’s Gemini Fullstack LangGraph Quickstart uses Gemini 2.5 and LangGraph to build a citation-driven research agent with a React and FastAPI architecture.
- [From Hallucinations to Clicks](posts/from-hallucinations-to-clicks.md): An automated method for mapping LLM-hallucinated URLs to valid pages using keyword matching and semantic similarity via vector embeddings and cosine similarity.
- [What is GEO?](posts/what-is-geo.md): Generative Engine Optimisation (GEO) is a term used to describe SEO for AI assistants and generative search engines, often based on a single research paper.
- [AI Mode & Page Indexing](posts/ai-mode-page-indexing.md): Tests indicate Google's AI Mode uses a proprietary content store rather than the live web, as it fails to fetch indexed pages that are otherwise ranking.
- [AI Mode is Not Live Web](posts/ai-mode-is-not-live-web.md): An experiment testing Google's AI Mode suggests it may rely on Google's existing index or cached web data rather than performing live HTTP requests for all URLs.
- [How AI Mode Selects Snippets](posts/how-ai-mode-selects-snippets.md): An analysis of how Google selects content for AI Mode snippets, identifying patterns in value propositions, HTML structure, and semantic selection criteria.
- [AI Mode Internals](posts/ai-mode-internals.md): An exploration of Google's AI Mode and Gemini tools, including its use of Google Search, Python libraries, and how it processes date, time, and location data.
- [The Future of Google](posts/the-future-of-google.md): Sundar Pichai discusses Google's AI strategy, the evolution of Search, upcoming AR glasses, the impact of AI on web traffic, and the future of robotics.
- [The Inner Workings of GPT’s file_search Tool](posts/gpt-file_search-tool.md): The file_search tool allows GPT models to extract precise information from uploaded documents using structured queries and provides citations for verification.
- [Live Blog: Hacking Gemini Embeddings](posts/live-blog-hacking-gemini-embeddings.md): An experimental study reproducing the vec2vec research paper by attempting to translate and align Gemini and MxbAI embedding spaces using unsupervised methods.
- [Google’s New URL Context Tool](posts/googles-new-url-context-tool.md): Google's Gemini now uses a combination of search and browsing tools to fetch and read specific web pages, allowing it to ground responses in real-world data.
- [LLM-Based Search Volume Prediction](posts/llm-search-volume.md): An analysis comparing Google Gemini's keyword volume predictions against actual Google Search Console data reveals weak-to-moderate correlation and limited accuracy.
- [How Google grounds its LLM, Gemini.](posts/gemini-grounding.md): An analysis of Gemini's internal grounding processes, revealing its structured indexing method, operational stages, and use of external verification tools.
- [Google Lens Modes](posts/google-lens-modes.md): The lns_mode parameter classifies Google Lens queries into text, unimodal, or multimodal modes to help route requests and support AI Mode functionality.
- [Content Substance Classification](posts/content-substance-classification.md): Cyberfluff is a novel approach for detecting low-quality web content using curriculum-driven contrastive pretraining to distinguish fluff from substance.
- [Chrome’s New Embedding Model: Smaller, Faster, Same Quality](posts/chromes-new-embedding-model.md): Chrome's latest update features a new text embedding model that is 57% smaller than its predecessor, using int8 quantization to maintain search quality.
- [AI Content Detection](posts/ai-content-detection.md): DEJAN-LM is an AI content detection model trained on 20 million sentences, using a combined deep learning and heuristic approach to identify advanced AI text.
- [I think Google got it wrong with “Generate → Ground” approach.](posts/generate-then-ground.md): An analysis of Google's RARR framework compared to retrieval-first approaches like RAG and FiD, focusing on reducing LLM hallucinations through grounding.
- [Introducing Grounding Classifier](posts/grounding-classifier.md): An analysis of Gemini 2.5 Pro's search grounding capabilities and the development of a prompt grounding classifier trained on 10,000 collected prompts.
- [Advanced Interpretability Techniques for Tracing LLM Activations](posts/advanced-interpretability-techniques-for-tracing-llm-activations.md): This page explores mechanistic interpretability techniques, including activation logging, causal tracing through activation patching, and attention head analysis.
- [Temperature Parameter for Controlling AI Randomness](posts/temperature-parameter-for-controlling-ai-randomness.md): The temperature parameter in generative AI models influences randomness and creativity by rescaling the probability distribution of potential next words.
- [Probability Threshold for Top-p (Nucleus) Sampling](posts/probability-threshold-for-top-p-nucleus-sampling.md): Top-p sampling, or nucleus sampling, is a parameter used in generative AI to control text randomness by selecting words based on a cumulative probability.
- [How Google Decides When to Use Gemini Grounding for User Queries](posts/how-google-decides-when-to-use-gemini-grounding-for-user-queries.md): Google uses dynamic retrieval to decide when Gemini models should use grounding. A prediction score and configurable threshold determine if a query needs search data.
- [Cross-Model Circuit Analysis: Gemini vs. Gemma Comparison Framework](posts/cross-model-circuit-analysis-gemini-vs-gemma-comparison-framework.md): A framework for comparative circuit analysis between Google's Gemini and Gemma models to identify how different architectures represent brand information.
- [Neural Circuit Analysis Framework for Brand Mention Optimization](posts/neural-circuit-analysis-framework-for-brand-mention-optimization.md): This framework uses open-weight models like Gemma 3 Instruct to perform mechanistic brand positioning through direct neural circuit and activation analysis.
- [Strategic Brand Positioning in LLMs: A Methodological Framework for Prompt Engineering and Model Behavior Analysis](posts/strategic-brand-positioning-in-llms-a-methodological-framework-for-prompt-engineering-and-model-behavior-analysis.md): This paper presents a methodological framework for analyzing and optimizing brand mentions in large language models through systematic prompt probing and analysis.
- [AlexNet: The Deep Learning Breakthrough That Reshaped Google’s AI Strategy](posts/alexnet-the-deep-learning-breakthrough-that-reshaped-googles-ai-strategy.md): Google and the Computer History Museum open-sourced the AlexNet code, highlighting its role in launching deep learning and shaping Google's AI-first strategy.
- [The Next Chapter of Search: Get Ready to Influence the Robots](posts/the-next-chapter-of-search-get-ready-to-influence-the-robots.md): Explore the evolving landscape of SEO, focusing on how AI, conversational search, and Large Language Models are changing brand representation and visibility.
- [Revealed: The exact search result data sent to Google’s AI.](posts/hacking-gemini.md): An analysis of Gemini's grounding capabilities, addressing issues with hallucinations, guardrails, and the discovery of multi-passage snippet context.
- [Beyond Rank Tracking: Analyzing Brand Perceptions Through Language Model Association Networks](posts/beyond-rank-tracking-analyzing-brand-perceptions-through-language-model-association-networks.md): The DEJAN methodology uses large language models to analyze brand perception and semantic associations, moving beyond traditional keyword rank tracking.
- [Teaching AI Models to Be Better Search Engines: A New Approach to Training Data](posts/teaching-ai-models-to-be-better-search-engines-a-new-approach-to-training-data.md): A recent patent application describes a method for training AI models to better understand human queries by using LLMs to automatically generate training data.
- [Self-Supervised Quantized Representation for KG-LLM Integration](posts/self-supervised-quantized-representation-for-kg-llm-integration.md): Self-Supervised Quantized Representation (SSQR) integrates knowledge graphs with large language models by compressing entity information into discrete codes.
- [What does Gemini think about your brand?](posts/what-does-gemini-think-about-your-brand.md): Chrome Dev includes a quantized Gemini model for tasks like scam prevention. This analysis examines its on-device execution and reverse-engineered prompts.
- [Google’s Privacy Sandbox: Navigating the Cookieless Future](posts/googles-privacy-sandbox-navigating-the-cookieless-future.md): An examination of Google's Privacy Sandbox, focusing on the technical details and privacy implications of the Topics API and the FLEDGE API.
- [Why deep learning works.](posts/why-deep-learning-works.md): An excerpt from François Chollet’s Deep Learning with Python exploring the manifold hypothesis and how structured information enables deep learning to work.
- [Introducing VecZip: Embedding Compression Algorithm](posts/introducing-veczip-embedding-compression-algorithm.md): VecZip is a novel compression method by DEJAN AI that reduces embedding dimensionality by retaining unique dimensions to improve AI performance and storage.
- [Site Engagement Metrics](posts/site-engagement-metrics.md): The Google Site Engagement Metrics Framework in Chromium tracks user interactions, engagement scores, and browsing behavior using UMA histograms.
- [Beyond Links: Understanding Page Transitions in Chrome](posts/transitions.md): Explore Chrome page transition types and qualifiers to understand user intent, navigation pathways, and the SEO implications of different browser behaviors.
- [Both humans and AI return similar results when asked for a random number](posts/random-numbers.md): A comparison of 200,000 random numbers provided by humans and Google's Gemma-2-2b-it model reveals significant overlaps and patterns in number selection.
- [Chrome AI Frameworks & Models](posts/chrome-ai-models.md): A comprehensive list of Chrome's on-device machine learning models, including specialized tools for language processing, page analysis, and content safety.
- [Attention Is All You Need](posts/attention-is-all-you-need.md): A discussion of the Attention Is All You Need paper, covering the Transformer architecture, multi-head attention, and its impact on machine translation.
- [The State of AI](posts/the-state-of-ai.md): The 2024 State of AI report explores the rise of open models, benchmarking challenges, neurosymbolic systems, model efficiency, and global AI developments.
- [ILO](posts/ilo.md): The ILO app is a Streamlit-based tool for managing SEO data through URL population, GSC data fetching, query intent classification, and traffic projections.
- [Resource-Efficient Binary Vector Embeddings With Matryoshka Representation Learning](posts/resource-efficient-binary-vector-embeddings-with-matryoshka-representation-learning.md): An analysis of reducing vector embedding storage through Matryoshka Representation Learning and binary embeddings to optimize SEO text feature extraction.
- [Query Intent via Retrieval Augmentation and Model Distillation](posts/query-intent-via-retrieval-augmentation-and-model-distillation.md): QUILL enhances query intent classification by using retrieval augmentation and a two-stage distillation process to balance model performance and efficiency.
- [Search Query Quality Classifier](posts/search-query-quality-classifier.md): A search query classifier using ALBERT architecture to identify well-formed queries with 80% accuracy, improving upon Google's LSTM-based model by 10%.
- [How Gemini Selects Results](posts/how-gemini-selects-results.md): An explanation of how internal algorithms use relevance scoring, recency bias, user intent, and stochasticity to retrieve and present information.
- [Gemini System Prompt](posts/gemini-system-prompt.md): Gemini Advanced provides access to the Gemini 1.5 Pro model, featuring a 1 million token context window for analyzing up to 1500 pages of information.

# Concepts

- [AEO (Answer Engine Optimization)](concepts/aeo-answer-engine-optimization.md): Answer Engine Optimization. A name for optimising to appear in AI answer engines.
- [AI SEO](concepts/ai-seo.md): A common name for AI-centric SEO: the practice of working toward AI Visibility.
- [AI Visibility](concepts/ai-visibility.md): The broad outcome of AI-centric SEO: being seen, cited, and recommended by AI systems when they answer questions.
- [AIO (AI Optimization)](concepts/aio-ai-optimization.md): AI Optimization. Another name for the practice of improving AI Visibility.
- [Frequency](concepts/frequency.md): How often your citations and mentions occur across AI queries and over time.
- [GEO (Generative Engine Optimization)](concepts/geo-generative-engine-optimization.md): Generative Engine Optimization. A name for optimising to appear in generative AI engines.
- [Generative self-retrieval](concepts/generative-self-retrieval.md): A reasoning model surfacing facts from its own parameters by generating them as text, where those generated facts then condition and improve its final answer.
- [Grounding](concepts/grounding.md): The process by which an AI system answers with current information: it runs a search, retrieves pages, reads short extracts, and writes its answer from them rather than from memory.
- [Grounding Bias](concepts/grounding-bias.md): A form of secondary bias: how much an AI model defers to, or discounts, retrieved sources once they are grounded into the context, rather than leaning on what it already believed.
- [Grounding Snippet](concepts/grounding-snippet.md): A short, extractive selection of sentences pulled from a page and supplied to an AI model as the evidence it grounds an answer on; the atomic unit of visibility in AI search.
- [Number of Citations](concepts/number-of-citations.md): The absolute count of times your pages are cited in AI answers.
- [Number of Mentions](concepts/number-of-mentions.md): The absolute count of times your brand is named in AI answers.
- [Open Knowledge Format](concepts/open-knowledge-format.md): An open, vendor-neutral format for packaging the knowledge AI systems need, as plain markdown files any model or agent can read.
- [Primary Bias](concepts/primary-bias.md): An AI model's ungrounded confidence in an entity, formed during training and present before any retrieval; in AI search it is the largest single factor in whether a source is selected.
- [Query Fanout](concepts/query-fanout.md): The step where an AI system splits one prompt into several single-intent sub-queries, each retrieving its own sources; a page can be grounded for one angle of a question and absent for another.
- [Rank](concepts/rank.md): How prominently your citations and mentions appear within an AI answer.
- [Relevance Engineering](concepts/relevance-engineering.md): Engineering a page's semantic relevance to a query with embeddings and vector math, treating visibility as an engineering problem rather than keyword optimization.
- [Secondary Bias](concepts/secondary-bias.md): The post-retrieval layer of selection: how an AI model treats, weights, and is swayed by content once it has been retrieved. Unlike primary bias, it is addressable now.
- [Selection Rate](concepts/selection-rate.md): The rate at which an AI system chooses a given source from the available grounding candidates when composing an answer; the AI-search equivalent of click-through rate.
- [Selection Rate Optimization](concepts/selection-rate-optimization.md): The AI-search counterpart to CTR optimization: improving how often AI systems select your content as the grounding source for their generated answers.
- [Share of Citations](concepts/share-of-citations.md): The proportion of cited sources in AI answers that are yours.
- [Share of Mentions](concepts/share-of-mentions.md): The proportion of brand mentions in AI answers that are yours.
- [Share of Voice](concepts/share-of-voice.md): Your overall slice of the AI answer space for a topic, relative to competitors.