← Blog · Framework

AI Knowledge Modes: Memory, Search, Fetch

Ask the same model the same question about your brand three times and you can get three different answers. With no tools, it answers from what it already knows. With search enabled, it answers from a fresh candidate set. With a supplied URL, it answers from the page it can actually read. The wrong move is to flatten those into one vague score called AI visibility. The right move is to separate the knowledge source first.

The SEO and GEO industry still talks as if every AI answer comes from one place — call this the monolithic GEO fallacy: collapsing three independent retrieval mechanisms into one undifferentiated "AI visibility" bucket. Vendors do not make that mistake. Anthropic exposes web_search_20260209 and web_fetch_20260209. OpenAI exposes web_search on the Responses API. Google gives Gemini google_search and url_context. The product names differ, but the split is consistent: the model can remember, search, or fetch.

That gives us the working trichotomy: Memory, Search, and Fetch. They are three optimization playbooks, three failure modes, and three time-scales. They do not substitute for each other.

The trichotomy

Every AI answer about your brand is composed from up to three knowledge modes: Memory (training data), Search (live web retrieval), and Fetch (direct URL reads). Each fails differently. Each rewards different work.

These modes are orthogonal to the 7-layer funnel. The layers tell you where the brand drops from Trained-on to Retrieved to Fetched to Cited to Sentiment & Accuracy to Recommended to Acted-on. The modes tell you which knowledge supply line fed that layer. They are not upstream or downstream. They are a third axis.

Mode 1 — Memory

What it is: Memory is knowledge baked into the model weights at pretraining time. It is static between model releases. It is active when no tools fire, when the prompt is a recall task, and when search or fetch fails and the model falls back to what it thinks it knows.

How to test: send the prompt with no tools enabled. No search, no URL context, no browsing. The response is the cleanest observable version of Memory. Ask basic entity questions, category prompts, comparison prompts, stale-fact checks, and disambiguation prompts against similarly named companies.

Failure modes: the brand is absent, confused with a similarly named entity, framed by stale 2023-era facts, or outweighed by a competitor whose category association is stronger. This is why a technically perfect website can still lose an unaided prompt. The model's prior is not yet on your side.

Optimization playbook: entity discipline and public consensus. Wikipedia where appropriate, Wikidata with consistent sameAs links, schema.org Organization markup, authoritative About pages, GitHub and Hacker News presence for technical brands, and sustained coverage in publications that become training data. Andrew Holland's phrase fame engineering is useful here: Memory changes when the public web repeatedly says the same thing about you.

Time-to-impact: months to years, gated by vendor model-release cycles. Memory only updates when the next base model is trained on a corpus that includes your new evidence. It is the slowest mode to move and the most permanent when it does.

Mode 2 — Search

What it is: Search is live web retrieval performed mid-answer. ChatGPT exposes web search without disclosing the current API backend, Claude has documented Brave Search in specific surfaces (notably the Government MCP integration), and Gemini uses Google Search grounding. The model rewrites the user's prompt into one or more queries, receives a candidate set of URLs, then chooses which sources to cite or synthesise.

How to test: enable the search tool and run the prompt portfolio. The response should include citations or at least observable source use. The important diagnostic is not just whether your brand appears, but whether your domain entered the candidate set and whether the answer used you instead of a competitor, review site, or forum thread.

Failure modes: your URL is not retrieved, the wrong page is retrieved, a competitor is cited instead, or a reviews aggregator such as G2, Reddit, Trustpilot, or an industry directory becomes the source of truth. Search failures often look like recommendation failures, but the real issue happened before synthesis.

Optimization playbook: classic SEO still matters because retrieval still starts with a search index. Add /llms.txt and /llms-full.txt. Allow AI crawlers in robots.txt, including GPTBot, ClaudeBot, OAI-SearchBot, PerplexityBot, Google-Extended (Gemini search grounding control), and Claude-SearchBot + Claude-User (Claude's retrievals; the older anthropic-ai UA is deprecated). Put a 40-60-word answer-first summary near the top of each important page. Use structured comparison tables, specific headings, fresh dates, and pages built around the questions the model actually rewrites into search queries.

Time-to-impact: 1-6 weeks. Search moves faster than Memory because content, rankings, snippets, freshness, and source mix can change inside the current model generation. A version-controlled query portfolio gives the feedback loop.

Mode 3 — Fetch

What it is: Fetch is direct URL reading. The model takes a URL from the prompt, from a prior search result, or from its own hypothesis and pulls that page server-side into context. Search asks, which URLs should I consider? Fetch asks, can I read this URL now?

How to test: enable the fetch or URL-context tool and include a URL in the prompt. Ask the model to answer from that page, extract facts, compare products, or summarize terms. If the page is readable, the answer should reflect the page. If it is not, the model will either admit the fetch failed or quietly fall back to Memory.

Failure modes: JavaScript-only category pages return empty shells, slow TTFB times out, cookie banners block the text, product facts live only in PDFs, redirects blur the canonical page, or robots.txt blocks ClaudeBot and GPTBot. In 2026, client-side-rendered category pages are the most common Fetch failure because many AI fetchers still do not execute JavaScript.

Optimization playbook: server-side render the core content. Keep TTFB under 800ms. Use clean semantic HTML with <main>, <article>, headings, links, and tables visible in the initial HTML. Return strict status codes. Put the facts the model needs in JSON-LD and human-readable text. Use /llms.txt to point at canonical URLs. Do not hide category, product, price, availability, or comparison facts behind client-side rendering.

Time-to-impact: 1 day to 2 weeks. Fetch is the fastest mode to fix. It is usually a template, rendering, robots, redirect, status-code, or content-structure issue, which makes it the biggest quick win for technically broken sites.

Why one number still works

The natural worry is that three modes means three dashboards. It does not. AI Share of Recommendation decomposes by mode. Compute Memory-SOR, Search-SOR, and Fetch-SOR, then weight them by use case. Category discovery weights Search higher. Reputation prompts weight Memory higher. Support, product, and action prompts often weight Fetch higher because the answer should come from the current URL.

The board metric can still be one number. The explanation gets sharper: overall AI SOR is 18%, driven by Search 27%, Fetch 14%, Memory 8%. That says the Memory leg is the bottleneck. The fix is not another round of page-speed work. It is entity consistency, third-party consensus, and long-arc reputation building.

How this maps to the 7 layers

The 7 layers are the funnel. The modes are the source axis. Every cell in the matrix can be tagged via Memory, via Search, or via Fetch. That makes the diagnosis two-dimensional: where did the prompt fail, and which knowledge mode supplied the failing evidence?

LayerPrimary mode(s)
1. Trained-onMemory
2. RetrievedSearch
3. FetchedFetch
4. CitedSearch + Fetch
5. Sentiment & AccuracyMemory + Search
6. RecommendedAll three
7. Acted-onFetch + Search (downstream)

What changes when you adopt this

How Combot measures it

Every night, every prompt in the portfolio is run against each model in each mode the prompt cares about: no-tools probes for Memory, search-enabled probes for Search, and fetch-enabled probes with URL lists for Fetch. Each result carries the mode tag. The dashboard surfaces Memory-SOR, Search-SOR, Fetch-SOR, and the weighted roll-up. Alerts fire on per-mode 2-sigma deltas, not on the rolled-up number alone.

When the AI changes its mind about your brand, the question we want to answer is not did it change but which knowledge source pulled it that way.


Further reading: 7 layers · AI Share of Recommendation · Technical SEO of LLMs