Tools for Modes: how the major LLMs implement Memory, Search, and Fetch

The three Knowledge Modes — Memory, Search, Fetch — only become real because vendors ship tools that implement them. This post walks the per-vendor tool inventory in 2026, with the history of how we got here, and the basic technical stack so the rest of the GEO playbook makes sense.

1. Why "tools for modes"

Our Knowledge Modes framework names three sources of any AI answer: parametric Memory baked into model weights, real-time Search against an index, and Fetch of a specific URL. Modes are an analytical abstraction. They become operationally real only when a vendor exposes APIs for them — a web_search tool, a web_fetch tool, a way to turn both off and probe pure memory. Without those primitives, "Search Mode" is a marketing slide, not something a measurement platform can run against on a Tuesday morning.

The five vendors that matter most for Generative Engine Optimisation in 2026 — Anthropic, OpenAI, Google, Perplexity, and xAI — implement the three Modes very differently. The differences are partly engineering choices and partly history. Both matter for any team trying to measure or optimise their AI visibility.

2. The anatomy of an LLM tool call

Every vendor's Search and Fetch implementation rides on the same generic message-loop pattern, even when the API field names differ. A request lands with a system prompt, a user message, and a list of available tools. The model decides whether to answer directly from weights or to emit a tool_use block that names a tool and supplies arguments. The host (your code, or a vendor-hosted runtime) executes the tool, returns a tool_result back into the conversation, and the model continues. The loop runs until the model emits a final text answer.

Anthropic and OpenAI both expose this loop directly: the Anthropic tool-use overview and OpenAI's Responses API reference document the shape per request. Gemini exposes a similar loop through function calling. Each vendor adds hosted tools on top of this — server-side primitives the platform runs for you (search, fetch, code execution, file retrieval) so you don't have to wire your own Brave or Bing key.

A note on "RAG". Classical Retrieval-Augmented Generation (Lewis et al., 2020) means an LLM developer owns a curated corpus, embeds it into a vector store, and trains the retriever and generator together. The web_search tool pattern is RAG-shaped but architecturally different: the LLM delegates a query to an external search engine (Brave, Google, Bing) that runs its own classical IR stack over the public web, and the search result is dropped back into the model's context as a tool result. We call this tool-call retrieval (or "agentic RAG"). It matters because the things you optimise are not the same: classical RAG cares about your embedded chunks; tool-call retrieval cares about whether bots can crawl your site at all.

3. Memory: no tools, just weights

Memory is the only Mode that is not implemented as a tool. It's what the model returns when no tools fire. Measuring Memory therefore means preventing tools from firing, which every vendor lets you do with one specific control:

Anthropic: omit tools from the request, or use tool-choice controls to prevent invocation.
OpenAI: set tool_choice: "none" in the Responses API.
Google Gemini: set tool_config.function_calling_config.mode = "NONE" for function tools, and omit google_search / url_context.
Perplexity: pass disable_search: true to the Sonar API. Caveat: Perplexity is search-native, so a Memory-only result is less comparable to its consumer surface than the other vendors.
xAI Grok: omit the web_search tool from the request; an explicit no-tool mode is not verified in the public docs.

These are the configurations Combot uses for controlled Memory-mode probes (see Measuring Memory for the methodology).

4. Anthropic Claude — the late arrival, with the Brave story

Anthropic launched Claude in March 2023 but did not ship a native web_search tool to the API for nearly two years. Model Context Protocol (MCP) arrived in November 2024 as the open standard for external tool integration. The first documented native web_search generation uses the web_search_20250305 version string, indicating a March 2025 generation; treat the exact release date as a vendor-changelog detail unless cited separately. The latest generation is web_search_20260318 (and web_fetch_20260318 for explicit URL fetches), which adds response_inclusion control on top of the dynamic-filtering capability introduced in the web_search_20260209 / web_fetch_20260209 generation; the basic versions (web_search_20250305 and web_fetch_20250910) remain available. Combot's own probes deploy the 20260209 generation.

That is roughly a 21-month gap behind OpenAI's first browsing tool. The gap had practical consequences. For most of 2023 and 2024, anyone who needed Claude to read the live web wrote a custom RAG pipeline; Anthropic's models were positioned as Memory-first thinking partners, and the brand was built on that posture.

When Anthropic shipped, it shipped with unusually clean primitives. The Search tool returns first-class citations blocks on final text content (content[].citations[].url, cited_text, title) and exposes usage counters at usage.server_tool_use.web_search_requests. The Fetch tool is explicitly URL-oriented and is documented as not supporting dynamically rendered JavaScript sites — a useful constraint that tells you what to render server-side if you want Claude to read you (see Lean Render).

The most discussed wrinkle is the index backend. Anthropic explicitly documents Brave Search API as the web-search backend for its Claude for Government MCP integration, and Anthropic's published subprocessor evidence separately lists Brave Search. These are two distinct disclosure points — do not extrapolate them into a universal claim about every Claude surface. Brave positions its Search API as an independent agentic backend, and we treat observed Brave visibility as a practical signal for Claude Search mode in our monitoring. See the server-logs post for the deeper Bravebot correlation work.

5. OpenAI GPT — Browse with Bing, SearchGPT, and the unified `web_search`

OpenAI was first to mainstream LLM-driven web browsing. Browse with Bing launched in beta for ChatGPT Plus in May 2023, was paused that summer after users found ways to bypass publisher paywalls, and returned later in the year. In July 2024 OpenAI unveiled SearchGPT as a prototype testing a more conversational, citation-heavy search experience, and that prototype folded into the generally available ChatGPT search in October 2024.

For developers, the canonical tool today is the Responses API's web_search. (The legacy web_search_preview name still appears in references but should not be used as the current canonical.) OpenAI also exposes file_search for retrieval from uploaded vector stores and code_interpreter for sandboxed execution; these are orthogonal to web Search but use the same hosted-tool plumbing.

Citations land at output[].type === "message" with annotations of type url_citation carrying url, title, and span indices. A separate web_search_call output records that Search was invoked, and an optional sources field lists URLs consulted but not necessarily cited. Notably, the public web_search guide does not disclose the current backend index owner; widely repeated claims that the modern API is Bing-backed should be treated as unverified outside Microsoft co-branded surfaces.

6. Google Gemini — search was always native

Gemini launched in December 2023 with a structural advantage no competitor has: it is wired to the same Google Search index that powers the consumer search product. The google_search tool grounds Gemini answers in live results and returns rich groundingMetadata — webSearchQueries, groundingChunks[].web.uri, and a groundingSupports[] array that maps text spans in the answer back to source chunks.

For explicit URL reading, Gemini exposes url_context, which fetches up to 20 URLs per call and returns retrieval metadata at candidates[].url_context_metadata.url_metadata[]. The two tools are combinable. Google introduced the Google-Extended robots product token in September 2023, separating publishers' Google Search inclusion from their Gemini grounding eligibility — the cleanest policy lever in the industry.

For practitioners, Google SEO is more directly relevant to Gemini than to any non-Google retrieval path — but indexability is not the same as grounding eligibility or citation. Measure Gemini Search directly rather than assuming organic ranking translates linearly.

7. Perplexity — RAG as the product

Perplexity is not a model with optional tools. It is a search-native answering engine launched in late 2022. Its commercial API, Sonar, opened developer access in 2025 and exposes search controls — search_domain_filter, search_recency_filter, enable_search_classifier, disable_search — rather than a separate web_search tool string. Crawling is split across two documented agents: PerplexityBot for search-result indexing and Perplexity-User for user-triggered fetches (see the crawler docs).

The architectural consequence is that you cannot meaningfully measure Perplexity in Memory-only mode the way you can Claude or GPT. Even with disable_search: true, the product comparison is unequal. For citation parsing, the Sonar API field paths should be confirmed against the live reference before any implementation pass — Combot does not currently maintain a parser for Perplexity citations.

8. xAI Grok — live X integration

Grok launched in November 2023 for X Premium+ subscribers. Its public differentiator is real-time access to the X (Twitter) data stream, which sets it apart from web-index-based competitors. The current xAI developer surface is the SDK-facing Web Search tool (Grok's older "live search" route now redirects there), and the SDK exposes a web_search primitive with optional image understanding.

Beyond that, public docs are thin. The exact REST tool JSON shape, X-integration internals, citation field paths, and JavaScript execution behaviour are not fully documented in the sources we verified for this post. Combot does not currently probe Grok in production; teams that need to measure Grok visibility should treat xAI claims about real-time X access as the marketed behaviour and validate empirically with their own prompts.

9. How they differ — at a glance

Vendor	Memory control	Search tool	Fetch tool	Citation path	Index backend (disclosed)
Anthropic	Omit tools	`web_search_20260209`	`web_fetch_20260209`	`content[].citations[]`	Brave (Government MCP only, per subprocessor list)
OpenAI	`tool_choice: "none"`	`web_search`	Use `web_search`	`message.content[].annotations[]` type `url_citation`	Not disclosed in the API guide
Google Gemini	`function_calling_config: NONE`	`google_search`	`url_context` (≤20 URLs/call)	`groundingMetadata.groundingChunks[]`	Google Search index
Perplexity	`disable_search: true`	Sonar API (search-native)	`Perplexity-User` (product fetcher)	Field paths to verify per release	Perplexity-operated search stack; backend internals not fully disclosed
xAI Grok	Omit `web_search`	`web_search` (SDK-facing)	Partial X-search references	Not fully verified	Marketed real-time X access + documented `web_search`; backend internals limited in public docs

10. What this means for measurement and optimisation

The tools determine what is measurable. If a vendor doesn't ship a probe-able Memory-only mode, you cannot isolate parametric recall; if it doesn't expose structured citations, you cannot run reliable cited-URL extraction. The Combot stack treats Anthropic, OpenAI, and Google as the three reliably probe-able vendors today, with Perplexity and Grok added as product-context signals.

From an optimisation perspective, the lesson is that the three Modes need three different work streams. Memory rewards durable evidence on Wikipedia, schema.org sameAs, and the open corpora that feed pre-training. Search rewards bot-accessible, RAG-friendly content with current freshness signals. Fetch rewards server-rendered HTML and Open Graph metadata that survive a single HTTP GET. The same brand can be excellent at one Mode and invisible in the other two — and only a vendor-aware measurement stack will tell you which.

For the practitioner playbook per Mode, see the GEO Matrix series: Optimising Memory, Optimising Search, Optimising Fetch, and the three matching Measure posts.

Foundation: AI Knowledge Modes — Memory · Search · Fetch

Series: Optimising Memory · Optimising Search · Optimising Fetch · Measuring Memory · Measuring Search · Measuring Fetch

Tool versions and behaviour current as of 2026-05-19. Vendor docs change frequently; treat dated specifics in this post as a snapshot, not a contract.