Most teams measuring AI visibility today are measuring the wrong thing. They run a few branded prompts in ChatGPT, screenshot the answer, and check whether their brand name appears. That's the equivalent of measuring SEO in 2008 by typing your company name into Google and seeing if you're number one. It's a true measurement. It's also commercially almost worthless.
The right question isn't "are we mentioned?" It's "are we recommended where it counts?" And that question has a name: AI Share of Recommendation.
The definition
AI Share of Recommendation is the percentage of relevant prompts where the brand is recommended as a viable or preferred choice, weighted by prompt value, model / platform importance, geography, and funnel intent.
It's a single number you can put on a board slide. But unlike "mentioned in ChatGPT," it forces three real disciplines on the team measuring it.
1. A prompt portfolio
You can't compute a share without a denominator. The denominator is a set of prompts — not a single query, not a leaderboard, a portfolio. Branded ("what is Acme?"), category ("best CRM for distributed sales teams"), comparison ("Acme vs Competitor"), problem ("I need a CRM that handles X"), local ("CRM Helsinki"), trust ("is Acme reliable"), and action ("buy a CRM today") prompts all behave differently. Skipping any of them hides whole failure modes.
Equally important: the prompt portfolio has to be version-controlled. The moment you start rewording prompts to chase better answers, your trend line becomes meaningless. We learned this the hard way. Treat prompts like tests: one canonical set, change-controlled.
2. Weighting that reflects business value
Not every prompt counts the same. "What is Acme?" is a reputation check. "Best CRM for sales teams under 50 people" is a demand-generation surface that represents real budget. The latter weights higher. Weighting can be by GSC volume, by Ads CPC, by sales-team-attested intent, or a combination. The point is that the weighting is explicit and the team agrees on it.
Advanced setups add a second weighting axis: knowledge mode. The same prompt scores differently depending on whether the model answered from static memory, a live web search, or a direct URL fetch — and the right mix depends on intent. Category discovery weights Search higher; product and pricing prompts weight Fetch higher; reputation prompts weight Memory higher. The rolled-up number still sits on the board slide, but the per-mode drill-down tells the team which fix family the regression demands.
3. Recommendation, not mention
A mention is "Acme is one of the providers in this space." A recommendation is "I'd choose Acme because of X." The first is noise; the second drives behavior. Distinguishing them requires a structured second-pass LLM evaluation: extract brands, rank position, classify recommendation strength, extract stated reasons. It's not free, but it's the difference between a vanity metric and a useful one.
Why this metric beats the alternatives
The competing candidates are mention rate, citation share, and SOV-style brand share. Each is useful as a component. None of them is sufficient alone.
| Metric | What it tells you | What it misses |
|---|---|---|
| Mention rate | The model knows you exist | Whether it recommends you |
| Citation share | Your URLs are trusted enough to link | The reason given for the link |
| Brand SOV | Volume vs competitors | Quality of framing, weighted value |
| Share of recommendation | Commercial usefulness, weighted | (it's the integration of the above) |
Share of recommendation is the integral. The others are partial derivatives. Track all of them — but report Share of Recommendation to leadership.
What changes when you adopt this
Three things change immediately:
- Content priorities shift. If the brand has 90% mention rate but 25% recommendation rate, the gap isn't a content-volume problem. It's a comparison / differentiation problem. You stop writing "ultimate guide to X" articles and start writing comparison + use-case pages that give the model a clear reason to recommend.
- The blame distribution changes. "We're invisible in ChatGPT" becomes "we're recommended 30% on branded prompts and 4% on category prompts" — which is actionable per prompt family, by team.
- The dashboard changes. Instead of a feed of screenshots, you get a number, a trend, and a drill-down to the prompts that moved it. That's a dashboard an executive will actually open.
How Combot computes it
Each night, every prompt in the portfolio runs against Claude, GPT, Gemini, and Perplexity. A second-pass evaluation classifies each answer: brand mention rank, recommendation strength (none / weak / explicit), reasons given, citation presence. Results land in BigQuery. The weighted recommendation rate per prompt is aggregated to a single per-day number per brand. A 30-day rolling average smooths the model-drift noise. Alerts fire when the number moves more than 2 standard deviations from the trend.
That number is what we put on the homepage of the dashboard. Everything else is drill-down. Mentions, citations, sentiment, fact accuracy — they explain why the share moved. They don't replace it.
Further reading: AI Knowledge Modes · The 7 layers of AI visibility · Source mapping · Fact-checking the machines
