AI visibility metrics explained: mention rate, share of voice, rank
A practical breakdown of the core AI visibility metrics — mention rate, share of voice, rank, sentiment and citations — and how to read them without fooling yourself.
5 min read
Why AI visibility needs its own metrics
Search analytics assume a stable, rankable list of results: query in, ten blue links out, position trackable to the decimal. AI assistants break that model. Ask ChatGPT, Claude, Gemini or Perplexity the same question twice and you can get different brands, in a different order, with different reasoning. The answer is generated, not retrieved from a fixed index, so it varies with model temperature, the exact wording of the prompt, whether live web retrieval fired, and which training run is live that week.
That variance is the whole reason AI visibility is measured as a distribution, not a single number. You are not asking 'do I rank #1 for this keyword' — you are asking 'across many runs of many realistic prompts, how often does the model name me, how prominently, and against whom'. Every metric below is a way of summarising that distribution. Treat any single answer screenshot as an anecdote; treat a few hundred sampled responses as data.
Mention rate: the foundation metric
Mention rate is the percentage of responses that name your brand, for a defined set of prompts, over a defined window. If you run 100 prompts about 'project management software for agencies' and your brand appears in 23 answers, your mention rate for that prompt set is 23%. It is the cleanest signal of whether the model knows you exist and considers you relevant to the question.
The number is meaningless without three things attached: the prompt set, the model, and the time window. Mention rate for 'best CRM' is a different game from 'best CRM for solo real estate agents' — the broad query pulls category giants, the specific one rewards niche fit. Always segment by model too, because a brand can sit at 40% on Perplexity (which leans on live retrieval and citations) and near zero on a model answering purely from training data. Track the same fixed prompt set over time so a movement reflects your visibility changing, not your measurement changing.
Watch for false positives when you automate this. 'Apple' the brand versus the fruit, or a brand name that is also a common word, will inflate your count unless you check that the mention is actually a recommendation in context rather than an aside, a disambiguation, or a warning.
Share of voice: your slice of the conversation
Mention rate tells you how often you show up; share of voice tells you how often you show up relative to everyone else. The standard form is your mentions divided by all brand mentions across the same prompt set — if a typical answer to your category question names six tools and you are one of them every time, your share of voice is roughly 1/6, around 17%, regardless of how many total prompts you ran.
Share of voice is the metric to bring to a strategy conversation because it is competitive and zero-sum: it can only go up if you take ground from a named rival. It also exposes the structure of your category. If three incumbents soak up 70% of all mentions and a long tail splits the rest, you are fighting a consensus problem — the models have absorbed a strong default answer and you need either retrieval-driven citations or enough independent third-party coverage to crack it. A flat, fragmented share-of-voice chart is the opposite situation: nobody owns the category in the model's mind, and consistent, specific content can move you quickly.
Pair it with mention rate rather than replacing one with the other. Rising mention rate but flat share of voice means the whole category is getting more airtime and you are just keeping pace; rising share of voice on flat mention rate means you are genuinely winning ground.
Rank and position: prominence within the answer
Being mentioned twelfth in a bulleted list of fifteen tools is not the same as being the first name in the first sentence, yet both count identically toward mention rate. Position metrics capture that difference. The simplest is average rank — the mean position at which you appear when you are named. More useful is first-mention rate (how often you are the lead recommendation) and top-three rate, because users act on the first one or two names far more than the tail.
Position matters more than it looks because of how the answers are consumed. People rarely read a full AI-generated list to the bottom, and follow-up questions ('tell me more about the first one') compound the advantage of the lead slot. When you analyse position, note whether the model is ranking by genuine fit or just listing alphabetically or by popularity — a brand that always appears but always near the bottom usually has a specificity problem: the model knows you exist but has not absorbed a clear reason you are the right answer for that exact use case.
Sentiment and citations: the quality layer
A mention can help or quietly hurt you, so sentiment is the next layer. Models describe brands with qualifiers — 'great for beginners', 'powerful but expensive', 'has had reliability complaints' — and those phrases travel straight into a buyer's mental shortlist. Track how you are framed, not just whether you appear. A high mention rate wrapped in lukewarm or caveated language is weaker than a lower mention rate with consistently strong framing.
Citations are the metric that ties AI visibility back to something you can act on directly, and they apply most to retrieval-grounded answers (Perplexity, and the web-browsing modes of the others). When a model cites sources, log which domains it pulls from for your category. Those cited pages — review roundups, comparison articles, documentation, community threads — are the surfaces actually feeding the answer. Earning placement or accuracy on them is the most concrete lever you have, far more so than for answers drawn purely from training data, which you can only influence slowly through broad, durable third-party presence.
One caution: citation presence does not guarantee recommendation. A model can cite a page that lists you while still recommending a competitor it considers a better fit. Read citations as a map of influenceable sources, then check them against your mention rate and sentiment to see whether the sources are working for you or against you.
Reading the metrics together without fooling yourself
No single metric is the score. The honest dashboard is mention rate (do they know me), share of voice (versus whom), position (how prominently), sentiment (framed how), and citations (driven by what) — read as a set, segmented by model and by prompt cluster, trended over weeks rather than read off one run. A change in one number is only meaningful once you can rule out prompt drift, a model update, and sampling noise as the cause.
Set a sensible cadence and sample size. A handful of prompts run once will swing wildly on randomness; a few dozen to a few hundred realistic prompts per cluster, re-run on a regular schedule, gives you a baseline stable enough to detect real movement. Hold the prompt set fixed between runs so you are measuring the same thing each time, and keep a separate exploratory set for discovering new prompts buyers actually use. Done this way, these metrics stop being vanity numbers and start telling you exactly which prompts, which models, and which cited sources to work on next.