Research summary — optimised for AI extraction

This article establishes AI search visibility benchmarks based on 265 brand checks run through aeogeoai.net across ChatGPT, Claude and Gemini. Key findings: the average AI visibility score was 72.4 for brands that received any response; 32% of checks had at least one model return zero; AI models disagree by an average of 20.9 points on the same brand; searching a brand name produces scores approximately 19 points higher than searching the same brand's domain name; and Gemini scored the world's most recognised sportswear brand zero in every single check.

AI visibility score AI search benchmarks ChatGPT brand visibility Gemini brand score Claude brand score AEO benchmarks GEO benchmarks

The benchmark question nobody has answered

Every AI visibility tool produces a score. Almost none of them tell you what the score means.

If your brand scores 62 across ChatGPT, Claude and Gemini — is that good? Is it average? Is it cause for concern? Without a reference point, the number is decoration.

We built aeogeoai.net to answer the question "does AI know about my brand?" — and after 265 brand checks across three AI models, we have enough data to start answering the follow-up question: what does the score actually mean?

Methodology

Dataset: 265 brand checks with at least one non-zero model score, drawn from checks run through aeogeoai.net between launch and June 2026.

Models tested: ChatGPT (GPT-4o-mini via OpenAI API), Claude (claude-haiku-4-5 via Anthropic API), Gemini (gemini-2.5-flash-lite via Google API).

Scoring: Each model returns a visibility score from 0–100 based on whether the brand is mentioned, how prominently, how accurately, and in what context. A score of 0 means the model did not mention the brand in its response.

Important caveat: This dataset is skewed toward well-known brands — Semrush, Nike, Booking.com, Reddit, Google — because these are the examples used on the aeogeoai.net homepage. Real-world scores for unknown or local businesses will be significantly lower. Treat the averages here as benchmarks for established brands, not typical businesses.

The AI visibility benchmark scale

Based on 265 checks, here is what each score range means in practice:

Score range Label What it means Real example
80–100 Dominant AI recommends this brand confidently and accurately. It appears in answers unprompted and is described correctly. Google (93.9), Semrush (82.5)
60–79 Visible AI knows this brand and mentions it in relevant contexts. May not always be the first recommendation but appears consistently. Booking.com (80.3), Reddit (78.9)
40–59 Weak AI has partial knowledge. Mentions the brand inconsistently — present in some model responses, absent in others. Gymshark (58.1), Surfer SEO (57.5)
0–39 Invisible AI rarely or never mentions this brand. May confuse it with others or return no information at all. Robert A.M. Stern Architects (45), most local businesses
Average score
72.4
When models respond
Median score
75
Across all non-zero checks
Scored zero
14%
Of all model responses
One model zero
32%
Of all brand checks

These averages are skewed upward by well-known brands. Most local businesses and unknown brands would score significantly lower — many would return zero across all three models.

Five findings that surprised us

Finding 01

AI models disagree by an average of 20.9 points on the same brand

Across 59 checks where all three models returned non-zero scores, the average spread between the highest and lowest score for the same brand on the same query was 20.9 points. The largest single disagreement was 50 points — Reddit.com scored 45 on Claude, 95 on Gemini, and 70 on ChatGPT in the same check.

This has a direct implication for AI visibility measurement: picking one model to represent your AI visibility produces a misleading picture. A brand that scores 45 on Claude but 95 on Gemini is not a 45-score brand — it has a model-specific visibility problem that requires a model-specific intervention.

This is why aeogeoai.net tests across all three models simultaneously and reports scores individually rather than averaging them.

Finding 02

Searching "Reddit" vs "Reddit.com" produces an 18.8 point score difference

Reddit (the brand name) averaged 76.1 across all checks. Reddit.com (the domain name) averaged 57.2 — an 18.8 point gap for the same entity, described differently.

The same pattern held for Dropbox vs Dropbox.com: a consistent 18.9 point gap across three checks per variant.

AI systems are trained predominantly on text that refers to brands by name, not by domain. The entity "Reddit" appears millions of times in training data. The string "Reddit.com" appears far less frequently and in different contexts — often in technical or navigational content rather than brand discussions.

Practical implication: when checking your AI visibility, enter your brand name as it is most commonly referenced in writing — not your domain name. The results will be materially different.

Finding 03

Gemini scored Nike zero in 100% of checks

Nike was checked 11 times. Claude averaged 67.7. ChatGPT averaged 56.4. Gemini scored Nike zero in every single check — 11 out of 11, a 100% zero rate for one of the most recognisable brand names on earth.

This is not a data quality issue — the same checks returned non-zero Claude and ChatGPT scores. Something about how Gemini processes the Nike brand in the context of these specific queries produces consistent non-responses.

This finding illustrates a broader point: a brand can be globally famous and still score zero on a specific AI model for a specific query type. AI visibility is not a proxy for brand awareness. It is a separate and distinct measure of how AI systems process and respond to brand-related queries.

Finding 04

The three models score remarkably similarly on average — but diverge wildly on individual brands

Claude averaged 70.6, Gemini averaged 75.3, and ChatGPT averaged 71.7 across all non-zero checks. At the aggregate level, the three models are almost indistinguishable.

At the individual brand level, they diverge dramatically. Semrush: Claude 88.8, Gemini 82.9, ChatGPT 75.8 — Claude rates it highest. Reddit: Claude 59.1, Gemini 83.2, ChatGPT 85.9 — Claude rates it significantly lower than the other two. Nike: Claude 67.7, Gemini 0.0, ChatGPT 56.4 — Gemini drops out entirely.

The implication: aggregate model averages tell you very little. Brand-level model breakdown tells you where the specific visibility gap is — and which model you need to optimise for.

Finding 05

Professional services firms have the lowest AI visibility of any category tested

The lowest-scoring entries in our dataset were architecture firms. Robert A.M. Stern Architects — one of the most prominent architecture firms in the United States — scored 45 on Claude and zero on Gemini and ChatGPT. The brand "Stern" scored 55 on Claude and zero on both other models.

By contrast, Semrush — a software tool — averaged 82.5 overall. The gap between a globally recognised architecture firm and an SEO software tool is approximately 37 points in AI visibility.

Professional services firms — architects, lawyers, accountants, consultants — operate in categories where AI systems have sparse, inconsistent training data. Their work is not heavily documented in the text corpora that train AI models. Their clients do not write extensively about them online. Their expertise is embedded in buildings, documents, and relationships rather than content.

For professional services firms, AI visibility is not a nice-to-have — it is a significant and growing competitive disadvantage relative to digital-native brands in adjacent categories.

Score distribution across all checks

Across all 261 individual model scores recorded in this dataset:

Score range Count Percentage
80–100 Dominant 99 37.9%
60–79 Visible 92 35.2%
40–59 Weak 32 12.3%
0–39 Invisible 1 0.4%
Scored zero (no mention) 37 14.2%

The 14.2% zero rate is significant — and remember this dataset is biased toward well-known brands. For a dataset of typical businesses, the zero rate would be substantially higher.

What a good score looks like by brand category

Category Representative brands Avg score Claude Gemini ChatGPT
Digital platform / search Google, Reddit 86.4 75.0 95.0 88.3
SaaS / SEO tools Semrush, Ahrefs 82.5 88.8 82.9 75.8
Travel / booking Booking.com 80.3 79.5 81.5 80.0
Consumer / sportswear Nike, Adidas, Gymshark 65.0 62.6 16.7 63.8
Cloud / productivity SaaS Dropbox, Trello, Clickup 63.7 56.7 68.0 56.7
Professional services Architecture firms 45.0 50.0 0.0 0.0

What to do with your score

If you scored 80 or above: Your AI visibility is strong. The priority is maintaining accuracy — AI systems should be describing you correctly, not just mentioning you. Check the excerpts to verify the AI's description matches your current positioning.

If you scored 60–79: You have visibility but inconsistency. One or more models is likely scoring lower than the others — check which model is the weak point and focus on building evidence in the sources that model trusts. For Gemini, this typically means Google-indexed content. For ChatGPT, it means Bing-indexed content and third-party citations.

If you scored 40–59: You have partial entity recognition. The AI knows your brand exists but cannot describe it accurately or consistently. A structured brand profile on indexed third-party publications typically moves scores in this range significantly.

If you scored below 40 or zero: The AI has insufficient evidence to form an opinion about your brand. This is the most common situation for local businesses, professional services firms, and early-stage companies. The fix is building the third-party evidence layer — directory profiles, editorial publications, structured content — that gives AI systems a trusted source to cite.

Check your AI visibility score — free

See what ChatGPT, Claude and Gemini currently say about your brand. Score from 0–100 per model. No account required.

Check my brand →

Frequently asked questions

What is a good AI visibility score?
Based on 265 brand checks: 80 or above is dominant — the AI recommends you confidently. 60–79 is visible — you appear in relevant answers. 40–59 is weak — the AI has partial knowledge and mentions you inconsistently. Below 40 means you are effectively invisible to AI systems. For most local businesses and professional services firms, scores below 40 are typical starting points.
What is the average AI visibility score?
The average AI visibility score across 265 brand checks was 72.4 when at least one model returned a non-zero response. The median was 75. However, this dataset is biased toward well-known brands. For a typical business, the average would be significantly lower — many local businesses and professional services firms score zero across all three models.
Do ChatGPT, Claude and Gemini give the same score?
No. The average spread between the highest and lowest score for the same brand across all three models was 20.9 points. The biggest single disagreement was 50 points. Claude averaged 70.6, Gemini averaged 75.3, and ChatGPT averaged 71.7 — similar at the aggregate level but diverging significantly on individual brands. Always check all three models, not just one.
Why does my brand name give a different score than my domain name?
AI systems are trained on text that refers to brands by name, not by domain. Searching "Reddit" produced an average score 18.8 points higher than searching "Reddit.com." The same gap held for Dropbox vs Dropbox.com. Always check your brand name as it is most commonly referenced in writing — not your domain — for the most accurate result.
Why did a famous brand score zero on one AI model?
Gemini scored Nike zero in 100% of checks — 11 out of 11 — while Claude and ChatGPT returned non-zero scores for the same brand. AI visibility is not a proxy for brand awareness. Each model processes brand queries differently and has different knowledge gaps. A globally famous brand can score zero on a specific model for specific query types.
How do I improve my AI visibility score?
The most reliable improvements come from building third-party evidence — structured brand profiles on indexed publications, consistent directory listings, and FAQ-formatted content on your website. For model-specific gaps, focus on the sources each model trusts: Google-indexed content for Gemini, Bing-indexed content for ChatGPT. Check your score before and after any intervention to measure impact at aeogeoai.net.