Methodology

How the benchmark measures whether AI systems can actually cite a brand.

The CiteOps benchmark does not reward vanity traffic signals. It scores whether a site is crawlable, quotable, decision-ready, and trustworthy enough that answer engines can cite it with confidence. If the data is not fresh enough to support that claim, the benchmark shows a refreshing or blocked state instead of a fake live view.

Agent readiness

Weight 42%

Measures whether answer engines can crawl, parse, and trust the site's machine-readable canon. This includes robots policy, llms canon, schema coverage, canonical hygiene, and basic semantic structure.

Decision surface

Weight 33%

Measures whether the site exposes the pages buyers actually ask from: pricing, comparisons, glossary-style definitions, docs, implementation pages, and answer-first content.

Trust density

Weight 25%

Measures proof, freshness, authorship, named-entity consistency, and public corroboration. The same brand facts should appear consistently across the public site and supporting surfaces.

1. Crawl and access layer

The benchmark starts by checking whether major AI crawlers can see the site at all. If GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot, or Google-Extended are blocked or partially blocked, the discovery surface collapses before content quality even matters.

This is why robots policy, CDN or WAF behavior, and root-file availability are scored early. A technically beautiful site that cannot be crawled by the relevant agents is not citation-ready.

2. Machine-readable canon

Next, CiteOps looks for a machine-readable briefing layer: llms.txt, llms-full.txt, canonical URLs, schema blocks, and quote-ready answer structures. These surfaces reduce ambiguity for models and make brand facts easier to lift accurately.

The system does not treat any one file as magic. Instead, it checks whether the overall canon is coherent enough that a model can move from the homepage to pricing, methodology, glossary, answers, and proof surfaces without losing the thread.

3. Decision pages

Answer engines often cite the page that best resolves a buying decision, not the page with the most traffic. That means pricing, comparisons, alternatives, implementation docs, benchmarks, and answer pages matter more than generic awareness content.

A site can have a strong homepage and still lose the prompt that actually matters if the comparison, pricing, and methodology pages are thin or missing.

4. Proof and freshness

CiteOps checks whether the site looks maintained and provable. Freshness signals, dated changelog entries, visible methodology upkeep, case studies, public proof pages, and quantified outcomes all raise trust density.

A frozen site with vague copy is harder to cite than a maintained site with fewer pages but stronger proof. This is especially true for Perplexity and other engines that react strongly to recency and sourceability.

5. Entity graph

The final layer is entity coherence: named organization, founder or operator context, research partner, benchmark presence, glossary terms, methodology pages, and off-site corroboration. AI systems cite entities, not isolated files.

The benchmark therefore rewards sites that make the brand easy to anchor across multiple surfaces, on-site and beyond.

High-scoring brand

A high-scoring brand usually has permissive AI-crawler access, strong schema, obvious pricing, clear alternatives content, visible freshness, and a denser entity graph. The score does not come from one trick. It comes from many small surfaces agreeing with each other.

Middle-tier brand

A middle-tier brand often has decent docs and a good homepage but thin llms canon, weaker comparisons, or no public methodology. The recommendation is usually to strengthen the decision layer and make the site easier to quote directly.

Low-scoring brand

A low-scoring brand usually blocks crawlers, lacks pricing clarity, hides product facts inside generic marketing copy, or has no stable answer-first pages. The fix path starts with access and machine-readable basics before expanding outward.

Benchmark states

Live

Managed or hosted benchmark rows were refreshed recently enough to represent current public reality.

Curated verified

A reviewed starter row is serving while the managed benchmark continues to backfill. Provenance stays explicit.

Refreshing

The benchmark source is being rerun or repopulated. The site does not pretend old data is live.

Blocked

A real runtime or access problem prevented a trustworthy refresh. Public proof pages remain truthful.

Next step

Read the benchmark with the methodology in mind, then run your own proof page.