How AI Search Engines Like Perplexity and ChatGPT Choose Which Sources to Cite

Why some websites get cited by Perplexity, ChatGPT, and Gemini while others are ignored — and the content and structural signals that influence AI source selection.

Table of Contents

Core principle

AI answer engines are not search engines in the traditional sense. They do not return a list of links and let the user decide. They select a small number of sources, extract information from them, and synthesise an answer. Getting cited means being selected at that extraction step — not just appearing in search results.

How AI answer engines retrieve and select sources

The retrieval and selection process varies by system, but the common pattern has two stages:

Stage 1 — Retrieval. The AI system queries a search index (often Bing, in the case of Perplexity and early ChatGPT search) or its own crawler index to generate a candidate set of pages relevant to the query. Standard SEO signals — domain authority, topical relevance, indexation — determine which pages enter this candidate set.

Stage 2 — Selection and extraction. From the candidate set, the system evaluates which pages contain the most reliable, relevant, and extractable answer to the query. Pages that answer the question directly, with clear structure and factual specificity, are favoured over pages that discuss the topic generally.

Appearing in the candidate set requires standard SEO. Being selected and cited requires something additional.

Signals that influence AI citation

Direct, specific answers near the top of the page

AI systems extract answers from the text of the page. A page that answers the query in its first two paragraphs — rather than building toward an answer across several sections — is easier to extract from and more likely to be cited.

This is the opposite of a common long-form SEO pattern where the answer is buried after extensive preamble. For AI citation, lead with the answer.

Clear heading hierarchy

AI crawlers parse heading structure to understand the organisation of a page. An h1 that matches the query topic, h2 headings that map to subtopics, and h3 subheadings for detail create a machine-readable outline that makes extraction reliable.

Skipped heading levels, decorative headings that do not reflect content structure, and walls of text without hierarchy all reduce extractability.

Factual density and specificity

AI systems favour sources that contain specific, verifiable claims — numbers, dates, named entities, defined processes — over sources that describe topics in general terms. A page that states "clinics using automated WhatsApp reminders report 30–50% reduction in no-shows" is more citable than one that states "automated reminders can reduce no-shows."

Specificity signals that the content is based on real knowledge rather than generated filler.

Topical authority

AI systems weight sources that demonstrate consistent, deep coverage of a topic domain over sources that cover many unrelated topics. A website with twenty posts on dental clinic technology is a more credible citation source for a dental technology query than a website with one dental post among two hundred unrelated articles.

This is the mechanism behind topical authority strategies in SEO — and it applies with equal force to AI citation. See AI SEO for SaaS Websites for a content architecture approach to building topical depth.

E-E-A-T signals

Google's E-E-A-T framework — Experience, Expertise, Authoritativeness, Trustworthiness — was developed for human quality raters but maps closely to what AI systems use to evaluate source credibility:

Schema markup

Structured data helps AI systems extract and attribute information accurately:

See Structured Data for SaaS for implementation detail on the schema types that affect both Google and AI visibility.

Crawl accessibility for AI bots

AI companies operate dedicated crawlers: GPTBot (OpenAI), PerplexityBot, ClaudeBot (Anthropic), GoogleBot-Extended (Gemini). If these are blocked in robots.txt, the site cannot be cited — regardless of content quality.

Check your robots.txt and verify that AI crawler user agents are not listed under Disallow. If they were blocked as a precaution during earlier periods of uncertainty about AI scraping, review whether that policy still reflects your goals.

What this means in practice

A page optimised for AI citation:

Most of these overlap with good content practice. The main adjustments relative to traditional SEO are: leading with the answer rather than building toward it, prioritising specificity over length, and ensuring AI crawlers are not blocked.

Summary

AI answer engines select sources based on retrievability, extractability, and credibility. The content signals that drive citation — direct answers, clear structure, factual specificity, topical authority, and schema markup — are also good SEO signals. The technical requirement specific to AI citation is ensuring the relevant crawler user agents have access.

For SaaS companies, the implication is that content depth within a defined topic domain is more valuable than broad coverage. A site cited for one topic consistently will be cited more often than a site that covers everything once.

AKORNET builds SEO and AI visibility into all four of its SaaS products. Learn more at akor.net →

FAQ

How do AI search engines like Perplexity decide which sources to cite?

AI answer engines typically retrieve candidate pages via a search index, then evaluate them for relevance, authority, and content quality. Pages with clear structure, direct answers to the query, factual density, and established topical authority are favoured. Schema markup and clean heading hierarchies improve machine readability and citation probability.

Does Google E-E-A-T affect AI search citation?

E-E-A-T signals — Experience, Expertise, Authoritativeness, Trustworthiness — influence both Google ranking and AI source selection. Content that demonstrates first-hand experience, cites verifiable data, identifies named authors, and is associated with a credible entity is more likely to be retrieved and cited by AI systems.

Do AI crawlers use schema markup?

Yes. Structured data — particularly FAQPage, HowTo, Article, and Organisation schema — helps AI systems extract and attribute information accurately. FAQ schema in particular maps directly to the question-answer format that AI answer engines use.

Is it possible to optimise specifically for AI citation rather than Google ranking?

The signals overlap significantly. Content optimised for AI citation — clear structure, direct answers, factual depth, topical authority — also tends to rank well in Google. The main addition for AI citation is ensuring content is crawlable by AI-specific bots (GPTBot, PerplexityBot, ClaudeBot) and structured for machine extraction.

Need help implementing this?

Talk with the AKORNET team about your project or SaaS infrastructure.

Get in Touch →