Synthetic query seeding for AEO with LLM generated exploratory queries

Why this matters now

On Sep 8, 2025 Spotify released the AudioBoost study showing that LLM generated synthetic queries, when indexed in autocomplete and retrieval, lifted audiobook impressions, clicks, and exploratory query completions for a cold start catalog. Treating autocomplete and retrieval as rankable surfaces measurably improved exploration. See the Spotify AudioBoost study for details. Earlier work from Spotify found that graph based query suggestions increased coverage and clicks on exploratory queries, underscoring autocomplete as a high leverage entry point. Review the graph learning for exploratory suggestions approach.

Thesis

Discovery now begins before a results page. Autocomplete menus, dynamic facets, related questions, and in app Q&A prompts are rankable. Seeding these surfaces with intent rich synthetic queries lets brands win high intent moments before traditional SEO has a chance to load.

The synthetic query matrix

Cover the full space of early exploration:

Season or event
Use case
Price band
Symptoms or problems
Jobs to be done
Audience segment
Locale or language
Compatibility or fit
Benefits and tradeoffs
Constraints like time and space

Example intents:

Seasonal: patio heater for small balcony winter use
Use case: laptop for video editing while traveling
Price band: noise canceling headphones under 150
Symptoms: itchy eyes relief for spring allergies
JTBD: organize receipts for quarterly taxes

Generation workflow

1) Source ground truth

Pull product and service metadata, reviews, UGC, customer support logs, past queries, SKU attributes, inventory and pricing, store availability, and policy constraints.

2) Prompt the LLM for exploratory coverage

Template: Generate 25 natural language queries a shopper might type for {category or symptom} that reflect early exploration. Vary specificity, attribute combinations, and constraints. Include phrases common in {locale}. Range across price bands and jobs to be done. Exclude brand slurs and regulated claims.
Add guardrails: policy snippets, banned claims list, medical or legal disclaimers, locale vocabulary, and competitive rules.

3) Score, dedupe, and cluster

Normalize and language detect. Remove near duplicates with MinHash or cosine-similarity thresholds. Cluster by intent and attribute bundle. Score for novelty vs historical logs and expected answerability given current content.

4) Map queries to answers

For each synthetic query choose a best landing object: answer snippet, buying guide module, comparison table, PDP section, store locator, or expert Q&A.
Generate one short answer, one follow up suggestion, and canonical facets to pre select.
Attach schema such as FAQPage, HowTo, Product variant, and medical disclaimers where appropriate. Create retrieval chunks tied to SKUs and policies so answer engines can ground responses.

5) Seed every rankable surface

Autocomplete and suggested searches: push top intent clusters into site QAC with labels like for small rooms or under 200. Mirror to retailer partner QAC programs where available.
Facets and filters: add synonyms and composite facet presets that reflect the synthetic intents. Example preset small balcony safe heaters with power 1200 to 1500 W and tip over shutoff.
On site answers and Q&A: pre write short answers tied to inventory and policy. Surface in chat widgets and category landing pages.
Collections and landing pages: auto build curated collections for each high value cluster and link from autocomplete and related searches modules.
Emerging answer engines: provide structured answers and SKU grounded evidence in feeds or APIs where supported. Optimize for answer boxes that show clarifying prompts and follow ups. For parallel tactics at the browser level, see the Chrome-as-Answer AEO playbook.

Test and learn like a performance channel

Measure both offline and online, and iterate:

Offline: retrievability uplift, intent coverage, novelty vs logs, answerability rate, toxicity and policy compliance.
Online: autocomplete click through, exploratory query completion rate, search refinement reduction, PDP views per search, add to cart rate, margin weighted conversion, store pickup starts, customer support deflection.
Use the AudioBoost deltas as directional benchmarks for early phases when you seed both autocomplete and retrieval.

Governance and risk controls

Hallucination control: require evidence snippets and SKU ties for any claim. Block unsupported medical, legal, or safety statements. Insert disclaimers for symptom based guidance and route to licensed content where needed.
Brand and compliance: canonical terminology, regional compliance flags, accessibility language. Maintain a denylist and a required phrases list per market. Extend governance into crawling surfaces by turning robots.txt as paid AEO.
Freshness: auto retire queries that lead to out of stock items or expired promotions. Refresh seasonally and ahead of events.

Operating cadence

Weekly: regenerate the top 20 percent of clusters by traffic and margin. Update QAC and related search modules.
Monthly: expand matrix coverage and rotate collections. Re run dedupe and novelty checks.
Seasonal: pre seed 6 to 8 weeks before peaks like back to school or holiday.

Quick start in 30 days

Week 1: define the intent matrix and success metrics. Integrate product and policy data.
Week 2: generate and score 5k to 10k synthetic queries. Map to answers and facets.
Week 3: seed site QAC, related searches, and Q&A modules. Launch an A or B experiment. Extend to visual entry points with camera-first AEO with Amazon Lens.
Week 4: analyze uplift, prune low performers, export winning clusters to retailer partners and answer engines.

Stack blueprint

LLM orchestration with policy prompts and toxicity filters
Vector store and lexical index for dual retrieval
Query store for QAC and suggestion APIs
Feature flags and experimentation
Observability for coverage and compliance

What good looks like

Autocomplete includes attribute rich intents that feel human and local.
Exploratory query completions rise and refinements fall.
Category pages and Q&A show grounded answers with clear next steps.
Retailer app search exposes your curated presets.
Answer engines surface your concise, supported answers before a traditional results page loads.

Summary

Spotify’s AudioBoost offers production proof that LLM generated synthetic queries can lift exploration when seeded into autocomplete and retrieval. Applying the same strategy across brand owned surfaces, retailer apps, and answer engines creates a durable early discovery moat for AEO.