Synthetic Focus Groups: Why Fake Respondents Can't Replace Real Customer Research

TL;DR

Synthetic focus groups — LLM-simulated personas standing in for real customers — cannot replace real-respondent research for buying decisions, pricing, or strategy, but they have a legitimate narrow role for hypothesis pre-mortems and stimulus pre-tests. Vendors like Synthetic Users and Outset.ai pitch a future where you "interview" 200 fake customers in 20 minutes; the math is seductive, the output is confident-sounding prose, and the failure mode is invisible until your roadmap is already wrong. Synthetic respondents inherit three structural defects: training-data drift (they reflect web text from 2020-2023, not your 2026 customer), sycophancy (LLMs are RLHF-tuned to agree with the framing of the question), and zero capacity for genuine surprise. Real-respondent AI moderation — N=200 conversations with actual customers, AI-moderated for follow-up depth — gets you the scale that made synthetic attractive without the epistemic loss. The right rule of thumb: use synthetic to pressure-test your hypotheses before fielding research, then field the research with real people. Treat any synthetic output that contradicts your real-respondent data as evidence the synthetic is wrong, not the customers. Perspective AI is built on the opposite premise — that the answer comes from the customer's mouth, not from a model's training set.

What synthetic focus groups are (and aren't)

A synthetic focus group is a study where instead of recruiting real people, the researcher prompts a large language model to role-play as a target persona and generate the responses an "interviewer" would have collected. The vendor pitch: define a persona ("28-year-old urban renter, household income $75K, owns a Subaru, has Lemonade renters insurance"), give the LLM an interview guide, and read the simulated transcript. The output reads fluent and on-format, the cost is near-zero per "respondent," and you can run "N=200" before lunch.

What they aren't: focus groups in any meaningful research sense. A focus group's epistemic value is that the participants are independent of the researcher's prior beliefs — they show up with information the researcher doesn't have. Synthetic respondents have no such independence. They can only return what's already implied in the prompt and the model's training data. The inputs are the outputs, slightly rearranged.

This is not a hypothetical concern. In April 2025 a multi-author paper from researchers at MIT, Cornell, and Stanford — "Generative AI Can Harm Learning" and adjacent work on simulated agents — documented that LLM-simulated populations systematically underestimate variance, miss minority opinions, and converge on the median web-text answer. Anthropic's own work on sycophancy in language models shows that frontier models are RLHF-trained to agree with users — exactly the wrong bias for research that's supposed to surface inconvenient truths.

The honest framing: synthetic focus groups are a hypothesis-rehearsal tool. They are not a research instrument. The rest of this post is about where that distinction matters.

Where synthetic actually works: hypothesis pre-mortems and stimulus pre-tests

Synthetic personas are useful for one thing: stress-testing your own thinking before you spend real money on real research. There are three legitimate use cases.

1. Pre-mortem on your interview guide. Before you field a study, run your interview script through a synthetic persona. If the simulated respondent gives boring, on-rails answers, your guide is leading the witness — your real respondents will do the same. A good synthetic run-through exposes where your questions assume the answer.

2. Stimulus pre-test for ad copy or concept descriptions. If you're testing five concept descriptions, a synthetic pass can flag the ones that read incomprehensibly to a non-expert. It won't tell you which concept will actually win in market, but it will catch the ones that fail at the "is this even readable" level. That's a $200 problem solved with $5 of API calls.

3. Edge-case scenario rehearsal. When designing an AI moderator's logic, you can test how it handles the difficult cases — the participant who refuses to answer, who's hostile, who contradicts themselves — by simulating those personas. This is a moderator-design tool, not a research tool, and it's how serious teams use synthetic data well.

The shared property of all three: synthetic is being used to test the researcher's own work, not to substitute for the customer. As soon as the question becomes "what will customers actually do?" — the answer has to come from customers.

For the broader picture of how AI moderation works on real respondents, see our pillar guide to AI focus groups in 2026 and the mechanics of AI-moderated focus groups.

Why synthetic fails for buying decisions

When the stakes rise — pricing, positioning, feature prioritization, churn intervention, market entry — synthetic focus groups break in three predictable ways. The myth/reality format below names each failure mode and what to do instead.

Myth: "Synthetic respondents are calibrated to real consumer behavior."

Reality: They're calibrated to the median of web text from their training cutoff. Frontier LLMs as of mid-2026 were trained on data with a cutoff somewhere between late 2023 and mid-2024. That means a "synthetic 28-year-old urban renter" is a statistical average of how 28-year-old urban renters were written about on the public internet up to 2024 — Reddit posts, blog comments, news articles, marketing copy. It is not how they think today. Anyone running insurance research on synthetic personas right now is asking a model trained mostly before generative AI was a household word to predict how households now feel about generative AI in their insurance experience. Training-data drift turns yesterday's median consumer into today's research input, and the gap widens monthly.

What to do instead: Field with real respondents who exist in 2026. AI moderation lets you run N=200 in a week — the scale problem that made synthetic attractive is solved by AI on the moderation side, not the respondent side.

Myth: "If we ask the LLM to be honest, it will be."

Reality: RLHF-trained models are sycophants by design. The training process rewards models for telling users what they want to hear. Multiple peer-reviewed evaluations have documented that when you frame a question with an embedded assumption ("As someone who values speed, would you say..."), the model agrees with the assumption a majority of the time — even when the assumption contradicts what an unprimed model said five turns earlier. A real customer will push back, change the subject, or get visibly frustrated when your premise is off. A synthetic respondent has been trained to make you feel heard.

What to do instead: Use conversational data collection with real respondents and AI moderation that's designed to probe disagreement, not paper over it.

Myth: "Synthetic data captures the long tail of customer types."

Reality: The long tail is exactly where synthetic models break. LLMs perform mode-collapse on rare populations: ask for a "65-year-old rural farmer who's anxious about AI replacing his accountant," and you'll get a transcript that reads like a stereotype written by someone who's never met one. Real outliers contradict the median. Synthetic outliers conform to it. This is why every major synthetic-data evaluation finds variance compression — the simulated population looks more uniform than the real one.

What to do instead: Recruit real long-tail respondents through targeted screeners. AI moderation makes the per-respondent cost low enough that you can over-sample minority segments. See our guide on online AI focus group recruitment for the screener mechanics.

Myth: "Synthetic is faster, so we can iterate more."

Reality: Faster wrong is still wrong. The bottleneck in research isn't fielding; it's learning something you didn't already believe. Synthetic respondents, by construction, can only echo your priors back at you. You will iterate quickly toward exactly the conclusion you started with. A research process that confirms your hypothesis in 20 minutes for $50 has produced no new information — it's just an expensive way to write your own talking points.

What to do instead: Compress the real-respondent timeline. AI moderation runs N=200 in 5-7 days at roughly $5K-$10K all-in, vs. 6 weeks and $40K+ for traditional moderated focus groups. The speed advantage that pulled teams toward synthetic exists in the real-respondent AI workflow too, without the epistemic loss. See Customer Research at Scale.

Myth: "We'll just calibrate against real data once and then run synthetic."

Reality: Calibration on a known dataset is the easy part — and it doesn't transfer. A model fine-tuned to match a 2024 baseline will drift the moment the real population's preferences move. The whole point of customer research is to detect those moves. A synthetic system, by definition, can't.

What to do instead: Treat any synthetic output that disagrees with your real-respondent data as evidence the synthetic is wrong. Use synthetic for hypothesis pre-mortems only.

What good real-respondent AI research looks like

The real win isn't synthetic vs traditional focus groups. It's AI-moderated real-respondent research vs traditional moderated focus groups. That comparison is where AI is actually changing the economics of customer research.

A modern AI-moderated study looks like this. The researcher writes a 6-8 question outline with the strategic objectives clearly named. A recruitment partner (or first-party customer panel) supplies 100-300 qualified respondents. The AI moderator opens each conversation, asks the planned questions, and probes follow-ups based on what each individual respondent says — handling vague answers, surfacing the "why" behind one-word responses, and noticing when someone is contradicting themselves. The transcripts come back coded, themed, and quoted. A senior researcher reads the synthesis, judges the patterns, and makes the strategic call.

The cost: $5K-$15K for N=200, depending on incentives and panel quality. Timeline: 5-10 business days. Output: 200 real transcripts that contain genuine surprises, real disagreements, and the messy "I don't know, it depends" answers that contain the strategic information.

For a head-to-head on cost, depth, and decision quality between AI moderation and traditional focus groups, see AI vs Focus Groups in 2026. For the bolder version of the argument that the 8-person conference room itself should go, see Replace Focus Groups With AI. For the buyer's lens on platform selection, see How to Evaluate an AI Focus Group Platform and the 12 platforms ranked by research depth.

This is the lane Perspective AI is built for: AI customer interviews at scale, with real respondents, designed for the moments when forms and surveys flatten the truth. We are explicitly not a synthetic-data vendor. The product is built on the assumption that the answer lives in the customer's voice, not in a model's training set.

When to use synthetic, when to use real, when to use both

A simple decision rule for research leaders.

Research question	Use synthetic?	Use real?
Is my interview guide leading the witness?	Yes (pre-mortem)	No
Is this concept description even readable?	Yes (stimulus screen)	No
How does my AI moderator handle a hostile respondent?	Yes (moderator design)	No
Will customers pay $X for this feature?	No	Yes
Why are customers churning?	No	Yes
What does the new buying committee actually care about?	No	Yes
Should we enter this market?	No	Yes
What pricing tier resonates with mid-market?	No	Yes

The pattern: synthetic for researcher-side questions, real for customer-side questions. The moment the answer would actually change a roadmap, a price, or a positioning decision — talk to a customer.

For the broader argument that surveys and synthetic both miss the same thing, see AI vs Surveys: Why Conversations Win for Real Customer Research. For the operational playbook on running real-respondent qualitative at scale, see AI Qualitative Research and UX Research at Scale.

Frequently Asked Questions

Are synthetic focus groups ever valid for real customer research?

No. Synthetic focus groups should not be treated as a substitute for real customer voice on any question whose answer would change a buying, pricing, positioning, or roadmap decision. They are a researcher-side tool — useful for pre-mortems on your interview guide, stimulus screens, and moderator design — but the moment the question becomes "what will customers actually do," the answer has to come from real respondents. Treat synthetic output that disagrees with real-respondent data as evidence the synthetic is wrong, not the customers.

What's the difference between synthetic users and AI-moderated interviews?

Synthetic users are LLM-simulated personas generating fake transcripts; AI-moderated interviews are real customers being interviewed by an AI moderator. The difference is which side of the conversation is artificial. With synthetic, the respondent is a model — output is bounded by training data and prompt framing. With AI moderation, the respondent is a real human and the AI is doing the moderation, follow-up, and probing — output contains genuine information from a person who exists. Vendors like Synthetic Users and Outset.ai sell the first; Perspective AI builds the second.

Why does training-data drift matter so much?

Training-data drift matters because frontier LLMs are trained on web text with a cutoff months or years before deployment, and customer preferences move faster than that. A model with a late-2023 training cutoff cannot tell you anything reliable about how 2026 customers feel about generative AI in financial services, AI-assisted home buying, or AI claims handling — because the training data predates the lived experience the question is asking about. Every month that passes widens the gap between the synthetic median and the real one.

Isn't sycophancy a solvable problem with better prompting?

Better prompting can soften sycophancy on a single turn but cannot eliminate it across a multi-turn interview, because RLHF-trained models are structurally biased toward agreement with the user. Anthropic's own published research and independent academic evaluations have documented this consistently. You can prompt a model to "disagree honestly" and get one or two pushbacks, but across a 20-question interview the cumulative drift toward agreement reasserts itself. Real customers don't have this problem — they will frustrate, contradict, and walk away from premises they reject.

Could synthetic focus groups improve enough to replace real ones in the future?

The structural problems — sycophancy, training-data drift, mode collapse on minorities — are properties of how these models are built, not bugs to be patched. Even if frontier models keep improving, they will continue to reflect the median of their training data, which is by definition the past. The only way to get the present customer's voice into research is to ask the present customer. Synthetic will keep getting better at researcher-side tasks (pre-mortems, stimulus screens), but the role of real-respondent research is structurally safe.

How do I explain this to a stakeholder pushing for synthetic?

Reframe the conversation around what synthetic actually buys you: it doesn't replace real research, it makes your real research better by stress-testing your interview guide before you field it. The stakeholder probably wants the cost and speed advantage they read about. Show them that AI-moderated real-respondent research delivers most of the cost/speed gains (N=200 in a week for ~$10K) without the epistemic loss. Then show them what good AI moderation actually produces: real transcripts, real surprises, real disagreement. The pitch isn't "no to synthetic," it's "synthetic in its right lane, real customers everywhere it matters."

Conclusion

Synthetic focus groups are not a research method; they are a hypothesis-rehearsal tool that some vendors have over-pitched as a research method. The distinction matters because the cost of getting it wrong is invisible — you don't know your synthetic study was wrong until you launch the wrong product, ship the wrong price, or miss the real reason customers are churning. The structural defects (training-data drift, sycophancy, mode collapse, no genuine surprise) are properties of the technology, not bugs that will be patched.

The good news: the scale and speed advantage that made synthetic attractive is fully available on the real-respondent side. AI-moderated customer interviews run N=200 in a week, at a fraction of traditional focus group cost, with the actual customer's voice in the transcript. That's the research method synthetic focus groups were trying to be.

Perspective AI is built on real customer voice — AI-moderated interviews at scale, designed to capture the why, the it-depends, and the surprises that synthetic models structurally cannot produce. If your research question would change a real decision, start a real-respondent study — or see how Perspective AI works end-to-end. Synthetic has its uses; this isn't one of them.

TL;DR#

What synthetic focus groups are (and aren't)#

Where synthetic actually works: hypothesis pre-mortems and stimulus pre-tests#

Why synthetic fails for buying decisions#

Myth: "Synthetic respondents are calibrated to real consumer behavior."#

Myth: "If we ask the LLM to be honest, it will be."#

Myth: "Synthetic data captures the long tail of customer types."#

Myth: "Synthetic is faster, so we can iterate more."#

Myth: "We'll just calibrate against real data once and then run synthetic."#

What good real-respondent AI research looks like#

When to use synthetic, when to use real, when to use both#

Frequently Asked Questions#

Are synthetic focus groups ever valid for real customer research?#

What's the difference between synthetic users and AI-moderated interviews?#

Why does training-data drift matter so much?#

Isn't sycophancy a solvable problem with better prompting?#

Could synthetic focus groups improve enough to replace real ones in the future?#

How do I explain this to a stakeholder pushing for synthetic?#

Conclusion#

More articles on AI Conversations at Scale