What Are Synthetic Individuals And Why They Can Represent Real Humans

Part I — What synthetic individuals are
Synthetic Individuals, aka Synthetic Ssers / Synthetic Humans / Virtual Respondents / Digital Humans are constrained AI agents designed to simulate a specific person and how would he/she respond under a defined context.
Two words matter:
- Constrained: a profile + boundaries (who they are, what they know, what they care about, what they won’t claim).
- Context: the same “person” should react differently to a pricing choice vs. an onboarding flow vs. a sales objection.
Not personas! Not roleplay!
People usually start with “LLM roleplay.” It’s fluent and unstable. Synthetic individuals are closer to executable personas: you can run them repeatedly, vary the stimulus, and compare outputs across iterations.
It's own personal mental model: a persona is a slide; a synthetic individual is a simulator.
It’s closer to a flight simulator than a focus group - you run it to discover failure modes and sharpen hypotheses.
| Approach | What it is | Strength | Typical failure |
|---|---|---|---|
| Static personas | A doc describing a segment | Alignment | Not executable; goes stale |
| LLM roleplay | One-off prompt: “pretend you are…” | Ideation | Drifts; overconfident |
| Synthetic individuals | Profile + boundaries (+ state) | Repeatable simulation | Needs calibration |
| Real humans | Lived experience | Truth, novelty | Slow; biased; costly |
What’s inside a synthetic individual?
- Seed: demographic attributes + constraints.
- Soul: trait + stances + preference.
- Memory: life events + what just happened, trust, fatigue.
- Engine: an LLM for language and reasoning.
- Governance: consistency checks, safety, and “I don’t know” defaults.
SOP: a practical definition of “good enough”
“Is it real?” is a trap question. My rule is: trust synthetic output at the level you can calibrate.
A useful calibration loop is Synthetic–Organic Parity (SOP): run the same instrument with a human benchmark cohort and a matched synthetic cohort, then compare distributions and correlations (not just averages). Repeat as context changes.
Where synthetic individuals help today
- Concept and messaging exploration: iterate fast before you buy attention.
- Discussion-guide rehearsal: find weak questions and missing probes.
- Segment contrast: see how objections and trade-offs differ by audience.
- Sales rehearsal: pressure-test objections for a defined ICP.
Where they don’t yet
- High-emotion truth trauma, grief, end-of-life.
- Physical behavior observation real environments, real friction.
- True novelty with no meaningful reference data.
- Compliance-critical studies where real consent is required.
Part II — Why synthetic individuals can represent real humans
“Represent” doesn’t mean “be conscious” or “be identical.” It means preserving decision-relevant patterns: how trade-offs cluster, how preferences shift by segment, and how people justify choices in language.
1) A direct evidence for subgroup-level proxying in surveys
Argyle et al. (Out of One, Many, 2023) test whether language models can proxy sub-populations. With proper conditioning, they generate “silicon samples” that emulate subgroup response distributions and relationship patterns observed in human data (for the domain studied).
2) LLMs make agents legible in natural language
Park et al. (2023) show “generative agents” that store experiences, reflect, and plan, producing believable individual and social behaviors in a sandbox. For product teams, this matters because simulation can now speak the language of interviews, objections, and narratives.
3) Large-scale experimental evidence for behavioral fidelity
- Replication at scale: Cui, Li & Zhou (2025) replicated 156 scenario-based experiments in psychology and management using three LLMs, reporting 73–81% replication of main effects.
- Preference estimation: Brand, Israeli & Ngwe (HBS Working Paper, 2023; revised 2025) show willingness-to-pay estimates derived from LLM response distributions are realistic and comparable to human studies, and that incorporating prior survey data via fine-tuning can further improve alignment in similar contexts.
- Strategic social behavior: Palatsi et al. (2025) build a “digital twin” of game-theoretic experiments and report an LLM configuration that reproduces human cooperation patterns with high fidelity—then uses the calibrated simulator to generate preregisterable predictions for new game settings.
Put differently: synthetic individuals become plausible proxies when (a) the task is language-mediated, (b) the agent is constrained, and (c) you treat parity as a measurable target (SOP), not a belief.
Best practice: hybrid research (speed first, humans for validation)
The most pragmatic strategy is:
- Synthetic for speed: exploration, iteration, rehearsal.
- Humans for validation: high-stakes decisions and emotional nuance.
- SOP over time: treat parity as a continuous calibration problem.