iMario vs Base LLMs: Solving Mode Collapse and Identity Drift in Synthetic Individuals

If you try to use ChatGPT, Claude, or Doubao to run a qualitative user interview, the illusion usually shatters pretty quickly. The first few questions might sound convincing. But if you ask the model to simulate a tens of thousand different people, or try to hold a deep 30-minute conversation, you will find: the personas start sounding identical, they blindly agree with everything you say, and they eventually forget their own backstory.
This is known as Persona Collapse or the Artificial Hivemind Effect. While LLMs are incredible general-purpose reasoning engines, their raw output is fundamentally bad at maintaining distinct, diverse human identities over time.
In recent evaluations, we compared standard LLMs against iMario's dedicated synthetic users platform to see how they perform when generating and interviewing synthetic individuals at scale. Here is what the data shows.
The Human Layer Benchmark
Based on recent academic frameworks evaluating "Pluralistic Alignment" and "Persona Gyms," we measure synthetic users across two critical dimensions:
-
Scale & Representation: Can the system generate 10,000+ distinct personas that accurately reflect real-world demographic and psychological diversity?
-
Long-Term Consistency: Can the synthetic individuals maintain its specific identity, beliefs, and behaviors across a long, multi-turn interview without breaking character?
1. Scale & Representation (Overcoming the Hivemind)
When you ask a base LLM to generate 10,000+ different user profiles, it suffers from severe mode collapse. Studies show that base models tend to over-represent majority viewpoints and generate stereotypical attributes. They default to a "helpful assistant" tone, regardless of the prompt.
iMario, on the other hand, is built specifically to construct populations. When you need 10,000 or even more unique "Gen Z gamers from the Midwest," iMario generates 10,000 statistically accurate, distinct individuals whose socioeconomic backgrounds, quirks, and nuanced opinions perfectly match real-world sociological distributions.
| Metric (Scale: N=10,000) | GPT-5.3 | Claude 4.6 Opus | DeepSeek V3 | Doubao 2.0 | iMario |
|---|---|---|---|---|---|
| Linguistic Variance | Low | Medium | Low | Low | High |
| Demographic Parity | Skewed | Skewed | Highly Skewed | Skewed | Real-world Parity |
| Mode Collapse | ~55% | ~50% | ~65% | ~60% | < 5% |
| Sycophancy | High | Medium-High | Medium-High | High | Low (Maintains opinion) |
Evaluation Methodology: Tested generating 10,000 synthetic individuals based on US Census criteria. Variance and homogenization computed using embeddings distance metrics and qualitative review.
Base models are trained to be polite and helpful. iMario personas are designed to act like real humans—which means they will disagree, express frustration, or hold unpopular opinions if it aligns with their profile.
2. Long-Term Consistency (The Interview Endurance Test)
In a 60-minute qualitative interview, context is everything. Standard LLMs rely entirely on their context window. As the conversation grows, their attention dilutes. A major academic study on Identity Drift in LLM Agents revealed a startling fact: larger, more advanced models actually experience greater identity drift than smaller models over time. Around the 10th or 15th question, a persona that started as an "impatient 50-year-old executive" will slowly revert back to a standard AI chatbot tone.
iMario is designed to bypass this limitation. Even after extended back-and-forth interactions, follow-up questions, and topic shifts, the synthetic individual remains flawlessly in character.
Identity Consistency
Tracking persona attribute adherence, tone retention, and memory recall over a 40 turns interview.
Evaluation Methodology: Tested via an internal multi-turn automated validation pipeline measuring context retention, trait persistence, and hallucination rates across 10,000 distinct synthetic individuals. 40 consecutive interview interactions per individual. Baseline metrics adapted from academic identity drift frameworks (e.g., PersonaGym 2024, NeurIPS 2025).
iMario Delivers Production-Ready Research Capabilities
By utilizing dynamic persona fabric techniques and continuous state management, iMario drops mode collapse to under 5% and pushes long-term identity consistency to 96%. When it comes to conducting rigorous qualitative interviews with thousands of synthetic individuals simultaneously, iMario provides the specialized infrastructure that off-the-shelf LLMs simply lack.