Beyond "Average": How iMario Is Building a Million Synthetic Individuals
Imagine you are designing a new consumer product. You want to know how the users will react to its design, features or price. You ask a standard LLM like GPT-5 to "simulate 100 different people" to test.
What happens? You usually get a hundred versions of a polite, slightly cautious, middle-of-the-road "Average Joe." This is what researchers call "Mode Collapse." In the real world, people are messy, contradictory, and diverse. They are activists, skeptics, grandmas who don't trust technology, and teenagers who live on TikTok, rednote, etc.
What if you could observe how millions of synthetic individuals - each with a unique background, personality, and desires - interact, react, and make choices. This isn't science fiction anymore.

iMario Persona Engine
Traditional AI agents are often "hand-crafted", creating these detailed personas was a manual, painstaking process. You tell the AI: "You are a 40-year-old marketing expert with rich experience of the FMCG industry in the US." with his life stories possibly. This works for a handful of characters, but for millions of unique individuals, it's a scaling nightmare.
iMario.ai introduces a way to realize this, specialized LLM-based pipelines that turn simple "seed" information into full-blown synthetic personas. A persona is not just a description; it’s a structured, machine-readable dataset of characteristics, biographies, and preferences.
The Two-Step "Digital Birth"
-
Stage 1: The Blueprint (Macro)
Instead of writing a biography immediately, the AI first picks "coordinates" in a high-dimensional space of traits (age, political views, openness to change, etc.). It uses an autoregressive approach, meaning every new person it creates intentionally tries to be different from the ones created before. -
Stage 2: The Flesh (Micro)
Once the traits are set, the AI fills in the details - names, life stories, specific memories, and even how they talk.
Injecting Diversity: The Art of Attribute Sampling
The real breakthrough, however, is not just generating a persona, but how to generate a diverse population of personas. If you just ask an LLM to generate 1,000 "young people interested in NEV," you will likely get 1,000 highly similar profiles. Why? Because LLMs are trained on the statistical modes of their data. They will gravitate towards the most common representation.
We use a powerful sampling strategy to define a desired distribution for each attribute and, crucially, the joint distribution between attributes.
In a simple random sampling, the population might lack critical data points and under-represent whole groups. In a structured sampling, like stratified or importance sampling, you can explicitly target those gaps and generate a population that perfectly reflects your desired distribution. You can ensure that your synthetic individuals has the correct mix of ages, professions, ethnicities, and opinions, making the whole system more realistic and useful.
This control over attribute sampling is what truly unlocks diversity at scale. It means you can generate not just many personas, but all kinds of personas.
What Makes A Persona Real? (Coherence and Realism)
Of course, diversity is meaningless if the individuals make no sense. A 20-year-old student with a Ph.D. and 30 years of management experience is useless. To ensure coherence, iMario employs structured prompting and verification techniques.
The "chain-of-thought" (CoT) prompting style, where the LLM is encouraged to break down its reasoning into smaller steps, is implicit in our sequential generation process. By generating attributes one by one, the model is guided towards internally consistent results. A prompt to generate a character's interests can be made contingent on the previously generated age and location.
iMario is equipped with checks that detect and prune logically flawed personas. Imagine a check that Flags: (Attribute A = 20 years old) + (Attribute B = Ph.D.) as highly unlikely or impossible. This automated verification is crucial for maintaining the quality and realism of the population. The generated biographies must also capture the nuance of a real person's "voice" - a feature that LLMs excel at.
Imaging A New World
What is the ultimate destination for these millions of diverse, high-fidelity synthetic people?
-
A "digital twin" of your city. Urban planners could populate it with diverse synthetic residents, each with a complex set of behaviors and transportation needs. They could then model the impact of a new bus line or a changes in zoning laws on this realistic population.
-
Market researchers could conduct massive, detailed focus groups with synthetic consumers who perfectly mirror the demographics and psychographics of their target market, getting initial feedback on a new product without the cost and time of human testing.
-
Game developers could populate an entire game world with distinct, complex NPCs who have deep backstories, complex relationships with other characters, and unique long-term goals.
-
In the social sciences, researchers can run controlled simulations to study the spread of ideas, the formation of social norms, or the dynamics of political polarized groups.
-
And more...
We are moving towards a future where we can create complex "synthetic communities" that are indistinguishable from human society in terms of structure, diversity, and behavioral patterns.
Last: The Ethics of Digital Doppelgängers
To be honest, the biggest challenge is still "the bias". Since synthetic individuals are powered by LLMs, which are trained on internet data, they risk amplifying existing societal biases. A synthetic individual might disproportionately generate male engineers and female nurses, or associate certain interests with specific ethnic groups. The same attribute sampling method that enables diversity could be misused to intentionally skew distributions, creating an inherently biased population.
Another concern is misuse. These digital personas are so realistic that they could be used for malicious purposes, such as generating large volumes of astroturfing content (fake product reviews, fabricated social media posts) or creating highly convincing digital companions for social engineering attacks (scamming).