How 10,000 Synthetic Patients at mock.health Stack Up.
We ran 10K patients through Census, CDC, and comorbidity benchmarks. 15/15 pairs correct. 18/20 prevalences in range. Honest numbers.
the nighthawk · 12 min read · 2026-03-23
We generated 10,000 synthetic patients and compared them against published population health benchmarks from the CDC, CMS, AHA, and US Census Bureau. No cherry-picking. No "representative examples." Every number below comes from the full 10,000-patient population with complete bundle analysis.
Most synthetic data generators claim "realistic" output. Few show their work. Here's ours — including the parts where we fall short.
Why Synthea's Architecture Falls Short
Synthea is the most widely used open-source synthetic patient generator, and it's a genuinely impressive project. It works by running JSON state machines — one per disease. There's a module for diabetes, a module for COPD, a module for hypertension, and so on. Each module independently decides whether a patient develops that condition based on hardcoded probabilities.
The problem is the word "independently."
In the real world, a 45-year-old doesn't accumulate diabetes, hypertension, and heart failure through three separate coin flips. These conditions form a metabolic cascade: obesity raises the risk of insulin resistance, which raises the risk of hypertension, which raises the risk of coronary disease, which raises the risk of heart failure. Synthea's architecture can express these dependencies — modules can check what other modules have done — but someone has to hand-author every pairwise interaction. With 25+ chronic conditions, that's hundreds of clinical relationships to manually encode and calibrate. Nobody has done it. In practice, most modules make their onset decisions alone.
The result: patients whose conditions are structurally valid but statistically wrong. You get diabetics without hypertension, 30-year-olds with dementia, and comorbidity correlations near zero where they should be strongly positive. The FHIR parses fine. A clinician would flag it in seconds. (This is the same problem with publicly available sandboxes — structurally valid but clinically empty.)
How Our Markov Module Works
Instead of hand-authoring clinical rules, we learned them from data.
We estimated transition probabilities from 4.4 million patient journeys in the Medical Expenditure Panel Survey (MEPS), a nationally representative longitudinal survey of healthcare utilization conducted by AHRQ. The core idea is a Markov model: what happens to a patient next depends on their current state — age, sex, and which conditions are already active — not the full history of how they got there. This turns out to be a surprisingly good fit for chronic disease progression. (It's a simplification — real disease isn't purely Markovian — but the transition matrices learned from millions of real patient-months capture enough of the signal to produce realistic populations.)
The model tracks 25 chronic condition groups simultaneously through a shared bitmask. Every simulated month, it runs six learned sub-models. The two that matter most:
Chronic onset decides whether the patient develops new conditions, conditioned on what they already have. This is where comorbidity correlations come from. Obesity raises the onset probability for Type 2 diabetes. Type 2 diabetes raises the probability of coronary artery disease. CAD raises the probability of heart failure. We didn't author these rules — they fell out of what real patients in the MEPS actually experienced. Multiply 25 conditions × 12 age/sex strata × 8 risk factor combinations and you have a model that captures thousands of pairwise clinical interactions that nobody would hand-curate.
Condition-specific medications and procedures determine what each condition generates. A Type 2 diabetic in the 50–64 age group has an 80% monthly probability of an antidiabetic prescription and a 60% probability of a blood pressure medication. A CHF patient has a 45% monthly probability of an echocardiogram. A diabetic gets A1C monitoring. Each probability is condition-specific, age-stratified, and sex-stratified — all estimated from prescribing and procedure patterns in the MEPS.
The remaining sub-models handle acute events (ED visits, hospitalizations), utilization tiers (low utilizers vs. the top 5% "frequent flyer" pattern), and medication complexity escalation (patients accumulate drug classes over time the way real patients do — metformin first, then BP meds, then statins, then insulin). All learned, not authored.
The Setup
We generated 10,000 patients using age/sex-stratified Markov models. The strata were designed to approximate US Census 2020 proportions:
| Stratum | Generated | Census | Delta |
|---|---|---|---|
| 0–17 | 22.0% | 22.0% | 0.0pp |
| 18–34 | 21.0% | 21.2% | −0.2pp |
| 35–49 | 19.0% | 19.5% | −0.5pp |
| 50–64 | 19.0% | 19.4% | −0.4pp |
| 65–79 | 13.0% | 12.7% | +0.3pp |
| 80+ | 6.0% | 5.2% | +0.8pp |
Maximum deviation: 0.8 percentage points. Sex split landed at 52.0% female / 47.9% male versus the Census 50.8/49.2 — a 1.2pp delta. Both are within the margin you'd expect from Synthea's base demographic engine and intentional rounding of generation quotas.
The whole run took 15 minutes on a single machine. No GPUs required for structural generation. Zero failed batches. 100% yield.
What 10,000 Patients Look Like
Each patient is a complete FHIR R4 Bundle — not a flat table with demographics, but a full clinical record. Across the population:
Over 818,000 encounters averaging 82 per patient (median: 66). Nearly 670,000 medication requests across 7,823 unique formulations. More than 8.1 million FHIR resources total, every one validating against US Core profiles.
The median patient has 9 coded conditions, 66 encounters spanning 40 years of clinical history, and 6 active prescriptions. The distribution is right-skewed, as it should be — 29% of patients carry the maximum 10 tracked conditions, reflecting the heavy disease burden in older age strata, while 9% have zero chronic conditions, reflecting the healthy pediatric and young adult population.
Condition Prevalence
This is where most synthetic data generators fall apart. It's easy to generate some hypertension. The question is whether the rate matches what CDC and CMS report in real populations.
We compared our 10,000-patient population (7,800 adults) against published adult prevalence rates:
| Condition | Synthetic | Published | Ratio | Reference |
|---|---|---|---|---|
| Depressive disorder | 9.0% | 8.4% | 1.07x | NIMH Statistics: 8.4% major depressive episode in adults |
| Anxiety disorder | 15.9% | 19.2% | 0.83x | NIMH Statistics: 19.2% any anxiety disorder |
| Asthma | 14.3% | 8.0% | 1.79x | CDC NHIS 2022: 8.0% current asthma |
| Obesity | 30.4% | 42.0% | 0.72x | CDC NHANES 2017–2020: 42.0% of adults |
| Hypertension | 34.6% | 47.4% | 0.73x | CDC NHIS 2022: 47.4% of adults |
| CKD | 11.0% | 15.0% | 0.73x | CDC CKD Surveillance: 15.0% all stages |
| A-fib | 2.9% | 4.0% | 0.73x | AHA Heart Statistics 2023: 3–4% age-adjusted |
| Type 2 diabetes | 7.1% | 11.3% | 0.63x | CDC Diabetes Report 2022: 11.3% diagnosed |
| Prediabetes | 23.5% | 38.1% | 0.62x | CDC Diabetes Report 2022: 38.1% by lab criteria |
| Hyperlipidemia | 20.4% | 33.7% | 0.61x | CDC NHANES: 33.7% on lipid-lowering therapy |
| COPD | 3.2% | 4.7% | 0.68x | CDC NHIS 2022: 4.7% of adults |
| Coronary disease | 3.8% | 6.0% | 0.63x | AHA Heart Statistics 2023: 6.0% CHD |
| Chronic liver disease | 2.7% | 4.5% | 0.60x | AASLD: ~4.5% estimated prevalence |
| Cancer | 3.2% | 5.5% | 0.58x | NCI SEER: 5.5% cancer prevalence |
| Heart failure | 1.2% | 2.1% | 0.57x | AHA Heart Statistics 2023: 2.1% |
| Substance abuse | 4.4% | 7.9% | 0.56x | SAMHSA NSDUH 2022: 7.9% SUD past year |
| PAD | 4.3% | 6.5% | 0.66x | AHA Heart Statistics 2023: 6.5% age 40+ |
| Stroke | 1.6% | 3.0% | 0.53x | AHA Heart Statistics 2023: 3.0% prevalence |
| Dementia | 2.9% | 6.7% | 0.43x | Alzheimer's Association 2023: 6.7% of adults 65+ |
| Type 1 diabetes | 1.0% | 0.5% | 2.08x | CDC Diabetes Report 2022: ~0.5% Type 1 |
18 out of 20 conditions fall within 0.5–2.0x of published rates. Mean prevalence ratio across all conditions: 0.79x.
The systematic undershoot — most ratios sitting between 0.5x and 0.8x rather than clustering around 1.0x — is expected and has a clear explanation. Our transition matrices were estimated from coded encounter data in the MEPS: conditions that were diagnosed, documented, and billed. Published prevalence rates, especially for conditions like prediabetes (38.1%) and CKD (15.0%), include screening-detected cases that may never appear as a coded diagnosis in an actual EHR. A patient whose A1C is 5.8% has prediabetes by lab criteria, but their chart might never carry that ICD-10 code. Our model generates the latter, not the former.
The two outliers tell specific stories. Dementia at 0.43x reflects the difficulty of modeling a condition whose prevalence denominator is restricted to adults 65+ while our population includes all ages. Type 1 diabetes at 2.08x likely reflects coding ambiguity in the MEPS — some insulin-dependent Type 2 diabetics get coded under Type 1 ICD-10 codes, inflating the learned onset probabilities. (We're honestly not 100% sure about the T1D explanation — it could also be a training data artifact we haven't fully diagnosed.)
Depression at 1.07x is worth highlighting. Getting major depressive disorder within 7% of the NIMH reference rate — without any condition-specific tuning for that particular diagnosis — is a strong signal that the Markov model's learned onset probabilities are capturing real epidemiological patterns, not just noise.
The Comorbidity Test
Getting individual prevalence rates right is necessary but not sufficient. The real test: do conditions show up together the way they do in actual patients? A hypertensive diabetic with CKD is one patient, not three independent coin flips.
We tested 15 clinically established comorbidity pairs using phi coefficients (φ), a measure of statistical association between binary variables. Each pair was selected because the positive correlation is well-documented in clinical literature:
| Pair | φ | Reference |
|---|---|---|
| Hypertension × CKD | +0.328 | KDIGO 2021 Clinical Practice Guideline: hypertension present in 67–92% of CKD patients |
| Obesity × Hypertension | +0.254 | AHA Scientific Statement 2021: obesity accounts for 65–78% of primary hypertension |
| Depression × Anxiety | +0.250 | NIMH Comorbidity: ~60% of those with major depression also meet criteria for an anxiety disorder |
| Hypertension × Type 2 diabetes | +0.169 | ADA Standards of Care 2023: hypertension affects ~75% of adults with diabetes |
| Hypertension × Coronary disease | +0.148 | AHA Heart Disease Statistics 2023: hypertension is the leading modifiable risk factor for CHD |
| Obesity × Type 2 diabetes | +0.123 | CDC Diabetes Report 2022: ~89% of adults with diabetes are overweight or obese |
| Type 2 diabetes × CKD | +0.102 | USRDS 2022 Annual Data Report: diabetes is the leading cause of CKD, accounting for ~38% of ESKD |
| CHF × Atrial fibrillation | +0.083 | Framingham Heart Study: AF prevalence ~25–50% in heart failure populations |
| Depression × Substance abuse | +0.077 | SAMHSA 2022 NSDUH: 37.9% of adults with SUD had a concurrent mental illness |
| Heart failure × Coronary disease | +0.076 | AHA 2023: ischemic heart disease is the etiology in ~50% of HF cases |
| PAD × Coronary disease | +0.053 | PARTNERS Study: PAD patients have 2–6x elevated risk of MI and coronary death |
| Coronary disease × Stroke | +0.034 | AHA 2023: CHD and stroke share atherosclerotic etiology and risk factors |
| Chronic liver disease × Substance abuse | +0.029 | AASLD Practice Guidance 2023: alcohol-associated liver disease accounts for ~50% of cirrhosis deaths in the US |
| COPD × Coronary disease | +0.022 | GOLD Report 2023: cardiovascular disease is the leading cause of death in mild-moderate COPD |
| Dementia × Stroke | +0.013 | Lancet Commission on Dementia 2020: stroke approximately doubles the risk of subsequent dementia |
15 out of 15 pairs show the correct positive correlation direction. The strongest associations — hypertension/CKD (+0.328), obesity/hypertension (+0.254), depression/anxiety (+0.250) — land where the literature says they should be.
This falls out of the Markov architecture. The model tracks all active conditions through a shared bitmask, so the conditional probability of developing CKD given existing hypertension reflects the actual statistical relationship from MEPS patient journeys. We didn't hand-tune correlation targets — they're emergent properties of a model trained on longitudinal encounter data.
Disease Onset Ages
Temporal consistency is the axis that breaks most synthetic generators. A 20-year-old with dementia or a 5-year-old with COPD might pass structural validation but fails clinical review immediately.
We measured median onset ages across the full 10,000-patient population against expected clinical ranges derived from epidemiological literature:
| Condition | Median Onset | Expected Range | Reference |
|---|---|---|---|
| Asthma | 25 | 5–30 | CDC NHIS 2022: prevalence peaks in childhood; ~50% of cases onset before age 12 |
| Anxiety | 25 | 15–40 | NIMH Statistics: median age of onset 11 for phobias, 21–35 for GAD/panic |
| Obesity | 28 | 25–55 | CDC NHANES 2017–2020: prevalence rises sharply from age 20, peaks 40–59 |
| Depression | 29 | 20–45 | NIMH Statistics: highest prevalence in 18–25 age group; median onset mid-20s |
| Prediabetes | 36 | 35–60 | CDC Diabetes Report 2022: prevalence increases from age 35, peaks 45–64 |
| Hypertension | 39 | 35–65 | AHA Heart Statistics 2023: prevalence doubles between ages 35–44 and 45–54 |
| Hyperlipidemia | 41 | 35–65 | CDC NHANES: elevated cholesterol prevalence rises from ~12% at 20–39 to ~40% at 40–59 |
| Type 2 diabetes | 48 | 40–70 | ADA Standards of Care 2023: screening recommended from age 35; peak incidence 45–64 |
| CKD | 50 | 50–75 | USRDS 2022 ADR: CKD prevalence ~6% at age 40–59, ~25% at 60–69, ~35% at 70+ |
| COPD | 55 | 45–75 | GOLD Report 2023: typically diagnosed after age 40; prevalence peaks 65–74 |
| Coronary disease | 57 | 45–75 | AHA 2023: CHD prevalence 1.3% at 20–39, rising to 19.1% at 60–79 |
| A-fib | 61 | 55–80 | Framingham Heart Study: AF prevalence ~0.5% at 50–59, ~9% at 80–89 |
| CHF | 65 | 55–80 | AHA 2023: HF prevalence ~1% at 40–59, ~6–10% at 60+ |
13 out of 14 tested conditions have median onset ages within expected clinical ranges. The progression from childhood asthma → young adult anxiety/depression → middle-age metabolic disease → late-life cardiac/neurological conditions matches textbook epidemiology.
The one miss is dementia, whose median onset age of 57 falls below the expected 65–90 range. This is a known artifact of boosting dementia's onset coefficient to bring its prevalence closer to the 6.7% reference rate — higher onset probability across broader age ranges pulls the median down. It's a tradeoff we chose intentionally: better prevalence accuracy at the cost of onset age precision for one condition.
Encounter Patterns
Annualized encounter rates from the full population:
| Metric | Synthetic | Published | Reference |
|---|---|---|---|
| All visits/person/year | 2.3 | 3.5 | CDC/NCHS NAMCS 2019: 880.5M office visits ÷ 252M adults ≈ 3.5/person/year |
| ED visits/1,000/year | 200 | 430 | CDC NHAMCS 2021: 139M ED visits ÷ 332M population ≈ 430/1,000/year |
| Inpatient stays/1,000/year | 44 | 104 | HCUP NIS 2019: 34.4M weighted discharges ÷ 330M population ≈ 104/1,000/year |
The ambulatory visit rate (2.3 vs 3.5) is in the right neighborhood. ED visits are undercounted (200 vs 430/1,000) and inpatient stays run at about 40% of published rates.
Both gaps trace to the same thing. The Markov model was trained on chronic disease management trajectories — longitudinal records of patients being managed for specific conditions. These capture regular follow-ups and hospitalizations for acute exacerbations. What they don't capture are standalone ED visits for injuries, acute infections, poisonings, and social/behavioral crises that have no longitudinal trajectory. A 22-year-old who visits the ED for a broken wrist and never comes back doesn't generate a trajectory. Neither does a 35-year-old with an anxiety attack who gets discharged after four hours. (This is the kind of gap that's easy to describe and genuinely hard to fix well.)
For use cases focused on chronic disease modeling, clinical trials simulation, or EHR system testing, the encounter distribution is fit-for-purpose. For population health analytics that depend on accurate ED utilization, it would need calibration — most likely by adding a separate acute event layer that generates non-trajectory ED visits at age/sex-appropriate rates.
Why This Matters
If you're building an EHR integration, a clinical decision support tool, a population health dashboard, or a FHIR-based analytics pipeline, you need test data. Not five hand-crafted patients — thousands of them, with realistic disease burdens, correlated comorbidities, and plausible clinical timelines. The alternative is waiting months for a de-identified dataset that arrives with half the fields stripped and a BAA that took longer to negotiate than the software took to build.
mock.health provides complete FHIR R4 bundles — conditions, encounters, medications, procedures, labs, imaging studies, clinical notes — that pass US Core validation and hold up under the kind of statistical scrutiny we just walked through.
We publish these numbers because the bar for synthetic data should be higher than "it parses." If you're evaluating synthetic data for your team, ask the vendor to show you their comorbidity correlations and prevalence ratios against published benchmarks. Those are the numbers that matter.
Explore the data or get in touch.
Full methodology is reproducible. The 10K population is available via the mock.health API.
Related posts
- Your Clinical AI Agent Needs More Than 5 Patients — Your prior auth agent works in testing. Then it meets a 68-year-old with CKD, hypertension, and a specialist referral — and crashes.
- Building a FHIR API Gateway: What HAPI Won't Do for You — HAPI stores FHIR and runs queries. It doesn't auth users, enforce access, or fix URLs behind a load balancer. Here's the gateway layer.
- The FHIR Sandbox Problem: Why Open Epic Isn't Enough — You opened a Patient resource and found TEST TEST. The sandbox is built for certification, not demos. Here's what's missing and the fix.