FHIR Server Performance Heatmap

How six open-source FHIR R4 servers scale under load — HAPI FHIR, Microsoft FHIR Server, Medplum, Aidbox, Blaze, and Spark. Independent. Reproducible. Continuously run.

What we measure

Two workloads at four checkpoints (1K → 4K → 16K → 64K synthetic patients): CRUD measures C/R/U/D op p50 (median) latency at steady state; Search measures ok-only p50 across five queries every server supports. p95 and p99 ride along as tail evidence in each evidence row. Ingest is the setup tax, not a published workload — vendors' recommended bulk-load path is $import, not transaction POSTs. Read the methodology.

Log-scaling curves

Each server detail page shows log-log charts — one per workload — so the scaling curve is visible at a glance. A server that scales linearly draws a straight line; deviations upward expose non-linear cost. Why log-log? Because order-of-magnitude differences are the story, and Synthea populations already span 64× in size.

Why this exists

Existing FHIR benchmarks are run by FHIR vendors. mock.health doesn't ship a FHIR server. The runner, the methodology, and the hardware are all open — vendors can submit corrections via PR.

Methodology · Conformance heatmap · Blog · Book a demo