Testing FHIR Integrations Without a Hospital
You can't get hospital access without a working integration. You can't build a working integration without hospital data. Here's how to break the catch-22.
mock.health · 10 min read · 2026-04-08
You're building a FHIR integration. Maybe it's a patient portal, a prior auth workflow, an RPM platform that writes vitals back to the chart. You need to test it against data that looks like what a hospital actually produces.
You don't have access to a hospital.
This is the catch-22 that every FHIR startup lives in for 6-12 months. You can't get production EHR access without a working, tested integration. You can't build a working, tested integration without production-quality data to test against. Epic's app marketplace review takes 2-4 months. Oracle Health takes 3-6. MEDITECH takes longer. And all of them want evidence that your software works before they give you the data you need to prove it works.
So what do you test against in the meantime?
What You Actually Need to Test
Before reaching for a solution, get specific about what "testing" means for a FHIR integration. Not everything requires hospital-grade data.
Structural validity — do your FHIR resources parse correctly? Do they conform to the US Core profile you claim to support? This is the table stakes layer. A Patient resource without a meta.profile declaration, or a Condition with a code from the wrong ValueSet, will fail validation at the EHR. You can catch 80% of these issues with a FHIR validator and zero patient data.
Semantic correctness — are you using the right code systems? SNOMED CT for conditions, LOINC for observations, RxNorm for medications. Are your terminology bindings correct? Is your HbA1c observation coded as LOINC 4548-4 with a valueQuantity in %, or did someone on your team hardcode a display string and skip the coding entirely? (It happens more than you'd think.)
Clinical realism — this is where most test environments fall apart. Your app needs to handle a 68-year-old diabetic with CKD stage 3, hypertension, hyperlipidemia, and 4 years of declining eGFR. Not because you're testing edge cases, but because that's a typical Medicare patient. If your app only works on healthy 30-year-olds with a single encounter and no medications, it doesn't work.
Auth flow — SMART on FHIR with PKCE is non-negotiable for EHR launches. Your OAuth implementation needs to handle the full flow: discovery → authorization → token exchange → scoped access. This requires a server that actually implements SMART, not just a FHIR endpoint with an API key.
Here's the uncomfortable truth: most test environments give you structural validity and nothing else. The clinical realism layer — the thing that determines whether your app survives contact with real patient charts — is almost always missing.
The Testing Pyramid for FHIR
Steal this from software engineering and apply it to FHIR integrations. Three tiers, each catching different classes of bugs.
Tier 1: Parse and Validate (unit-level)
What it catches: malformed resources, missing required fields, wrong data types, profile violations.
You don't need a server for this. The HAPI FHIR Validator runs locally and validates against any StructureDefinition. Point it at US Core 6.1 profiles and feed it your output resources.
# Validate a Bundle against US Core
java -jar validator_cli.jar patient-bundle.json \
-ig hl7.fhir.us.core#6.1.0 \
-profile http://hl7.org/fhir/us/core/StructureDefinition/us-core-patient
If you're generating FHIR resources (write-side integrations), this should run in CI on every commit. If you're consuming them (read-side), use it to validate your parsing logic handles all the fields US Core declares as must-support.
Tier 2: Integration (API-level)
What it catches: auth failures, search parameter bugs, pagination issues, reference resolution errors, _include/_revinclude logic.
This requires a running FHIR server with SMART on FHIR auth. You need to test the full request lifecycle: discovery, authorization, token exchange, scoped queries, and handling of OperationOutcome errors.
Things that break at this tier and nowhere else:
- Your search query uses
Observation?category=laboratorybut the server indexes it asObservation?category=http://terminology.hl7.org/CodeSystem/observation-category|laboratory. Both are valid. Only one returns results. - You request
Patient/$everythingand get back a Bundle with 2,000 entries and three pages of pagination links. Your client followsBundle.link.where(relation='next')correctly — or it doesn't. - Your SMART scopes request
patient/Observation.readbut the server only grantspatient/Observation.rs. Your token works, but your Observation query returns a 403 becausereadandsearchare separate grants. (Welcome to scope negotiation.)
Tier 3: Realistic (clinical-level)
What it catches: logic errors that only surface with complex patients. UI rendering issues with large datasets. Performance problems with realistic data volumes. Clinical workflow gaps.
This is the tier most teams skip, and it's the tier that bites them in production. You need patients that look like real patients:
| What you're testing | What the data needs |
|---|---|
| Lab trending UI | 3+ years of longitudinal labs with realistic values, reference ranges, and interpretation flags |
| Medication reconciliation | Patients on 8-12 active medications with start dates, dosages, and historical discontinuations |
| Problem list display | 5-15 active conditions with proper SNOMED coding and onset dates |
| Clinical notes viewer | Discharge summaries, progress notes, radiology reports with narrative text (not "FINDINGS: Normal") |
| Prior auth workflow | Complex patients who actually get denied — comorbidities, specialist referrals, high-cost medications |
| Imaging integration | ImagingStudy resources with DICOM references and DiagnosticReports with real radiology report text |
If your Tier 3 test data is a single patient named "Test Cancer" with one condition and no medications, your Tier 3 tests aren't testing anything.
Your Options
Option 1: Run Your Own HAPI Server
You've probably already done this. Most teams start here — HAPI in Docker, a handful of Synthea patients loaded in, maybe a script that generates 10-50 bundles. It works. For a while.
docker run -p 8080:8080 hapiproject/hapi:latest
You now have a FHIR R4 server at localhost:8080/fhir. Load it with Synthea-generated bundles:
# Generate 10 patients with Synthea
cd synthea && ./run_synthea -p 10
# POST each bundle to HAPI
for f in output/fhir/*.json; do
curl -X POST http://localhost:8080/fhir \
-H "Content-Type: application/fhir+json" \
-d @"$f"
done
When this is enough: Internal tooling, early prototyping, Tier 1 validation. If you just need a FHIR endpoint to parse resources against, HAPI in Docker is hard to beat. Darren Devitt recommends it as the default open-source choice, and we agree from operational experience.
When it isn't: HAPI out of the box has no SMART on FHIR auth. No US Core profile validation on write. And the Synthea defaults produce patients with single conditions and minimal clinical depth — a diabetic without HbA1c trends, hypertension, or CKD. You'll invest time configuring Synthea modules, tuning parameters, adding custom data. At some point the test data pipeline becomes its own project — one you maintain alongside the product you're actually building. That maintenance cost is invisible at first and real within a few months.
Option 2: Vendor Sandboxes
Open Epic, Oracle Health (Cerner) Code, and the SMART Health IT Sandbox all provide FHIR endpoints with SMART auth.
When this is enough: Testing your OAuth flow against a real vendor's auth server. Confirming your app can launch from an EHR context. Tier 2 integration testing for the specific vendor you're targeting.
When it isn't: We wrote an entire post about this. The short version — Open Epic gives you 8 patients with sparse data. "Test Cancer" at 123 Main St. No comorbidity patterns. No longitudinal labs. No imaging. No clinical notes. The sandbox exists for certification, not for building products.
Cerner's sandbox is similar. The SMART Health IT sandbox loads ~100 Synthea patients — structurally valid but clinically flat.
None of these support write-side testing against realistic data. If you're building a prior auth workflow or an RPM platform that writes Observations back to the chart, the vendor sandbox has nothing for you.
Option 3: Clinically Realistic Sandbox
This is what we built mock.health to be. US Core 6.1 compliant FHIR R4 server with SMART on FHIR auth and patients generated from 4.4M real patient journey patterns.
What "clinically realistic" means concretely:
# A patient with correlated comorbidities
curl -s https://api.mock.health/fhir/Condition?patient=example \
-H "Authorization: Bearer $TOKEN" | jq '.entry[].resource.code.coding[0].display'
"Type 2 diabetes mellitus"
"Essential hypertension"
"Chronic kidney disease, stage 3"
"Hyperlipidemia"
"Diabetic retinopathy"
These conditions travel together because the generation engine learned that pattern from real CMS claims data. The patient also has 4 years of declining eGFR, metformin → insulin progression, and a nephrology referral. That's what a hospital chart actually looks like.
Compare to an Open Epic sandbox patient:
"Pain in throat"
The difference matters when your app needs to display a problem list, reconcile medications, or decide whether a referral needs prior authorization.
When this is enough: Demos, investor presentations, pre-production integration testing, write-side validation, Tier 3 clinical testing. If you need to walk into a hospital and show your app handling a complex patient, this is what you test against.
When it isn't: mock.health is synthetic data. It won't reproduce the specific idiosyncrasies of Epic's FHIR implementation, Oracle Health's scope negotiation quirks, or the particular flavor of CCDA-to-FHIR conversion your target health system uses. When you get production access (and you will), you'll encounter vendor-specific extensions, unexpected code systems, and data quality issues that no sandbox can fully simulate. Plan for trash.
What Changes When You Get Production Access
Your sandbox-tested code won't ship to production unmodified. Here's what you'll encounter:
Vendor-specific extensions. Epic uses urn:oid:1.2.840.114350.1.13.0.1.7.10.698084.130 for their internal patient class. Oracle Health has their own set. These aren't in the spec, they're not in any sandbox, and you'll need to handle them gracefully — parse what you recognize, ignore what you don't, never reject a resource because it has unexpected extensions.
Data quality variance. We wrote a whole post about this. The spec says one thing. Production says another. Conditions coded in ICD-9 instead of SNOMED. Observations without reference ranges. Resources that validate against the schema but are clinically nonsensical. Your validation layer needs to handle all of it.
The approval gauntlet. Every EHR vendor has their own app review process. Epic requires a security questionnaire, a SOC 2 report (or plan), penetration testing documentation, and a live demo. The review takes 2-4 months and the process is opaque. Oracle Health is similar but slower. MEDITECH varies by site.
The architecture you built and tested against a sandbox will survive this. The specific data handling will need adaptation. That's normal — the point of sandbox testing isn't to simulate production perfectly, it's to build the 90% that doesn't depend on vendor-specific behavior so you're not starting from zero when production access arrives.
Put It Together
The testing pyramid for a FHIR integration:
| Tier | What | Tool | Catches |
|---|---|---|---|
| 1 | Parse + validate | HAPI Validator in CI | Malformed resources, profile violations |
| 2 | Auth + API | SMART-enabled sandbox | OAuth bugs, search parameter issues, scope errors |
| 3 | Clinical realism | Realistic synthetic data | Logic errors with complex patients, UI issues, workflow gaps |
Run Tier 1 on every commit. Run Tier 2 when you change auth or query logic. Run Tier 3 before every demo and before you apply for production access.
The teams that get through the EHR approval process fastest are the ones who show up with evidence that their integration handles complex patients — not just "Test Cancer" with a sore throat.
mock.health — SMART-enabled FHIR sandbox with the clinical density to get through Tier 3 testing. API key in 60 seconds →
Related posts
- How to Make Claude Write Valid Synthea Modules — LLMs generate valid Synthea JSON but hallucinate the medical codes. Here's a Claude Code skill that grounds every SNOMED and LOINC lookup.
- Build a SMART on FHIR App in 30 Minutes — Build a standalone SMART on FHIR app — OAuth 2.0 + PKCE, patient context, live vital signs charts — in a single HTML file with zero build tools.
- FHIR, USCDI & US Core: How They Fit Together — FHIR says how to send data. USCDI says what data. US Core says exactly how to format it. Here's how the three standards fit together.