Testing FHIR Integrations Without a Hospital

You can't get hospital access without a working integration. You can't build a working integration without hospital data. Here's how to break the catch-22.

mock.health · 10 min read · 2026-04-08

You're building a FHIR integration. Maybe it's a patient portal, a prior auth workflow, an RPM platform that writes vitals back to the chart. You need to test it against data that looks like what a hospital actually produces.

You don't have access to a hospital.

This is the catch-22 that every FHIR startup lives in for 6-12 months. You can't get production EHR access without a working, tested integration. You can't build a working, tested integration without production-quality data to test against. Epic's app marketplace review takes 2-4 months. Oracle Health takes 3-6. MEDITECH takes longer. And all of them want evidence that your software works before they give you the data you need to prove it works.

So what do you test against in the meantime?

What You Actually Need to Test

Before reaching for a solution, get specific about what "testing" means for a FHIR integration. Not everything requires hospital-grade data.

Structural validity — do your FHIR resources parse correctly? Do they conform to the US Core profile you claim to support? This is the table stakes layer. A Patient resource without a meta.profile declaration, or a Condition with a code from the wrong ValueSet, will fail validation at the EHR. You can catch 80% of these issues with a FHIR validator and zero patient data.

Semantic correctness — are you using the right code systems? SNOMED CT for conditions, LOINC for observations, RxNorm for medications. Are your terminology bindings correct? Is your HbA1c observation coded as LOINC 4548-4 with a valueQuantity in %, or did someone on your team hardcode a display string and skip the coding entirely? (It happens more than you'd think.)

Clinical realism — this is where most test environments fall apart. Your app needs to handle a 68-year-old diabetic with CKD stage 3, hypertension, hyperlipidemia, and 4 years of declining eGFR. Not because you're testing edge cases, but because that's a typical Medicare patient. If your app only works on healthy 30-year-olds with a single encounter and no medications, it doesn't work.

Auth flow — SMART on FHIR with PKCE is non-negotiable for EHR launches. Your OAuth implementation needs to handle the full flow: discovery → authorization → token exchange → scoped access. This requires a server that actually implements SMART, not just a FHIR endpoint with an API key. If you've never built the launch flow end to end, our SMART on FHIR tutorial walks through a working app in a single HTML file with no build tools.

Here's the uncomfortable truth: most test environments give you structural validity and nothing else. The clinical realism layer — the thing that determines whether your app survives contact with real patient charts — is almost always missing.

The Testing Pyramid for FHIR

Steal this from software engineering and apply it to FHIR integrations. Three tiers, each catching different classes of bugs.

Tier 1: Parse and Validate (unit-level)

What it catches: malformed resources, missing required fields, wrong data types, profile violations.

You don't need a server for this. The HAPI FHIR Validator runs locally and validates against any StructureDefinition. Point it at US Core 6.1 profiles and feed it your output resources.

# Validate a Bundle against US Core
java -jar validator_cli.jar patient-bundle.json \
  -ig hl7.fhir.us.core#6.1.0 \
  -profile http://hl7.org/fhir/us/core/StructureDefinition/us-core-patient

If you're generating FHIR resources (write-side integrations), this should run in CI on every commit. If you're consuming them (read-side), use it to validate your parsing logic handles all the fields US Core declares as must-support.

Tier 2: Integration (API-level)

What it catches: auth failures, search parameter bugs, pagination issues, reference resolution errors, _include/_revinclude logic.

This requires a running FHIR server with SMART on FHIR auth. You need to test the full request lifecycle: discovery, authorization, token exchange, scoped queries, and handling of OperationOutcome errors.

Things that break at this tier and nowhere else:

Your search query uses Observation?category=laboratory but the server indexes it as Observation?category=http://terminology.hl7.org/CodeSystem/observation-category|laboratory. Both are valid. Only one returns results.
You request Patient/$everything and get back a Bundle with 2,000 entries and three pages of pagination links. Your client follows Bundle.link.where(relation='next') correctly — or it doesn't.
Your SMART scopes request patient/Observation.read but the server only grants patient/Observation.rs. Your token works, but your Observation query returns a 403 because read and search are separate grants. (Welcome to scope negotiation.)

Tier 3: Realistic (clinical-level)

What it catches: logic errors that only surface with complex patients. UI rendering issues with large datasets. Performance problems with realistic data volumes. Clinical workflow gaps.

This is the tier most teams skip, and it's the tier that bites them in production. You need patients that look like real patients:

What you're testing	What the data needs
Lab trending UI	3+ years of longitudinal labs with realistic values, reference ranges, and interpretation flags
Medication reconciliation	Patients on 8-12 active medications with start dates, dosages, and historical discontinuations
Problem list display	5-15 active conditions with proper SNOMED coding and onset dates
Clinical notes viewer	Discharge summaries, progress notes, radiology reports with narrative text (not "FINDINGS: Normal")
Prior auth workflow	Complex patients who actually get denied — comorbidities, specialist referrals, high-cost medications
Imaging integration	ImagingStudy resources with DICOM references and DiagnosticReports with real radiology report text

If your Tier 3 test data is a single patient named "Test Cancer" with one condition and no medications, your Tier 3 tests aren't testing anything.

Your Options

Option 1: Run Your Own HAPI Server

You've probably already done this. Most teams start here — HAPI in Docker, a handful of Synthea patients loaded in, maybe a script that generates 10-50 bundles. It works. For a while.

docker run -p 8080:8080 hapiproject/hapi:latest

You now have a FHIR R4 server at localhost:8080/fhir. Load it with Synthea-generated bundles:

# Generate 10 patients with Synthea
cd synthea && ./run_synthea -p 10

# POST each bundle to HAPI
for f in output/fhir/*.json; do
  curl -X POST http://localhost:8080/fhir \
    -H "Content-Type: application/fhir+json" \
    -d @"$f"
done

When this is enough: Internal tooling, early prototyping, Tier 1 validation. If you just need a FHIR endpoint to parse resources against, HAPI in Docker is hard to beat. Darren Devitt recommends it as the default open-source choice, and we agree from operational experience — see the independent HAPI FHIR conformance scorecard for exactly how it lands on the FHIR R4 base spec and Bulk Data kickoff tests.

When it isn't: HAPI out of the box has no SMART on FHIR auth. No US Core profile validation on write. And the Synthea defaults produce patients with single conditions and minimal clinical depth — a diabetic without HbA1c trends, hypertension, or CKD. You'll invest time configuring Synthea modules, tuning parameters, adding custom data. At some point the test data pipeline becomes its own project — one you maintain alongside the product you're actually building. That maintenance cost is invisible at first and real within a few months.

Option 2: Vendor Sandboxes

Open Epic, Oracle Health (Cerner) Code, and the SMART Health IT Sandbox all provide FHIR endpoints with SMART auth.

When this is enough: Testing your OAuth flow against a real vendor's auth server. Confirming your app can launch from an EHR context. Tier 2 integration testing for the specific vendor you're targeting.

When it isn't: We wrote an entire post about this. The short version — Open Epic gives you 8 patients with sparse data. "Test Cancer" at 123 Main St. No comorbidity patterns. No longitudinal labs. No imaging. No clinical notes. The sandbox exists for certification, not for building products.

Cerner's sandbox is similar. The SMART Health IT sandbox loads ~100 Synthea patients — structurally valid but clinically flat.

None of these support write-side testing against realistic data. If you're building a prior auth workflow or an RPM platform that writes Observations back to the chart, the vendor sandbox has nothing for you.

Option 3: Clinically Realistic Sandbox

This is what we built mock.health to be. US Core 6.1 compliant FHIR R4 server with SMART on FHIR auth and patients generated from millions of real patient journey patterns.

What "clinically realistic" means concretely:

# A patient with correlated comorbidities
curl -s https://api.mock.health/fhir/Condition?patient=example \
  -H "Authorization: Bearer $TOKEN" | jq '.entry[].resource.code.coding[0].display'

"Type 2 diabetes mellitus"
"Essential hypertension"
"Chronic kidney disease, stage 3"
"Hyperlipidemia"
"Diabetic retinopathy"

These conditions travel together because the generation engine learned that pattern from real CMS claims data. The patient also has 4 years of declining eGFR, metformin → insulin progression, and a nephrology referral. That's what a hospital chart actually looks like.

Compare to an Open Epic sandbox patient:

"Pain in throat"

The difference matters when your app needs to display a problem list, reconcile medications, or decide whether a referral needs prior authorization.

When this is enough: Demos, investor presentations, pre-production integration testing, write-side validation, Tier 3 clinical testing. If you need to walk into a hospital and show your app handling a complex patient, this is what you test against.

When it isn't: mock.health is synthetic data. It won't reproduce the specific idiosyncrasies of Epic's FHIR implementation, Oracle Health's scope negotiation quirks, or the particular flavor of CCDA-to-FHIR conversion your target health system uses. When you get production access (and you will), you'll encounter vendor-specific extensions, unexpected code systems, and data quality issues that no sandbox can fully simulate. Plan for trash.

What Changes When You Get Production Access

Your sandbox-tested code won't ship to production unmodified. Here's what you'll encounter:

Vendor-specific extensions. Epic uses urn:oid:1.2.840.114350.1.13.0.1.7.10.698084.130 for their internal patient class. Oracle Health has their own set. These aren't in the spec, they're not in any sandbox, and you'll need to handle them gracefully — parse what you recognize, ignore what you don't, never reject a resource because it has unexpected extensions.

Data quality variance. We wrote a whole post about this. The spec says one thing. Production says another. Conditions coded in ICD-9 instead of SNOMED. Observations without reference ranges. Resources that validate against the schema but are clinically nonsensical. Your validation layer needs to handle all of it.

The approval gauntlet. Every EHR vendor has their own app review process. Epic requires a security questionnaire, a SOC 2 report (or plan), penetration testing documentation, and a live demo. The review takes 2-4 months and the process is opaque. Oracle Health is similar but slower. MEDITECH varies by site.

The architecture you built and tested against a sandbox will survive this. The specific data handling will need adaptation. That's normal — the point of sandbox testing isn't to simulate production perfectly, it's to build the 90% that doesn't depend on vendor-specific behavior so you're not starting from zero when production access arrives.

Put It Together

The testing pyramid for a FHIR integration:

Tier	What	Tool	Catches
1	Parse + validate	HAPI Validator in CI	Malformed resources, profile violations
2	Auth + API	SMART-enabled sandbox	OAuth bugs, search parameter issues, scope errors
3	Clinical realism	Realistic synthetic data	Logic errors with complex patients, UI issues, workflow gaps

Run Tier 1 on every commit. Run Tier 2 when you change auth or query logic. Run Tier 3 before every demo and before you apply for production access.

The teams that get through the EHR approval process fastest are the ones who show up with evidence that their integration handles complex patients — not just "Test Cancer" with a sore throat.

mock.health — SMART-enabled FHIR sandbox with the clinical density to get through Tier 3 testing. API key in 60 seconds →

Stop Claude from Hallucinating Synthea Modules — LLMs generate valid Synthea JSON but hallucinate the medical codes. Here's a Skill to grounds every SNOMED and LOINC lookup.
SMART on FHIR Tutorial: Build an App in 30 Minutes — Build a standalone SMART on FHIR app in a single HTML file with zero build tools. You can do it.
FHIR, USCDI, and US Core: What They Are, How They Fit — FHIR is HL7's standard for exchanging health data over RESTful APIs. USCDI defines what data; US Core defines how to format it. Here's how the three fit.

All posts · Home · Docs