The Contents of That Dumpster Are Private

ONC mandates it. CMS requires it. Every EHR vendor supports it. But production FHIR data is trash — here's what that actually looks like.

mock.health · 6 min read · 2026-03-25

via Nikhil Krishnan / Out of Pocket

Nikhil Krishnan wrote a great piece recently arguing that open source is healthcare's moment. The business models exist (Red Hat, GitLab). AI is lowering the contribution barrier. The ecosystem is ready.

He's right. But there's a punchline missing from the open-standards conversation that anyone who's actually built against healthcare APIs knows intimately:

The standard exists. Nobody follows it the same way.

FHIR (Fast Healthcare Interoperability Resources) is the anointed standard. ONC mandates it. CMS requires it for payer Patient Access APIs. Every major EHR vendor will tell you they support it. And technically, they do — the way a restaurant "supports" vegetarians by offering a side salad.

Same Patient, Different Reality

Flexpa recently published a fascinating comparison: the same patient's health data pulled through two different API pathways. The results should make anyone building healthcare software nervous:

TEFCA path: 192 FHIR resources, but only 1 condition — "pain in throat"
ONC g(10) path: 166 resources, 11 conditions including rhinitis — but missed a documented cat dander allergy entirely

Same patient. Same standard. Wildly different clinical picture. If you were building a care coordination app, which version of this patient would you trust?

This isn't an edge case. It's the norm.

The Spec Says One Thing. Production Says Another.

The January 2026 deadline for USCDI v3 and US Core 6.1.0 was supposed to fix this. Certified health IT modules must now support the latest FHIR profiles with richer data elements — sexual orientation, gender identity, social determinants of health. On paper, progress.

In practice, Flexpa's State of Payer Patient Access APIs report paints a different picture:

Major state Medicaid agencies — Arizona, Colorado, Illinois, Indiana, Massachusetts, New York, Pennsylvania, Texas — remain non-compliant
No BCBS plan operated by HCSC has successfully had a patient complete authorization and FHIR data retrieval
53 payers list refresh token support in their SMART configurations but don't actually issue refresh tokens
UnitedHealthcare has nearly 100 patients who simply cannot access their own data through the mandated API

These aren't obscure edge cases. These are the largest payers and state programs in the country.

The Normalization Tax

Even when the APIs work, the data that comes back requires heavy lifting before it's usable.

Zus Health reports that pulling records from clinical networks yields about 200 documents per patient, mapping to over 1,000 raw FHIR resources. Conditions arrive as a grab bag of ICD-9, ICD-10, and SNOMED codes — sometimes all three for the same condition. Without normalization, deduplication, and enrichment, the volume is pure noise.

Particle Health built an entirely proprietary CCDA-to-FHIR converter because open source libraries didn't meet their quality bar. When you're aggregating records from multiple sources, the CCDA files are so full of duplicate and contradictory data that deduplication becomes a core product feature, not a nice-to-have.

This is the dirty secret: an entire category of healthcare infrastructure companies exists primarily to clean up the mess that "standardized" data exchange creates. The standard is the floor, not the ceiling — and the floor has holes in it.

Plan for Trash

Krishnan's thesis is that open source can unlock healthcare innovation by lowering barriers to entry. I agree. But the real barrier is the gap between the specification and the implementation — and that gap isn't closing anytime soon.

FHIR is a good standard. US Core is a good implementation guide. The ONC's HTI-1 rule keeps tightening the spec — USCDI v3, US Core 6.1.0, SMART App Launch 2.0. But tighter specs only help if implementations actually conform. Right now, "FHIR-compliant" is a checkbox on a certification form, not a meaningful guarantee that the data you receive will be complete, consistent, or even parseable.

So here's the engineering takeaway: stop hoping for clean data. Plan for trash.

Every application that consumes FHIR from external systems needs a validation layer that isn't optional. Not "we'll add validation later," not "we'll handle edge cases in v2." Day-one, first-class validation:

Profile validation on ingest. Every resource that crosses your system boundary gets validated against the US Core profile you expect. Not just schema validation — profile-level. Does this Observation actually have a category? Does this Condition use a code from a required ValueSet? You need to know the moment it arrives, not when a clinician files a bug report.
Completeness checks per data class. USCDI defines 22 data classes. When you pull a patient's record, how many of those classes are actually populated? If you're getting Conditions but no Allergies, is that because the patient has no allergies, or because the source system doesn't export them? You need to track and surface that distinction.
Automated regression testing against real-world variance. Your integration tests shouldn't run against a single pristine FHIR server. They should run against data that looks like what production actually sends — missing fields, mixed code systems, DSTU2 holdovers, C-CDA documents pretending to be FHIR. If your test data is cleaner than your production data, your tests are lying to you. (Here's what testing against realistic synthetic data actually looks like.)

The companies actually moving the needle in interoperability — Flexpa, Zus, Particle, Onyx — all learned this the hard way. They built proprietary normalization, deduplication, and validation pipelines because the alternative was shipping broken products. The standard isn't bad. Conformance is just unreliable, and your architecture needs to account for that from day one.

The Dumpster Metaphor, Extended

The meme says the contents of the dumpster are private. That's the HIPAA joke, and it's funny. But the deeper joke is:

Even if you get permission to open the dumpster, what's inside is still trash.

The dumpster is labeled FHIR R4. The contents are a mix of R4, DSTU2 holdovers, proprietary extensions, missing required fields, and code systems that vary by vendor. The standard exists. The reference implementation is published. But the production systems treat the spec as a suggestion.

Don't wait for the spec to get stricter, or for ONC enforcement to get teeth, or for vendors to suddenly care about data quality. Build like you know the dumpster is full of trash — because it is — and validate everything that comes out of it.

Which means your test data needs to look like what production actually sends — not pristine sandbox patients with perfect coding. If your integration tests pass against clean data and break against the dumpster, your tests are lying to you. mock.health generates FHIR data with the clinical density and variance you'll see in the real world. Free tier, no sales call →

FHIR, USCDI, and US Core: What They Are, How They Fit — FHIR is HL7's standard for exchanging health data over RESTful APIs. USCDI defines what data; US Core defines how to format it. Here's how the three fit.
Your FHIR Architecture Determines Your Test Data Strategy — Facade, hybrid, or FHIR-native — each architecture breaks differently. Here's what to test for each model and what breaks when you don't.
FHIR as It's Actually Deployed: The Map of Every US Hospital Endpoint — Open Epic publishes one CapabilityStatement. Production deploys thousands of shapes of it. The map plots ~3,800 real hospital FHIR endpoints by the shape they actually serve — and lets you build against them before the customer call.

All posts · Home · Docs