The Contents of That Dumpster Are Private
ONC mandates it. CMS requires it. Every EHR vendor supports it. But production FHIR data is trash — here's what that actually looks like.
the nighthawk · 6 min read · 2026-03-25
via Nikhil Krishnan / Out of Pocket
Nikhil Krishnan wrote a great piece recently arguing that open source is healthcare's moment. The business models exist (Red Hat, GitLab). AI is lowering the contribution barrier. The ecosystem is ready.
He's right. But there's a punchline missing from the open-standards conversation that anyone who's actually built against healthcare APIs knows intimately:
The standard exists. Nobody follows it the same way.
FHIR (Fast Healthcare Interoperability Resources) is the anointed standard. ONC mandates it. CMS requires it for payer Patient Access APIs. Every major EHR vendor will tell you they support it. And technically, they do — the way a restaurant "supports" vegetarians by offering a side salad.
Same Patient, Different Reality
Flexpa recently published a fascinating comparison: the same patient's health data pulled through two different API pathways. The results should make anyone building healthcare software nervous:
- TEFCA path: 192 FHIR resources, but only 1 condition — "pain in throat"
- ONC g(10) path: 166 resources, 11 conditions including rhinitis — but missed a documented cat dander allergy entirely
Same patient. Same standard. Wildly different clinical picture. If you were building a care coordination app, which version of this patient would you trust?
This isn't an edge case. It's the norm.
The Spec Says One Thing. Production Says Another.
The January 2026 deadline for USCDI v3 and US Core 6.1.0 was supposed to fix this. Certified health IT modules must now support the latest FHIR profiles with richer data elements — sexual orientation, gender identity, social determinants of health. On paper, progress.
In practice, Flexpa's State of Payer Patient Access APIs report paints a different picture:
- Major state Medicaid agencies — Arizona, Colorado, Illinois, Indiana, Massachusetts, New York, Pennsylvania, Texas — remain non-compliant
- No BCBS plan operated by HCSC has successfully had a patient complete authorization and FHIR data retrieval
- 53 payers list refresh token support in their SMART configurations but don't actually issue refresh tokens
- UnitedHealthcare has nearly 100 patients who simply cannot access their own data through the mandated API
These aren't obscure edge cases. These are the largest payers and state programs in the country.
The Normalization Tax
Even when the APIs work, the data that comes back requires heavy lifting before it's usable.
Zus Health reports that pulling records from clinical networks yields about 200 documents per patient, mapping to over 1,000 raw FHIR resources. Conditions arrive as a grab bag of ICD-9, ICD-10, and SNOMED codes — sometimes all three for the same condition. Without normalization, deduplication, and enrichment, the volume is pure noise.
Particle Health built an entirely proprietary CCDA-to-FHIR converter because open source libraries didn't meet their quality bar. When you're aggregating records from multiple sources, the CCDA files are so full of duplicate and contradictory data that deduplication becomes a core product feature, not a nice-to-have.
This is the dirty secret: an entire category of healthcare infrastructure companies exists primarily to clean up the mess that "standardized" data exchange creates. The standard is the floor, not the ceiling — and the floor has holes in it.
Plan for Trash
Krishnan's thesis is that open source can unlock healthcare innovation by lowering barriers to entry. I agree. But the real barrier is the gap between the specification and the implementation — and that gap isn't closing anytime soon.
FHIR is a good standard. US Core is a good implementation guide. The ONC's HTI-1 rule keeps tightening the spec — USCDI v3, US Core 6.1.0, SMART App Launch 2.0. But tighter specs only help if implementations actually conform. Right now, "FHIR-compliant" is a checkbox on a certification form, not a meaningful guarantee that the data you receive will be complete, consistent, or even parseable.
So here's the engineering takeaway: stop hoping for clean data. Plan for trash.
Every application that consumes FHIR from external systems needs a validation layer that isn't optional. Not "we'll add validation later," not "we'll handle edge cases in v2." Day-one, first-class validation:
- Profile validation on ingest. Every resource that crosses your system boundary gets validated against the US Core profile you expect. Not just schema validation — profile-level. Does this Observation actually have a
category? Does this Condition use a code from a required ValueSet? You need to know the moment it arrives, not when a clinician files a bug report. - Completeness checks per data class. USCDI defines 22 data classes. When you pull a patient's record, how many of those classes are actually populated? If you're getting Conditions but no Allergies, is that because the patient has no allergies, or because the source system doesn't export them? You need to track and surface that distinction.
- Automated regression testing against real-world variance. Your integration tests shouldn't run against a single pristine FHIR server. They should run against data that looks like what production actually sends — missing fields, mixed code systems, DSTU2 holdovers, C-CDA documents pretending to be FHIR. If your test data is cleaner than your production data, your tests are lying to you. (Here's what testing against realistic synthetic data actually looks like.)
The companies actually moving the needle in interoperability — Flexpa, Zus, Particle, Onyx — all learned this the hard way. They built proprietary normalization, deduplication, and validation pipelines because the alternative was shipping broken products. The standard isn't bad. Conformance is just unreliable, and your architecture needs to account for that from day one.
The Dumpster Metaphor, Extended
The meme says the contents of the dumpster are private. That's the HIPAA joke, and it's funny. But the deeper joke is:
Even if you get permission to open the dumpster, what's inside is still trash.
The dumpster is labeled FHIR R4. The contents are a mix of R4, DSTU2 holdovers, proprietary extensions, missing required fields, and code systems that vary by vendor. The standard exists. The reference implementation is published. But the production systems treat the spec as a suggestion.
Don't wait for the spec to get stricter, or for ONC enforcement to get teeth, or for vendors to suddenly care about data quality. Build like you know the dumpster is full of trash — because it is — and validate everything that comes out of it.
Which means your test data needs to look like what production actually sends — not pristine sandbox patients with perfect coding. If your integration tests pass against clean data and break against the dumpster, your tests are lying to you. mock.health generates FHIR data with the clinical density and variance you'll see in the real world. Free tier, no sales call →
Related posts
- FHIR, USCDI, and US Core: What They Are, How They Fit — FHIR says how to send data. USCDI says what data. US Core says exactly how to format it. Here's how the three standards fit together.
- Your FHIR Architecture Determines Your Test Data Strategy — Facade, hybrid, or FHIR-native — each architecture breaks differently. Here's what to test for each model and what breaks when you don't.
- FHIR Is Not Enough: Real-Time Integrations Still Need HL7v2 — FHIR answers 'what does this patient look like now?' To ask for changes you'll have to remember how HL7v2 works.