FHIR Terminology Isn't Portable: 7 Servers, 5 Behaviors

I sent the same ValueSet/$expand and CodeSystem/$lookup requests to seven FHIR servers. I got five different HTTP status codes back. If your app uses terminology operations, you do not have a portable backend.

mock.health · 8 min read · 2026-04-26

If your app does any of these things — rendering a display string for a SNOMED code in a patient chart, expanding a value set into an autocomplete picker, validating that an ICD-10 code is in the allowed subset for a specific form — it depends on FHIR terminology operations.

ValueSet/$expand and CodeSystem/$lookup are the two workhorses. $expand takes a value set reference and returns the member codes. $lookup takes a code and returns its display string. Both are defined in the FHIR R4 spec. Both are listed as common operations every serious FHIR server should support.

I sent the same request to seven open-source FHIR servers. I got five different HTTP status codes back. If you assumed your app was backend-portable because "FHIR is a standard," this post is the corrective.

The test

The setup is the fhir-server-compare matrix: HAPI FHIR, Aidbox, Medplum, Microsoft FHIR Server, Blaze, Spark, and HFS. Each in its own Docker container, each pinned by sha256, each given a single Synthea patient to warm the tables. Then:

GET ValueSet/$expand?url=http://loinc.org/vs/LL715-4

LL715-4 is a LOINC "answer list" for the PHQ-9 depression screening — a small, well-known value set that should be present on any server that supports LOINC.

Here is what I got back:

Server	Status	Response
HAPI	200	Expanded 14 answer codes
Aidbox	200	Expanded 14 answer codes (LOINC pre-loaded)
Medplum	400	`ValueSet not found`
MS FHIR	404	Operation not implemented
Blaze	422	`terminology service not configured`
Spark	501	`operation not supported`
HFS	200	Expanded 14 answer codes

Five different status codes for the same operation. Three servers did the right thing. Four did something else. The "something else" is not a single failure mode — it is a menu of them, each requiring different client-side handling.

The $lookup case is worse. I sent:

GET CodeSystem/$lookup?system=http://loinc.org&code=8480-6

8480-6 is systolic blood pressure. Every clinical app on earth needs this display string. The results:

Server	Status	Response
HAPI	200	`Systolic blood pressure`
Aidbox	400	`CodeSystem LOINC not loaded` (until you run the import)
Medplum	400	`Not implemented`
MS FHIR	404	Operation not implemented
Blaze	422	`terminology service not configured`
Spark	501	`operation not supported`
HFS	200	`Systolic blood pressure`

Only two out of seven servers answer "what is the display string for code 8480-6" out of the box. The third — Aidbox — can, but only if you take a dependency: you must download the LOINC release (several hundred megabytes), run Aidbox's terminology import tool, and wait for the codes to load. That is a real operation you do once. It is also a commitment to the workflow that surfaces code loading as part of your deployment.

Why this is worse than it looks

If the answer were "HAPI does terminology, the others do not," you could shrug, pick HAPI, and move on. But the picture is more complicated.

Medplum's 400 is not fixable by loading a value set. Medplum returns Not implemented because the operation is not wired in the server at all. You cannot "add LOINC" to Medplum and get $expand working. The operation suffix is not recognized.

Microsoft FHIR Server returns 404. The parser does not match $expand as an operation name. Same as Medplum — not a data problem, a missing feature. This is consistent with Azure Health Data Services' documented stance: terminology operations are not in the open-source server. You get them by paying for the Azure-managed FHIR service, which bundles a separate terminology layer.

Blaze's 422 is honest. The server does not pretend to implement terminology; it returns a status code that specifically indicates the terminology subsystem is not configured. You can add a terminology service as a separate process, but it is a real secondary deployment.

Spark's 501 is also honest. Not implemented is the correct response for an operation the server simply does not support.

The pattern: if you want portable terminology operations across FHIR servers, you do not use the server's built-in operations at all. You run a separate terminology service and point at it from your app. The moment you depend on the built-in $expand endpoint, you are coupled to one of three servers in the matrix, with very different operational profiles.

The three portability options

If your app needs terminology — if the display string, the coded set expansion, the code-system validation is on the critical path of a screen your users will look at — you have three choices.

1. Pick a server that implements terminology and pin to it

HAPI and HFS are the honest picks here. Both implement $expand and $lookup out of the box (HAPI requires you to load LOINC/SNOMED via hapi-fhir-cli migrate-terminology; HFS ships with a pre-populated terminology store). You commit to running one of those two servers for your FHIR data layer, and the terminology comes free.

The catch: you have now coupled two architectural decisions. Your FHIR server choice is also your terminology server choice. If later you need to move FHIR data to Aidbox for its better CRUD performance, you have to replan terminology.

2. Run a separate terminology server

The common pattern. Run HAPI as a dedicated terminology server (with LOINC, SNOMED, RxNorm, and ICD-10 loaded), and point your application's terminology calls at it regardless of which server holds your clinical data.

This decouples decisions cleanly. You can run Aidbox or Medplum for the FHIR resource store and HAPI as the terminology layer. The app hits https://terminology.example.com/ValueSet/$expand for all code work and https://fhir.example.com/ for all patient data.

The catch: it is a separate service to deploy, operate, backup, and monitor. Loading the LOINC release is a multi-gigabyte import. Keeping SNOMED current requires a paid license in most countries. These are real costs.

3. Skip FHIR terminology entirely

For a surprising number of apps, the terminology layer can be replaced with static JSON lookup tables you ship in the frontend. If your app only needs to render display strings for the 200 most common LOINC codes in your data, that is a 20KB JSON file. It is not FHIR, but it is not wrong either.

For clinical trial apps and NLP pipelines that need the full vocabulary, this is not viable. For most patient-facing apps, it is.

Where this hits hardest

Three app categories where this really matters:

Clinical NLP. If you are extracting concepts from free text and mapping to SNOMED or LOINC, you need $lookup to verify the mapping rendered right, and $expand to verify the mapped code is in the allowed subset for the input form. Without portable terminology, your NLP pipeline is locked to whichever server hosts the vocabulary.

Clinical trial apps. Every CDISC submission depends on validated code systems (LOINC for labs, MedDRA for adverse events, SNOMED for conditions). The app that collects trial data needs $expand on the allowed ValueSet for every coded field, and $lookup to render the display string on review screens. These apps live and die on the terminology layer.

Any app that displays clinical data. If a patient summary renders "Condition: Hypertension" instead of "Condition: 38341003," something is doing $lookup on 38341003 to get the display string. That something has to exist. It is either the FHIR server, a separate terminology server, or a static frontend table — but it exists.

What to do before you commit

Before you pick a FHIR server, answer this one question: does my app need terminology operations, and if so, how am I going to get them?

If the answer is "the server I picked handles it" — confirm by running GET ValueSet/$expand?url=... against your chosen server with a value set you actually use. If it returns 200 with expanded codes, proceed. If it returns anything else, you need plan B.

The conformance matrix at /conformance/{server}/terminology shows the per-server status for both operations across the seven servers in the round. The companion repo reproduces each claim on your box in ten minutes.

For the broader context on what else differs across FHIR servers, see the conformance pillar post. For how these same servers perform under load, see the performance pillar. If you need synthetic patient data with properly coded Observations (LOINC), Conditions (SNOMED), and Medications (RxNorm) to exercise your terminology layer, mock.health is what we built for it.

Your Clinical AI Agent Needs More Than 5 Patients — Your prior auth agent works in testing. Then it meets a 68-year-old with CKD, hypertension, and a specialist referral — and crashes.
Building a FHIR API Gateway: What HAPI Won't Do for You — HAPI stores FHIR and runs queries. It doesn't auth users, enforce access, or fix URLs behind a load balancer. Here's the gateway layer.
The FHIR Sandbox Problem: Why Open Epic Isn't Enough — You opened a Patient resource and found TEST TEST. The sandbox is built for certification, not demos. Here's what's missing and the fix.

All posts · Home · Docs