HAPI vs GCP Healthcare API (2026): Why They Disagree

I loaded 1,000 Synthea patients into HAPI and GCP Healthcare API. Same data, same queries — the two stores disagreed on every single one.

mock.health · 11 min read · 2026-04-12

A FHIR consultant named Darren Devitt wrote a book called FHIR Architecture Decisions earlier this year. It is the first impartial guide I have seen on choosing a FHIR server. In the introduction he points out that there is no vendor-neutral comparison of FHIR servers anywhere in the wild — every analysis comes from a vendor selling one of them. Teams pick a server, build on it for two years, then discover the next one over disagrees about half of what they assumed was standard.

(I eventually built one to fill that gap — the FHIR server conformance heatmap runs the same TestScript suite against seven open-source servers. This post came first.)

I thought about it for a while and then did the experiment.

I loaded the same 1,000 Synthea patient bundles into two FHIR R4 stores: HAPI FHIR running on Cloud Run, and Google Cloud Healthcare API. Both in us-central1, both behind Google's network, no local-Docker advantage for either side. Then I ran the same 14 FHIR queries against both and diffed the responses.

Every single query diverged in at least one measurable field. Status code, Bundle.total presence, top-level Bundle metadata, entry counts, supported operations, error shapes, even how each server handles a typo in a search parameter name — at least one of those differs on every query I sent. Both servers return correct FHIR R4 responses. They just return different correct responses.

If you build your app against one of these and assume the other will behave identically, you will ship bugs. Some of them are loud. Most are quiet.

HAPI accepts five things GCP rejects

The most expensive finding came before I ran a single query. Loading the bundles into HAPI was a one-shot operation. Loading the same bundles into GCP Healthcare API took five strip rules, two retry policies, and an oversized-bundle skip. Each one of those was a different bug HAPI was silently absorbing.

Here are the five resource types I had to remove from every Synthea bundle before GCP would accept the import:

Claim — GCP requires Claim.diagnosis[x], the polymorphic field that has to resolve to either diagnosisCodeableConcept or diagnosisReference. Synthea sometimes emits Claim.diagnosis entries with the outer element but no inner diagnosis[x]. GCP returns HTTP 400 unparseable_resource. HAPI persists them.
ExplanationOfBenefit — same root cause, same failure mode.
Questionnaire — every Synthea bundle ships the same canonical URL http://loinc.org/q/96842-0 (the PHQ-9 depression screen). GCP treats canonical URLs as globally unique and rejects duplicates with HTTP 409 conflict. HAPI silently deduplicates or keeps both. (Without this fix, only the first patient bundle in your import succeeds.)
QuestionnaireResponse — references the stripped Questionnaire via urn:uuid and GCP then rejects the entire transaction with invalid_references.
Provenance — Synthea Provenance resources list every resource they touched in Provenance.target, including the questionnaires we just removed. GCP rejects the transaction again with the same unresolved-reference error.

If you only test against HAPI, you are not testing against the stricter validation profile. The day you migrate to GCP Healthcare API — or to any other server with first-class FHIR profile validation — your data pipeline is going to start throwing 400s on resources HAPI happily accepted for months.

This is the deeper version of the FHIR sandbox problem. Your sandbox can pass all your tests and still be hiding bugs that only the production server catches. HAPI is a great server. It is also a permissive one.

GCP has a 4,500-entry transaction bundle cap

This one is not in the docs anywhere I can find it. It is in the error message.

Send GCP Healthcare API a transaction bundle with more than 4,500 entries and you get this back:

HTTP 413 invalid_bundle: Cannot execute transaction bundle.
Bundle size 6417 exceeds maximum of 4500.
Use import when creating a large number of resources.

The error helpfully points at the GCS-backed $import operation as the alternative, which is a completely different code path (NDJSON files in a bucket, long-running operation, polling). HAPI has no such limit. Send HAPI a 7,000-entry bundle and it will think for a few seconds and then accept the whole thing.

Here is the part that matters. The patients with bundles over 4,500 entries are the most chronic patients in your dataset. Decades of conditions, hundreds of encounters, thousands of observations from years of A1C and BP monitoring. These are exactly the patients you most want to test chronic-care apps against. In my run, 33 of 1,000 Synthea patients (3.3%) tripped the limit. The largest had 7,167 entries after I had already stripped Claim, EOB, Questionnaire, QuestionnaireResponse, and Provenance.

If you are building anything that touches longitudinal records, you cannot ignore this. You either use the GCS $import path (different code, different debugging surface), split bundles client-side (and re-resolve the cross-chunk urn:uuid references yourself), or accept that your most clinically interesting patients silently never make it into the store.

GCP throttles writes. HAPI does not.

After I fixed the strip rules and the bundle cap, the import still failed.

GCP Healthcare API enforces a per-minute quota called "Number of FHIR write operations per minute per region" at the project level. A transaction bundle counts every entry as a separate write. A single Synthea patient with 1,500 entries chews through 1,500 writes against the quota in one POST. Single-threaded import trips HTTP 429 RESOURCE_EXHAUSTED after 25-50 bundles, every time, until you wait out the quota window.

HAPI has no such quota. You can shovel data into HAPI as fast as your network can deliver it.

The fix is exponential backoff in the importer. Honor Retry-After if the response includes one (GCP usually does), exponential delay if it does not, cap the retry count. This is standard practice — you should be doing it anyway — but the gap is still real. A team doing a one-time bulk load of historical patient data into a new store on GCP needs to plan for "this will take hours" and instrument retries from day one. The same team on HAPI just runs the import and goes to lunch.

The silent-ignore that will eat your filter

Out of all 39 queries I ran across both servers, this is the one I cannot stop thinking about.

I sent GET Patient?this-is-not-a-real-param=garbage&_count=1 to both backends. HAPI returned HTTP 400 with the full list of valid Patient search parameters in the error body. Helpful, loud, easy to find. GCP returned HTTP 200 with the unfiltered patient list — all 968 of them — and silently dropped the unknown filter.

Think about what this means. You write code that filters patients by family-name=Smith. The actual US Core search parameter is family, not family-name. On HAPI, your code throws a 400 the first time you run it and you fix the typo. On GCP, your code returns every patient in the store, and your UI displays it as if it were filtered. You ship the bug. The clinician opens the patient list expecting to see one patient and sees nine hundred. If they trust your filter, they could click into the wrong record.

The fix is "validate your search parameter names against the CapabilityStatement before you trust the result," which is good practice nobody actually does. The simpler fix is "test against both servers." The bug only shows up on the strict one (HAPI), and once you have seen it once you stop making it.

This is the inverse of the HAPI's-leniency story I have been telling all post. On most things HAPI is the lenient one — it will accept invalid Claim resources and duplicate canonical URLs and references with no integrity checks. But on search parameter names HAPI is strict and GCP is the lenient one. The lesson is the same: a server that quietly accepts your input is hiding a bug. Test against the strict one.

Operations support is where the gap is biggest

If your app uses FHIR operations (the $something endpoints), the picture gets uneven fast. I tested four common ones against both backends:

ValueSet/$expand — expand a coded set into its members. Used by anything that needs an autocomplete picker for SNOMED, LOINC, or ICD-10 codes. HAPI supports it. GCP returns HTTP 400 invalid ID '$expand' — the parser does not recognize the operation suffix at all.
CodeSystem/$lookup — fetch the display string for a coded value. Used everywhere clinical data needs to be rendered to a human. HAPI supports it. GCP returns HTTP 400 invalid ID '$lookup' — same story.
Patient/$export — patient-compartment Bulk Data export. HAPI supports it (returns 400 because we did not pass the right Prefer: respond-async header, but the operation surface is real). GCP returns HTTP 400 invalid ID '$export' — the operation is not implemented. System-level $export does work on GCP, but the patient-compartment variant the Bulk Data spec defines does not.

If your app needs terminology operations GCP Healthcare API is not a drop-in replacement for HAPI. You either build those services yourself, point at a separate terminology server, or use a different FHIR backend. (Patient/$everything does work on both, but on GCP it is capped at 100 entries per call — same as _revinclude.)

Search behavior diverges in ways you will not notice

This is where the divergence stops being a one-time operational concern and becomes a daily one.

Bundle.total is null on HAPI by default. Pass _total=accurate and you get a count back. Skip the parameter and it is null. GCP includes a count on every search response, no parameter required. (FHIR R4 lets servers pick their own default for _total, so both are spec-compliant. You will not learn that until you read section 3.1.1.4 of the spec.) On Patient?_total=accurate&_count=1, my HAPI store reported 1,012 patients, my GCP store reported 967 — the difference is the 33 patients I had to skip plus a few HAPI seed records. Pagination logic that reads Bundle.total to compute "page X of Y" works differently on each backend.

_revinclude has a 100-entry cap per parameter on GCP. HAPI has no cap. The same query — Patient?_revinclude=Observation:subject&_count=5 — returned 1,005 entries on HAPI (5 patients + 1,000 observations) and exactly 105 on GCP (5 patients + 100 observations). If your patient summary screen fetches a patient's full vital sign history in one shot, it will silently truncate on GCP and you will not notice unless you count.

_revinclude=* (the wildcard) is HAPI-only. GCP rejects with HTTP 400 invalid_query: invalid _revinclude query: *; the expected format is [resource_type]:[parameter]. The HAPI version returned 770 entries on the first 5 patients I asked about. There is no clean rewrite for "give me everything that points at this resource" on GCP — you have to enumerate the resource types you care about and ask for each one separately.

_summary=count works on HAPI, returns an OperationOutcome on GCP. To count resources on GCP you have to use _total=accurate&_count=1 and read Bundle.total. Two different ways to ask the same question.

code does not search component codes. On either server. This one almost tricked me. I ran Observation?code=http://loinc.org|8480-6 (systolic blood pressure) against both stores and got back 12 results from HAPI and zero from GCP. I assumed I had found a real indexing divergence. I was wrong. Both servers were behaving identically and correctly. The 12 HAPI matches were stale historical Observations from an earlier data load that the matched run did not touch. Once I checked the source bundles directly, the truth was simpler and more interesting: Synthea encodes blood pressure as a panel (code = 85354-9, "Blood pressure panel with all children optional") with systolic and diastolic in component[0] and component[1]. The default code search parameter on Observation only matches the top-level code field — it does not look inside component. Both servers return zero standalone systolic Observations because zero standalone systolic Observations exist in the source data. To find them, you need combo-code=http://loinc.org|8480-6 (a base FHIR R4 Observation search parameter that matches code or component.code) or component-code=http://loinc.org|8480-6 (component matches only). Both work on HAPI and GCP. The gotcha is which search parameter you reach for. (If your patient summary is "find all systolic readings for this patient" and you used code=8480-6, it returns nothing. You silently shipped a feature that displays no blood pressure data.)

Every HAPI Bundle has id and meta.lastUpdated. GCP omits both. HAPI brands its search responses with a Bundle.id and a lastUpdated timestamp. GCP search responses are anonymous and ephemeral. Clients that pass Bundle IDs around for caching, log correlation, or audit trails see different behavior depending on which server is upstream.

(I am skipping the CapabilityStatement shape diff here because it is a less interesting flavor of the same point: both are valid FHIR CapabilityStatement resources with overlapping but different sets of top-level fields. If your code parses metadata to feature-detect, you have to handle both shapes.)

What this means for you

If you are prototyping, HAPI's leniency gets you to a working demo faster. The same leniency is a liability when you ship.

If you are running production on managed infrastructure with strict validation requirements, the strictness that gets in your way during development is the same strictness that prevents invalid data from poisoning your store. Plan for the bundle cap, the write-ops quota, and the missing terminology operations in your architecture up front. If your app needs $expand, $lookup, or per-resource history, plan to run a separate service for them.

If you are building an app you might deploy against either, test against both early. The bugs you ship will not be the loud ones — the strict-validation rejections are loud and you will fix them on day one. The bugs you ship will be the quiet ones: the search parameter that works on test data and silently returns nothing on production data, the _revinclude that gets capped at 100 entries on one backend and not the other, the Bundle the client tried to cache by id that suddenly has no id.

Verify any of this yourself in 90 seconds

I do not expect you to take any of the above on faith. A blog post that makes a lot of claims about server behavior is exactly the kind of content a developer should be skeptical of. So I built a runnable companion to this post and put it in a public repo: mock-health/samples/fhir-server-compare.

It is intentionally small. One Synthea patient bundle (171 resources). One Python script that runs 10 FHIR queries. One Docker command to spin up a fresh HAPI server. No mock.health credentials, no GCP project, no clone of any private repo. The 10 queries are not arbitrary — they are the smallest set that surfaces every structural finding in this post. Each query in queries.yaml carries the section title it backs up and the expected response shape on each backend, so even if you only run the HAPI half you can still verify the claim.

The whole loop is four commands:

docker run -d --name hapi -p 8080:8080 hapiproject/hapi:latest
pip install -r requirements.txt
export HAPI_BASE_URL=http://localhost:8080/fhir
python load_bundle.py && python compare.py

Adding the GCP column is opt-in via GCP_FHIR_STORE_URL once you have a Healthcare API store. Without that env var the script runs HAPI-only and shows the expected GCP behavior in the Verdict column.

I think the silent-ignore row is the one that justifies the whole reproducer. Read the table carefully. If your code typos a search parameter name on GCP, you get the unfiltered result set back and your filter is dead. That bug does not show up against HAPI because HAPI errors loudly. Run the comparison once, see it on your own screen, and you will never trust an unvalidated search parameter again.

And if you are looking for synthetic patient data that loads into HAPI, GCP Healthcare API, or anything else FHIR R4-compliant, mock.health is what we built for exactly this use case. Free tier, no sales call.

Your Clinical AI Agent Needs More Than 5 Patients — Your prior auth agent works in testing. Then it meets a 68-year-old with CKD, hypertension, and a specialist referral — and crashes.
Building a FHIR API Gateway: What HAPI Won't Do for You — HAPI stores FHIR and runs queries. It doesn't auth users, enforce access, or fix URLs behind a load balancer. Here's the gateway layer.
The FHIR Sandbox Problem: Why Open Epic Isn't Enough — You opened a Patient resource and found TEST TEST. The sandbox is built for certification, not demos. Here's what's missing and the fix.

All posts · Home · Docs