Building a FHIR API Gateway: What HAPI Won't Do for You

HAPI stores FHIR and runs queries. It doesn't auth users, enforce access, or fix URLs behind a load balancer. Here's the gateway layer.

mock.health · 9 min read · 2026-04-08

Darren Devitt's FHIR Architecture Decisions lists the API gateway as component #1 in Appendix I — the first thing you need beyond the FHIR server itself. He's right. We learned this by building the gateway after the FHIR server and spending months backfilling the things we assumed HAPI would handle.

HAPI is a FHIR storage engine. It validates resources, indexes search parameters, runs queries, and returns Bundles. What it does not do: authenticate users, enforce per-tenant access control, generate correct URLs behind a reverse proxy, stream large responses without buffering them in memory, or protect itself from query parameters that trigger its own bugs.

We run FastAPI in front of HAPI on Google Cloud Run. Every request passes through the gateway before touching HAPI. Here are the five things the gateway does that HAPI can't — and the bugs we hit building each one.

1. Auth Injection: One Hook, Every Request

HAPI doesn't know who's calling. Locally, that's fine — HAPI runs on an open port and accepts whatever you send it. In production, HAPI sits behind Cloud Run's IAM layer, and every request needs a GCP identity token.

The pattern: register an httpx event hook on the shared AsyncClient. The hook fires before every outbound request and injects a Bearer token fetched from Cloud Run's metadata server.

# api.py — registered once at startup
async def _inject_hapi_auth(request: httpx.Request):
    """Event hook: inject GCP ID token for all HAPI-bound requests."""
    audience_url = _HAPI_HOST_MAP.get(request.url.host)
    if audience_url:
        token = await get_id_token(audience_url)
        if token:
            request.headers["Authorization"] = f"Bearer {token}"

# Token caching: 1-hour lifetime, refresh 5 minutes early
_token_cache: dict[str, tuple[str, float]] = {}

The bug we hit: before the event hook existed, auth injection was per-route. The main FHIR proxy had it. The portal routes didn't. The health check had it. The media renderer didn't. Everything worked in local development where HAPI is open. In production, four endpoints returned 403 and we spent an afternoon tracing which routes were missing the token.

The lesson: auth injection belongs on the HTTP client, not on individual routes. One hook, one place, every request. If your gateway has any path to the FHIR server that bypasses the hook, that path will work locally and break in production.

2. The `--proxy-headers` Trap

This is the bug that will save you the most time if you're running FastAPI (or any ASGI app) behind a load balancer. We lost a full day to it.

Cloud Run's load balancer terminates TLS. Your app receives plain HTTP. The load balancer passes the original protocol via X-Forwarded-Proto: https. Uvicorn, by default, ignores this header. So request.base_url returns http://api.mock.health instead of https://api.mock.health.

Every URL the gateway generates is wrong.

The cascade:

SMART discovery document (/.well-known/smart-configuration) advertises http:// token endpoint
SMART client POSTs authorization code to http://api.mock.health/auth/token
Cloud Run sees an HTTP request and 302 redirects to HTTPS
Per HTTP spec, a 302 redirect converts POST to GET
Token endpoint receives GET instead of POST
405 Method Not Allowed

Nine places in our codebase use request.base_url: SMART discovery, token exchange, authorization redirects, bulk export polling URLs, capability statement links, launch URLs. All nine generated broken URLs. The SMART auth flow, the bulk export status endpoint, the capability statement — every URL-generating codepath was silently wrong.

The fix is two flags:

CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000",
     "--proxy-headers", "--forwarded-allow-ips", "*"]

--proxy-headers tells uvicorn to trust X-Forwarded-Proto and X-Forwarded-For. --forwarded-allow-ips "*" trusts all upstream proxies (safe inside Cloud Run's network, since only the load balancer can reach your container).

If you're running uvicorn behind any reverse proxy — Cloud Run, AWS ALB, nginx, Caddy — and you generate URLs from request.base_url, you need these flags. You won't catch this locally because there's no load balancer. You won't catch it in integration tests unless your test harness simulates the proxy headers. You'll catch it when your first SMART client reports that token exchange returns 405.

3. Tag-Based Access Control

HAPI has no concept of per-user access control. It serves whatever partition you point it at. Access control lives entirely in the gateway.

Our model: every FHIR resource carries a meta.tag indicating which dataset it belongs to. The gateway computes the user's allowed tags based on their account plan and dataset grants, then appends _tag={allowed_tags} to every search query before forwarding to HAPI.

# Free user searches get tag-scoped automatically
extra_params["_tag"] = ",".join(accessible_tags)
url = _build_hapi_url(path, query_string, extra_params, partition="DEFAULT")

Free users see only their assigned sample dataset. Pro users see the full catalog. Shared datasets with expiry dates are supported — the gateway checks grant validity on every request.

The instance-read problem

HAPI ignores _tag on direct resource reads. GET /fhir/Patient/abc-123 returns the resource regardless of its tags. This means a free user who guesses (or is given) a resource ID from the pro catalog can fetch it directly, bypassing the tag filter.

The fix: post-fetch verification. The gateway fetches the resource from HAPI, inspects its meta.tag array, and returns 404 if none of the tags match the user's allowed set.

async def _proxy_read_and_verify_tags(request, path, allowed_tags, partition):
    resp = await client.get(url, headers=headers)
    if resp.status_code == 200:
        resource = resp.json()
        resource_tags = {
            t["system"] + "|" + t["code"]
            for t in resource.get("meta", {}).get("tag", [])
        }
        if not resource_tags.intersection(set(allowed_tags)):
            return JSONResponse(status_code=404, content=not_found_outcome)
    return JSONResponse(content=resource, ...)

Not ideal — the resource is fetched and discarded if the tags don't match. But HAPI doesn't support tag filtering on instance reads, and this is the only way to enforce it without modifying HAPI's source. (If you're using a commercial FHIR server, check whether it supports element-level or tag-level access control natively. Most don't.)

4. Streaming Large Responses

A FHIR $everything operation can return megabytes — a patient with 5 years of encounters, 200 observations, 15 conditions, and a dozen imaging studies produces a Bundle north of 2MB. Buffering that in the gateway's memory before forwarding it to the client wastes RAM and adds latency.

The pattern: httpx streaming with Starlette's StreamingResponse.

req = client.build_request("GET", url, headers=headers)
resp = await client.send(req, stream=True)

return StreamingResponse(
    resp.aiter_bytes(),
    status_code=resp.status_code,
    headers=filtered_response_headers,
    media_type="application/fhir+json",
)

stream=True tells httpx not to read the response body into memory. resp.aiter_bytes() yields chunks as they arrive from HAPI. The gateway forwards each chunk to the client as it's received — constant memory usage regardless of response size.

Headers get filtered on the way through: transfer-encoding, connection, and content-encoding are stripped (Starlette manages these). The Authorization header from the original client request is stripped before forwarding — the event hook injects a fresh service-to-service token instead.

One gotcha: if HAPI returns a 503 (it's warming up, or you've exhausted its connection pool), you need to detect that before starting the stream. Our gateway checks resp.status_code before constructing the StreamingResponse and returns a structured error with Retry-After: 5 if HAPI is unhealthy. Once you've started streaming a response, you can't change the status code.

5. The Catch-All Route Ordering Trap

FastAPI matches routes in registration order. Our FHIR proxy has a catch-all route:

@router.api_route("/fhir/{path:path}", methods=["GET", "POST", "PUT", "DELETE"])
async def fhir_proxy(path: str, request: Request):
    ...

This matches everything under /fhir/. Including /fhir/Patient/$export, /fhir/DocumentReference/$docref, and /fhir/$export-poll-status — all of which have their own dedicated route handlers with different logic.

If the catch-all proxy is mounted first, those specific routes are unreachable. Every $export request hits the generic proxy, gets forwarded to HAPI (which can't handle partitioned exports), and fails. Every $docref request hits the proxy instead of the custom handler that translates Parameters to a search query.

The fix is explicit mount ordering:

# routes/__init__.py — order matters
app.include_router(bulk_export_router)  # $export operations (most specific)
app.include_router(docref_router)       # $docref operations
app.include_router(fhir_router)         # catch-all proxy (last)

Most specific routes first, catch-all last. This is a well-known FastAPI pattern, but it's easy to forget when you add a new operation router six months after the proxy was built. We lost an afternoon to $docref silently proxying to HAPI instead of hitting our custom handler. The symptom was wrong results, not an error — the hardest kind of bug to notice.

Build the Gateway First

If you're deploying a FHIR server to production, build the gateway layer before you build features on top of the FHIR data. Auth injection, URL generation, access control, streaming, and route safety are infrastructure concerns that affect every endpoint you'll add later. Retrofitting them onto an existing proxy is harder than building them in from the start.

The FHIR server stores data. The gateway decides who sees it and how.

mock.health handles the gateway — SMART on FHIR auth, per-tenant access control, streaming, and URL safety — so you can focus on your application. Try it free →

Your Clinical AI Agent Needs More Than 5 Patients — Your prior auth agent works in testing. Then it meets a 68-year-old with CKD, hypertension, and a specialist referral — and crashes.
The FHIR Sandbox Problem: Why Open Epic Isn't Enough — You opened a Patient resource and found TEST TEST. The sandbox is built for certification, not demos. Here's what's missing and the fix.
Same FHIR Specification Different Answers — I loaded the same Synthea patient into six open-source FHIR servers and ran the same conformance probes against each.

All posts · Home · Docs