Three Ways to Build a Multi-Tenant FHIR Server
Separate databases, partitioned tables, or tag-based filtering — every multi-tenant FHIR deployment picks one. Here's how to choose in HAPI.
mock.health · 9 min read · 2026-04-08
You're building a FHIR platform that serves multiple customers. Each customer needs to see their own data through the same API. Tenant A's patients can't leak into Tenant B's queries. Sounds like a solved problem — every SaaS does this.
In FHIR, it isn't solved. Darren Devitt's FHIR Architecture Decisions flags multi-tenancy as a project "deal breaker" — one of those requirements that can eliminate entire classes of FHIR servers from consideration. His assessment: "There is no best practice for managing multi-tenancy in FHIR servers. Different servers handle it differently or not at all."
Some regulations require physical data separation. Others accept logical isolation. Your customers may have opinions. Your compliance team definitely will.
There are three models, each with different isolation guarantees, operational costs, and failure modes. We've built two of them in production with HAPI FHIR. Here's what each one actually looks like.
Model 1: Separate Database Per Tenant
The strongest isolation guarantee. Each tenant gets their own database — either a separate schema on a shared Postgres instance, or a fully separate database host.
Tenant A → API Gateway → HAPI (database: tenant_a) → Postgres host/schema: tenant_a
Tenant B → API Gateway → HAPI (database: tenant_b) → Postgres host/schema: tenant_b
What you get
Strongest isolation. At the schema level, tenant data lives in separate tables — a bug in your query layer can't leak data across tenants because the connection itself is scoped. At the host level, you get true physical isolation — separate machines, separate disks, separate failure domains. Same-host schemas are logical isolation with a strong boundary. Different hosts are physical isolation. Compliance teams care about the difference — some regulations (especially in Europe) require physical separation, which means separate hosts, not just separate schemas.
Independent scaling. A tenant with 10 million resources doesn't degrade search performance for a tenant with 10,000. You can allocate database resources per tenant based on their usage.
Independent lifecycle. You can backup, restore, migrate, and delete tenant data independently. Deleting a tenant is DROP SCHEMA, not a filtered delete across shared tables.
What breaks
Operational cost. You're managing N database connections, N migration runs, N backup schedules. At 5 tenants this is fine. At 50 it's a full-time job. At 500 you need automation that most teams don't have.
Connection pooling. HAPI uses HikariCP with a fixed connection pool (default: 10 connections). If you're routing to different databases per request, you either maintain separate connection pools per tenant (memory-expensive) or you're reconnecting on every request (latency-expensive). We hit this directly — 8 concurrent workers exhausted HikariCP's 10-connection pool and left HAPI in a zombie state, alive but unable to serve any database request.
Cold starts per tenant. HAPI's JVM-based architecture means the first query against a tenant's database triggers Hibernate schema validation and search parameter indexing. On Cloud Run, this can add 30-45 seconds to the first request for a tenant that hasn't been accessed recently.
When to use it
Regulatory requirements mandate physical separation. Your tenants are large enough to justify dedicated infrastructure. You have the DevOps capacity to manage N databases. Healthcare organizations in Europe, where data sovereignty regulations often require physical separation by country or region, frequently end up here.
When not to
You have many small tenants. You're a startup and can't afford the operational overhead. Your tenants share common reference data (medications, procedures, terminology) that you'd need to duplicate across every database.
Model 2: Shared Database, Partitioned Tables
One database, but every resource row gets a partition ID. Queries are scoped to a single partition. HAPI supports this natively via database_partition_mode.
Tenant A → API Gateway → HAPI → Postgres (WHERE partition_id = 101)
Tenant B → API Gateway → HAPI → Postgres (WHERE partition_id = 102)
How to set it up
HAPI's partition mode adds a PARTITION_ID column to every resource table. Requests route to partitions via URL path — /fhir/tenant_acme/Patient hits partition 101, /fhir/tenant_beta/Patient hits partition 102.
# services/hapi/application.yaml
partitioning:
database_partition_mode_enabled: true
request_tenant_partitioning_mode: true
allow_references_across_partitions: true
Creating a partition is one API call:
POST /fhir/$partition-management-create-partition
Content-Type: application/fhir+json
{
"resourceType": "Parameters",
"parameter": [
{"name": "id", "valueInteger": 101},
{"name": "name", "valueCode": "tenant_acme"}
]
}
What you get
Logical isolation at the storage layer. Every query includes a partition predicate. Tenant A's search can't return Tenant B's resources because the SQL query physically filters by partition ID. Stronger than tag-based filtering (Model 3), weaker than separate databases (Model 1).
Single database to manage. One connection pool, one backup schedule, one migration run. The operational simplicity of a shared database with most of the isolation guarantees of separate ones.
Cross-partition references. With allow_references_across_partitions: true, a resource in Tenant A's partition can reference a resource in a shared partition. Useful when tenants share common data — a shared Practitioner directory, for example, or a shared medication formulary.
What breaks
HAPI bugs surface here. Partition mode is less battle-tested than HAPI's default single-tenant mode. We hit two bugs that only appear in partitioned deployments:
HAPI #1099: search queries like Observation?patient=UUID trigger a "Non-unique ID" error because HAPI's reference resolver doesn't properly scope to the current partition. The workaround: rewrite patient=UUID to a chained search patient:Patient._id=UUID in your API gateway before forwarding to HAPI.
# Rewrite bare reference params to avoid HAPI-1099
if key in ("patient", "subject") and is_bare_id(value):
rewritten[f"{key}:Patient._id"] = value
HAPI #6665: bulk export background jobs lose partition context entirely. The batch job framework creates SystemRequestDetails without a tenant ID, so partition resolution fails. We wrote a 44-line Java interceptor to inject the DEFAULT partition for system requests, then rebuilt the entire Bulk Data Access IG in Python because the interceptor only fixed HAPI's internal jobs — not tenant-scoped exports.
These aren't showstoppers, but they're the kind of bugs you only discover in production with real multi-tenant traffic. They won't surface in a single-tenant dev environment.
No built-in access control. HAPI's partition mode is a storage mechanism, not a security boundary. It puts data where you tell it and retrieves data from where you ask. Deciding which partition a request should hit, based on the authenticated user, is entirely your API gateway's responsibility. HAPI doesn't know or care who's calling.
Shared resource contention. All tenants share the same database indexes, the same connection pool, and the same query planner. A tenant running a heavy $everything operation can degrade performance for other tenants. Postgres-level resource limits (connection limits, statement timeouts) help but add operational complexity.
When to use it
You have many tenants with moderate data volumes. You want logical isolation without the operational cost of separate databases. You need cross-partition references for shared data. You're comfortable building an API gateway to handle auth and routing.
When not to
Regulations require physical separation. You have tenants with wildly different data volumes where resource contention is a concern. Your team doesn't want to deal with HAPI's partition-mode bugs.
Model 3: Shared Database, Tag-Based Filtering
One database, one HAPI instance, no partitions. Every resource gets a meta.tag indicating which tenant it belongs to. Your API gateway appends _tag=tenant_acme to every query.
Tenant A → API Gateway → HAPI (+ _tag=tenant_a) → Postgres (all resources, filtered)
Tenant B → API Gateway → HAPI (+ _tag=tenant_b) → Postgres (all resources, filtered)
What you get
Simplest to implement. No HAPI configuration changes. No partition management. No Java interceptors. You tag resources on write and filter on read. A single _tag query parameter does all the work.
# Gateway appends tag to every search
extra_params["_tag"] = f"https://yourapp.com/tenant|{tenant_id}"
url = f"{hapi_base}/fhir/{path}?{urlencode(extra_params)}"
Zero operational overhead per tenant. Adding a tenant is adding a tag value, not creating a partition or a database. There's nothing to provision, nothing to migrate, nothing to clean up when a tenant leaves.
Works with any FHIR server. Tags are part of the FHIR spec. This pattern works with HAPI, Medplum, Azure Health Data Services, Google Healthcare API — any server that supports _tag search.
What breaks
No real isolation. The isolation boundary is your API gateway's tag-filtering logic. A bug in the gateway — a missing _tag parameter on one endpoint, a code path that forgets to filter — and Tenant A sees Tenant B's data. The database doesn't protect you. Every request passes through the same tables, the same indexes, the same queries.
Instance reads bypass tags. HAPI (and most FHIR servers) ignore _tag on direct resource reads. GET /fhir/Patient/abc-123 returns the resource regardless of its tags. If Tenant A guesses a resource ID from Tenant B, they get it. Your gateway needs post-fetch verification — fetch the resource, check its tag, return 404 if it doesn't match. Every. Single. Instance. Read.
# Post-fetch tag verification for instance reads
resp = await client.get(url)
resource = resp.json()
resource_tags = {t["system"] + "|" + t["code"] for t in resource.get("meta", {}).get("tag", [])}
if not resource_tags.intersection(allowed_tags):
return 404 # Tag doesn't match — pretend it doesn't exist
Performance at scale. All tenants' data lives in the same tables. Search queries scan across all tenants' resources and filter by tag. With 100 tenants and millions of resources, _tag filtering adds measurable overhead compared to partition-scoped queries. Postgres can index tags, but the index size grows with total resource count, not per-tenant count.
When to use it
Prototyping. Internal tools where tenants are trusted. Read-only access where the risk of data leakage is acceptable. Small deployments (under ~10 tenants) where performance isn't a concern. As a starting point before migrating to partitions.
When not to
Anything with real data isolation requirements. External-facing APIs where tenant ID guessing is a risk. Large deployments where cross-tenant query overhead matters. Compliance-sensitive environments.
Choosing Your Model
| Concern | Separate DB | Partitions | Tags |
|---|---|---|---|
| Isolation strength | Strongest (physical if separate hosts) | Logical (storage-level) | Logical (query-level) |
| Compliance teams | Happy (especially separate hosts) | Usually acceptable | Nervous |
| Operational cost per tenant | High | Low | Zero |
| Performance isolation | Full | Shared DB, partitioned queries | Shared everything |
| Cross-tenant data sharing | Hard (duplicate or federate) | Built-in (cross-partition refs) | Built-in (shared tags) |
| HAPI bugs you'll hit | Fewer | More (#1099, #6665) | Fewer |
| Works with any FHIR server | Yes | HAPI-specific | Yes |
| Instance read safety | Inherent | Inherent | Requires post-fetch check |
Most teams start with tags (simplest), discover the isolation gaps, and migrate to partitions. Some start with partitions and wish they'd evaluated the operational cost of separate databases first. Almost nobody starts with separate databases and migrates down.
The direction of migration matters: you can move from tags to partitions to separate databases, but moving in the other direction is a data migration project. Start with the simplest model that meets your compliance requirements and plan for the migration you'll probably need in 12-18 months.
Whichever model you choose, the FHIR server handles storage. Auth, routing, access control, and tenant lifecycle are your API gateway's responsibility. We wrote about what that gateway layer looks like in practice.
mock.health handles multi-tenant isolation so you can focus on your integration. Each team gets isolated FHIR data with SMART auth — no partition management required. Try it free →
Related posts
- Your Clinical AI Agent Needs More Than 5 Patients — Your prior auth agent works in testing. Then it meets a 68-year-old with CKD, hypertension, and a specialist referral — and crashes.
- Building a FHIR API Gateway: What HAPI Won't Do for You — HAPI stores FHIR and runs queries. It doesn't auth users, enforce access, or fix URLs behind a load balancer. Here's the gateway layer.
- The FHIR Sandbox Problem: Why Open Epic Isn't Enough — You opened a Patient resource and found TEST TEST. The sandbox is built for certification, not demos. Here's what's missing and the fix.