Skip to content

Bulkhead Isolation

The Bulkhead (watertight compartment) isolates system resources into independent compartments, so that a failure or overload in one compartment does not propagate to others. In a multi-tenant SaaS context, the pattern prevents a greedy tenant, a slow notification channel, or a failing external service from degrading the entire platform. Granit implements this pattern through a combination of mechanisms: Wolverine queue partitioning, per-pipeline parallelism limits, per-tenant quota isolation, and HTTP circuit breakers.

flowchart TB
    subgraph Bulkheads
        direction LR
        B1[Queue domain-events<br/>Parallelism: local]
        B2[Queue notification-delivery<br/>MaxParallel: 8]
        B3[Queue webhook-delivery<br/>MaxParallel: 20]
        B4[HttpClient Keycloak<br/>Circuit Breaker]
        B5[HttpClient Brevo<br/>Circuit Breaker]
    end

    REQ[Incoming requests] --> B1
    REQ --> B2
    REQ --> B3
    REQ --> B4
    REQ --> B5

    B3 --x|Saturated| B3
    B3 -.->|No impact| B1
    B3 -.->|No impact| B2
sequenceDiagram
    participant T1 as Tenant A (high load)
    participant RL as RateLimiter per-tenant
    participant Q as Queue webhook-delivery
    participant T2 as Tenant B (normal load)

    T1->>RL: 500 webhooks/min
    RL-->>T1: 429 Too Many Requests (quota exceeded)
    Note over RL: Tenant A isolated by its quota

    T2->>RL: 10 webhooks/min
    RL-->>Q: Allowed
    Q->>Q: Processing (max 20 parallel)
    Note over T2,Q: Tenant B unaffected

Granit implements Bulkhead Isolation through 5 complementary mechanisms, each targeting a different isolation level:

1. Queue partitioning — isolation by message type (Wolverine)

Section titled “1. Queue partitioning — isolation by message type (Wolverine)”

Messages are routed to dedicated queues with controlled parallelism. Each queue operates as an independent compartment.

QueueMessagesIsolation
domain-eventsIDomainEventLocal only, never routed to an external transport
notification-deliveryDeliverNotificationCommandConfigurable parallelism (default: 8)
webhook-deliverySendWebhookCommandConfigurable parallelism (default: 20)
Error queue (DLQ)ValidationExceptionDeterministic failures, no retry
// Explicit routing to local queue (AddGranitWolverine)
opts.PublishMessage<Core.Events.IDomainEvent>()
.ToLocalQueue("domain-events");

2. MaxParallelDeliveries — concurrency limits per pipeline

Section titled “2. MaxParallelDeliveries — concurrency limits per pipeline”

Each delivery pipeline limits the number of simultaneous operations, preventing a saturated channel from consuming all pod resources.

ModuleParameterDefaultRange
Granit.NotificationsMaxParallelDeliveries81—100
Granit.WebhooksMaxParallelDeliveries201—100

3. Per-tenant rate limiting — quota isolation by tenant

Section titled “3. Per-tenant rate limiting — quota isolation by tenant”

Granit.RateLimiting partitions Redis counters by tenant. Each tenant has its own independent counters — a tenant can never consume another’s quota.

ElementDetail
Redis key{prefix}:{tenantId}:{policyName}
Hash tag{tenantId} guarantees co-location in Redis Cluster
BypassConfigurable exempt roles

4. Circuit breaker — external HTTP service isolation

Section titled “4. Circuit breaker — external HTTP service isolation”

AddStandardResilienceHandler() (Microsoft.Extensions.Http.Resilience) is applied to each HttpClient targeting an external service. When an external service is down, the circuit opens and requests fail immediately without consuming resources.

ServiceCircuit BreakerRetryTimeout
Keycloak Admin APIYes3 attempts, exponential backoff30s / 2min
Brevo APIYes3 attempts, exponential backoff30s / 2min
// Each external HttpClient is isolated by its own circuit breaker
services.AddHttpClient("keycloak-admin")
.AddStandardResilienceHandler();

5. SemaphoreSlim — anti-stampede per resource

Section titled “5. SemaphoreSlim — anti-stampede per resource”

Shared resources (token cache, distributed cache) use SemaphoreSlim(1, 1) to serialize concurrent access during a cache miss. This mechanism prevents a request spike from generating N parallel calls to the same external service.

ComponentProtected resourcePattern
KeycloakAdminTokenServiceKeycloak tokenDouble-check locking
DistributedCacheServiceDistributed cacheDouble-check locking

The WebhookDispatchWorker uses two separate System.Threading.Channels.Channel(T) to isolate the fan-out phase (trigger to commands) from the delivery phase (command to HTTP POST). Both phases execute in parallel via Task.WhenAll() without interference.

FileRole
src/Granit.Wolverine/Extensions/WolverineHostApplicationBuilderExtensions.csQueue routing for domain-events
src/Granit.RateLimiting/Internal/TenantPartitionedRateLimiter.csPer-tenant isolation
src/Granit.Identity.Keycloak/Internal/KeycloakAdminTokenService.csSemaphoreSlim anti-stampede
src/Granit.Caching/DistributedCacheService.csSemaphoreSlim cache miss
src/Granit.Webhooks/Internal/WebhookDispatchWorker.csSeparate trigger/delivery channels
ProblemSolution
Webhook to a slow service blocks notificationsSeparate queues with independent parallelism
Greedy tenant saturates the API for everyoneQuota counters partitioned by tenant
Down external service consumes threadsCircuit breaker cuts calls after threshold
Simultaneous cache miss = N calls to the providerSemaphoreSlim serializes, only one call passes
Webhook fan-out blocks the delivery phaseSeparate channels for trigger vs delivery
// --- Isolation by Wolverine queue ---
// IDomainEvent -> dedicated local queue, no interference with integration events
opts.PublishMessage<Core.Events.IDomainEvent>()
.ToLocalQueue("domain-events");
// --- Configurable parallelism per module ---
// appsettings.json
// {
// "Webhooks": { "MaxParallelDeliveries": 20 },
// "Notifications": { "MaxParallelDeliveries": 8 }
// }
// --- Circuit breaker per external service ---
services.AddHttpClient("keycloak-admin", client =>
client.BaseAddress = new Uri(keycloakUrl))
.AddStandardResilienceHandler();
// --- Per-tenant rate limiting (automatic isolation) ---
app.MapGet("/api/v1/patients", GetPatientsAsync)
.RequireGranitRateLimiting("api");
// Each tenant has its own Redis counters -- independent quota