Skip to content

Circuit Breaker and Retry

The Circuit Breaker cuts calls to a failing service to prevent cascade saturation. Retry automatically replays failed requests with exponential backoff. Granit combines both via Microsoft.Extensions.Http.Resilience for outgoing HTTP calls and Wolverine RetryWithCooldown for asynchronous messages.

stateDiagram-v2
    [*] --> Closed
    Closed --> Open : Failure rate > threshold
    Open --> HalfOpen : Timeout expired
    HalfOpen --> Closed : Test request succeeded
    HalfOpen --> Open : Test request failed

    state Closed {
        [*] --> Normal
        Normal --> Retry : Transient failure
        Retry --> Normal : Success
        Retry --> Retry : Exponential backoff
    }
sequenceDiagram
    participant S as Granit Service
    participant R as Resilience Handler
    participant E as External Service

    S->>R: POST /api/send-email
    R->>E: Attempt 1
    E-->>R: 503 Service Unavailable
    R->>R: Backoff 1s
    R->>E: Attempt 2
    E-->>R: 503 Service Unavailable
    R->>R: Backoff 2s
    R->>E: Attempt 3
    E-->>R: 200 OK
    R-->>S: 200 OK

Each HttpClient targeting an external service is configured with the .NET standard resilience handler:

ServiceRegistration file
Keycloak Admin APIsrc/Granit.Identity.Keycloak/Extensions/IdentityKeycloakServiceCollectionExtensions.cs
Microsoft Graph (Entra ID)src/Granit.Identity.EntraId/Extensions/IdentityEntraIdServiceCollectionExtensions.cs
Brevo (email/SMS/WhatsApp)src/Granit.Notifications.Brevo/Extensions/BrevoNotificationsServiceCollectionExtensions.cs
Zulip (chat)src/Granit.Notifications.Zulip/Extensions/ZulipNotificationsServiceCollectionExtensions.cs
Firebase FCM (push)src/Granit.Notifications.MobilePush.GoogleFcm/Extensions/GoogleFcmMobilePushServiceCollectionExtensions.cs
services.AddHttpClient("KeycloakAdmin", client =>
{
client.BaseAddress = new Uri(options.BaseUrl);
client.Timeout = TimeSpan.FromSeconds(options.TimeoutSeconds);
})
.AddStandardResilienceHandler();

AddStandardResilienceHandler() automatically adds:

  • Retry — 3 attempts, exponential backoff on transient errors (429, 5xx)
  • Circuit Breaker — opens after exceeding the failure threshold over 30s
  • Timeout — per request (30s) and total (2min)
  • Rate Limiter — concurrency control

For asynchronous messages, Wolverine provides retry with progressive cooldown. Example with webhooks (6 levels, 30s to 12h):

opts.OnException<WebhookDeliveryException>()
.RetryWithCooldown(
TimeSpan.FromSeconds(30), // Level 1
TimeSpan.FromMinutes(2), // Level 2
TimeSpan.FromMinutes(10), // Level 3
TimeSpan.FromMinutes(30), // Level 4
TimeSpan.FromHours(2), // Level 5
TimeSpan.FromHours(12)); // Level 6 -> Dead-Letter Queue
ServiceHTTP ResilienceMessaging retrySpecial behavior
Keycloak AdminStandard handlerGraceful degradation on reads
BrevoStandard handlerWolverine retry
SMTPConfigurable timeoutWolverine retry
Web PushStandard handlerWolverine retryAuto-cleanup on HTTP 410
WebhooksTimeout 5-120s6 levels (30s to 12h)Auto-suspend on 401/403/410
VaultLease renewalAuto-refresh credentials
S3AWS SDK built-in retryNative SDK backoff
OTLPBuffer batch export

Each external service exposes a timeout via the Options pattern:

OptionsPropertyDefaultRange
KeycloakAdminOptionsTimeoutSeconds30
BrevoOptionsTimeoutSeconds301—300
SmtpOptionsTimeoutSeconds30
WebhooksOptionsHttpTimeoutSeconds105—120
FileRole
src/Granit.Identity.Keycloak/Extensions/IdentityKeycloakServiceCollectionExtensions.csStandard resilience on Keycloak
src/Granit.Notifications.Brevo/Extensions/BrevoNotificationsServiceCollectionExtensions.csStandard resilience on Brevo
src/Granit.Webhooks/Extensions/WebhooksHostApplicationBuilderExtensions.csRetryWithCooldown 6 levels
ProblemSolution
Temporarily down external service = cascade of 500sCircuit Breaker cuts calls, prevents saturation
Transient network error = data lossRetry with backoff replays automatically
Webhook endpoint down for hours6 progressive levels (30s to 12h) before dead-letter
Expired token on an external serviceAuto-refresh via Vault lease renewal
new HttpClient() without resilienceIHttpClientFactory + systematic AddStandardResilienceHandler()
// --- Registration with standard resilience ---
services.AddHttpClient<GeoService>(client =>
{
client.BaseAddress = new Uri("https://api.geo.example.com");
})
.AddStandardResilienceHandler();
// --- The service has no awareness of the resilience ---
public sealed class GeoService(HttpClient httpClient)
{
public async Task<GeoResult?> GeocodeAsync(
string address,
CancellationToken cancellationToken = default)
{
// Retry + Circuit Breaker + Timeout are transparent
return await httpClient
.GetFromJsonAsync<GeoResult>(
$"/geocode?q={Uri.EscapeDataString(address)}",
cancellationToken)
.ConfigureAwait(false);
}
}