Skip to content

Observability

This guide covers the production observability setup for Granit applications: structured logging with Serilog, distributed tracing and metrics with OpenTelemetry, and visualization with the Grafana LGTM stack (Loki, Grafana, Tempo, Mimir).

Granit exports all three observability signals via OTLP to a self-hosted Grafana stack:

SignalLibraryCollector destinationStorage
LogsSerilog 9+ (OTLP sink)OpenTelemetry CollectorLoki
TracesOpenTelemetry .NET 1.11+OpenTelemetry CollectorTempo
MetricsOpenTelemetry .NET 1.11+OpenTelemetry CollectorMimir

All telemetry flows through the OpenTelemetry Collector, which routes signals to the appropriate backend. The entire stack is self-hosted on European infrastructure — no telemetry leaves the EU.

A single call to AddGranitObservability() configures Serilog and OpenTelemetry:

Program.cs
builder.AddGranitObservability();

This registers:

  • Serilog with structured logging, console output, and OTLP export
  • OpenTelemetry tracing with ASP.NET Core, HttpClient, and EF Core instrumentation
  • OpenTelemetry metrics with ASP.NET Core and HttpClient instrumentation
  • Automatic enrichment of all log entries with service metadata
{
"Observability": {
"ServiceName": "my-backend",
"ServiceVersion": "1.2.0",
"ServiceNamespace": "my-company",
"Environment": "production",
"OtlpEndpoint": "http://otel-collector.monitoring:4317",
"EnableTracing": true,
"EnableMetrics": true
}
}
PropertyDescriptionDefault
ServiceNameService identifier in all telemetry signals"unknown-service"
ServiceVersionService version"0.0.0"
ServiceNamespaceLogical grouping of services"my-company"
EnvironmentDeployment environment (production, staging, development)"development"
OtlpEndpointOpenTelemetry Collector gRPC endpoint"http://localhost:4317"
EnableTracingEnable trace export via OTLPtrue
EnableMetricsEnable metrics export via OTLPtrue

Every log entry is automatically enriched with the following properties:

PropertySourceDescription
ServiceNameObservabilityOptionsService identifier
ServiceVersionObservabilityOptionsDeployed version
EnvironmentObservabilityOptionsDeployment environment
TenantIdICurrentTenantActive tenant (if multi-tenancy is configured)
UserIdICurrentUserServiceAuthenticated user
TraceIdActivity.CurrentOpenTelemetry trace identifier
SpanIdActivity.CurrentOpenTelemetry span identifier
MachineNameSystemKubernetes pod name

Useful queries for production troubleshooting in Grafana:

# Errors in the last 24 hours for a service
{service_name="my-backend"} | json | Level = "Error"
# Slow requests (> 500ms)
{service_name="my-backend"} | json | RequestDuration > 500
# Activity for a specific tenant
{service_name="my-backend"} | json | TenantId = "tenant-123"
# Audit trail: user access log (ISO 27001)
{service_name="my-backend"} | json | UserId = "john.doe" | Level = "Error"
# Correlate logs with a specific trace
{service_name="my-backend"} | json | TraceId = "abc123def456"

Granit requires [LoggerMessage] source-generated logging throughout — never use string interpolation in log calls:

public static partial class LogMessages
{
[LoggerMessage(Level = LogLevel.Information, Message = "Processing order {OrderId} for tenant {TenantId}")]
public static partial void ProcessingOrder(this ILogger logger, Guid orderId, Guid tenantId);
}

This eliminates boxing allocations and enables compile-time validation of log message templates.

AddGranitObservability() instruments the following automatically:

  • ASP.NET Core: incoming HTTP requests (with exception recording)
  • HttpClient: outgoing HTTP requests
  • EF Core: SQL queries
  • Wolverine: message handlers (via TraceContextBehavior — context propagation across async boundaries)
  • Redis: cache operations (when StackExchange.Redis instrumentation is added)

Health check endpoints (/health/*) are excluded from tracing to reduce noise.

Granit modules register their own ActivitySource instances via GranitActivitySourceRegistry. These are automatically picked up by the OpenTelemetry tracer configuration — no manual AddSource() calls needed for Granit modules.

In Grafana, configure a data source correlation between Loki and Tempo. Clicking a TraceId in a log entry opens the corresponding trace in Tempo. This provides end-to-end request visibility across services.

MetricTypeDescription
http_server_request_duration_secondsHistogramHTTP request duration
http_server_active_requestsUpDownCounterIn-flight HTTP requests
db_client_operation_duration_secondsHistogramDatabase operation duration
dotnet_gc_collections_totalCounter.NET GC collection count
dotnet_process_memory_bytesGaugeProcess memory usage

Build these dashboards for comprehensive production visibility:

  1. Service overview: request rate, error rate (4xx/5xx), p50/p95/p99 latency
  2. Database: SQL query duration, connection pool utilization, slow queries
  3. Cache: hit ratio, Redis latency, evictions
  4. Wolverine: messages processed/s, error rate, queue depth, dead letter count
  5. Infrastructure: CPU, memory, GC pressure, thread count
AlertConditionSeverity
High HTTP error raterate(http_5xx) / rate(http_total) > 0.05 over 5 minCritical
High P99 latencyp99(http_duration) > 2s over 5 minWarning
Vault lease renewal failureLog "Vault lease renewal failed"Critical
DB connection pool saturateddb_pool_active / db_pool_max > 0.9Warning
Wolverine dead letter queuewolverine_dead_letter_count > 0Warning
SeverityChannelResponse
CriticalPagerDuty / OpsGenieOn-call SRE paged immediately
WarningSlack #ops-alertsInvestigated within business hours
InfoWeekly email digestReviewed in operations meeting
SignalHot retentionCold retentionISO 27001 requirement
Logs90 days3 years3 years minimum (audit trail)
Traces30 daysNot required
Metrics1 yearNot required

A minimal Collector configuration for routing Granit signals:

receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
exporters:
loki:
endpoint: http://loki:3100/loki/api/v1/push
otlp/tempo:
endpoint: http://tempo:4317
tls:
insecure: true
prometheusremotewrite:
endpoint: http://mimir:9009/api/v1/push
service:
pipelines:
logs:
receivers: [otlp]
exporters: [loki]
traces:
receivers: [otlp]
exporters: [otlp/tempo]
metrics:
receivers: [otlp]
exporters: [prometheusremotewrite]