Prometheus Alerts Runbooks

Manager Rules

ReconcileErrors

MeaningThe OpenTelemetry Operator cannot succeed in the reconciliation step, probably because of a misconfigured OpenTelemetryCollector.
ImpactNo impact on already running deployments or new correct ones.
DiagnosisCheck manager logs for reasons why this might happen.
MitigationFind out which OpenTelemetryCollector is causing the errors and fix the config.

WorkqueueDepth

MeaningThe working queue for the operator is larger than 0.
ImpactNo impact if the queue depth reverts to 0 quickly. More investigation is needed if the problem persists.
DiagnosisCheck manager logs for reasons why this might happen.
MitigationThis could be caused by many errors. Act based on what the logs are showing.

最終更新 September 3, 2024: Add operator runbooks (#5131) (09f39d1a)