Internal telemetry

You can monitor the health of any OpenTelemetry Collector instance by checking its own internal telemetry. Read on to learn about this telemetry and how to configure it to help you troubleshoot Collector issues.

Activate internal telemetry in the Collector

By default, the Collector exposes its own telemetry in two ways:

  • Internal metrics are exposed using a Prometheus interface which defaults to port 8888.
  • Logs are emitted to stderr by default.

Configure internal metrics

You can configure how internal metrics are generated and exposed by the Collector. By default, the Collector generates basic metrics about itself and exposes them for scraping at http://127.0.0.1:8888/metrics. You can expose the endpoint to one specific or all network interfaces when needed. For containerized environments, you might want to expose this port on a public interface.

Set the address in the config service::telemetry::metrics:

service:
  telemetry:
    metrics:
      address: 0.0.0.0:8888

You can adjust the verbosity of the Collector metrics output by setting the level field to one of the following values:

  • none: no telemetry is collected.
  • basic: essential service telemetry.
  • normal: the default level, adds standard indicators on top of basic.
  • detailed: the most verbose level, includes dimensions and views.

Each verbosity level represents a threshold at which certain metrics are emitted. For the complete list of metrics, with a breakdown by level, see Lists of internal metrics.

The default level for metrics output is normal. To use another level, set service::telemetry::metrics::level:

service:
  telemetry:
    metrics:
      level: detailed

The Collector can also be configured to scrape its own metrics and send them through configured pipelines. For example:

receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: 'otelcol'
          scrape_interval: 10s
          static_configs:
            - targets: ['0.0.0.0:8888']
          metric_relabel_configs:
            - source_labels: [__name__]
              regex: '.*grpc_io.*'
              action: drop
exporters:
  debug:
service:
  pipelines:
    metrics:
      receivers: [prometheus]
      exporters: [debug]

Configure internal logs

You can find log output in stderr. The verbosity level for logs defaults to INFO, but you can adjust it in the config service::telemetry::logs:

service:
  telemetry:
    logs:
      level: 'debug'

You can also see logs for the Collector on a Linux systemd system using journalctl:

journalctl | grep otelcol
journalctl | grep otelcol | grep Error

Types of internal observability

The OpenTelemetry Collector aims to be a model of observable service by clearly exposing its own operational metrics. Additionally, it collects host resource metrics that can help you understand if problems are caused by a different process on the same host. Specific components of the Collector can also emit their own custom telemetry. In this section, you will learn about the different types of observability emitted by the Collector itself.

Values observable with internal metrics

The Collector emits internal metrics for the following current values:

  • Resource consumption, including CPU, memory, and I/O.
  • Data reception rate, broken down by receiver.
  • Data export rate, broken down by exporters.
  • Data drop rate due to throttling, broken down by data type.
  • Data drop rate due to invalid data received, broken down by data type.
  • Throttling state, including Not Throttled, Throttled by Downstream, and Internally Saturated.
  • Incoming connection count, broken down by receiver.
  • Incoming connection rate showing new connections per second, broken down by receiver.
  • In-memory queue size in bytes and in units.
  • Persistent queue size.
  • End-to-end latency from receiver input to exporter output.
  • Latency broken down by pipeline elements, including exporter network roundtrip latency for request/response protocols.

Rate values are averages over 10 second periods, measured in bytes/sec or units/sec (for example, spans/sec).

The Collector also emits internal metrics for these cumulative values:

  • Total received data, broken down by receivers.
  • Total exported data, broken down by exporters.
  • Total dropped data due to throttling, broken down by data type.
  • Total dropped data due to invalid data received, broken down by data type.
  • Total incoming connection count, broken down by receiver.
  • Uptime since start.

Lists of internal metrics

The following tables group each internal metric by level of verbosity: basic, normal, and detailed. Each metric is identified by name and description and categorized by instrumentation type.

basic-level metrics

Metric nameDescriptionType
otelcol_exporter_enqueue_failed_
log_records
Number of spans that exporter(s) failed to enqueue.Counter
otelcol_exporter_enqueue_failed_
metric_points
Number of metric points that exporter(s) failed to enqueue.Counter
otelcol_exporter_enqueue_failed_
spans
Number of spans that exporter(s) failed to enqueue.Counter
otelcol_exporter_queue_capacityFixed capacity of the retry queue, in batches.Gauge
otelcol_exporter_queue_sizeCurrent size of the retry queue, in batches.Gauge
otelcol_exporter_send_failed_
log_records
Number of logs that exporter(s) failed to send to destination.Counter
otelcol_exporter_send_failed_
metric_points
Number of metric points that exporter(s) failed to send to destination.Counter
otelcol_exporter_send_failed_
spans
Number of spans that exporter(s) failed to send to destination.Counter
otelcol_exporter_sent_log_recordsNumber of logs successfully sent to destination.Counter
otelcol_exporter_sent_metric_pointsNumber of metric points successfully sent to destination.Counter
otelcol_exporter_sent_spansNumber of spans successfully sent to destination.Counter
otelcol_process_cpu_secondsTotal CPU user and system time in seconds.Counter
otelcol_process_memory_rssTotal physical memory (resident set size).Gauge
otelcol_process_runtime_heap_
alloc_bytes
Bytes of allocated heap objects (see ‘go doc runtime.MemStats.HeapAlloc’).Gauge
otelcol_process_runtime_total_
alloc_bytes
Cumulative bytes allocated for heap objects (see ‘go doc runtime.MemStats.TotalAlloc’).Counter
otelcol_process_runtime_total_
sys_memory_bytes
Total bytes of memory obtained from the OS (see ‘go doc runtime.MemStats.Sys’).Gauge
otelcol_process_uptimeUptime of the process.Counter
otelcol_processor_accepted_
log_records
Number of logs successfully pushed into the next component in the pipeline.Counter
otelcol_processor_accepted_
metric_points
Number of metric points successfully pushed into the next component in the pipeline.Counter
otelcol_processor_accepted_spansNumber of spans successfully pushed into the next component in the pipeline.Counter
otelcol_processor_batch_batch_
send_size_bytes
Number of bytes in the batch that was sent.Histogram
otelcol_processor_dropped_
log_records
Number of logs dropped by the processor.Counter
otelcol_processor_dropped_
metric_points
Number of metric points dropped by the processor.Counter
otelcol_processor_dropped_spansNumber of spans dropped by the processor.Counter
otelcol_receiver_accepted_
log_records
Number of logs successfully ingested and pushed into the pipeline.Counter
otelcol_receiver_accepted_
metric_points
Number of metric points successfully ingested and pushed into the pipeline.Counter
otelcol_receiver_accepted_spansNumber of spans successfully ingested and pushed into the pipeline.Counter
otelcol_receiver_refused_
log_records
Number of logs that could not be pushed into the pipeline.Counter
otelcol_receiver_refused_
metric_points
Number of metric points that could not be pushed into the pipeline.Counter
otelcol_receiver_refused_spansNumber of spans that could not be pushed into the pipeline.Counter
otelcol_scraper_errored_
metric_points
Number of metric points the Collector failed to scrape.Counter
otelcol_scraper_scraped_
metric_points
Number of metric points scraped by the Collector.Counter

Additional normal-level metrics

Metric nameDescriptionType
otelcol_processor_batch_batch_
send_size
Number of units in the batch.Histogram
otelcol_processor_batch_batch_
size_trigger_send
Number of times the batch was sent due to a size trigger.Counter
otelcol_processor_batch_metadata_
cardinality
Number of distinct metadata value combinations being processed.Counter
otelcol_processor_batch_timeout_
trigger_send
Number of times the batch was sent due to a timeout trigger.Counter

Additional detailed-level metrics

Metric nameDescriptionType
http_client_active_requestsNumber of active HTTP client requests.Counter
http_client_connection_durationMeasures the duration of the successfully established outbound HTTP connections.Histogram
http_client_open_connectionsNumber of outbound HTTP connections that are active or idle on the client.Counter
http_client_request_body_sizeMeasures the size of HTTP client request bodies.Histogram
http_client_request_durationMeasures the duration of HTTP client requests.Histogram
http_client_response_body_sizeMeasures the size of HTTP client response bodies.Histogram
http_server_active_requestsNumber of active HTTP server requests.Counter
http_server_request_body_sizeMeasures the size of HTTP server request bodies.Histogram
http_server_request_durationMeasures the duration of HTTP server requests.Histogram
http_server_response_body_sizeMeasures the size of HTTP server response bodies.Histogram
rpc_client_durationMeasures the duration of outbound RPC.Histogram
rpc_client_request_sizeMeasures the size of RPC request messages (uncompressed).Histogram
rpc_client_requests_per_rpcMeasures the number of messages received per RPC. Should be 1 for all non-streaming RPCs.Histogram
rpc_client_response_sizeMeasures the size of RPC response messages (uncompressed).Histogram
rpc_client_responses_per_rpcMeasures the number of messages sent per RPC. Should be 1 for all non-streaming RPCs.Histogram
rpc_server_durationMeasures the duration of inbound RPC.Histogram
rpc_server_request_sizeMeasures the size of RPC request messages (uncompressed).Histogram
rpc_server_requests_per_rpcMeasures the number of messages received per RPC. Should be 1 for all non-streaming RPCs.Histogram
rpc_server_response_sizeMeasures the size of RPC response messages (uncompressed).Histogram
rpc_server_responses_per_rpcMeasures the number of messages sent per RPC. Should be 1 for all non-streaming RPCs.Histogram

Events observable with internal logs

The Collector logs the following internal events:

  • A Collector instance starts or stops.
  • Data dropping begins due to throttling for a specified reason, such as local saturation, downstream saturation, downstream unavailable, etc.
  • Data dropping due to throttling stops.
  • Data dropping begins due to invalid data. A sample of the invalid data is included.
  • Data dropping due to invalid data stops.
  • A crash is detected, differentiated from a clean stop. Crash data is included if available.