Internal telemetry

Estás viendo la versión en inglés de está página porque aún no ha sido traducida. ¿Te interesa ayudar? Mira en Contribuir.

You can inspect the health of any OpenTelemetry Collector instance by checking its own internal telemetry. Read on to learn about this telemetry and how to configure it to help you monitor and troubleshoot the Collector.

Advertencia

The Collector uses the OpenTelemetry SDK declarative configuration schema for configuring how to export its internal telemetry. This schema is still under development and may undergo breaking changes in future releases. We intend to keep supporting older schemas until a 1.0 schema release is available, and offer a transition period for users to update their configurations before dropping pre-1.0 schemas. For details and to track progress see issue #10808.

Activate internal telemetry in the Collector

By default, the Collector exposes its own telemetry in two ways:

Internal metrics are exposed using a Prometheus interface which defaults to port 8888.
Logs are emitted to stderr by default.

Configure resource attributes

The Collector’s automatically attaches the service.name, service.version, and service.instance.id (randomly generated) resource attributes to its internal telemetry signals. These can be disabled by setting the attribute value to null (ex. service.name: null).

If you’d like to add additional resource attributes to the Collector’s internal telemetry signals (traces, metrics, and logs) you can set them under service::telemetry::resource:

service:
  telemetry:
    resource:
      attribute_key: 'attribute_value'

Configure internal metrics

OTLP exporter for internal metrics

You can configure how internal metrics are generated and exposed by the Collector. By default, the Collector generates basic metrics about itself and exposes them using the OpenTelemetry Go Prometheus exporter for scraping at http://127.0.0.1:8888/metrics.

The Collector can push its internal metrics to an OTLP backend via the following configuration:

service:
  telemetry:
    metrics:
      readers:
        - periodic:
            exporter:
              otlp:
                protocol: http/protobuf
                endpoint: https://backend:4318

For all available options, see OTLP exporter options.

Prometheus endpoint for internal metrics

Alternatively, you can expose the Prometheus endpoint to one specific or all network interfaces when needed. For containerized environments, you might want to expose this port on a public interface.

Set the Prometheus config under service::telemetry::metrics:

service:
  telemetry:
    metrics:
      readers:
        - pull:
            exporter:
              prometheus:
                host: '0.0.0.0'
                port: 8888

If you want to add additional labels to the Prometheus metrics, you can add them with prometheus::with_resource_constant_labels:

prometheus:
  host: '0.0.0.0'
  port: 8888
  with_resource_constant_labels:
    included:
      - label_key

And then reference the labels in service::telemetry::resource:

resource:
  label_key: label_value

Service address

Internal telemetry configuration changes

As of Collector v0.123.0, the service::telemetry::metrics::address setting is ignored. In earlier versions, it could be configured with:

service:
  telemetry:
    metrics:
      address: 0.0.0.0:8888

Metric verbosity

You can adjust the verbosity of the Collector metrics output by setting the level field to one of the following values:

none: no telemetry is collected.
basic: essential service telemetry.
normal: the default level, adds standard indicators on top of basic.
detailed: the most verbose level, includes dimensions and views.

Each verbosity level represents a threshold at which certain metrics are emitted. For the complete list of metrics, with a breakdown by level, see Lists of internal metrics.

The default level for metrics output is normal. To use another level, set service::telemetry::metrics::level:

service:
  telemetry:
    metrics:
      level: detailed

Metric views

You can further configure how metrics from the Collector are emitted by using views. For example, the following configuration updates the metric named otelcol_process_uptime to emit a new name process_uptime and description:

Nota

When configuring the Prometheus exporter for internal metrics manually (using readers), otelcol_process_uptime may be exported as otelcol_process_uptime_seconds_total unless without_type_suffix and without_units are set to true. Use the instrument_name value otelcol_process_uptime (the OTLP name) in views regardless. To control Prometheus-specific suffixes, see Unit suffixes.

service:
  telemetry:
    metrics:
      views:
        - selector:
            instrument_name: otelcol_process_uptime
            instrument_type:
          stream:
            name: process_uptime
            description: The amount of time the Collector has been up

You can also use views to update the resulting aggregation, attributes, and cardinality limits. For the full list of options, see the examples in the OpenTelemetry Configuration schema repository.

Configure internal logs

Log output is found in stderr. You can configure logs in the config service::telemetry::logs. The configuration options are:

Field name	Default value	Description
`level`	`INFO`	Sets the minimum enabled logging level. Other possible values are `DEBUG`, `WARN`, and `ERROR`.
`development`	`false`	Puts the logger in development mode.
`encoding`	`console`	Sets the logger’s encoding. The other possible value is `json`.
`disable_caller`	`false`	Stops annotating logs with the calling function’s file name and line number. By default, all logs are annotated.
`disable_stacktrace`	`false`	Disables automatic stacktrace capturing. Stacktraces are captured for logs at `WARN` level and above in development and at `ERROR` level and above in production.
`sampling::enabled`	`true`	Sets a sampling policy.
`sampling::tick`	`10s`	The interval in seconds that the logger applies to each sampling.
`sampling::initial`	`10`	The number of messages logged at the start of each `sampling::tick`.
`sampling::thereafter`	`100`	Sets the sampling policy for subsequent messages after `sampling::initial` messages are logged. When `sampling::thereafter` is set to `N`, every `Nth` message is logged and all others are dropped. If `N` is zero, the logger drops all messages after `sampling::initial` messages are logged.
`output_paths`	`["stderr"]`	A list of URLs or file paths to write logging output to.
`error_output_paths`	`["stderr"]`	A list of URLs or file paths to write logger errors to.
`initial_fields`		A collection of static key-value pairs added to all log entries to enrich logging context. By default, there is no initial field.

You can also see logs for the Collector on a Linux systemd system using journalctl:

journalctl | grep otelcol

journalctl | grep otelcol | grep Error

The following configuration can be used to emit internal logs from the Collector to an OTLP/HTTP backend:

service:
  telemetry:
    logs:
      processors:
        - batch:
            exporter:
              otlp:
                protocol: http/protobuf
                endpoint: https://backend:4318

For all available options, see OTLP exporter options.

Configure internal traces

The Collector does not expose traces by default, but it can be configured to.

Precaución

Internal tracing is an experimental feature, and no guarantees are made as to the stability of the emitted span names and attributes.

The following configuration can be used to emit internal traces from the Collector to an OTLP backend:

service:
  telemetry:
    traces:
      processors:
        - batch:
            exporter:
              otlp:
                protocol: http/protobuf
                endpoint: https://backend:4318

See the example configuration for additional options; note that the tracer_provider section there corresponds to traces here. For details about the OTLP exporter options specifically, see below.

OTLP exporter options

The following options are available for the OTLP exporter for all three signals. Some additional options are available for metrics.

metrics::readers[*]::periodic::exporter::otlp
logs::processors[*]::batch::exporter::otlp
traces::processors[*]::batch::exporter::otlp

Field name	Default value	Description
`endpoint`	`localhost:4317` (gRPC), `localhost:4318` (http/protobuf)	Target URL to send telemetry to, for example `https://backend:4318`. For `http/protobuf`, any path in the URL is forwarded to the exporter; if no path is specified, the default signal-specific path is used (`/v1/traces`, `/v1/metrics`, or `/v1/logs`).
`protocol`	(required)	Transport protocol. Supported values: `grpc`, `http/protobuf`.
`compression`		Compression algorithm applied before sending. Supported values: `gzip`, `none`.
`timeout`	`10000`	Timeout in milliseconds for each export attempt.
`headers`		List of key-value pairs sent as request headers. Each entry requires a `name` field and a `value` field.
`headers_list`		Headers in W3C Baggage format (for example, `key1=value1,key2=value2`). When both `headers` and `headers_list` are set, `headers` takes precedence on an individual header basis.
`certificate`		Path to a PEM-encoded CA certificate file used to verify the server’s certificate.
`client_certificate`		Path to a PEM-encoded client certificate file for mTLS. Required when `client_key` is set.
`client_key`		Path to a PEM-encoded private key file for the client certificate. Required when `client_certificate` is set.
`insecure`	`false`	Only applies to the `grpc` protocol. When `true`, disables TLS for gRPC connections where the endpoint scheme is not `http` or `https`. For `http/protobuf`, TLS will be enabled unless the endpoint uses the `http` scheme, independently of this option.

Nota

The internal OTLP exporter is implemented in the Go SDK used by the Collector. While the Go SDK supports environment variable-based configuration, programmatic configuration by the collector takes precedence, so it is recommended to use the collector’s YAML configuration to avoid unexpected behavior.

Additional options for metrics

The following options apply only to the OTLP metric exporter (metrics::readers[*].periodic.exporter.otlp).

Field name	Default value	Description
`temporality_preference`	`cumulative`	Aggregation temporality for metric instruments. Supported values: `cumulative` (all instruments), `delta` (delta for counters, histograms, and observable counters; cumulative for all others), `lowmemory` (delta for counters and histograms; cumulative for all others).

Types of internal telemetry

The OpenTelemetry Collector aims to be a model of observable service by clearly exposing its own operational metrics. Additionally, it collects host resource metrics that can help you understand if problems are caused by a different process on the same host. Specific components of the Collector can also emit their own custom telemetry. In this section, you will learn about the different types of observability emitted by the Collector itself.

Summary of values observable with internal metrics

The Collector emits internal metrics for at least the following values:

Process uptime and CPU time since start.
Process memory and heap usage.
For receivers: Items accepted and refused, per data type.
For processors: Incoming and outgoing items.
For exporters: Items the exporter sent, failed to enqueue, and failed to send, per data type.
For exporters: Queue size and capacity.
Count, duration, and size of HTTP/gRPC requests and responses.

A more detailed list is available in the following sections.

Metric names

This section explains special naming conventions applied to some internal metrics.

`otelcol_` prefix

As of Collector v0.106.1, internal metric names are handled differently based on their source:

Metrics generated from Collector components are prefixed with otelcol_.
Metrics generated from instrumentation libraries do not use the otelcol_ prefix by default, unless their metric names are explicitly prefixed.

For Collector versions prior to v0.106.1, all internal metrics emitted using the Prometheus exporter, regardless of their origin, are prefixed with otelcol_. This includes metrics from both Collector components and instrumentation libraries.

`_total` suffix

By default and unique to Prometheus, the Prometheus exporter adds a _total suffix to summation metrics to follow Prometheus naming conventions, such as otelcol_exporter_send_failed_spans_total. This behavior can be disabled by setting without_type_suffix: true in the Prometheus exporter’s configuration.

If you leave out service::telemetry::metrics::readers in the Collector configuration, the default Prometheus exporter set up by the Collector already has without_type_suffix set to false. However, if you customize the readers and add a Prometheus exporter manually, you must set that option to return to the “raw” metric name. For more information, see the Collector v1.25.0/v0.119.0 release notes.

Internal metrics exported through OTLP do not have this behavior. The internal metrics on this page are listed in OTLP format, such as otelcol_exporter_send_failed_spans.

`_seconds` and other unit suffixes

The Prometheus exporter appends a unit suffix to metrics that carry a unit. For example, otelcol_process_uptime (unit: seconds) can be exported as otelcol_process_uptime_seconds_total — the _seconds unit suffix is added first, then the _total counter suffix.

The default Prometheus exporter configured by the Collector (when no readers are specified) already sets without_type_suffix and without_units to true for backwards compatibility, so otelcol_process_uptime is used as-is.

However, when you manually configure the Prometheus exporter under service::telemetry::metrics::readers, those options are not set by default. To keep the original, shorter metric names, explicitly set both options to true:

service:
  telemetry:
    metrics:
      readers:
        - pull:
            exporter:
              prometheus:
                host: '0.0.0.0'
                port: 8888
                without_type_suffix: true
                without_units: true

With this configuration, otelcol_process_uptime_seconds_total is exported as otelcol_process_uptime.

Dots (`.`) v. underscores (`_`)

http* and rpc* metrics come from instrumentation libraries. Their original names used dots (.). Prior to Collector v0.120.0, internal metrics exposed with Prometheus changed dots (.) to underscores (_) to match Prometheus naming conventions, resulting in metric names that looked like rpc_server_duration.

Versions 0.120.0 and later of the Collector use Prometheus 3.0 scrapers, so the original http* and rpc* metric names with dots are preserved. The internal metrics on this page are listed in their original form, such as rpc.server.call.duration. For more information, see the Collector v0.120.0 release notes.

Lists of internal metrics

The following tables group each internal metric by level of verbosity: basic, normal, and detailed. Each metric is identified by name and description and categorized by instrumentation type.

`basic`-level metrics

Metric name	Description	Type
`otelcol_exporter_enqueue_failed_` `log_records`	Number of logs that exporter(s) failed to enqueue.	Counter
`otelcol_exporter_enqueue_failed_` `metric_points`	Number of metric points that exporter(s) failed to enqueue.	Counter
`otelcol_exporter_enqueue_failed_` `spans`	Number of spans that exporter(s) failed to enqueue.	Counter
`otelcol_exporter_in_flight_requests`	Number of export requests currently in flight, including retry backoff.	UpDownCounter
`otelcol_exporter_queue_capacity`	Fixed capacity of the sending queue, in batches.	Gauge
`otelcol_exporter_queue_size`	Current size of the sending queue, in batches.	Gauge
`otelcol_exporter_send_failed_` `log_records`	Number of logs that exporter(s) failed to send to destination.	Counter
`otelcol_exporter_send_failed_` `metric_points`	Number of metric points that exporter(s) failed to send to destination.	Counter
`otelcol_exporter_send_failed_` `spans`	Number of spans that exporter(s) failed to send to destination.	Counter
`otelcol_exporter_sent_log_records`	Number of logs successfully sent to destination.	Counter
`otelcol_exporter_sent_metric_points`	Number of metric points successfully sent to destination.	Counter
`otelcol_exporter_sent_spans`	Number of spans successfully sent to destination.	Counter
`otelcol_process_cpu_seconds`	Total CPU user and system time in seconds.	Counter
`otelcol_process_memory_rss`	Total physical memory (resident set size) in bytes.	Gauge
`otelcol_process_runtime_heap_` `alloc_bytes`	Bytes of allocated heap objects (see ‘go doc runtime.MemStats.HeapAlloc’).	Gauge
`otelcol_process_runtime_total_` `alloc_bytes`	Cumulative bytes allocated for heap objects (see ‘go doc runtime.MemStats.TotalAlloc’).	Counter
`otelcol_process_runtime_total_` `sys_memory_bytes`	Total bytes of memory obtained from the OS (see ‘go doc runtime.MemStats.Sys’).	Gauge
`otelcol_process_uptime`	Uptime of the process in seconds.	Counter
`otelcol_processor_incoming_items`	Number of items passed to the processor.	Counter
`otelcol_processor_outgoing_items`	Number of items emitted from the processor.	Counter
`otelcol_receiver_accepted_` `log_records`	Number of logs successfully ingested and pushed into the pipeline.	Counter
`otelcol_receiver_accepted_` `metric_points`	Number of metric points successfully ingested and pushed into the pipeline.	Counter
`otelcol_receiver_accepted_spans`	Number of spans successfully ingested and pushed into the pipeline.	Counter
`otelcol_receiver_refused_` `log_records`	Number of logs that could not be pushed into the pipeline.	Counter
`otelcol_receiver_refused_` `metric_points`	Number of metric points that could not be pushed into the pipeline.	Counter
`otelcol_receiver_refused_spans`	Number of spans that could not be pushed into the pipeline.	Counter
`otelcol_scraper_errored_` `metric_points`	Number of metric points the Collector failed to scrape.	Counter
`otelcol_scraper_scraped_` `metric_points`	Number of metric points scraped by the Collector.	Counter

Additional `normal`-level metrics

Metric name	Description	Type
`otelcol_processor_batch_batch_` `send_size`	Number of units in the batch that was sent.	Histogram
`otelcol_processor_batch_batch_size_` `trigger_send`	Number of times the batch was sent due to a size trigger.	Counter
`otelcol_processor_batch_metadata_` `cardinality`	Number of distinct metadata value combinations being processed.	Counter
`otelcol_processor_batch_timeout_` `trigger_send`	Number of times the batch was sent due to a timeout trigger.	Counter

Batch processor metrics level changes

In Collector v0.99.0, all batch processor metrics were upgraded from basic to normal (current level), except for otelcol_processor_batch_batch_send_size_bytes, which has been detailed since its introduction. Note however that these metrics were inadvertently reverted to basic from v0.109.0 to v0.121.0.

Additional `detailed`-level metrics

Metric name	Description	Type
`http.client.request.body.size`	Measures the size of HTTP client request bodies.	Counter
`http.client.request.duration`	Measures the duration of HTTP client requests.	Histogram
`http.server.request.body.size`	Measures the size of HTTP server request bodies.	Counter
`http.server.request.duration`	Measures the duration of HTTP server requests.	Histogram
`http.server.response.body.size`	Measures the size of HTTP server response bodies.	Counter
`otelcol_processor_batch_batch_` `send_size_bytes`	Number of bytes in the batch that was sent.	Histogram
`rpc.client.call.duration`	Measures the duration of outbound remote procedure calls (RPC).	Histogram
`rpc.server.call.duration`	Measures the duration of inbound remote procedure calls (RPC).	Histogram

Ownership of emitted metrics

Some metrics are not owned by the Collector SIG and some are limited to certain components.

http*and rpc metrics

These metrics are not under the Collector SIG’s control, and as such, are not covered by the maturity levels below.

rpc metrics

The Collector’s internal RPC metrics come from the upstream otelgrpc instrumentation, which tracks the OpenTelemetry RPC semantic conventions. The set of RPC metrics emitted by the Collector has changed across releases:

Collector version	Emitted RPC metrics
v0.146.x and earlier	`rpc.client.duration`, `rpc.server.duration`, `rpc..request.size`, `rpc..response.size`, `rpc..requests_per_rpc`, `rpc..responses_per_rpc`
v0.147.0	`rpc.client.call.duration`, `rpc.server.call.duration`, `rpc..request.size`, `rpc..response.size` (the `*_per_rpc` metrics are deprecated and no longer emitted)
v0.148.0 and later	`rpc.client.call.duration`, `rpc.server.call.duration` only

RPC size metrics are not emitted by Collector v0.148.0 or later. The RPC semantic conventions v1.40.0 deprecated them due to ambiguous definitions and inconsistent implementation.

otelcol_processor_batch_* metrics

These metrics are unique to the batchprocessor.

helper package metrics

The otelcol_receiver_, otelcol_scraper_, otelcol_processor_, and otelcol_exporter_ metrics come from their respective helper packages. As such, some components not using those packages might not emit them.

Events observable with internal logs

The Collector logs the following internal events:

A Collector instance starts or stops.
Data dropping begins due to throttling for a specified reason, such as local saturation, downstream saturation, downstream unavailable, etc.
Data dropping due to throttling stops.
Data dropping begins due to invalid data. A sample of the invalid data is included.
Data dropping due to invalid data stops.
A crash is detected, differentiated from a clean stop. Crash data is included if available.

Telemetry maturity levels

The Collector telemetry levels apply to all first-party telemetry produced by the Collector. Third-party libraries, including those of OpenTelemetry Go, are not covered by these maturity levels.

Traces

Tracing instrumentation is still under active development, and changes might be made to span names, attached attributes, instrumented endpoints, or other aspects of the telemetry. Until this feature graduates to stable, there are no guarantees of backwards compatibility for tracing instrumentation.

Metrics

The Collector’s first-party metrics follow this lifecycle:

stateDiagram-v2
    state StabilityLevels {
    InDevelopment --> Alpha
    Alpha --> Beta
    Beta --> Stable
    }

    InDevelopment: In Development

    StabilityLevels --> Deprecated
    Deprecated --> Removed

The stability levels follow Semantic Conventions guidance, derived from OTEP-0232. Collector metrics skip the release_candidate level.

Note that the deprecated and deleted stages are lifecycle states, not stability levels.

Third-party metrics, including those generated by OpenTelemetry Go instrumentation libraries, are not covered by these maturity levels.

Development

Development metrics are still under active development and may change in any release.

Alpha

Alpha metrics have no stability guarantees. These metrics can be modified or deleted at any time.

Beta

Beta metrics may still change between releases, but component owners should try to minimize breaking changes. This stage encourages broader usage and is the final step before stable.

Stable

Stable metrics are guaranteed to not change. This means:

A stable metric without a deprecated signature will not be deleted or renamed.
A stable metric’s type and attributes will not be modified.

Deprecated

Deprecated metrics are slated for deletion but are still available for use. The description of these metrics include an annotation about the version in which they became deprecated. For example:

Before deprecation:

# HELP otelcol_exporter_queue_size this counts things
# TYPE otelcol_exporter_queue_size counter
otelcol_exporter_queue_size 0

After deprecation:

# HELP otelcol_exporter_queue_size (Deprecated since 1.15.0) this counts things
# TYPE otelcol_exporter_queue_size counter
otelcol_exporter_queue_size 0

Deleted

Deleted metrics are no longer published and cannot be used.

Logs

Individual log entries and their formatting might change from one release to the next. There are no stability guarantees at this time.

Use internal telemetry to monitor the Collector

This section recommends best practices for monitoring the Collector using its own telemetry.

Monitoring

Queue length

Most exporters provide a queue and/or retry mechanism that is recommended for use in any production deployment of the Collector.

The otelcol_exporter_queue_capacity metric indicates the capacity, in batches, of the sending queue. The otelcol_exporter_queue_size metric indicates the current size of the sending queue. Use these two metrics to check if the queue capacity can support your workload.

Using the following three metrics, you can identify the number of spans, metric points, and log records that failed to reach the sending queue:

otelcol_exporter_enqueue_failed_spans
otelcol_exporter_enqueue_failed_metric_points
otelcol_exporter_enqueue_failed_log_records

These failures could be caused by a queue filled with unsettled elements. You might need to decrease your sending rate or horizontally scale Collectors.

The queue or retry mechanism also supports logging for monitoring. Check the logs for messages such as Dropping data because sending_queue is full.

Receive failures

Sustained rates of otelcol_receiver_refused_log_records, otelcol_receiver_refused_spans, and otelcol_receiver_refused_metric_points indicate that too many errors were returned to clients. Depending on the deployment and the clients’ resilience, this might indicate clients’ data loss.

Sustained rates of otelcol_exporter_send_failed_log_records, otelcol_exporter_send_failed_spans, and otelcol_exporter_send_failed_metric_points indicate that the Collector is not able to export data as expected. These metrics do not inherently imply data loss since there could be retries. But a high rate of failures could indicate issues with the network or backend receiving the data.

Data flow

You can monitor data ingress with the otelcol_receiver_accepted_log_records, otelcol_receiver_accepted_spans, and otelcol_receiver_accepted_metric_points metrics and data egress with the otelcol_exporter_sent_log_records, otelcol_exporter_sent_spans, and otelcol_exporter_sent_metric_points metrics.

Comentarios

¿Fue útil esta página?

Thank you. Your feedback is appreciated!

Please let us know how we can improve this page. Your feedback is appreciated!

Última modificación June 30, 2026: Kbauer/internal telemetry config (#10075) (21ddc821)