On this page, you can learn how to troubleshoot the health and performance of the OpenTelemetry Collector.
The Collector provides a variety of metrics, logs, and extensions for debugging issues.
You can configure and use the Collector’s own internal telemetry to monitor its performance.
For certain types of issues, such as configuration verification and network debugging, you can send a small amount of test data to a Collector configured to output to local logs. Using a local exporter, you can inspect the data being processed by the Collector.
For live troubleshooting, consider using the
debug
exporter,
which can confirm that the Collector is receiving, processing, and exporting
data. For example:
receivers:
zipkin:
exporters:
debug:
service:
pipelines:
traces:
receivers: [zipkin]
processors: []
exporters: [debug]
To begin testing, generate a Zipkin payload. For example, you can create a file
called trace.json
that contains:
[
{
"traceId": "5982fe77008310cc80f1da5e10147519",
"parentId": "90394f6bcffb5d13",
"id": "67fae42571535f60",
"kind": "SERVER",
"name": "/m/n/2.6.1",
"timestamp": 1516781775726000,
"duration": 26000,
"localEndpoint": {
"serviceName": "api"
},
"remoteEndpoint": {
"serviceName": "apip"
},
"tags": {
"data.http_response_code": "201"
}
}
]
With the Collector running, send this payload to the Collector:
curl -X POST localhost:9411/api/v2/spans -H'Content-Type: application/json' -d @trace.json
You should see a log entry like the following:
2023-09-07T09:57:43.468-0700 info TracesExporter {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 2}
You can also configure the debug
exporter so the entire payload is printed:
exporters:
debug:
verbosity: detailed
If you re-run the previous test with the modified configuration, the log output looks like this:
2023-09-07T09:57:12.820-0700 info TracesExporter {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 2}
2023-09-07T09:57:12.821-0700 info ResourceSpans #0
Resource SchemaURL: https://opentelemetry.io/schemas/1.4.0
Resource attributes:
-> service.name: Str(telemetrygen)
ScopeSpans #0
ScopeSpans SchemaURL:
InstrumentationScope telemetrygen
Span #0
Trace ID : 0c636f29e29816ea76e6a5b8cd6601cf
Parent ID : 1a08eba9395c5243
ID : 10cebe4b63d47cae
Name : okey-dokey
Kind : Internal
Start time : 2023-09-07 16:57:12.045933 +0000 UTC
End time : 2023-09-07 16:57:12.046058 +0000 UTC
Status code : Unset
Status message :
Attributes:
-> span.kind: Str(server)
-> net.peer.ip: Str(1.2.3.4)
-> peer.service: Str(telemetrygen)
Use the following sub-command to list the available components in a Collector distribution, including their stability levels. Please note that the output format might change across versions.
otelcol components
Sample output:
buildinfo:
command: otelcol
description: OpenTelemetry Collector
version: 0.96.0
receivers:
- name: opencensus
stability:
logs: Undefined
metrics: Beta
traces: Beta
- name: prometheus
stability:
logs: Undefined
metrics: Beta
traces: Undefined
- name: zipkin
stability:
logs: Undefined
metrics: Undefined
traces: Beta
- name: otlp
stability:
logs: Beta
metrics: Stable
traces: Stable
processors:
- name: resource
stability:
logs: Beta
metrics: Beta
traces: Beta
- name: span
stability:
logs: Undefined
metrics: Undefined
traces: Alpha
- name: probabilistic_sampler
stability:
logs: Alpha
metrics: Undefined
traces: Beta
exporters:
- name: otlp
stability:
logs: Beta
metrics: Stable
traces: Stable
- name: otlphttp
stability:
logs: Beta
metrics: Stable
traces: Stable
- name: debug
stability:
logs: Development
metrics: Development
traces: Development
- name: prometheus
stability:
logs: Undefined
metrics: Beta
traces: Undefined
connectors:
- name: forward
stability:
logs-to-logs: Beta
logs-to-metrics: Undefined
logs-to-traces: Undefined
metrics-to-logs: Undefined
metrics-to-metrics: Beta
traces-to-traces: Beta
extensions:
- name: zpages
stability:
extension: Beta
- name: health_check
stability:
extension: Beta
- name: pprof
stability:
extension: Beta
Here is a list of extensions you can enable for debugging the Collector.
The
pprof extension,
which is available locally on port 1777
, allows you to profile the Collector
as it runs. This is an advanced use-case that should not be needed in most
circumstances.
The
zPages extension,
which is exposed locally on port 55679
, can be used to inspect live data from
the Collector’s receivers and exporters.
The TraceZ page, exposed at /debug/tracez
, is useful for debugging trace
operations, such as:
Note that zpages
might contain error logs that the Collector does not emit
itself.
For containerized environments, you might want to expose this port on a public
interface instead of just locally. The endpoint
can be configured using the
extensions
configuration section:
extensions:
zpages:
endpoint: 0.0.0.0:55679
It can be difficult to isolate problems when telemetry flows through multiple Collectors and networks. For each “hop” of telemetry through a Collector or other component in your pipeline, it’s important to verify the following:
This section covers how to resolve common Collector issues.
The Collector and its components might experience data issues.
The Collector might drop data for a variety of reasons, but the most common are:
To mitigate drops, configure the
batch
processor.
In addition, it might be necessary to configure the
queued retry options
on enabled exporters.
The Collector might not receive data for the following reasons:
receivers
section but not enabled in any
pipelines
.Check the Collector’s logs as well as zPages for potential issues.
Most processing issues result from of a misunderstanding of how the processor works or a misconfiguration of the processor. For example:
The Collector might not export data for the following reasons:
Check the Collector’s logs as well as zPages for potential issues.
Exporting data often does not work because of a network configuration issue, such as a firewall, DNS, or proxy issue. Note that the Collector does have proxy support.
The Collector might experience failed startups or unexpected exits or restarts.
The Collector might exit or restart due to:
memory_limiter
processor.With v0.90.1 and earlier, the Collector might fail to start in a Windows Docker
container, producing the error message
The service process could not connect to the service controller
. In this case,
the NO_WINDOWS_SERVICE=1
environment variable must be set to force the
Collector to start as if it were running in an interactive terminal, without
attempting to run as a Windows service.
The Collector might experience problems due to configuration issues.
During configuration resolution of multiple configs, values in earlier configs are removed in favor of later configs, even if the later value is null. You can fix this issue by
{}
to represent an empty map, such as processors: {}
instead of
processors:
.processors:
from the configuration.See confmap troubleshooting for more information.
¿Fue útil esta página?
Thank you. Your feedback is appreciated!
Please let us know how we can improve this page. Your feedback is appreciated!