Troubleshooting

Troubleshooting OBI common issues and errors

On this page, you can learn how to diagnose and resolve common OBI errors and issues.

Troubleshooting tools

OBI provides a variety of tools and configuration options to help diagnose and troubleshoot issues.

Detailed logging

You can increase the logging verbosity of OBI by setting the log_level configuration or the OTEL_EBPF_LOG_LEVEL environment variable to debug. This provides more detailed logs that may help in diagnosing issues.

To enable logging from the BPF programs, set the ebpf.bpf_debug configuration or the OTEL_EBPF_BPF_DEBUG environment variable to true. Use this only for debugging, as it can generate a significant number of logs.

Configuration logging

By default, OBI merges its configuration from three different sources, from least to most priority:

  • Built-in default configuration
  • Configuration file, provided using the --config flag or OTEL_EBPF_CONFIG_PATH
  • Environment variables, usually starting with OTEL_EBPF_

It is often helpful to view the final merged configuration. Using the log_config configuration value (or OTEL_EBPF_LOG_CONFIG environment variable), you can instruct OBI to log the final configuration at startup.

log_config supports the following values:

  • yaml — logs the final configuration in YAML format; best for human readability since it matches the config file structure
  • json — logs the final configuration in JSON format; best for log shippers since it is a single structured line

Internal metrics

You can configure and use OBI internal metrics to monitor performance and internal state.

To turn on internal metrics, configure internal_metrics.exporter with one of the following values:

  • none (default): disables internal metrics
  • prometheus: exports internal metrics in Prometheus format via an HTTP server
  • otlp: exports internal metrics via an OTLP exporter

Debug traces exporter

To debug the raw trace spans generated by OBI, you can set the otel_traces_exporter.protocol configuration value or the OTEL_EXPORTER_OTLP_TRACES_PROTOCOL environment variable to debug. This logs the raw trace spans to the console in a human-readable format, matching the OTel Collector debug exporter with verbosity: detailed. Example spans to the console look like this:

Traces	{"resource spans": 1, "spans": 1}
ResourceSpans #0
Resource SchemaURL:
Resource attributes:
     -> service.name: Str(flagd)
     -> telemetry.sdk.language: Str(go)
     -> telemetry.sdk.name: Str(opentelemetry-ebpf-instrumentation)
     -> telemetry.sdk.version: Str(main)
     -> host.name: Str(flagd-5cccb4c4f5-sfkcm)
     -> os.type: Str(linux)
     -> service.namespace: Str(opentelemetry-demo)
     -> k8s.owner.name: Str(flagd)
     -> k8s.kind: Str(Deployment)
     -> k8s.replicaset.name: Str(flagd-5cccb4c4f5)
     -> k8s.pod.name: Str(flagd-5cccb4c4f5-sfkcm)
     -> k8s.container.name: Str(flagd)
     -> k8s.deployment.name: Str(flagd)
     -> service.version: Str(2.0.2)
     -> k8s.namespace.name: Str(default)
     -> otel.library.name: Str(go.opentelemetry.io/obi)
ScopeSpans #0
ScopeSpans SchemaURL:
InstrumentationScope
Span #0
    Trace ID       : 63a2723a58e0033170e58b1ff27ef03d
    Parent ID      :
    ID             : fab47609b60cc4e0
    Name           : /opentelemetry.proto.collector.metrics.v1.MetricsService/Export
    Kind           : Client
    Start time     : 2025-11-28 16:10:35.4241749 +0000 UTC
    End time       : 2025-11-28 16:10:35.42555658 +0000 UTC
    Status code    : Unset
    Status message :
Attributes:
     -> rpc.method: Str(/opentelemetry.proto.collector.metrics.v1.MetricsService/Export)
     -> rpc.system: Str(grpc)
     -> rpc.grpc.status_code: Int(0)
     -> server.address: Str(otel-collector.default)
     -> peer.service: Str(otel-collector.default)
     -> server.port: Int(4317)

Performance profiler (pprof)

OBI can expose a pprof port to allow performance profiling. To enable it, set the profile_port configuration value or the OTEL_EBPF_PROFILE_PORT environment variable to the desired port.

This is an advanced use case and typically not required.

Common OBI issues

This section covers how to resolve common OBI issues.

Node.js services crash or become unresponsive when OBI is running

To enable better context propagation in Node.js applications, OBI injects custom code to track the current execution context. It does so using the Node.js inspector protocol and sends the SIGUSR1 signal to the Node process to open the inspector.

However, if the application defines its own SIGUSR1 signal handler, it handles OBI’s signal in a custom way, which may cause crashes or unresponsiveness of the targeted application. For example:

process.on('SIGUSR1', () => {
  process.exit(0);
});

Or by using Node.js flags that register their own signal handling, such as:

node --heapsnapshot-signal=SIGUSR1

Solutions:

  • Use the discovery configuration to exclude specific Node.js applications from OBI tracking, preventing OBI from sending SIGUSR1.
  • Disable Node.js context propagation entirely by setting nodejs.enabled:false in configuration file or environment variable OTEL_EBPF_NODEJS_ENABLED=false.

ClickHouse instances crash when OBI is running

If you’re running Clickhouse on the same node with OBI, you might see ClickHouse crashing with logs such as:

Application: Code: 246. DB::Exception: Calculated checksum of the executable (...) does not correspond to the reference checksum ...

The issue is likely caused by OBI attaching eBPF uprobes to the ClickHouse binary. A relevant GitHub issue explains this behavior:

When attaching a uprobe, the kernel will modify the target process memory to insert a trap instruction at the attachment address. This causes the ClickHouse binary checksum validation to fail during startup.

Solution:

Start ClickHouse with the skip_binary_checksum_checks flag

Missing telemetry data for Go applications or TLS requests

If you are missing telemetry coming from Go applications or TLS requests (like HTTPS communication), it might be due to insufficient privileges for attaching uprobes. Due to some recent kernel security changes which were backported to many older kernel versions, uprobes now require CAP_SYS_ADMIN capability. OBI uses uprobes to instrument Golang applications and TLS requests, along with other runtime/language specific instrumentations. If your OBI deployment security configuration isn’t using privileged operation (for example, privileged:true or Docker and Kubernetes) or it doesn’t provide CAP_SYS_ADMIN as a security capability, you might not see some or all of your telemetry.

To troubleshoot this issue, enable detailed OBI logging with OTEL_EBPF_LOG_LEVEL=debug. If you see all the uprobe injections failing with the error “setting uprobe (offset)…” then you are likely experiencing this issue.

Solutions:

You can either:

  • Run OBI as privileged.
  • Add CAP_SYS_ADMIN to the list of capabilities in your deployment security configuration.

Last modified December 9, 2025: add obi troubleshooting docs (#8559) (bc9ab59a)