Tail-Based Sampling with service.criticality

আপনি এই পৃষ্ঠার ইংরেজি সংস্করণ দেখছেন কারণ এটি এখনও সম্পূর্ণভাবে অনুবাদ করা হয়নি। সাহায্য করতে আগ্রহী? দেখুন Contributing

This example demonstrates how to use the service.criticality resource attribute for intelligent tail-based sampling decisions in the OpenTelemetry Collector.

The demo application assigns a service.criticality value to each service, classifying them by operational importance:

CriticalitySampling RateServices
critical100%payment, checkout, frontend, frontend-proxy
high50%cart, product-catalog, currency, shipping
medium10%recommendation, ad, product-reviews, email
low1%accounting, fraud-detection, image-provider, load-generator, quote, flagd, flagd-ui, Kafka

Collector Configuration

To enable tail-based sampling, add the following to your otelcol-config-extras.yml:

processors:
  tail_sampling:
    decision_wait: 10s
    num_traces: 100000
    expected_new_traces_per_sec: 1000
    policies:
      # Policy 1: Always sample critical services (100%)
      - name: critical-services-always-sample
        type: string_attribute
        string_attribute:
          key: service.criticality
          values:
            - critical
          enabled_regex_matching: false
          invert_match: false

      # Policy 2: Sample 50% of high-criticality services
      - name: high-criticality-probabilistic
        type: and
        and:
          and_sub_policy:
            - name: is-high-criticality
              type: string_attribute
              string_attribute:
                key: service.criticality
                values:
                  - high
            - name: probabilistic-50
              type: probabilistic
              probabilistic:
                sampling_percentage: 50

      # Policy 3: Sample 10% of medium-criticality services
      - name: medium-criticality-probabilistic
        type: and
        and:
          and_sub_policy:
            - name: is-medium-criticality
              type: string_attribute
              string_attribute:
                key: service.criticality
                values:
                  - medium
            - name: probabilistic-10
              type: probabilistic
              probabilistic:
                sampling_percentage: 10

      # Policy 4: Sample 1% of low-criticality services
      - name: low-criticality-probabilistic
        type: and
        and:
          and_sub_policy:
            - name: is-low-criticality
              type: string_attribute
              string_attribute:
                key: service.criticality
                values:
                  - low
            - name: probabilistic-1
              type: probabilistic
              probabilistic:
                sampling_percentage: 1

      # Policy 5: Always sample error traces regardless of criticality
      - name: errors-always-sample
        type: status_code
        status_code:
          status_codes:
            - ERROR

      # Policy 6: Always sample slow traces from critical/high services
      - name: slow-critical-traces
        type: and
        and:
          and_sub_policy:
            - name: is-critical-or-high
              type: string_attribute
              string_attribute:
                key: service.criticality
                values:
                  - critical
                  - high
            - name: is-slow
              type: latency
              latency:
                threshold_ms: 5000

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [resourcedetection, memory_limiter, transform, tail_sampling]
      exporters: [otlp, debug, spanmetrics]

How It Works

The tail-sampling processor evaluates completed traces against the configured policies. A trace is sampled if any policy matches:

  • Critical services are always sampled to ensure full visibility into payment flows, checkout, and user-facing services.
  • High-criticality services are sampled at 50%, balancing observability with data volume.
  • Medium and low-criticality services are progressively sampled at lower rates to reduce noise from less critical paths.
  • Errors are always captured regardless of service criticality, ensuring no issues go unnoticed.
  • Slow traces (>5s) from critical and high-criticality services are always sampled to help identify performance bottlenecks.

সর্বশেষ পরিবর্তিত April 13, 2026: docs(demo): add tail-based sampling example using service.criticality (#9468) (d503571b)