Tail-based sampling
You are viewing the English version of this page because it has not yet been fully translated. Interested in helping out? See Contributing.
This guide describes one possible way to achieve a form of tail-based sampling to include all failed activities (spans) in addition to head-based sampling in OpenTelemetry .NET.
What is tail-based sampling?
Tail-based sampling makes sampling decisions after the trace has been completed, allowing for more informed decisions based on the full context of the trace. This is in contrast to head-based sampling, which makes decisions at the beginning of the trace.
This implementation uses a combination of a custom sampler and an
ActivityProcessor
(span processor) to achieve a hybrid approach:
- Head-based sampling (probabilistic/unbiased sampling)
- Tail-based sampling (a non-probabilistic/biased sampling)
Implementation approach
The SDK uses a hybrid approach where we do head-based sampling to get a probabilistic subset of all activities which includes both successful activities and failure activities. In addition, it captures all failure activities.
To accomplish this:
- If the parent-based sampler’s decision is to drop an activity, the SDK returns a “Record-Only” sampling result. This ensures that the activity processor receives that activity.
- In the activity processor, at the end of an activity, the SDK checks if it is a failure activity. If so, the SDK changes the decision from “Record-Only” to set the sampled flag so that the exporter receives the activity.
In this example, each activity is filtered individually without consideration to any other activities.
When to use tail-based sampling
This is a good option if you want to get all failed activities in addition to head-based sampling. With this approach, you get basic activity-level tail-based sampling at the SDK level without having to install any additional components.
Tradeoffs
Tail-sampling this way involves several tradeoffs:
Additional performance cost: Unlike head-based sampling where the sampling decision is made at activity creation time, in tail sampling the decision is made only at the end, so there is additional memory/processing cost.
Partial traces: Since this sampling is at an activity level, the generated trace will be partial. For example, if another part of the call tree is successful, those activities may not be exported leading to an incomplete trace.
Multiple exporters: If multiple exporters are used, this decision will impact all of them.
Example code
The implementation consists of two main components:
1. A custom parent-based sampler that allows “Record-Only” decisions:
public class ParentBasedElseAlwaysRecordSampler : Sampler
{
private readonly Sampler _rootSampler;
public ParentBasedElseAlwaysRecordSampler(Sampler rootSampler)
: base()
{
_rootSampler = rootSampler ?? throw new ArgumentNullException(nameof(rootSampler));
}
public override SamplingResult ShouldSample(in SamplingParameters samplingParameters)
{
// If there's a parent, use its sampling decision
if (samplingParameters.ParentContext.TraceId != default)
{
if (samplingParameters.ParentContext.TraceFlags.HasFlag(ActivityTraceFlags.Recorded))
{
return new SamplingResult(SamplingDecision.RecordAndSample);
}
else
{
// Instead of dropping, we record this activity so that we can process it in our
// processor
return new SamplingResult(SamplingDecision.RecordOnly);
}
}
// This is a root activity. Use the root sampler to make the decision.
return _rootSampler.ShouldSample(samplingParameters);
}
public override string Description => $"ParentBasedElseAlwaysRecordSampler({_rootSampler.Description})";
}
2. A tail sampling processor that selectively samples failed activities:
public class TailSamplingProcessor : BaseProcessor<Activity>
{
private readonly string _statusTagName;
public TailSamplingProcessor(string statusTagName = "otel.status_code")
{
_statusTagName = statusTagName;
}
public override void OnEnd(Activity activity)
{
// If the activity is already sampled, we don't need to do anything
if (activity.ActivityTraceFlags.HasFlag(ActivityTraceFlags.Recorded))
{
return;
}
// Check if this is an error activity
bool isError = false;
if (activity.Status == ActivityStatusCode.Error)
{
isError = true;
}
else if (activity.TagObjects != null)
{
foreach (var tag in activity.TagObjects)
{
if (tag.Key == _statusTagName)
{
if (tag.Value?.ToString() == "ERROR")
{
isError = true;
break;
}
}
}
}
if (isError)
{
Console.WriteLine($"Including error activity with id {activity.Id} and status {activity.Status}");
activity.ActivityTraceFlags |= ActivityTraceFlags.Recorded;
}
else
{
Console.WriteLine($"Dropping activity with id {activity.Id} and status {activity.Status}");
}
}
}
Example output
When you run an application using this sampler and processor, you should see output similar to the following:
Including error activity with id
00-404ddff248b8f9a9b21e347d68d2640e-035858bc3c168885-01 and status Error
Activity.TraceId: 404ddff248b8f9a9b21e347d68d2640e
Activity.SpanId: 035858bc3c168885
Activity.TraceFlags: Recorded
Activity.ActivitySourceName: SDK.TailSampling.POC
Activity.DisplayName: SayHello
Activity.Kind: Internal
Activity.StartTime: 2023-02-09T19:05:32.5563112Z
Activity.Duration: 00:00:00.0028144
Activity.Tags:
foo: bar
StatusCode: Error
Resource associated with Activity:
service.name: unknown_service:Examples.TailBasedSamplingAtSpanLevel
Dropping activity with id 00-ea861bda268c58d328ab7cbe49851499-daba29055de80a53-00
and status Ok
Including head-sampled activity with id
00-f3c88010615e285c8f3cb3e2bcd70c7f-f9316215f12437c3-01 and status Ok
Activity.TraceId: f3c88010615e285c8f3cb3e2bcd70c7f
Activity.SpanId: f9316215f12437c3
Activity.TraceFlags: Recorded
Activity.ActivitySourceName: SDK.TailSampling.POC
Activity.DisplayName: SayHello
Activity.Kind: Internal
Activity.StartTime: 2023-02-09T19:05:32.8519346Z
Activity.Duration: 00:00:00.0000034
Activity.Tags:
foo: bar
StatusCode: Ok
Resource associated with Activity:
service.name: unknown_service:Examples.TailBasedSamplingAtSpanLevel
This shows that:
- Error activities are always included (through tail-based sampling)
- Some OK activities are dropped (if not selected by head-based sampling)
- Some OK activities are included (through head-based sampling)
Complete example
For the complete example including a working application, see the OpenTelemetry .NET repository.
Feedback
Was this page helpful?
Thank you. Your feedback is appreciated!
Please let us know how we can improve this page. Your feedback is appreciated!