Observing Lambdas using the OpenTelemetry Collector Extension Layer
Getting telemetry data out of modern applications is very straightforward (or at least it should be). You set up a collector which either receives data from your application or asks it to provide an up-to-date state of various counters. This happens every minute or so, and if it’s a second late or early, no one really bats an eye. But what if the application isn’t around for long? What if every second waiting for the data to be collected is billed? Then you’re most likely thinking of Function-as-a-Service (FaaS) environments, the most well-known being AWS Lambda.
In this execution model, functions are called directly, and the environment is frozen afterward. You’re only billed for actual execution time and no longer need a server to wait for incoming requests. This is also where the term serverless comes from. Keeping the function alive until metrics can be collected isn’t really an option and even if you were willing to pay for that, different invocations will have a completely separate context and not necessarily know about all the other executions happening simultaneously. You might now be saying: “I’ll just push all the data at the end of my execution, no issues here!”, but that doesn’t solve the issue. You’ll still have to pay for the time it takes to send the data and with many invocations, this adds up.
But there is another way! Lambda extension layers allow you to run any process alongside your code, sharing the execution runtime and providing additional services. With the opentelemetry-lambda extension layer, you get a local endpoint to send data to while it keeps track of the Lambda lifecycle and ensures your telemetry gets to the storage layer.
How does it work?
When your function is called for the first time, the extension layer starts an instance of the OpenTelemetry Collector. The Collector build is a stripped down version, providing only components necessary in the context of Lambda. It registers with the Lambda Extensions API and Telemetry API. By doing this, it receives notifications whenever your function is executed, emits a logline, or the execution context is about to be shut down.
This is where the magic happens
Up until now, this just seems like extra work for nothing. You’ll still have to
wait for the Collector to export the data, right? This is where the special
decouple
processor comes in. It separates the receiving and exporting
components while interfacing with the Lambda lifecycle. This allows for the
Lambda to return, even if not all data has been sent. At the next invocation (or
on shutdown) the Collector continues exporting the data while your function does
its thing.
How can I use it?
As of November 2024, the opentelemetry-lambda project publishes releases of the Collector extension layer. It can be configured through a configuration file hosted either in an S3 bucket or on an arbitrary HTTP server. It is also possible to bundle the configuration file with your Lambda code. In both cases, you have tradeoffs to consider. Remote configuration files add to the cold start duration as an additional request needs to be made, while bundling the configuration increases the management overhead when trying to control the configuration for multiple Lambdas.
The simplest way to get started is with an embedded configuration. For this, add
a file called collector.yaml
to your function. This is a regular Collector
configuration file. To take advantage of the Lambda specific extensions, they
need to be configured. As an example, the configuration shown next receives
traces and logs from the Telemetry API and sends them to another endpoint.
receivers:
telemetryapi:
exporters:
otlphttp/external:
endpoint: 'external-collector:4318'
processors:
batch:
decouple:
service:
pipelines:
traces:
receivers: [telemetryapi]
processors: [batch, decouple]
exporters: [otlphttp/external]
logs:
receivers: [telemetryapi]
processors: [batch, decouple]
exporters: [otlphttp/external]
The decouple
processor is configured by default if omitted. It is explicitly
added in this example to illustrate the entire pipeline. For more information,
see
Autoconfiguration.
Afterward, set the OPENTELEMETRY_COLLECTOR_CONFIG_URI
environment variable to
/var/task/collector.yaml
. Once the function is redeployed, you’ll see your
function logs appear! You can see this in action in the video below.
Every log line your Lambda produces will be sent to the external-collector
endpoint specified. You don’t need to modify the code at all! From there,
telemetry data flows to your backend as usual. Since the transmission of
telemetry data might be frozen when the lambda is not active, logs can arrive
delayed. They’ll either arrive during the next execution or during the shutdown
interval.
If you want further insight into your applications, also see the language specific auto instrumentation layers.