# An Introduction to Observability for LLM-based applications using OpenTelemetry

LLMS index: [llms.txt](/llms.txt)

---

Large Language Models (LLMs) are really popular right now, especially
considering the wide range of applications that they have from simple chatbots
to Copilot bots that are helping software engineers write code. Seeing the
growing use of LLMs in production, it’s important for users to learn how to
understand and monitor how these models behave.

In the following example, we'll use [Prometheus](https://prometheus.io/) and
[Jaeger](https://www.jaegertracing.io/) as the target backend for metrics and
traces generated by an auto-instrumentation LLM monitoring library
[OpenLIT](https://github.com/openlit/openlit). We will use
[Grafana](https://grafana.com/oss/grafana/) as the tool to visualize the LLM
monitoring data. You can choose any backend of your choice to store OTel metrics
and traces.

## Why Observability Matters for LLM Applications

Monitoring LLM applications is crucial for several reasons.

1. It's vital to keep track of how often LLMs are being used for usage and cost
   tracking.
2. Latency is important to track since the response time from the model can vary
   based on the inputs passed to the LLM.
3. Rate limiting is a common challenge, particularly for external LLMs, as
   applications depend more on these external API calls. When rate limits are
   hit, it can hinder these applications from performing their essential
   functions using these LLMs.

By keeping a close eye on these aspects, you can not only save costs but also
avoid hitting request limits, ensuring your LLM applications perform optimally.

## What are the signals that you should be looking at?

Using Large Language Models (LLMs) in applications differs from traditional
machine learning (ML) models. Primarily, LLMs are often accessed through
external API calls instead of being run locally or in-house. It is crucial to
capture the sequence of events (using traces), especially in a RAG-based
application where there can be events before and after LLM usage. Also,
analyzing the aggregated data (through metrics) provides a quick overview like
request, tokens and cost is important for optimizing performance and managing
costs. Here are the key signals to monitor:

### Traces

- **Request Metadata**: This is important in the context of LLMs, given the
  variety of parameters (like `temperature` and `top_p`) that can drastically
  affect both the response quality and the cost. Specific aspects to monitor
  are:
  - **Temperature**: Indicates the level of creativity or randomness desired
    from the model's outputs. Varying this parameter can significantly impact
    the nature of the generated content.

  - **top_p**: Decides how selective the model is by choosing from a certain
    percentage of most likely words. A high "top_p" value means the model
    considers a wider range of words, making the text more varied.

  - **Model Name or Version**: Essential for tracking over time, as updates to
    the LLM might affect performance or response characteristics.

  - **Prompt Details**: The exact inputs sent to the LLM, which, unlike in-house
    ML models where inputs might be more controlled and homogeneous, can vary
    wildly and affect output complexity and cost implications.

- **Response Metadata**: Given the API-based interaction with LLMs, tracking the
  specifics of the response is key for cost management and quality assessment:
  - **Tokens**: Directly impacts cost and is a measure of response length and
    complexity.

  - **Cost**: Critical for budgeting, as API-based costs can scale with the
    number of requests and the complexity of each request.

  - **Response Details**: Similar to the prompt details but from the response
    perspective, providing insights into the model's output characteristics and
    potential areas of inefficiency or unexpected cost.

> [!NOTE]
>
> The LLM Working Group has recommended on capturing these details on events
> instead of span attributes because many backend systems can struggle with
> those often large payloads.

### Metrics

- **Request Volume**: The total number of requests made to the LLM service. This
  helps in understanding the demand patterns and identifying any anomaly in
  usage, such as sudden spikes or drops.
- **Request Duration**: The time it takes for a request to be processed and a
  response to be received from the LLM. This includes network latency and the
  time the LLM takes to generate a response, providing insights into the
  performance and reliability of the LLM service.
- **Costs and Tokens Counters**: Keeping track of the total cost accrued and
  tokens consumed over time is essential for budgeting and cost optimization
  strategies. Monitoring these metrics can alert you to unexpected increases
  that may indicate inefficient use of the LLM or the need for optimization.

## An example Setup

### Prerequisites

Before we begin, make sure you have the following running in your environment:

- Prometheus
- Jaeger
- Grafana

### Setting Up the OpenTelemetry Collector

First, [install the OpenTelemetry Collector](/docs/collector/install/).

### Configuring the Collector

Next, you need to tell the Collector where to send the data. Here's a simple
configuration for sending metrics to **Prometheus** and traces to **Jaeger**:

```yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
  memory_limiter:
    # 80% of maximum memory up to 2G
    limit_mib: 1500
    # 25% of limit up to 2G
    spike_limit_mib: 512
    check_interval: 5s

exporters:
  prometheusremotewrite:
    endpoint: 'YOUR_PROMETHEUS_REMOTE_WRITE_URL'
    add_metric_suffixes: false
  otlp:
    endpoint: 'YOUR_JAEGER_URL'

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlp]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [prometheusremotewrite]
```

### Instrument your LLM Application with OpenLIT

OpenLIT is an OpenTelemetry-based library designed to streamline the monitoring
of LLM-based applications by offering auto-instrumentation for a variety of
Large Language Models and VectorDBs.

It aligns with the GenAI semantic conventions established by the OpenTelemetry
community and ensures a smooth integration process by not relying on
vendor-specific span or event attributes or environment variables for OTLP
endpoint configuration, offering a standard solution.

#### Install the library

To install the OpenLIT Python Library, run this command:

```shell
pip install openlit
```

Then, add these lines to your LLM application:

```python
import openlit

openlit.init(
  otlp_endpoint="YOUR_OTELCOL_URL:4318",
)
```

You can instead pass the OpenTelemetry Collector URL using the
`OTEL_EXPORTER_OTLP_ENDPOINT` also.

```python
import openlit

openlit.init()
```

```shell
export OTEL_EXPORTER_OTLP_ENDPOINT = "YOUR_OTELCOL_URL:4318"
```

### Visualize the metrics and traces

After your OpenTelemetry Collectors start sending metrics to Prometheus and
traces to Jaeger, follow these steps to visualize them in Grafana. You can use
any tool of your choice to visualize this data:

#### Add Prometheus as a data source

1. In Grafana, navigate to **Connections** > **Data Sources**.
2. Click **Add data source** and select **Prometheus**.
3. In the settings, enter your Prometheus URL, for example,
   `http://<your_prometheus_host>`, along with any other necessary details.
4. Select **Save & Test**.

#### Add Jaeger as a data source

1. In Grafana, navigate to **Connections** > **Data Sources**.
2. Click **Add data source** and select **Jaeger**.
3. In the settings, enter your Jaeger URL, for example,
   `http://<your_jaeger_host>`, along with any other necessary details.
4. Select **Save & Test**.

#### Add the dashboard

To make things easy, you can use
[the OpenLIT's dashboard](https://docs.openlit.io/latest/sdk/destinations/prometheus-jaeger#prometheus-jaeger).

This guide showed you how to use OpenTelemetry, Prometheus, Jaeger, and Grafana
to monitor your LLM Applications.

If you have any questions, reach out on my GitHub
[@ishanjainn](https://github.com/ishanjainn) or Twitter
[@ishan_jainn](https://twitter.com/ishan_jainn).
