Handling sensitive data

Best practices and guidance for handling sensitive data in OpenTelemetry

When implementing OpenTelemetry, it’s crucial to be mindful of sensitive data handling. The collection of telemetry data always carries the risk of inadvertently capturing sensitive or personal information that may be subject to various privacy regulations and compliance requirements.

Your responsibility

OpenTelemetry collects telemetry data, but it can’t determine what data is sensitive in your specific context on its own. As the implementer, you are responsible for:

  • Ensuring compliance with applicable privacy laws and regulations.
  • Protecting sensitive information in your telemetry data.
  • Obtaining necessary consents for data collection.
  • Implementing appropriate data handling and storage practices.

Additionally, you are responsible for understanding and reviewing the telemetry data emitted by any instrumentation libraries you use, as these libraries may collect and expose sensitive information as well.

Sensitive data considerations

What data is sensitive varies from situation to situation. Examples include:

  • Personal Identifiable Information (PII)
  • Authentication credentials
  • Session tokens
  • Financial information
  • Health-related data
  • User behavior data

Data minimization

When collecting potentially sensitive data through telemetry, follow the principle of data minimization. This means:

  • Only collect data that serves an observability purpose.
  • Avoid collecting personal information unless absolutely necessary.
  • Consider whether aggregated or anonymized data could serve the same purpose.
  • Regularly review collected attributes to ensure they remain necessary.

Protecting sensitive data

As outlined in the previous section, the best way to prevent the collection of sensitive data is not to collect data that might be sensitive. However, you might want to collect this data under certain circumstances, or perhaps have no full control over the data being collected, and need ways to scrape the data in post processing. The following suggestions can help you with that.

The OpenTelemetry Collector provides several processors that can help manage sensitive data:

Deleting and hashing user information

The following configuration for the attribute processor is hashing the user.email and deleting user.full_name from sensitive user information:

processors:
  attributes/example:
    actions:
      - key: user.email
        action: hash
      - key: user.full_name
        action: delete

Replacing user.id with user.hash

The following configuration for the transform processor can be used to remove the user.id and replace it with a user.hash:

transform:
  trace_statements:
    - context: span
      statements:
        - set(attributes["user.hash"], SHA256(attributes["user.id"]))
        - delete_key(attributes, "user.id")

Truncating IP addresses

As an alternative to hashing you can truncate data, or group it by a common prefix or suffix. This for example applies to

  • dates, where you keep only the year or the year and the month, but drop the day.
  • email addresses, where you drop the local part and only keep the domain.
  • IP addresses, where you drop drop the last octet of IPv4 or the last 80 bits of IPv6.

The following configuration for the transform processor drops the last octet of a client.address attribute:

transform:
  trace_statements:
    - context: span
      statements:
        - replace_pattern(attributes["client.address"], "\\.\\d+$", ".0")

Delete attributes with redaction processor

Finally, an example for the redaction processor to delete certain attributes can be found in the section “Scrub sensitive data” of the security best practices page for Collector configurations.