# Logs Data Model

**Status**: [Stable](../document-status.md)


This is a data model and semantic conventions that allow to represent logs from
various sources: application log files, machine generated events, system logs,
etc. Existing log formats can be unambiguously mapped to this data model.
Reverse mapping from this data model is also possible to the extent that the
target log format has equivalent capabilities.

The purpose of the data model is to have a common understanding of what a log
record is, what data needs to be recorded, transferred, stored and interpreted
by a logging system.

This proposal defines a data model for
[Standalone Logs](../glossary.md#standalone-log).

## Design Notes

### Requirements

The Data Model was designed to satisfy the following requirements:

- It should be possible to unambiguously map existing log formats to this Data
  Model. Translating log data from an arbitrary log format to this Data Model
  and back should ideally result in identical data.

- Mappings of other log formats to this Data Model should be semantically
  meaningful. The Data Model must preserve the semantics of particular elements
  of existing log formats.

- Translating log data from an arbitrary log format A to this Data Model and
  then translating from the Data Model to another log format B ideally must
  result in a meaningful translation of log data that is no worse than a
  reasonable direct translation from log format A to log format B.

- It should be possible to efficiently represent the Data Model in concrete
  implementations that require the data to be stored or transmitted. We
  primarily care about 2 aspects of efficiency: CPU usage for
  serialization/deserialization and space requirements in serialized form. This
  is an indirect requirement that is affected by the specific representation of
  the Data Model rather than the Data Model itself, but is still useful to keep
  in mind.

The Data Model aims to successfully represent 3 sorts of logs and events:

- System Formats. These are logs and events generated by the operating system
  and over which we have no control - we cannot change the format or affect what
  information is included (unless the data is generated by an application which
  we can modify). An example of system format is Syslog.

- Third-party Applications. These are generated by third-party applications. We
  may have certain control over what information is included, e.g. customize the
  format. An example is Apache log file.

- First-party Applications. These are applications that we develop and we have
  some control over how the logs and events are generated and what information
  we include in the logs. We can likely modify the source code of the
  application if needed.

### Events

Events are OpenTelemetry's standardized format for LogRecords. All semantic
conventions defined for logs SHOULD be formatted as Events. Requirements and details for the Event format can be found in the [semantic conventions](/docs/specs/semconv/general/events.md).

Events are intended to be used by OpenTelemetry instrumentation. It is not a
requirement that all LogRecords are formatted as Events.

### Field Kinds

This Data Model defines a logical model for a log record (irrespective of the
physical format and encoding of the record). Each record contains 2 kinds of
fields:

- Named top-level fields of specific type and meaning.

- Fields stored as [Attribute Collections](../common/README.md#attribute-collections),
  whose values are [AnyValue](../common/README.md#anyvalue). The keys and values
  for well-known fields follow semantic conventions for key names and possible
  values that allow all parties that work with the field to have the same
  interpretation of the data. See references to semantic conventions for
  `Resource` and `Attributes` fields and examples in
  [Appendix A](./data-model-appendix.md#appendix-a-example-mappings).

The reasons for having these 2 kinds of fields are:

- Ability to efficiently represent named top-level fields, which are almost
  always present (e.g. when using encodings like Protocol Buffers where fields
  are enumerated but not named on the wire).

- Ability to enforce types of named fields, which is very useful for compiled
  languages with type checks.

- Flexibility to represent less frequent data in Attribute Collections. This
  includes well-known data that has standardized semantics as well as arbitrary
  custom data that the application may want to include in the logs.

When designing this data model we followed the following reasoning to make a
decision about when to use a top-level named field:

- The field needs to be either mandatory for all records or be frequently
  present in well-known log and event formats (such as `Timestamp`) or is
  expected to be often present in log records in upcoming logging systems (such
  as `TraceId`).

- The field’s semantics must be the same for all known log and event formats and
  can be mapped directly and unambiguously to this data model.

Both of the above conditions were required to give the field a place in the
top-level structure of the record.

## Log and Event Record Definition

[Appendix A](./data-model-appendix.md#appendix-a-example-mappings) contains many examples that show how
existing log formats map to the fields defined below. If there are questions
about the meaning of the field reviewing the examples may be helpful.

Here is the list of fields in a log record:

| Field Name | Description |
| ---------- | ----------- |
| Timestamp | Time when the event occurred. |
| ObservedTimestamp | Time when the event was observed. |
| TraceId | Request trace id. |
| SpanId | Request span id. |
| TraceFlags | W3C trace flag. |
| SeverityText | The severity text (also known as log level). |
| SeverityNumber | Numerical value of the severity. |
| Body | The body of the log record. |
| Resource | Describes the source of the log. |
| InstrumentationScope | Describes the scope that emitted the log. |
| Attributes | Additional information about the event. |
| EventName | Name that identifies the class / type of event. |

Below is the detailed description of each field.

### Field: `Timestamp`

Type: Timestamp, uint64 nanoseconds since Unix epoch.

Description: Time when the event occurred measured by the origin clock, i.e. the
time at the source. This field is optional, it may be missing if the source
timestamp is unknown.

### Field: `ObservedTimestamp`

Type: Timestamp, uint64 nanoseconds since Unix epoch.

Description: Time when the event was observed by the collection system. For
events that originate in OpenTelemetry (e.g. using OpenTelemetry Logging SDK)
this timestamp is typically set at the generation time and is equal to
Timestamp. For events originating externally and collected by OpenTelemetry
(e.g. using Collector) this is the time when OpenTelemetry's code observed the
event measured by the clock of the OpenTelemetry code. This field SHOULD be set
once the event is observed by OpenTelemetry.

For converting OpenTelemetry log data to formats that support only one timestamp
or when receiving OpenTelemetry log data by recipients that support only one
timestamp internally the following logic is recommended:

- Use `Timestamp` if it is present, otherwise use `ObservedTimestamp`.

### Trace Context Fields

#### Field: `TraceId`

Type: byte sequence.

Description: Request trace id as defined in
[W3C Trace Context](https://www.w3.org/TR/trace-context/#trace-id). Can be set
for logs that are part of request processing and have an assigned trace id. This
field is optional.

#### Field: `SpanId`

Type: byte sequence.

Description: Span id. Can be set for logs that are part of a particular
processing span. If SpanId is present TraceId SHOULD be also present. This field
is optional.

#### Field: `TraceFlags`

Type: byte.

Description: Trace flag as defined in
[W3C Trace Context](https://www.w3.org/TR/trace-context/#trace-flags)
specification. At the time of writing the specification defines one flag - the
SAMPLED flag. This field is optional.

### Severity Fields

#### Field: `SeverityText`

Type: string.

Description: severity text (also known as log level). This is the original
string representation of the severity as it is known at the source. If this
field is missing and `SeverityNumber` is present then the short name that
corresponds to the `SeverityNumber` may be used as a substitution. This field is
optional.

#### Field: `SeverityNumber`

Type: number.

Description: numerical value of the severity, normalized to values described in
this document. This field is optional.

`SeverityNumber` is an integer number. Smaller numerical values correspond to
less severe events (such as debug events), larger numerical values correspond to
more severe events (such as errors and critical events).

For example `SeverityNumber=17` describes an error that is less
critical than an error with `SeverityNumber=20`.

The following table defines the meaning of `SeverityNumber` value:

| SeverityNumber range | Range name | Meaning                                                                                 |
| -------------------- | ---------- | --------------------------------------------------------------------------------------- |
| 1-4                  | TRACE      | A fine-grained debugging event. Typically disabled in default configurations.           |
| 5-8                  | DEBUG      | A debugging event.                                                                      |
| 9-12                 | INFO       | An informational event. Indicates that an event happened.                               |
| 13-16                | WARN       | A warning event. Not an error but is likely more important than an informational event. |
| 17-20                | ERROR      | An error event. Something went wrong.                                                   |
| 21-24                | FATAL      | A fatal error such as application or system crash.                                      |

`SeverityNumber=0` MAY be used to represent an unspecified value.

#### Mapping of `SeverityNumber`

Mappings from existing logging systems and formats (or **source format** for
short) must define how severity (or log level) of that particular format
corresponds to `SeverityNumber` of this data model based on the meaning given
for each range in the above table.

If the source format has more than one severity that matches a single range in
this table then the severities of the source format must be assigned numerical
values from that range according to how severe (important) the source severity
is.

For example if the source format defines "Error" and "Critical" as error events
and "Critical" is a more important and more severe situation then we can choose
the following `SeverityNumber` values for the mapping: "Error"->17,
"Critical"->18.

If the source format has only a single severity that matches the meaning of the
range then it is recommended to assign that severity the smallest value of the
range.

For example if the source format has an "Informational" log level and no other
log levels with similar meaning then it is recommended to use
`SeverityNumber=9` for "Informational".

Source formats that do not define a concept of severity or log level MAY omit
`SeverityNumber` and `SeverityText` fields. Backend and UI may represent log
records with missing severity information distinctly or may interpret log
records with missing `SeverityNumber` and `SeverityText` fields as if the
`SeverityNumber` was set equal to INFO (numeric value of 9).

#### Reverse Mapping

When performing a reverse mapping from `SeverityNumber` to a specific format
and the `SeverityNumber` has no corresponding mapping entry for that format
then it is recommended to choose the target severity that is in the same
severity range and is closest numerically.

For example Zap has only one severity in the INFO range, called "Info". When
doing reverse mapping all `SeverityNumber` values in INFO range (numeric 9-12)
will be mapped to Zap’s "Info" level.

#### Error Semantics

If `SeverityNumber` is present and has a value of ERROR (numeric 17) or higher
then it is an indication that the log record represents an erroneous situation.
It is up to the reader of this value to make a decision on how to use this fact
(e.g. UIs may display such errors in a different color or have a feature to find
all erroneous log records).

If the log record represents an erroneous event and the source format does not
define a severity or log level concept then it is recommended to set
`SeverityNumber` to ERROR (numeric 17) during the mapping process. If the log
record represents a non-erroneous event the `SeverityNumber` field may be
omitted or may be set to any numeric value less than ERROR (numeric 17). The
recommended value in this case is INFO (numeric 9). See
[Appendix B](./data-model-appendix.md#appendix-b-severitynumber-example-mappings) for more mapping
examples.

#### Displaying Severity

The following table defines the recommended short name for each
`SeverityNumber` value. The short name can be used for example for representing
the `SeverityNumber` in the UI:

| SeverityNumber | Short Name |
| -------------- | ---------- |
| 1              | TRACE      |
| 2              | TRACE2     |
| 3              | TRACE3     |
| 4              | TRACE4     |
| 5              | DEBUG      |
| 6              | DEBUG2     |
| 7              | DEBUG3     |
| 8              | DEBUG4     |
| 9              | INFO       |
| 10             | INFO2      |
| 11             | INFO3      |
| 12             | INFO4      |
| 13             | WARN       |
| 14             | WARN2      |
| 15             | WARN3      |
| 16             | WARN4      |
| 17             | ERROR      |
| 18             | ERROR2     |
| 19             | ERROR3     |
| 20             | ERROR4     |
| 21             | FATAL      |
| 22             | FATAL2     |
| 23             | FATAL3     |
| 24             | FATAL4     |

When an individual log record is displayed it is recommended to show both
`SeverityText` and `SeverityNumber` values. A recommended combined string in
this case begins with the short name followed by `SeverityText` in parenthesis.

For example "Informational" Syslog record will be displayed as **INFO
(Informational)**. When for a particular log record the `SeverityNumber` is
defined but the `SeverityText` is missing it is recommended to only show the
short name, e.g. **INFO**.

When drop down lists (or other UI elements that are intended to represent the
possible set of values) are used for representing the severity it is preferable
to display the short name in such UI elements.

For example a dropdown list of severities that allows filtering log records by
severities is likely to be more usable if it contains the short names of
`SeverityNumber` (and thus has a limited upper bound of elements) compared to a
dropdown list, which lists all distinct `SeverityText` values that are known to
the system (which can be a large number of elements, often differing only in
capitalization or abbreviated, e.g. "Info" vs "Information").

#### Comparing Severity

In the contexts where severity participates in less-than / greater-than
comparisons `SeverityNumber` field should be used.
Special handling MAY be given to `SeverityNumber=0`
when it is used to represent an unspecified severity.

### Field: `Body`

Type: [AnyValue](../common/README.md#anyvalue).

Description: A value containing the body of the log record. Can be for example
a human-readable string message (including multi-line) describing the event in
a free form or it can be a structured data composed of arrays and maps of other
values. Body MUST support [AnyValue](../common/README.md#anyvalue)
to preserve the semantics of structured logs emitted by the applications.
Can vary for each occurrence of the event coming from the same source.
This field is optional.

### Field: `Resource`

Type: [Resource](../resource/sdk.md).

Description: Describes the source of the log, aka
[resource](../overview.md#resources). Multiple occurrences of events coming from
the same event source can happen across time and they all have the same value of
`Resource`. Can contain for example information about the application that emits
the record or about the infrastructure where the application runs. Data formats
that represent this data model may be designed in a manner that allows the
`Resource` field to be recorded only once per batch of log records that come
from the same source. SHOULD follow OpenTelemetry
[semantic conventions for Resources](/docs/specs/semconv/resource/README.md).
This field is optional.

### Field: `InstrumentationScope`

Type: [Instrumentation Scope](../common/instrumentation-scope.md).

Description: the [instrumentation scope](../common/instrumentation-scope.md).
Multiple occurrences of events coming from the same scope can happen across time and
they all have the same value of `InstrumentationScope`. This field is optional.

### Field: `Attributes`

Type: [Attribute Collection](../common/README.md#attribute-collections).

Description: Additional information about the specific event occurrence. Unlike
the `Resource` field, which is fixed for a particular source, `Attributes` can
vary for each occurrence of the event coming from the same source. Can contain
information about the request context (other than [Trace Context Fields](#trace-context-fields)).
This field is optional.

#### Errors and Exceptions

Additional information about errors and/or exceptions that are associated with
a log record MAY be included in the structured data in the `Attributes` section
of the record.
If included, they MUST follow the OpenTelemetry
[semantic conventions for exception-related attributes](/docs/specs/semconv/exceptions/exceptions-logs.md).

### Field: `EventName`

Type: string.

Description: Name that identifies the class / type of the [Event](#events).
This name SHOULD uniquely identify the event structure (both attributes and body).
A log record with a non-empty event name is an [Event](#events).

## Example Log Records

For example log records see
[JSON File serialization](../protocol/file-exporter.md#examples).

## Example Mappings

For example log format mappings, see the
[Data Model Appendix](./data-model-appendix.md).

## References

- Log Data Model [OTEP 0097](https://github.com/open-telemetry/opentelemetry-specification/tree/v1.55.0/oteps/logs/0097-log-data-model.md)

- [Draft discussion of Data Model](https://docs.google.com/document/d/1ix9_4TQO3o-qyeyNhcOmqAc1MTyr-wnXxxsdWgCMn9c/edit#)

- [Discussion of Severity field](https://docs.google.com/document/d/1WQDz1jF0yKBXe3OibXWfy3g6lor9SvjZ4xT-8uuDCiA/edit#)
