Semantic Conventions for System Metrics

Status: Experimental

This document describes instruments and attributes for common system level metrics in OpenTelemetry. Consider the general metric semantic conventions when creating instruments not explicitly defined in the specification.

Metric Instruments

system.cpu. - Processor metrics

Description: System level processor metrics.

NameDescriptionUnitsInstrument TypeValue TypeAttribute Key(s)Attribute Values
system.cpu.timesAsynchronous CounterDoublestateidle, user, system, interrupt, etc.
cpuCPU number [0..n-1]
system.cpu.utilizationDifference in system.cpu.time since the last measurement, divided by the elapsed time and number of CPUs1Asynchronous GaugeDoublestateidle, user, system, interrupt, etc.
cpuCPU number (0..n)

system.memory. - Memory metrics

Description: System level memory metrics. This does not include paging/swap memory.

NameDescriptionUnitsInstrument TypeValue TypeAttribute KeyAttribute Values
system.memory.usageByAsynchronous UpDownCounterInt64stateused, free, cached, etc.
system.memory.utilization1Asynchronous GaugeDoublestateused, free, cached, etc.

system.paging. - Paging/swap metrics

Description: System level paging/swap memory metrics.

NameDescriptionUnitsInstrument TypeValue TypeAttribute KeyAttribute Values
system.paging.usageUnix swap or windows pagefile usageByAsynchronous UpDownCounterInt64stateused, free
system.paging.utilization1Asynchronous GaugeDoublestateused, free
system.paging.faults{faults}Asynchronous CounterInt64typemajor, minor
system.paging.operations{operations}Asynchronous CounterInt64typemajor, minor
directionin, out

system.disk. - Disk controller metrics

Description: System level disk performance metrics.

NameDescriptionUnitsInstrument TypeValue TypeAttribute KeyAttribute Values
system.disk.ioByAsynchronous CounterInt64device(identifier)
directionread, write
system.disk.operations{operations}Asynchronous CounterInt64device(identifier)
directionread, write
system.disk.io_time[1]Time disk spent activatedsAsynchronous CounterDoubledevice(identifier)
system.disk.operation_time[2]Sum of the time each operation took to completesAsynchronous CounterDoubledevice(identifier)
directionread, write
system.disk.merged{operations}Asynchronous CounterInt64device(identifier)
directionread, write

1 The real elapsed time (“wall clock”) used in the I/O path (time from operations running in parallel are not counted). Measured as:

2 Because it is the sum of time each request took, parallel-issued requests each contribute to make the count grow. Measured as:

  • Linux: Fields 7 & 11 from procfs-diskstats
  • Windows: “Avg. Disk sec/Read” perf counter multiplied by “Disk Reads/sec” perf counter (similar for Writes)

system.filesystem. - Filesystem metrics

Description: System level filesystem metrics.

NameDescriptionUnitsInstrument TypeValue TypeAttribute KeyAttribute Values
system.filesystem.usageByAsynchronous UpDownCounterInt64device(identifier)
stateused, free, reserved
typeext4, tmpfs, etc.
moderw, ro, etc.
system.filesystem.utilization1Asynchronous GaugeDoubledevice(identifier)
stateused, free, reserved
typeext4, tmpfs, etc.
moderw, ro, etc.
mountpoint(path) - Network metrics

Description: System level network metrics.

NameDescriptionUnitsInstrument TypeValue TypeAttribute KeyAttribute Values[1]Count of packets that are dropped or discarded even though there was no error{packets}Asynchronous CounterInt64device(identifier)
directiontransmit, receive{packets}Asynchronous CounterInt64device(identifier)
directiontransmit, receive[2]Count of network errors detected{errors}Asynchronous CounterInt64device(identifier)
directiontransmit, receive CounterInt64device(identifier)
directiontransmit, receive{connections}Asynchronous UpDownCounterInt64device(identifier)
protocoltcp, udp, etc.
statee.g. for tcp

1 Measured as:

2 Measured as:

system.processes. - Aggregate system process metrics

Description: System level aggregate process metrics. For metrics at the individual process level, see process metrics.

NameDescriptionUnitsInstrument TypeValue TypeAttribute KeyAttribute Values
system.processes.countTotal number of processes in each state{processes}Asynchronous UpDownCounterInt64statusrunning, sleeping, etc.
system.processes.createdTotal number of processes created over uptime of the host{processes}Asynchronous CounterInt64--

system.{os}. - OS Specific System Metrics

Instrument names for system level metrics that have different and conflicting meaning across multiple OSes should be prefixed with system.{os}. and follow the hierarchies listed above for different entities like CPU, memory, and network.

For example, UNIX load average over a given interval is not well standardized and its value across different UNIX like OSes may vary despite being under similar load:

Without getting into the vagaries of every Unix-like operating system in existence, the load average more or less represents the average number of processes that are in the running (using the CPU) or runnable (waiting for the CPU) states. One notable exception exists: Linux includes processes in uninterruptible sleep states, typically waiting for some I/O activity to complete. This can markedly increase the load average on Linux systems.

(source of quote, linux source code)

An instrument for load average over 1 minute on Linux could be named system.linux.cpu.load_1m, reusing the cpu name proposed above and having an {os} prefix to split this metric across OSes.