Semantic conventions for GPU metrics

Status: Development

GPU metrics hw.gpu.*

Graphics Processing Unit (discrete).

hw.type MUST be set to "gpu".

All GPU metrics may include the below attributes:

AttributeTypeDescriptionExamplesRequirement LevelStability
hw.idstringAn identifier for the hardware component, unique within the monitored hostwin32battery_battery_testsysa33_1RequiredDevelopment
hw.driver_versionstringDriver version for the hardware component10.2.1-3RecommendedDevelopment
hw.firmware_versionstringFirmware version of the hardware component2.0.1RecommendedDevelopment
hw.modelstringDescriptive model name of the hardware componentPERC H740P; Intel(R) Core(TM) i7-10700K; Dell XPS 15 BatteryRecommendedDevelopment
hw.namestringAn easily-recognizable name for the hardware componenteth0RecommendedDevelopment
hw.parentstringUnique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller)dellStorage_perc_0RecommendedDevelopment
hw.serial_numberstringSerial number of the hardware componentCNFCP0123456789RecommendedDevelopment
hw.vendorstringVendor name of the hardware componentDell; HP; Intel; AMD; LSI; LenovoRecommendedDevelopment

Metric: hw.errors (GPU)

This metric is recommended.

Number of errors encountered by the GPU.

When using this metric, the following attributes MUST be set:

  • hw.type MUST be set to "gpu" to indicate that the errors are from a GPU.
  • error.type SHOULD be set to one of the following values to indicate the type of error:
    • "corrected": Errors that were detected and corrected by the GPU.
    • "uncorrected": Errors that were detected but could not be corrected by the GPU.
NameInstrument TypeUnit (UCUM)DescriptionStabilityEntity Associations
hw.errorsCounter{error}Number of errors encountered by the component.Development
AttributeTypeDescriptionExamplesRequirement LevelStability
hw.idstringAn identifier for the hardware component, unique within the monitored hostwin32battery_battery_testsysa33_1RequiredDevelopment
hw.typestringType of the component [1]battery; cpu; disk_controllerRequiredDevelopment
error.typestringThe type of error encountered by the component. [2]uncorrected; zero_buffer_credit; crc; bad_sectorConditionally Required if and only if an error has occurredStable
hw.namestringAn easily-recognizable name for the hardware componenteth0RecommendedDevelopment
hw.parentstringUnique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller)dellStorage_perc_0RecommendedDevelopment
network.io.directionstringDirection of network traffic for network errors. [3]receive; transmitRecommendedDevelopment

[1] hw.type: Describes the category of the hardware component for which hw.state is being reported. For example, hw.type=temperature along with hw.state=degraded would indicate that the temperature of the hardware component has been reported as degraded.

[2] error.type: The error.type SHOULD match the error code reported by the component, the canonical name of the error, or another low-cardinality error identifier. Instrumentations SHOULD document the list of errors they report.

[3] network.io.direction: This attribute SHOULD only be used when hw.type is set to "network" to indicate the direction of the error.


error.type has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

ValueDescriptionStability
_OTHERA fallback error value to be used when the instrumentation doesn’t define a custom value.Stable

hw.type has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

ValueDescriptionStability
batteryBatteryDevelopment
cpuCPUDevelopment
disk_controllerDisk controllerDevelopment
enclosureEnclosureDevelopment
fanFanDevelopment
gpuGPUDevelopment
logical_diskLogical diskDevelopment
memoryMemoryDevelopment
networkNetworkDevelopment
physical_diskPhysical diskDevelopment
power_supplyPower supplyDevelopment
tape_driveTape driveDevelopment
temperatureTemperatureDevelopment
voltageVoltageDevelopment

network.io.direction has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

ValueDescriptionStability
receivereceiveDevelopment
transmittransmitDevelopment

Metric: hw.gpu.io

This metric is recommended.

NameInstrument TypeUnit (UCUM)DescriptionStabilityEntity Associations
hw.gpu.ioCounterByReceived and transmitted bytes by the GPU.Development
AttributeTypeDescriptionExamplesRequirement LevelStability
hw.idstringAn identifier for the hardware component, unique within the monitored hostwin32battery_battery_testsysa33_1RequiredDevelopment
network.io.directionstringThe network IO operation direction.receive; transmitRequiredDevelopment
hw.driver_versionstringDriver version for the hardware component10.2.1-3RecommendedDevelopment
hw.firmware_versionstringFirmware version of the hardware component2.0.1RecommendedDevelopment
hw.modelstringDescriptive model name of the hardware componentPERC H740P; Intel(R) Core(TM) i7-10700K; Dell XPS 15 BatteryRecommendedDevelopment
hw.namestringAn easily-recognizable name for the hardware componenteth0RecommendedDevelopment
hw.parentstringUnique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller)dellStorage_perc_0RecommendedDevelopment
hw.serial_numberstringSerial number of the hardware componentCNFCP0123456789RecommendedDevelopment
hw.vendorstringVendor name of the hardware componentDell; HP; Intel; AMD; LSI; LenovoRecommendedDevelopment

network.io.direction has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

ValueDescriptionStability
receivereceiveDevelopment
transmittransmitDevelopment

Metric: hw.gpu.memory.limit

This metric is recommended.

NameInstrument TypeUnit (UCUM)DescriptionStabilityEntity Associations
hw.gpu.memory.limitUpDownCounterBySize of the GPU memory.Development
AttributeTypeDescriptionExamplesRequirement LevelStability
hw.idstringAn identifier for the hardware component, unique within the monitored hostwin32battery_battery_testsysa33_1RequiredDevelopment
hw.driver_versionstringDriver version for the hardware component10.2.1-3RecommendedDevelopment
hw.firmware_versionstringFirmware version of the hardware component2.0.1RecommendedDevelopment
hw.modelstringDescriptive model name of the hardware componentPERC H740P; Intel(R) Core(TM) i7-10700K; Dell XPS 15 BatteryRecommendedDevelopment
hw.namestringAn easily-recognizable name for the hardware componenteth0RecommendedDevelopment
hw.parentstringUnique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller)dellStorage_perc_0RecommendedDevelopment
hw.serial_numberstringSerial number of the hardware componentCNFCP0123456789RecommendedDevelopment
hw.vendorstringVendor name of the hardware componentDell; HP; Intel; AMD; LSI; LenovoRecommendedDevelopment

Metric: hw.gpu.memory.utilization

This metric is recommended.

NameInstrument TypeUnit (UCUM)DescriptionStabilityEntity Associations
hw.gpu.memory.utilizationGauge1Fraction of GPU memory used.Development
AttributeTypeDescriptionExamplesRequirement LevelStability
hw.idstringAn identifier for the hardware component, unique within the monitored hostwin32battery_battery_testsysa33_1RequiredDevelopment
hw.driver_versionstringDriver version for the hardware component10.2.1-3RecommendedDevelopment
hw.firmware_versionstringFirmware version of the hardware component2.0.1RecommendedDevelopment
hw.modelstringDescriptive model name of the hardware componentPERC H740P; Intel(R) Core(TM) i7-10700K; Dell XPS 15 BatteryRecommendedDevelopment
hw.namestringAn easily-recognizable name for the hardware componenteth0RecommendedDevelopment
hw.parentstringUnique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller)dellStorage_perc_0RecommendedDevelopment
hw.serial_numberstringSerial number of the hardware componentCNFCP0123456789RecommendedDevelopment
hw.vendorstringVendor name of the hardware componentDell; HP; Intel; AMD; LSI; LenovoRecommendedDevelopment

Metric: hw.gpu.memory.usage

This metric is recommended.

NameInstrument TypeUnit (UCUM)DescriptionStabilityEntity Associations
hw.gpu.memory.usageUpDownCounterByGPU memory used.Development
AttributeTypeDescriptionExamplesRequirement LevelStability
hw.idstringAn identifier for the hardware component, unique within the monitored hostwin32battery_battery_testsysa33_1RequiredDevelopment
hw.driver_versionstringDriver version for the hardware component10.2.1-3RecommendedDevelopment
hw.firmware_versionstringFirmware version of the hardware component2.0.1RecommendedDevelopment
hw.modelstringDescriptive model name of the hardware componentPERC H740P; Intel(R) Core(TM) i7-10700K; Dell XPS 15 BatteryRecommendedDevelopment
hw.namestringAn easily-recognizable name for the hardware componenteth0RecommendedDevelopment
hw.parentstringUnique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller)dellStorage_perc_0RecommendedDevelopment
hw.serial_numberstringSerial number of the hardware componentCNFCP0123456789RecommendedDevelopment
hw.vendorstringVendor name of the hardware componentDell; HP; Intel; AMD; LSI; LenovoRecommendedDevelopment

Metric: hw.gpu.utilization

This metric is recommended.

NameInstrument TypeUnit (UCUM)DescriptionStabilityEntity Associations
hw.gpu.utilizationGauge1Fraction of time spent in a specific task.Development
AttributeTypeDescriptionExamplesRequirement LevelStability
hw.idstringAn identifier for the hardware component, unique within the monitored hostwin32battery_battery_testsysa33_1RequiredDevelopment
hw.driver_versionstringDriver version for the hardware component10.2.1-3RecommendedDevelopment
hw.firmware_versionstringFirmware version of the hardware component2.0.1RecommendedDevelopment
hw.gpu.taskstringType of task the GPU is performingdecoder; encoder; generalRecommendedDevelopment
hw.modelstringDescriptive model name of the hardware componentPERC H740P; Intel(R) Core(TM) i7-10700K; Dell XPS 15 BatteryRecommendedDevelopment
hw.namestringAn easily-recognizable name for the hardware componenteth0RecommendedDevelopment
hw.parentstringUnique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller)dellStorage_perc_0RecommendedDevelopment
hw.serial_numberstringSerial number of the hardware componentCNFCP0123456789RecommendedDevelopment
hw.vendorstringVendor name of the hardware componentDell; HP; Intel; AMD; LSI; LenovoRecommendedDevelopment

hw.gpu.task has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

ValueDescriptionStability
decoderDecoderDevelopment
encoderEncoderDevelopment
generalGeneralDevelopment

Metric: hw.status (GPU)

This metric is recommended.

Operational status: 1 (true) or 0 (false) for each of the possible states.

When using this metric for GPU status, the following attributes MUST be set:

  • hw.type MUST be set to "gpu" to indicate that the status is for a GPU.
  • hw.state MUST be set to one of the following values to indicate the GPU state:
    • "ok": The GPU is operating normally.
    • "degraded": The GPU is operating with reduced functionality or performance.
    • "failed": The GPU has failed and is not operational.
    • "predicted_failure": The GPU is currently operational but is predicted to fail soon.
NameInstrument TypeUnit (UCUM)DescriptionStabilityEntity Associations
hw.statusUpDownCounter1Operational status: 1 (true) or 0 (false) for each of the possible states. [1]Development

[1]: hw.status is currently specified as an UpDownCounter but would ideally be represented using a StateSet as defined in OpenMetrics. This semantic convention will be updated once StateSet is specified in OpenTelemetry. This planned change is not expected to have any consequence on the way users query their timeseries backend to retrieve the values of hw.status over time.

AttributeTypeDescriptionExamplesRequirement LevelStability
hw.idstringAn identifier for the hardware component, unique within the monitored hostwin32battery_battery_testsysa33_1RequiredDevelopment
hw.statestringThe current state of the componentdegraded; failed; needs_cleaningRequiredDevelopment
hw.typestringType of the component [1]battery; cpu; disk_controllerRequiredDevelopment
hw.namestringAn easily-recognizable name for the hardware componenteth0RecommendedDevelopment
hw.parentstringUnique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller)dellStorage_perc_0RecommendedDevelopment

[1] hw.type: Describes the category of the hardware component for which hw.state is being reported. For example, hw.type=temperature along with hw.state=degraded would indicate that the temperature of the hardware component has been reported as degraded.


hw.state has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

ValueDescriptionStability
degradedDegradedDevelopment
failedFailedDevelopment
needs_cleaningNeeds CleaningDevelopment
okOKDevelopment
predicted_failurePredicted FailureDevelopment

hw.type has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

ValueDescriptionStability
batteryBatteryDevelopment
cpuCPUDevelopment
disk_controllerDisk controllerDevelopment
enclosureEnclosureDevelopment
fanFanDevelopment
gpuGPUDevelopment
logical_diskLogical diskDevelopment
memoryMemoryDevelopment
networkNetworkDevelopment
physical_diskPhysical diskDevelopment
power_supplyPower supplyDevelopment
tape_driveTape driveDevelopment
temperatureTemperatureDevelopment
voltageVoltageDevelopment