Semantic conventions for GPU metrics
Status: Development
GPU metrics hw.gpu.*
Graphics Processing Unit (discrete).
hw.type
MUST be set to "gpu"
.
All GPU metrics may include the below attributes:
Attribute | Type | Description | Examples | Requirement Level | Stability |
---|---|---|---|---|---|
hw.id | string | An identifier for the hardware component, unique within the monitored host | win32battery_battery_testsysa33_1 | Required | |
hw.driver_version | string | Driver version for the hardware component | 10.2.1-3 | Recommended | |
hw.firmware_version | string | Firmware version of the hardware component | 2.0.1 | Recommended | |
hw.model | string | Descriptive model name of the hardware component | PERC H740P ; Intel(R) Core(TM) i7-10700K ; Dell XPS 15 Battery | Recommended | |
hw.name | string | An easily-recognizable name for the hardware component | eth0 | Recommended | |
hw.parent | string | Unique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller) | dellStorage_perc_0 | Recommended | |
hw.serial_number | string | Serial number of the hardware component | CNFCP0123456789 | Recommended | |
hw.vendor | string | Vendor name of the hardware component | Dell ; HP ; Intel ; AMD ; LSI ; Lenovo | Recommended |
Metric: hw.errors
(GPU)
This metric is recommended.
Number of errors encountered by the GPU.
When using this metric, the following attributes MUST be set:
hw.type
MUST be set to"gpu"
to indicate that the errors are from a GPU.error.type
SHOULD be set to one of the following values to indicate the type of error:"corrected"
: Errors that were detected and corrected by the GPU."uncorrected"
: Errors that were detected but could not be corrected by the GPU.
Name | Instrument Type | Unit (UCUM) | Description | Stability | Entity Associations |
---|---|---|---|---|---|
hw.errors | Counter | {error} | Number of errors encountered by the component. |
Attribute | Type | Description | Examples | Requirement Level | Stability |
---|---|---|---|---|---|
hw.id | string | An identifier for the hardware component, unique within the monitored host | win32battery_battery_testsysa33_1 | Required | |
hw.type | string | Type of the component [1] | battery ; cpu ; disk_controller | Required | |
error.type | string | The type of error encountered by the component. [2] | uncorrected ; zero_buffer_credit ; crc ; bad_sector | Conditionally Required if and only if an error has occurred | |
hw.name | string | An easily-recognizable name for the hardware component | eth0 | Recommended | |
hw.parent | string | Unique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller) | dellStorage_perc_0 | Recommended | |
network.io.direction | string | Direction of network traffic for network errors. [3] | receive ; transmit | Recommended |
[1] hw.type
: Describes the category of the hardware component for which hw.state
is being reported. For example, hw.type=temperature
along with hw.state=degraded
would indicate that the temperature of the hardware component has been reported as degraded
.
[2] error.type
: The error.type
SHOULD match the error code reported by the component, the canonical name of the error, or another low-cardinality error identifier. Instrumentations SHOULD document the list of errors they report.
[3] network.io.direction
: This attribute SHOULD only be used when hw.type
is set to "network"
to indicate the direction of the error.
error.type
has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
Value | Description | Stability |
---|---|---|
_OTHER | A fallback error value to be used when the instrumentation doesn’t define a custom value. |
hw.type
has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
Value | Description | Stability |
---|---|---|
battery | Battery | |
cpu | CPU | |
disk_controller | Disk controller | |
enclosure | Enclosure | |
fan | Fan | |
gpu | GPU | |
logical_disk | Logical disk | |
memory | Memory | |
network | Network | |
physical_disk | Physical disk | |
power_supply | Power supply | |
tape_drive | Tape drive | |
temperature | Temperature | |
voltage | Voltage |
network.io.direction
has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
Value | Description | Stability |
---|---|---|
receive | receive | |
transmit | transmit |
Metric: hw.gpu.io
This metric is recommended.
Name | Instrument Type | Unit (UCUM) | Description | Stability | Entity Associations |
---|---|---|---|---|---|
hw.gpu.io | Counter | By | Received and transmitted bytes by the GPU. |
Attribute | Type | Description | Examples | Requirement Level | Stability |
---|---|---|---|---|---|
hw.id | string | An identifier for the hardware component, unique within the monitored host | win32battery_battery_testsysa33_1 | Required | |
network.io.direction | string | The network IO operation direction. | receive ; transmit | Required | |
hw.driver_version | string | Driver version for the hardware component | 10.2.1-3 | Recommended | |
hw.firmware_version | string | Firmware version of the hardware component | 2.0.1 | Recommended | |
hw.model | string | Descriptive model name of the hardware component | PERC H740P ; Intel(R) Core(TM) i7-10700K ; Dell XPS 15 Battery | Recommended | |
hw.name | string | An easily-recognizable name for the hardware component | eth0 | Recommended | |
hw.parent | string | Unique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller) | dellStorage_perc_0 | Recommended | |
hw.serial_number | string | Serial number of the hardware component | CNFCP0123456789 | Recommended | |
hw.vendor | string | Vendor name of the hardware component | Dell ; HP ; Intel ; AMD ; LSI ; Lenovo | Recommended |
network.io.direction
has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
Value | Description | Stability |
---|---|---|
receive | receive | |
transmit | transmit |
Metric: hw.gpu.memory.limit
This metric is recommended.
Name | Instrument Type | Unit (UCUM) | Description | Stability | Entity Associations |
---|---|---|---|---|---|
hw.gpu.memory.limit | UpDownCounter | By | Size of the GPU memory. |
Attribute | Type | Description | Examples | Requirement Level | Stability |
---|---|---|---|---|---|
hw.id | string | An identifier for the hardware component, unique within the monitored host | win32battery_battery_testsysa33_1 | Required | |
hw.driver_version | string | Driver version for the hardware component | 10.2.1-3 | Recommended | |
hw.firmware_version | string | Firmware version of the hardware component | 2.0.1 | Recommended | |
hw.model | string | Descriptive model name of the hardware component | PERC H740P ; Intel(R) Core(TM) i7-10700K ; Dell XPS 15 Battery | Recommended | |
hw.name | string | An easily-recognizable name for the hardware component | eth0 | Recommended | |
hw.parent | string | Unique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller) | dellStorage_perc_0 | Recommended | |
hw.serial_number | string | Serial number of the hardware component | CNFCP0123456789 | Recommended | |
hw.vendor | string | Vendor name of the hardware component | Dell ; HP ; Intel ; AMD ; LSI ; Lenovo | Recommended |
Metric: hw.gpu.memory.utilization
This metric is recommended.
Name | Instrument Type | Unit (UCUM) | Description | Stability | Entity Associations |
---|---|---|---|---|---|
hw.gpu.memory.utilization | Gauge | 1 | Fraction of GPU memory used. |
Attribute | Type | Description | Examples | Requirement Level | Stability |
---|---|---|---|---|---|
hw.id | string | An identifier for the hardware component, unique within the monitored host | win32battery_battery_testsysa33_1 | Required | |
hw.driver_version | string | Driver version for the hardware component | 10.2.1-3 | Recommended | |
hw.firmware_version | string | Firmware version of the hardware component | 2.0.1 | Recommended | |
hw.model | string | Descriptive model name of the hardware component | PERC H740P ; Intel(R) Core(TM) i7-10700K ; Dell XPS 15 Battery | Recommended | |
hw.name | string | An easily-recognizable name for the hardware component | eth0 | Recommended | |
hw.parent | string | Unique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller) | dellStorage_perc_0 | Recommended | |
hw.serial_number | string | Serial number of the hardware component | CNFCP0123456789 | Recommended | |
hw.vendor | string | Vendor name of the hardware component | Dell ; HP ; Intel ; AMD ; LSI ; Lenovo | Recommended |
Metric: hw.gpu.memory.usage
This metric is recommended.
Name | Instrument Type | Unit (UCUM) | Description | Stability | Entity Associations |
---|---|---|---|---|---|
hw.gpu.memory.usage | UpDownCounter | By | GPU memory used. |
Attribute | Type | Description | Examples | Requirement Level | Stability |
---|---|---|---|---|---|
hw.id | string | An identifier for the hardware component, unique within the monitored host | win32battery_battery_testsysa33_1 | Required | |
hw.driver_version | string | Driver version for the hardware component | 10.2.1-3 | Recommended | |
hw.firmware_version | string | Firmware version of the hardware component | 2.0.1 | Recommended | |
hw.model | string | Descriptive model name of the hardware component | PERC H740P ; Intel(R) Core(TM) i7-10700K ; Dell XPS 15 Battery | Recommended | |
hw.name | string | An easily-recognizable name for the hardware component | eth0 | Recommended | |
hw.parent | string | Unique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller) | dellStorage_perc_0 | Recommended | |
hw.serial_number | string | Serial number of the hardware component | CNFCP0123456789 | Recommended | |
hw.vendor | string | Vendor name of the hardware component | Dell ; HP ; Intel ; AMD ; LSI ; Lenovo | Recommended |
Metric: hw.gpu.utilization
This metric is recommended.
Name | Instrument Type | Unit (UCUM) | Description | Stability | Entity Associations |
---|---|---|---|---|---|
hw.gpu.utilization | Gauge | 1 | Fraction of time spent in a specific task. |
Attribute | Type | Description | Examples | Requirement Level | Stability |
---|---|---|---|---|---|
hw.id | string | An identifier for the hardware component, unique within the monitored host | win32battery_battery_testsysa33_1 | Required | |
hw.driver_version | string | Driver version for the hardware component | 10.2.1-3 | Recommended | |
hw.firmware_version | string | Firmware version of the hardware component | 2.0.1 | Recommended | |
hw.gpu.task | string | Type of task the GPU is performing | decoder ; encoder ; general | Recommended | |
hw.model | string | Descriptive model name of the hardware component | PERC H740P ; Intel(R) Core(TM) i7-10700K ; Dell XPS 15 Battery | Recommended | |
hw.name | string | An easily-recognizable name for the hardware component | eth0 | Recommended | |
hw.parent | string | Unique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller) | dellStorage_perc_0 | Recommended | |
hw.serial_number | string | Serial number of the hardware component | CNFCP0123456789 | Recommended | |
hw.vendor | string | Vendor name of the hardware component | Dell ; HP ; Intel ; AMD ; LSI ; Lenovo | Recommended |
hw.gpu.task
has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
Value | Description | Stability |
---|---|---|
decoder | Decoder | |
encoder | Encoder | |
general | General |
Metric: hw.status
(GPU)
This metric is recommended.
Operational status: 1
(true) or 0
(false) for each of the possible states.
When using this metric for GPU status, the following attributes MUST be set:
hw.type
MUST be set to"gpu"
to indicate that the status is for a GPU.hw.state
MUST be set to one of the following values to indicate the GPU state:"ok"
: The GPU is operating normally."degraded"
: The GPU is operating with reduced functionality or performance."failed"
: The GPU has failed and is not operational."predicted_failure"
: The GPU is currently operational but is predicted to fail soon.
Name | Instrument Type | Unit (UCUM) | Description | Stability | Entity Associations |
---|---|---|---|---|---|
hw.status | UpDownCounter | 1 | Operational status: 1 (true) or 0 (false) for each of the possible states. [1] |
[1]: hw.status
is currently specified as an UpDownCounter but would ideally be represented using a StateSet as defined in OpenMetrics. This semantic convention will be updated once StateSet is specified in OpenTelemetry. This planned change is not expected to have any consequence on the way users query their timeseries backend to retrieve the values of hw.status
over time.
Attribute | Type | Description | Examples | Requirement Level | Stability |
---|---|---|---|---|---|
hw.id | string | An identifier for the hardware component, unique within the monitored host | win32battery_battery_testsysa33_1 | Required | |
hw.state | string | The current state of the component | degraded ; failed ; needs_cleaning | Required | |
hw.type | string | Type of the component [1] | battery ; cpu ; disk_controller | Required | |
hw.name | string | An easily-recognizable name for the hardware component | eth0 | Recommended | |
hw.parent | string | Unique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller) | dellStorage_perc_0 | Recommended |
[1] hw.type
: Describes the category of the hardware component for which hw.state
is being reported. For example, hw.type=temperature
along with hw.state=degraded
would indicate that the temperature of the hardware component has been reported as degraded
.
hw.state
has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
Value | Description | Stability |
---|---|---|
degraded | Degraded | |
failed | Failed | |
needs_cleaning | Needs Cleaning | |
ok | OK | |
predicted_failure | Predicted Failure |
hw.type
has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
Value | Description | Stability |
---|---|---|
battery | Battery | |
cpu | CPU | |
disk_controller | Disk controller | |
enclosure | Enclosure | |
fan | Fan | |
gpu | GPU | |
logical_disk | Logical disk | |
memory | Memory | |
network | Network | |
physical_disk | Physical disk | |
power_supply | Power supply | |
tape_drive | Tape drive | |
temperature | Temperature | |
voltage | Voltage |
Feedback
Was this page helpful?
Thank you. Your feedback is appreciated!
Please let us know how we can improve this page. Your feedback is appreciated!