指标与告警
MinIO 使用 Prometheus Data Model 发布指标。 你可以使用任意抓取工具从 MinIO 拉取指标数据,以执行进一步分析和配置告警。
从 MinIO Server RELEASE.2024-07-15T19-02-30Z 与 MinIO Client RELEASE.2024-07-11T18-01-28Z 开始,metrics version 3 提供了更多端点。 对于新部署,MinIO 建议使用 version 3。
Metrics version 2
现有部署可以继续使用 version 2 指标 和 Grafana 仪表板。
Version 3 端点
对于 metrics version 3,所有指标都位于基础端点 /minio/metrics/v3 之下。
你可以抓取该基础端点以一次性收集全部指标,也可以追加可选路径,仅返回特定类别的指标。
Important
本页中的 V3 指标说明可能存在缺漏、不准确或错误信息。 如需最准确的指标定义,请参考 minio/minio 仓库并审阅源代码。
例如,以下端点会返回 audit 指标:
http://HOSTNAME:PORT/minio/metrics/v3/audit
将 HOSTNAME:PORT 替换为 MinIO 部署的 FQDN 与端口。
对于使用负载均衡器管理 MinIO 节点间连接的部署,请指定负载均衡器地址。
默认情况下,MinIO 要求在抓取指标端点时进行身份验证。
如需生成所需的 bearer token,请使用 mc admin prometheus generate。
你也可以将 MINIO_PROMETHEUS_AUTH_TYPE 设置为 public,以禁用指标端点认证。
相对于基础 URL,MinIO 提供以下抓取端点:
类别 |
路径 |
|---|---|
API |
|
审计 |
|
集群 |
|
调试 |
|
ILM |
|
日志 Webhook |
|
通知 |
|
复制 |
|
扫描器 |
|
系统 |
|
各端点对应的完整指标列表,请参见 Available version 3 metrics。
如需在 MinIO Console 中启用历史数据可视化,请在 MinIO 部署的每个节点上设置以下环境变量:
将
MINIO_PROMETHEUS_URL设置为 Prometheus 服务的 URL将
MINIO_PROMETHEUS_JOB_ID设置为分配给已采集指标的唯一 job ID
可用的 version 3 指标
MinIO 为集群、API 请求、存储桶以及 MinIO 服务的其他方面发布多类指标:
许多指标都包含标签,用于标识生成该指标的资源及其他相关信息。
API metrics
Metrics about requests served by the current node.
Path |
Description |
|---|---|
|
Metrics over all requests. |
|
Metrics over all requests for a given bucket. |
/api/requests
Name |
Description |
Labels |
|---|---|---|
|
Total number of requests rejected for auth failure. |
|
|
Total number of requests rejected for invalid header. |
|
|
Total number of requests rejected for invalid timestamp. |
|
|
Total number of invalid requests. |
|
|
Total number of requests in the waiting queue. |
|
|
Total number of incoming requests. |
|
|
Total number of requests currently in flight. |
|
|
Total number of requests. |
|
|
Total number of requests with 4xx or 5xx errors. |
|
|
Total number of requests with 5xx errors. |
|
|
Total number of requests with 4xx errors. |
|
|
Total number of requests canceled by the client. |
|
|
Distribution of time to first byte across API calls. |
|
|
Total number of bytes sent. |
|
|
Total number of bytes received. |
|
/bucket/api
Name |
Description |
Labels |
|---|---|---|
|
Total number of bytes sent for a bucket. |
|
|
Total number of bytes received for a bucket. |
|
|
Total number of requests currently in flight for a bucket. |
|
|
Total number of requests for a bucket. |
|
|
Total number of requests canceled by the client for a bucket. |
|
|
Total number of requests with 4xx errors for a bucket. |
|
|
Total number of requests with 5xx errors for a bucket. |
|
|
Distribution of time to first byte across API calls for a bucket. |
|
Audit metrics
Metrics about the MinIO audit functionality.
Path |
Description |
|---|---|
|
Metrics related to audit functionality. |
/audit
Name |
Description |
Labels |
|---|---|---|
|
Total number of messages that failed to send since start. |
|
|
Number of unsent messages in queue for target. |
|
|
Total number of messages sent since start. |
|
Cluster metrics
Metrics about an entire MinIO cluster.
Path |
Description |
|---|---|
|
Cluster configuration metrics. |
|
Erasure set metrics. |
|
Cluster health metrics. |
|
Cluster iam metrics. |
|
Object statistics by bucket. |
|
Object statistics. |
/cluster/config
Name |
Description |
Labels |
|---|---|---|
|
Reduced redundancy storage class parity. |
|
|
Standard storage class parity. |
/cluster/erasure-set
Name |
Description |
Labels |
|---|---|---|
|
Overall write quorum across pools and sets. |
|
|
Overall health across pools and sets (1=healthy, 0=unhealthy). |
|
|
Read quorum for the erasure set in a pool. |
|
|
Write quorum for the erasure set in a pool. |
|
|
Count of online drives in the erasure set in a pool. |
|
|
Count of healing drives in the erasure set in a pool. |
|
|
Health of the erasure set in a pool (1=healthy, 0=unhealthy). |
|
|
Number of drive failures that can be tolerated without disrupting read operations. |
|
|
Number of drive failures that can be tolerated without disrupting write operations. |
|
|
Health of the erasure set in a pool for read operations (1=healthy, 0=unhealthy). |
|
|
Health of the erasure set in a pool for write operations (1=healthy, 0=unhealthy). |
|
/cluster/health
Name |
Description |
Labels |
|---|---|---|
|
Count of offline drives in the cluster. |
|
|
Count of online drives in the cluster. |
|
|
Count of all drives in the cluster. |
|
|
Count of offline nodes in the cluster. |
|
|
Count of online nodes in the cluster. |
|
|
Total cluster raw storage capacity in bytes. |
|
|
Total cluster raw storage free in bytes. |
|
|
Total cluster usable storage capacity in bytes. |
|
|
Total cluster usable storage free in bytes. |
/cluster/iam
Name |
Description |
Labels |
|---|---|---|
|
Last successful IAM data sync duration in milliseconds. |
|
|
When plugin authentication is configured, returns failed requests count in the last full minute. |
|
|
When plugin authentication is configured, returns time (in seconds) since the last failed request to the service. |
|
|
When plugin authentication is configured, returns time (in seconds) since the last successful request to the service. |
|
|
When plugin authentication is configured, returns average round-trip time of successful requests in the last full minute. |
|
|
When plugin authentication is configured, returns maximum round-trip time of successful requests in the last full minute. |
|
|
When plugin authentication is configured, returns total requests count in the last full minute. |
|
|
Time (in milliseconds) since last successful IAM data sync. |
|
|
Number of failed IAM data syncs since server start. |
|
|
Number of successful IAM data syncs since server start. |
/cluster/usage/buckets
Name |
Description |
Labels |
|---|---|---|
|
Time since last update of usage metrics in seconds. |
|
|
Total bucket size in bytes. |
|
|
Total object count in bucket. |
|
|
Total object versions count in bucket, including delete markers. |
|
|
Total delete markers count in bucket. |
|
|
Total bucket quota in bytes. |
|
|
Bucket object size distribution. |
|
|
Bucket object version count distribution. |
|
/cluster/usage/objects
Name |
Description |
Labels |
|---|---|---|
|
Time since last update of usage metrics in seconds. |
|
|
Total cluster usage in bytes. |
|
|
Total cluster objects count. |
|
|
Total cluster object versions count, including delete markers. |
|
|
Total cluster delete markers count. |
|
|
Total cluster buckets count. |
|
|
Cluster object size distribution. |
|
|
Cluster object version count distribution. |
|
Debug metrics
Standard Go runtime metrics from the Prometheus Go Client base collector.
Path |
Description |
|---|---|
|
Go runtime metrics. |
ILM metrics
Metrics about the MinIO ILM functionality.
Path |
Description |
|---|---|
|
Metrics related to ILM functionality. |
/ilm
Name |
Description |
Labels |
|---|---|---|
|
Number of pending ILM expiry tasks in the queue. |
|
|
Number of active ILM transition tasks. |
|
|
Number of pending ILM transition tasks in the queue. |
|
|
Number of missed immediate ILM transition tasks. |
|
|
Total number of object versions checked for ILM actions since server start. |
|
Logger webhook metrics
Metrics about MinIO logger webhooks.
Path |
Description |
|---|---|
|
Metrics related to logger webhooks. |
/logger/webhook
Name |
Description |
Labels |
|---|---|---|
|
Number of messages that failed to send. |
|
|
Webhook queue length. |
|
|
Total number of messages sent to this target. |
|
Notification metrics
Metrics about the MinIO notification functionality.
Path |
Description |
|---|---|
|
Metrics related to notification functionality. |
/notification
Name |
Description |
Labels |
|---|---|---|
|
Number of concurrent async Send calls active to all targets. |
|
|
Total number of events that failed to send to the targets. |
|
|
Total number of events sent to the targets. |
|
|
Number of events not sent to the targets due to the in-memory queue being full. |
|
Replication metrics
Metrics about MinIO site and bucket replication.
Path |
Description |
|---|---|
|
Metrics related to bucket replication. |
|
Metrics related to site replication. |
/replication
Name |
Description |
Labels |
|---|---|---|
|
Average number of active replication workers. |
|
|
Average number of bytes queued for replication since server start. |
|
|
Average number of objects queued for replication since server start. |
|
|
Average replication data transfer rate in bytes/sec. |
|
|
Total number of active replication workers. |
|
|
Current replication data transfer rate in bytes/sec. |
|
|
Number of bytes queued for replication in the last full minute. |
|
|
Number of objects queued for replication in the last full minute. |
|
|
Maximum number of active replication workers seen since server start. |
|
|
Maximum number of bytes queued for replication since server start. |
|
|
Maximum number of objects queued for replication since server start. |
|
|
Maximum replication data transfer rate in bytes/sec since server start. |
|
|
Total number of objects seen in replication backlog in the last 5 minutes |
|
/bucket/replication
Name |
Description |
Labels |
|---|---|---|
|
Total number of bytes on a bucket which failed to replicate at least once in the last hour. |
|
|
Total number of objects on a bucket which failed to replicate in the last hour. |
|
|
Total number of bytes on a bucket which failed at least once in the last full minute. |
|
|
Total number of objects on a bucket which failed to replicate in the last full minute. |
|
|
Replication latency on a bucket in milliseconds. |
|
|
Number of DELETE tagging requests proxied to replication target. |
|
|
Number of failures in GET requests proxied to replication target. |
|
|
Number of GET requests proxied to replication target. |
|
|
Number of failures in GET tagging requests proxied to replication target. |
|
|
Number of GET tagging requests proxied to replication target. |
|
|
Number of failures in HEAD requests proxied to replication target. |
|
|
Number of HEAD requests proxied to replication target. |
|
|
Number of failures in PUT tagging requests proxied to replication target. |
|
|
Number of PUT tagging requests proxied to replication target. |
|
|
Total number of bytes replicated to the target. |
|
|
Total number of objects replicated to the target. |
|
|
Total number of bytes failed to replicate at least once since server start. |
|
|
Total number of objects that failed to replicate since server start. |
|
|
Number of failures in DELETE tagging requests proxied to replication target. |
|
Scanner metrics
Metrics about the MinIO scanner.
Path |
Description |
|---|---|
|
Metrics related to the MinIO scanner. |
/scanner
Name |
Description |
Labels |
|---|---|---|
|
Total number of bucket scans completed since server start. |
|
|
Total number of bucket scans started since server start. |
|
|
Total number of directories scanned since server start. |
|
|
Time elapsed (in seconds) since last scan activity. |
|
|
Total number of unique objects scanned since server start. |
|
|
Total number of object versions scanned since server start. |
|
System metrics
Metrics about the MinIO process and the node.
Path |
Description |
|---|---|
|
Metrics about CPUs on the system. |
|
Metrics about drives on the system. |
|
Metrics about internode requests made by the node. |
|
Metrics about memory on the system. |
|
Standard process metrics. |
/system/drive
Name |
Description |
Labels |
|---|---|---|
|
Total storage used on a drive in bytes. |
|
|
Total storage free on a drive in bytes. |
|
|
Total storage available on a drive in bytes. |
|
|
Total used inodes on a drive. |
|
|
Total free inodes on a drive. |
|
|
Total inodes available on a drive. |
|
|
Total timeout errors on a drive. |
|
|
Total I/O errors on a drive. |
|
|
Total availability errors (I/O errors, timeouts) on a drive. |
|
|
Total waiting I/O operations on a drive. |
|
|
Average last minute latency in µs for drive API storage operations. |
|
|
Count of offline drives. |
|
|
Count of online drives. |
|
|
Count of all drives. |
|
|
Drive health (0 = offline, 1 = healthy, 2 = healing). |
|
|
Reads per second on a drive. |
|
|
Kilobytes read per second on a drive. |
|
|
Average time for read requests served on a drive. |
|
|
Writes per second on a drive. |
|
|
Kilobytes written per second on a drive. |
|
|
Average time for write requests served on a drive. |
|
|
Percentage of time the disk was busy. |
|
/system/memory
Name |
Description |
Labels |
|---|---|---|
|
Used memory on the node. |
|
|
Used memory percentage on the node. |
|
|
Free memory on the node. |
|
|
Total memory on the node. |
|
|
Buffers memory on the node. |
|
|
Cache memory on the node. |
|
|
Shared memory on the node. |
|
|
Available memory on the node. |
|
/system/cpu
Name |
Description |
Labels |
|---|---|---|
|
Average CPU idle time. |
|
|
Average CPU IOWait time. |
|
|
CPU load average 1min. |
|
|
CPU load average 1min (percentage). |
|
|
CPU nice time. |
|
|
CPU steal time. |
|
|
CPU system time. |
|
|
CPU user time. |
|
/system/network/internode
Name |
Description |
Labels |
|---|---|---|
|
Total number of failed internode calls. |
|
|
Total number of internode TCP dial timeouts and errors. |
|
|
Average dial time of internodes TCP calls in nanoseconds. |
|
|
Total number of bytes sent to other peer nodes. |
|
|
Total number of bytes received from other peer nodes. |
|
/system/process
Name |
Description |
Labels |
|---|---|---|
|
Number of current READ locks on this peer. |
|
|
Number of current WRITE locks on this peer. |
|
|
Total user and system CPU time spent in seconds. |
|
|
Total number of go routines running. |
|
|
Total bytes read by the process from the underlying storage system including cache, /proc/[pid]/io rchar. |
|
|
Total bytes read by the process from the underlying storage system, /proc/[pid]/io read_bytes. |
|
|
Total bytes written by the process to the underlying storage system including page cache, /proc/[pid]/io wchar. |
|
|
Total bytes written by the process to the underlying storage system, /proc/[pid]/io write_bytes. |
|
|
Start time for MinIO process in seconds since Unix epoch. |
|
|
Uptime for MinIO process in seconds. |
|
|
Limit on total number of open file descriptors for the MinIO Server process. |
|
|
Total number of open file descriptors by the MinIO Server process. |
|
|
Total read SysCalls to the kernel. /proc/[pid]/io syscr. |
|
|
Total write SysCalls to the kernel. /proc/[pid]/io syscw. |
|
|
Resident memory size in bytes. |
|
|
Virtual memory size in bytes. |
|
|
Maximum virtual memory size in bytes. |
|