HBase Cluster Monitoring Metrics
Description
Monitoring is critical to ensure CloudTable reliability, availability, and performance. You can monitor the running status of CloudTable servers.
This section describes the metrics that can be monitored by Cloud Eye as well as their namespaces and dimensions. You can use the management console or APIs provided by Cloud Eye to query the metrics of the monitored objects and alarms generated for CloudTable. For details, see the user guide and API reference of Cloud Eye.
Namespace
SYS.CloudTable
CloudTable HBase HMaster Instance Monitoring Metrics
Metric ID | Name | Meaning | Value Range | Unit | Conversion Rule | Monitored Object (Dimension) | Monitoring Interval (Raw Data) |
---|---|---|---|---|---|---|---|
disk_throughput_write_rate | Disks Read Rate | Volume of data read from the monitored object per second | >= 0 | Byte/s | 1024(IEC) | CloudTable instance node | 1 min |
disk_throughput_read_rate | Disks Write Rate | Volume of data written to the monitored object per second | >= 0 | Byte/s | 1024(IEC) | CloudTable instance node | 1 min |
cmdForTotalMemory | Total Memory | Total memory size of the monitored object | > 0 | Byte | 1024(IEC) | CloudTable instance node | 1 min |
cmdProcessCPU | CPU Usage | CPU usage of the monitored object | 0~100 | % | N/A | CloudTable instance node | 1 min |
cmdProcessMem | Memory Usage | Memory usage of the monitored object | 0~100 | % | N/A | CloudTable instance node | 1 min |
hm_deadregionservernum | Faulty RegionServers | Number of faulty RegionServers in the cluster | ≥ 0 | Count | N/A | CloudTable instance node | 1 min |
hm_regionservernum | Normal RegionServers | Number of normal RegionServers in the cluster | ≥ 0 | Count | N/A | CloudTable instance node | 1 min |
hm_ritCount | RIT Count | Number of regions in the Region In Transaction (RIT) state in the cluster where the monitored object is located | ≥ 0 | Count | N/A | CloudTable instance node | 1 min |
hm_ritCountOverThreshold | RIT Count Over Threshold | Number of regions in the RIT state and reach the threshold in the cluster where the monitored object is running | ≥ 0 | Count | N/A | CloudTable instance node | 1 min |
rs_queuecalltime_max | RPC Queue Call Time (Max) | Maximum RPC queue call time | >= 0 | ms | N/A | CloudTable instance node | 1 min |
rs_queuecalltime_mean | RPC Queue Call Time (Mean) | Mean RPC queue call time | >= 0 | ms | N/A | CloudTable instance node | 1 min |
nn_percentallused | Disk Utilization Rate | Disk space usage of the cluster | 0~100 | % | N/A | CloudTable instance node | 1 min |
nn_capacityremaining | Disk capacity remaining of cluster | Remaining disk space of the cluster | Depends on the cluster disk capacity. | GB | N/A | CloudTable instance node | 1 min |
nn_capacityused | Disk capacity used of cluster | Disk space used in the cluster | Depends on the cluster disk capacity. | GB | N/A | CloudTable instance node | 1 min |
cmdForUsedStorageRate | Ratio of Used Storage Space | Ratio of the used storage space to the total storage space in the cluster | 0~100 | % | N/A | CloudTable instance node | 1 min |
network_throughput_inbound_rate | Inbound Throughput | Inbound data volume over network of each node per second | >= 0 | KB/s | N/A | CloudTable instance node | 1 min |
network_throughput_outgoing_rate | Outbound Throughput | Outbound data volume over network of each node per second | >= 0 | KB/s | N/A | CloudTable instance node | 1 min |
disk_throughput_read_rate | Disk Read Throughput | Disk read throughput | >= 0 | Byte/s | 1024(IEC) | CloudTable instance node | 1 min |
disk_throughput_write_rate | Disk Write Throughput | Disk write throughput | >= 0 | Byte/s | 1024(IEC) | CloudTable instance node | 1 min |
hmaster instances include hmaster-standby (standby) and hmaster-active (active). When hmaster-active becomes faulty, hmaster-standby becomes active to provide services.
CloudTable HBase RegionServer Instance Monitoring Metrics
Table 2 lists the monitoring metrics supported by CloudTable HBase RegionServer instances.
Metric ID | Metric | Meaning | Value Range | Unit | Conversion Rule | Monitored Object (Dimension) | Monitoring Period (Raw Data) |
---|---|---|---|---|---|---|---|
cmdProcessCPU | CPU Usage | CPU usage of the monitored object | 0~100 | % | N/A | CloudTable instance node | 1 minute |
cmdForTotalMemory | Total Memory | Total memory size of the monitored object | > 0 | Byte | 1024(IEC) | CloudTable instance node | 1 minute |
cmdProcessMem | Memory Usage | Memory usage of the monitored object | 0~100 | % | N/A | CloudTable instance node | 1 minute |
disk_throughput_write_rate | Disks Write Rate | Volume of data written to the monitored object per second | >= 0 | Byte/s | 1024(IEC) | CloudTable instance node | 1 minute |
disk_throughput_read_rate | Disks Read Rate | Volume of data read from the monitored object per second | >= 0 | Byte/s | 1024(IEC) | CloudTable instance node | 1 minute |
hm_regionservernum | Normal RegionServers | Number of normal RegionServers | ≥ 0 | Count | N/A | CloudTable instance node | 1 minute |
hm_deadregionservernum | Faulty RegionServers | Number of faulty RegionServers | ≥ 0 | Count | N/A | CloudTable instance node | 1 minute |
hm_ritCountOverThreshold | RIT Count Over Threshold | Region in transaction count over threshold | ≥ 0 | Count | N/A | CloudTable instance node | 1 minute |
hm_ritCount | RIT Count | Region in transaction count | ≥ 0 | Count | N/A | CloudTable instance node | 1 minute |
rs_requests | Requests Per Second | Number of requests of a RegionServer per second | >= 0 | requests/s | N/A | CloudTable instance node | 1 minute |
rs_regions | Regions | Number of regions of a RegionServer | ≥ 0 | Count | N/A | CloudTable instance node | 1 minute |
rs_writerequestscount | Write Requests | Number of write requests of a RegionServer | ≥ 0 | Count | N/A | CloudTable instance node | 1 minute |
rs_readrequestscount | Read Requests | Number of read requests of a RegionServer | ≥ 0 | Count | N/A | CloudTable instance node | 1 minute |
rs_blockcachehitcachingratio | Hit Cache Block Caching Ratio | Block cache hit caching ratio | 0~100 | % | N/A | CloudTable instance node | 1 minute |
rs_blockCacheCountHitPercent | Hit Cache Block Ratio | Block cache hit ratio | 0~100 | % | N/A | CloudTable instance node | 1 minute |
rs_getavgtime | Get Delay (Avg) | Average Get operation delay of the RegionServer per unit time | >= 0 | ms | N/A | CloudTable instance node | 1 minute |
rs_putavgtime | Put Delay (Avg) | Average Put operation delay of the RegionServer per unit time | >= 0 | ms | N/A | CloudTable instance node | 1 minute |
rs_deleteavgtime | Delete Delay (Avg) | Average Delete operation delay of the RegionServer per unit time | >= 0 | ms | N/A | CloudTable instance node | 1 minute |
rs_getnumops | Get Operations | Number of Get operations of the RegionServer per unit time | ≥ 0 | Count | N/A | CloudTable instance node | 1 minute |
rs_putnumops | Put Operations | Number of Put operations of the RegionServer per unit time | ≥ 0 | Count | N/A | CloudTable instance node | 1 minute |
rs_deletenumops | Delete Operations | Number of Delete operations of the RegionServer per unit time | ≥ 0 | Count | N/A | CloudTable instance node | 1 minute |
rs_queuecalltime_max | RPC Queue Call Time (Max) | Maximum RPC queue call time | >= 0 | ms | N/A | CloudTable instance node | 1 minute |
rs_queuecalltime_mean | RPC Queue Call Time (Mean) | Mean RPC queue call time | >= 0 | ms | N/A | CloudTable instance node | 1 minute |
rs_flushtime_mean | Flush Time(Mean) | Mean time of flush | >= 0 | ms | N/A | CloudTable instance node | 1 minute |
rs_compactionqueuesize | Compaction Queue Size | Point in time length of the compaction queue. The number of Stores for compaction in the RegionServer. | ≥ 0 | Count | N/A | CloudTable instance node | 1 minute |
rs_flushqueuesize | Flush Queue Size | Flush queue size | ≥ 0 | Count | N/A | CloudTable instance node | 1 minute |
rs_compactionscompletedcount | Compaction Count | Count of compaction | ≥ 0 | Count | N/A | CloudTable instance node | 1 minute |
rs_flushtimeops_num | Flush Operation Count | Count of flush operation | ≥ 0 | Count | N/A | CloudTable instance node | 1 minute |
rs_blockcacheevictedcount | Discarded Cache Blocks | Block cache evict count | ≥ 0 | Count | N/A | CloudTable instance node | 1 minute |
rs_syncTime_max | Sync WAL Time(Max) | Maximum time it took to sync the WAL | >= 0 | ms | N/A | CloudTable instance node | 1 minute |
rs_syncTime_mean | Sync WAL Time(Mean) | Mean time it took to sync the WAL | >= 0 | ms | N/A | CloudTable instance node | 1 minute |
dn_byteswritten_speed | Bytes written per second | Bytes written per second of the node | >=0 | Byte | 1024(IEC) | CloudTable instance node | 1 min |
dn_bytesread_speed | Bytes read per second | Bytes read per second of the node | >=0 | Byte | 1024(IEC) | CloudTable instance node | 1 min |
rs_numActiveHandler | Number of RegionServer Active Handlers | Number of active RegionServer handlers (total number of handlers for processing user table requests, meta table requests, and replication requests) | ≥ 0 | Count | N/A | CloudTable instance node | 1 min |
rs_numActiveGeneralHandler | Number of RegionServer Active Handlers for Processing User Table Requests | Number of active RegionServer handlers for processing user table requests | ≥ 0 | Count | N/A | CloudTable instance node | 1 min |
rs_scanTime_p999 | 99.9th Percentile of the Scan Operation Delay | 99.9th percentile of the RegionServer Scan operation delay | >= 0 | ms | N/A | CloudTable instance node | 1 min |
rs_syncTime_p999 | 99.9th Percentile of the WAL Sync Operation Delay | 99.9th percentile of the RegionServer WAL Sync operation delay | >= 0 | ms | N/A | CloudTable instance node | 1 min |
rs_Get_99th_percentile | 99th Percentile of the Get Operation Delay | 99th percentile of the RegionServer Get operation delay | >= 0 | ms | N/A | CloudTable instance node | 1 min |
rs_Put_99th_percentile | 99th Percentile of the Put Operation Delay | 99th percentile of the RegionServer Put operation delay | >= 0 | ms | N/A | CloudTable instance node | 1 min |
rs_Delete_99th_percentile | 99th Percentile of the Delete Operation Delay | 99th percentile of the RegionServer Delete operation delay | >= 0 | ms | N/A | CloudTable instance node | 1 min |
rs_Get_999th_percentile | 99.9th Percentile of the Get Operation Delay | 99.9th percentile of the RegionServer Get operation delay | >= 0 | ms | N/A | CloudTable instance node | 1 min |
rs_Put_999th_percentile | 99.9th Percentile of the Put Operation Delay | 99.9th percentile of the RegionServer Put operation delay | >= 0 | ms | N/A | CloudTable instance node | 1 min |
rs_Delete_999th_percentile | 99.9th Percentile of the Delete Operation Delay | 99.9th percentile of the RegionServer Delete operation delay | >= 0 | ms | N/A | CloudTable instance node | 1 min |
Dimension
Key | Value |
---|---|
cluster_id | CloudTable cluster ID. |
instance_name | Name of a CloudTable cluster node. |
- Description
- Namespace
- CloudTable HBase HMaster Instance Monitoring Metrics
- CloudTable HBase RegionServer Instance Monitoring Metrics
- Dimension