nav-img
Advanced

Introduction to Server Monitoring

Server monitoring includes basic monitoring, process monitoring, and OS monitoring for servers.

  • Basic monitoring covers metrics automatically reported by ECSs. The data is collected every 5 minutes. For details, see Services Interconnected with Cloud Eye.
  • OS monitoring provides proactive and fine-grained OS monitoring for ECSs or BMSs, and it requires the Agent to be installed on all servers that will be monitored. The data is collected every minute. OS monitoring supports metrics such as CPU usage and memory usage (Linux). For details, see Services Interconnected with Cloud Eye.
  • Process monitoring provides monitoring of active processes on hosts. By default, Cloud Eye collects CPU usage, memory usage, and number of opened files of active processes.
Note

Scenarios

Whether you are using ECSs or BMSs, you can use server monitoring to track various OS metrics, monitor server resource usage, and query monitoring data when faults occur.

Monitoring Capabilities

Server monitoring provides multiple metrics, such as metrics for CPU, memory, disk, and network usage, meeting the basic monitoring and O&M requirements for servers. For details about metrics, see Services Interconnected with Cloud Eye.

Resource Usage

The Agent uses very few system resources (no more than 10% of a single-core CPU or no more than 200 MB memory). When the Agent is installed on a server, it uses less than 5% of the single-core CPU and less than 100 MB of memory.

In some scenarios, the CPU and memory usage of the Agent may increase sharply due to server operations. If the resource usage exceeds the threshold, circuit-breaking will be activated. The following table describes some common scenarios and typical solutions.

Table 1 High Agent resource usage scenarios

Cause

Scenario

Solution

Too many TCP connections

By default, the Agent collects only two basic metrics TCP TOTAL and TCP ESTABLISHED, which use a few CPU resources. If you choose to enable any detailed TCP metric by updating the configuration file, the Agent will start collecting all TCP metrics, which will consume a lot of CPU resources.

Basic TCP metrics: TCP TOTAL and TCP ESTABLISHED

TCP detailed metrics: TCP SYS_SENT, TCP SYS_RECV, TCP FIN_WAIT1, TCP FIN_WAIT2, TCP TIME_WAIT, TCP CLOSE, TCP CLOSE_WAIT, TCP LAST_ACK, TCP LISTEN, and TCP CLOSING

Method 1: Modify the configuration file to disable TCP detailed metric collection and reduce the CPU usage. For details, see How Do I Enable or Disable Metric Collection by Modifying the Configuration File?.

Method 2: Adjust the Agent resource usage threshold by referring to How Do I Change the Agent Resource Consumption Threshold by Modifying the Configuration File?.

Too many file handles

While the Agent is running, it monitors all files opened by processes on the host to track and sum the number of file handles If there are too many file handles, the Agent task will be re-executed, resulting in high CPU usage.

Method 1: Modify the configuration file to decrease the metric update interval for the Agent process to minimize CPU usage. For details, see How Do I Change the Process Collection Frequency by Modifying the Configuration File?.

Method 2: Change the Agent resource usage threshold. For details, see How Do I Change the Agent Resource Consumption Threshold by Modifying the Configuration File?.

Too many processes

When the Agent is running, it scans all processes on the current server and collects process-level metrics by reviewing process information. When there are too many processes, the Agent task is re-executed, leading to high CPU usage.