Introduction to Server Monitoring
Server monitoring includes basic monitoring, process monitoring, and OS monitoring for servers.
- Basic monitoring covers metrics automatically reported by ECSs. The data is collected every 5 minutes. For details, see Services Interconnected with Cloud Eye.
- OS monitoring provides proactive and fine-grained OS monitoring for ECSs or BMSs, and it requires the Agent to be installed on all servers that will be monitored. The data is collected every minute. OS monitoring supports metrics such as CPU usage and memory usage (Linux). For details, see Services Interconnected with Cloud Eye.
- Process monitoring provides monitoring of active processes on hosts. By default, Cloud Eye collects CPU usage, memory usage, and number of opened files of active processes.
- Windows and Linux OSs are supported. For details, see What OSs Does the Agent Support?
- For the ECS specifications, use 2 vCPUs and 4 GB memory for a Linux ECS and 4 vCPUs and 8 GB memory or higher specifications for a Windows ECS.
- The Agent will use the system ports. For details, see descriptions of ClientPort and PortNum in (Optional) Manually Configuring the Agent (Linux). If the Agent port conflicts with a service port, see What Should I Do If the Service Port Is Used by the Agent?
- To install the Agent in a Linux server, you must have the root permissions. For a Windows server, you must have the administrator permissions.
Scenarios
Whether you are using ECSs or BMSs, you can use server monitoring to track various OS metrics, monitor server resource usage, and query monitoring data when faults occur.
Monitoring Capabilities
Server monitoring provides multiple metrics, such as metrics for CPU, memory, disk, and network usage, meeting the basic monitoring and O&M requirements for servers. For details about metrics, see Services Interconnected with Cloud Eye.
Resource Usage
The Agent uses very few system resources (no more than 10% of a single-core CPU or no more than 200 MB memory). When the Agent is installed on a server, it uses less than 5% of the single-core CPU and less than 100 MB of memory.
In some scenarios, the CPU and memory usage of the Agent may increase sharply due to server operations. If the resource usage exceeds the threshold, circuit-breaking will be activated. The following table describes some common scenarios and typical solutions.
Cause | Scenario | Solution |
---|---|---|
Too many TCP connections | By default, the Agent collects only two basic metrics TCP TOTAL and TCP ESTABLISHED, which use a few CPU resources. If you choose to enable any detailed TCP metric by updating the configuration file, the Agent will start collecting all TCP metrics, which will consume a lot of CPU resources. Basic TCP metrics: TCP TOTAL and TCP ESTABLISHED TCP detailed metrics: TCP SYS_SENT, TCP SYS_RECV, TCP FIN_WAIT1, TCP FIN_WAIT2, TCP TIME_WAIT, TCP CLOSE, TCP CLOSE_WAIT, TCP LAST_ACK, TCP LISTEN, and TCP CLOSING | Method 1: Modify the configuration file to disable TCP detailed metric collection and reduce the CPU usage. For details, see How Do I Enable or Disable Metric Collection by Modifying the Configuration File?. Method 2: Adjust the Agent resource usage threshold by referring to How Do I Change the Agent Resource Consumption Threshold by Modifying the Configuration File?. |
Too many file handles | While the Agent is running, it monitors all files opened by processes on the host to track and sum the number of file handles If there are too many file handles, the Agent task will be re-executed, resulting in high CPU usage. | Method 1: Modify the configuration file to decrease the metric update interval for the Agent process to minimize CPU usage. For details, see How Do I Change the Process Collection Frequency by Modifying the Configuration File?. Method 2: Change the Agent resource usage threshold. For details, see How Do I Change the Agent Resource Consumption Threshold by Modifying the Configuration File?. |
Too many processes | When the Agent is running, it scans all processes on the current server and collects process-level metrics by reviewing process information. When there are too many processes, the Agent task is re-executed, leading to high CPU usage. |
- Scenarios
- Monitoring Capabilities
- Resource Usage