CPU Usage of DataNodes Is Close to 100% Occasionally, Causing Node Loss
Symptom
There is a possibility that the CPU usage of DataNodes is close to 100%. As a result, nodes may be lost (the SSH connection is slow or fails).
Figure 1 DataNode CPU usage close to 100%

Cause Analysis
- A lot of write failure logs exist on DataNodes.
Figure 2 DataNode write failure log
- A large number of files are written in a short time, causing insufficient DataNode memory.
Figure 3 Insufficient DataNode memory
Solution
- Check DataNode memory configuration and whether the remaining server memory is sufficient.
- Increase DataNode memory and restart the DataNode.
Parent topic: Using HDFS
- Symptom
- Cause Analysis
- Solution