HDFS NameNode Failed to Start Due to Insufficient Memory
Symptom
Scenario 1: After the HDFS service is restarted, HDFS is in the Bad state, and the NameNode instance status is abnormal and cannot exit the safe mode for a long time.
Scenario 2: The NameNode fails to be started after the startup times out, and the native web UI cannot be opened.
Cause Analysis
- In the NameNode run log (/var/log/Bigdata/hdfs/nn/hadoop-omm-namenode-XXX.log), search for WARN. It is found that GC takes 63 seconds. 2017-01-22 14:52:32,641 | WARN | org.apache.hadoop.util.JvmPauseMonitor$Monitor@1b39fd82 | Detected pause in JVM or host machine (eg GC): pause of approximately 63750msGC pool 'ParNew' had collection(s): count=1 time=0msGC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=63924ms | JvmPauseMonitor.java:189
- Analyze the NameNode log /var/log/Bigdata/hdfs/nn/hadoop-omm-namendoe-XXX.log. It is found that the NameNode is waiting for block reporting and the total number of blocks is too large. In the following example, the total number of blocks is 36.29 million.2017-01-22 14:52:32,641 | INFO | IPC Server handler 8 on 25000 | STATE* Safe mode ON.The reported blocks 29715437 needs additional 6542184 blocks to reach the threshold 0.9990 of total blocks 36293915.
- On Manager, check the GC_OPTS parameter of the NameNode:
Figure 1 Checking the GC_OPTS parameter of the NameNode
- For details about the mapping between the NameNode memory configuration and data volume, see Table 1.
Table 1 Mapping between NameNode memory configuration and data volume Number of File Objects
Reference Value
10,000,000
-Xms6G -Xmx6G -XX:NewSize=512M -XX:MaxNewSize=512M
20,000,000
-Xms12G -Xmx12G -XX:NewSize=1G -XX:MaxNewSize=1G
50,000,000
-Xms32G -Xmx32G -XX:NewSize=2G -XX:MaxNewSize=3G
100,000,000
-Xms64G -Xmx64G -XX:NewSize=4G -XX:MaxNewSize=6G
200,000,000
-Xms96G -Xmx96G -XX:NewSize=8G -XX:MaxNewSize=9G
300,000,000
-Xms164G -Xmx164G -XX:NewSize=12G -XX:MaxNewSize=12G
Solution
- Modify the NameNode memory parameter based on the specifications. If the number of blocks is 36 million, change the parameter value to -Xms32G -Xmx32G -XX:NewSize=2G -XX:MaxNewSize=3G.
- Restart a NameNode and check that the NameNode can be started normally.
- Restart the other NameNode and check that the page status is restored.
Parent topic: Using HDFS
- Symptom
- Cause Analysis
- Solution