Insufficient Number of Replicas Is Reported During High Concurrent HDFS Writes
Symptom
File writes to HDFS fail occasionally.
The operation log is as follows:
105 | INFO | IPC Server handler 23 on 25000 | IPC Server handler 23 on 25000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 192.168.1.96:47728 Call#1461167 Retry#0 | Server.java:2278java.io.IOException: File /hive/warehouse/000000_0.835bf64f-4103 could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and 3 node(s) are excluded in this operation.
Cause Analysis
- HDFS has a reservation mechanism for file writing: each block to be written is 128 MB no matter whether the file is 10 MB or 1 GB. If a 10 MB file needs to be written, the file occupies 10 MB of the first block and about 118 MB space will be released. If a 1 GB file needs to be written, HDFS writes the file block by block and releases unused space after the file is written.
- If there are a large number of files to be written concurrently, the disk space for reserved write blocks is insufficient. As a result, the file fails to be written.
Solution
- Log in to the HDFS WebUI and go to the JMX page of the DataNode.
- On the native HDFS page, choose Datanodes.
- Locate the target DataNode and click the HTTP address to go to the DataNode details page.
- Change datanode.html in url to jmx.
- Search for the XceiverCount indicator. If the value of this indicator multiplied by the block size exceeds the DataNode disk capacity, the disk space reserved for block write is insufficient.
- You can use either of the following methods to solve the problem:
Method 1: Reduce the service concurrency.
Method 2: Combine multiple files into one file to reduce the number of files to be written.
Parent topic: Using HDFS
- Symptom
- Cause Analysis
- Solution