Advanced
Тема интерфейса

Unbalanced DataNode Disk Usages of a Node

Symptom

The disk usage of each DataNode on a node is uneven.

Example:

189-39-235-71:~ # df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda 360G 92G 250G 28% /
/dev/xvdb 700G 900G 200G 78% /srv/BigData/hadoop/data1
/dev/xvdc 700G 900G 200G 78% /srv/BigData/hadoop/data2
/dev/xvdd 700G 900G 200G 78% /srv/BigData/hadoop/data3
/dev/xvde 700G 900G 200G 78% /srv/BigData/hadoop/data4
/dev/xvdf 10G 900G 890G 2% /srv/BigData/hadoop/data5
189-39-235-71:~ #

Possible Causes

Some disks are faulty and are replaced with new ones. The new disk usage is low.

Disks are added. For example, the original four data disks are expanded to five disks.

Cause Analysis

There are two policies for writing data to Block disks on DataNodes: 1. Round Robin (default value) and 2. Preferentially writing data to the disk with the more available space.

Description of the dfs.datanode.fsdataset.volume.choosing.policy parameter

Possible values:

  • Polling: org.apache.hadoop.hdfs.server.datanode.fsdataset.RoundRobinVolumeChoosingPolicy
  • Preferentially writing data to the disk with more available space: org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy

Solution

Change the value of dfs.datanode.fsdataset.volume.choosing.policy to org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy, save the settings, and restart the affected services or instances.

In this way, the DataNode preferentially selects a node with the most available disk space to store data copies.

Note
  • Data written to the DataNode will be preferentially written to the disk with more available disk space.
  • The high usage of some disks can be relieved with the gradual deletion of aging data from the HDFS.