If an Elasticsearch cluster has excess capacity due to off-peak traffic or reduced data volumes, you can reduce its nodes to optimize costs.
Type | Scenario | Change Process |
|---|---|---|
Removing nodes randomly | Randomly removes cluster nodes to optimize costs. |
Nodes are removed one at a time, so as to avoid interrupting services. |
Removing specified nodes | Removes specified cluster nodes to optimize costs. |
For a pay-per-use cluster, you can see its new price when confirming the scale-in on the console. After the scale-in is complete, the new price will apply.
For example, if a cluster has three data nodes, three client nodes, and three cold data nodes, a maximum of two nodes can be removed at a time. Formula: (3+3)/2 = 3; and the number of nodes that can be removed should be less than 3.
For example, if each index can have a maximum of two replicas, the remaining data nodes plus cold data nodes must be at least three.
For example, if a cluster has two data nodes and four master nodes, only one master node can be removed for the current scale-in operation. Formula: 4/2 = 2; and the number of nodes that can be removed should be less than 2.
Node Type | Value Range |
|---|---|
Data nodes |
|
Master nodes | 3, 5, 7, or 9 (must be an odd number from 3 to 9) |
Client nodes | 1–32 |
Cold data nodes | 1–32 |
Before the change, learn about possible impacts and operation suggestions, and develop a plan to minimize these impacts.
During a scale-in, shards on the to-be-removed nodes are migrated to the remaining nodes. This process will consume I/O performance. This is why you are advised to perform the operation during off-peak hours.
To minimize this impact, it is advisable to adjust the data migration rate based on the cluster's traffic cycle: increase the data migration rate during off-peak hours to shorten the task duration, and decrease it before peak hours arrive to ensure optimal cluster performance. The data migration rate is determined by the indices.recovery.max_bytes_per_sec parameter. The default value of this parameter is the number of vCPUs multiplied by 32 MB. For example, for four vCPUs, the data migration rate is 128 MB. Set this parameter to a value between 40 MB and 1000 MB based on your service requirements.
PUT /_cluster/settings{"transient": {"indices.recovery.max_bytes_per_sec": "1000MB"}}
After a scale-in, the remaining nodes will need to handle all of the cluster's load. This may lead to higher CPU, memory, and disk I/O usage, impacting query and write performance. If shards are unevenly allocated, performance bottlenecks may occur. This is why before a scale-in, it is necessary to evaluate whether the remaining nodes have the capacity to handle the current cluster load.
Once started, a scaling task cannot be stopped until it succeeds or fails.
The following formula can be used to estimate how long a scale-in operation will take:
Scale-in duration (min) = 5 (min) x Number of nodes to be removed + Data migration duration (min)
where, 5 minutes indicates how long non-data migration operations (e.g., initialization) typically take per node. It is an empirical value.
Data migration duration (min) = Total data size of the nodes to be removed (MB) ÷ [Total number of vCPUs of the data nodes x 32 (MB/s) x 60 (s)]
where,
The following formula can be used to estimate how long a node storage reduction operation will take:
Node storage reduction duration (min) = 15 (min) x Number of nodes to be changed + Data migration duration (min)
where,
Data migration duration (min) = Total data size (MB)/[Total number of vCPUs of the data nodes x 32 (MB/s) x 60 (s)]
where,
Parameter | Description |
|---|---|
Action | Select Scale in. |
Resources | Quantities of resources reduced. |
Nodes | Reduce the number of nodes in the Nodes column. You can change multiple node types at the same time. For the range of node quantities supported by each node type, see Constraints. |
Parameter | Description |
|---|---|
Node Type | Expand the node type that needs be changed to show all nodes under it. Select the nodes you want to remove. |
During data migration, the system migrates all data from the to-be-removed nodes to the remaining nodes, and removes these nodes upon completion of the data migration. If the data on the to-be-removed nodes has replicas on other nodes, data migration can be skipped and the cluster change can be completed faster.