Upgrading the Version of an OpenSearch Cluster
OpenSearch clusters support both same-version upgrade and cross-version upgrade.
- During a same-version upgrade, kernel patches are updated for a cluster. The cluster is upgraded to the latest image of the current version to fix known issues or optimize performance. For example, if the cluster version is 1.3.6(1.3.6_24.3.3_0102), upon a same-version upgrade, the cluster will be upgraded to the latest image 1.3.6(1.3.6_24.3.4_0109) of version 1.3.6. (The version numbers used here are examples only.)
- Cross-version upgrade means to upgrade a cluster to the latest image of the target version to enhance functionality or incorporate versions. For example, if the cluster version is 1.3.6(1.3.6_24.3.3_1224), upon a cross-version upgrade, the cluster will be upgraded to the latest image 2.17.1(2.17.1_24.3.4_0109) of version 2.17.1. (The version numbers used here are examples only.)
The nodes in a cluster are upgraded one at a time to prevent service interruption. The upgrade process is as follows: Bring a node offline, migrate its data to another node, create a new node of the target version, and mount the NIC ports of the offline node to the new node to reuse the node IP address, then add the new node to the cluster. Upgrade the remaining nodes one at a time in the same way. If there is a large amount of data in a cluster, the upgrade duration depends on the data migration duration.
Upgrade Impact
Before upgrading a cluster, you need to understand the potential impacts and operation suggestions to be able to properly schedule the upgrade time based on service requirements and cluster status. This helps to ensure a smooth upgrade process and minimize impact on services.
- Upgrade process and performance impact
The nodes of a cluster are upgraded one at a time to ensure service continuity. However, data migration during the upgrade consumes I/O performance, and taking individual nodes offline still has some impact on the overall cluster performance. To minimize this impact, it is advisable to adjust the data migration speed based on the cluster's traffic cycle: increase the data migration speed during off-peak hours to accelerate the upgrade, and decrease it during peak hours to ensure optimal cluster performance.
The data migration speed is determined by the indices.recovery.max_bytes_per_sec parameter. The default value is Number of CPU cores x 32, in MB (per second). You can change this parameter as needed within the range of 40 MB to 1000 MB.
PUT /_cluster/settings{"transient": {"indices.recovery.max_bytes_per_sec": "1000mb"}} - Node replacement and request handling
During the upgrade, replacing a node may lead to request failures. To avoid this problem, you are advised to use the VPC Endpoint Service or a dedicated load balancer to handle cluster access requests, and also add a request retry logic to the client code. Additionally, you are advised to perform the upgrade during off-peak hours to further minimize potential impact.
For details about how to configure the VPC Endpoint Service or a dedicated load balancer, see Configuring VPC Endpoint Service for an OpenSearch Cluster and Configuring a Dedicated Load Balancer for an OpenSearch Cluster.
- Rebuilding OpenSearch Dashboards and Cerebro
OpenSearch Dashboards and Cerebro will be rebuilt during the upgrade, making them temporarily unavailable. Additionally, due to cross-version compatibility issues, OpenSearch Dashboards may become unavailable during the upgrade. These problems will go away once the upgrade is completed.
- Upgrade task management
Once started, an upgrade task cannot be stopped until it succeeds or fails. An upgrade failure affects individual nodes only. It will not affect services as long as there are replicas for such nodes.
Constraints
- A maximum of 20 clusters can be upgraded at the same time. You are advised to perform the upgrade during off-peak hours.
- Clusters that have ongoing tasks cannot be upgraded.
Pre-Upgrade Check
To ensure a successful upgrade, you must check the items listed in the following table before performing an upgrade.
Check Item | Check Method | Description | Normal Status |
---|---|---|---|
Cluster status | System check | After an upgrade task is started, the system automatically checks the cluster status. Clusters whose status is green or yellow can work properly and have no unallocated primary shards. | The cluster status is Available. |
Node quantity | System check | During a cluster upgrade, the system automatically checks the number of nodes. To ensure service continuity, the total number of data nodes and cold data nodes in a cluster must be greater than or equal to 3. | The total number of data nodes and cold data nodes in a cluster must be greater than or equal to 3. |
Disk capacity | System check | After an upgrade task is started, the system automatically checks the disk capacity. During the upgrade, nodes are brought offline one by one and then new nodes are created. Ensure that the disk capacity of all the remaining nodes can process all data of the node that has been brought offline. | After a node is brought offline, the remaining nodes can contain all data of the cluster. |
Data backup | System check | Check whether the maximum number of primary and standby shards of indexes in a cluster can be allocated to the remaining data nodes and cold data nodes. Prevent backup allocation failures after a node is brought offline during the upgrade. | The maximum number of primary and standby shards plus 1 must be less than or equal to the total number of data nodes and cold data nodes before the upgrade. |
Data backup | System check | Before the upgrade, back up data to prevent data loss caused by upgrade faults. When submitting an upgrade task, you can determine whether to enable the system to check for the backup of all indexes. | Check whether data has been backed up. |
Resources | System check | After an upgrade task is started, the system automatically checks resources. Resources will be created during the upgrade. Ensure that resources are available. | Resources are available and sufficient. |
Custom plugins | System and manual check | Perform this check only when custom plugins are installed in the source cluster. If a cluster has a custom plugin, upload all plugin packages of the target version on the plugin management page before the upgrade. During the upgrade, install the custom plugin in the new nodes. Otherwise, the custom plugins will be lost after the cluster is successfully upgraded. After an upgrade task is started, the system automatically checks whether the custom plugin package has been uploaded, but you need to check whether the uploaded plugin package is correct. NOTE: If the uploaded plugin package is incorrect or incompatible, the plugin package cannot be automatically installed during the upgrade. As a result, the upgrade task fails. To restore a cluster, you can terminate the upgrade task and restore the node that fails to be upgraded by performing Replacing Specified Nodes for an OpenSearch Cluster. After the upgrade is complete, the status of the custom plugin is reset to Uploaded. | The plugin package of the cluster to be upgraded has been uploaded to the plugin list. |
Custom configurations | System check | During the upgrade, the system automatically synchronizes the content of the cluster configuration file opensearch.yml. | Clusters' custom configurations are not lost after the upgrade. |
Non-standard operations | Manual check | Check whether non-standard operations have been performed in the cluster. Non-standard operations refer to manual operations that are not recorded. These operations cannot be automatically passed on during the upgrade, for example, modification of the opensearch_dashboards.yml configuration file, system settings, and return routes. | Some non-standard operations are compatible. For example, the modification of a security plugin can be retained through metadata, and the modification of system configuration can be retained using images. Some non-standard operations, such as the modification of the opensearch_dashboards.yml file, cannot be retained, and you must back up the file in advance. |
Compatibility check | System and manual check | After a cross-version upgrade task is started, the system automatically checks whether the source and target versions have incompatible configurations. If a custom plugin is installed for a cluster, the version compatibility of the custom plugin needs to be manually checked. | Configurations before and after the cross-version upgrade are compatible. |
Check Cluster Loads | System and manual check | If the cluster is heavily loaded, there is a high probability that the upgrade will get stuck or fail. You are advised to check the cluster load before the upgrade and perform the upgrade only during off-peak hours. You can also choose to check the cluster load while configuring upgrade information. |
|
Creating an Upgrade Task
- Log in to the CSS management console.
- In the navigation pane on the left, choose Clusters to display the cluster list.
- Click the target cluster name. The cluster information page is displayed.
- In the navigation pane on the left, choose Cluster Snapshots, and create snapshots to back up all index data. For details, see Manually Creating a Snapshot.
When creating an upgrade task, you can choose to check whether the full index data has been backed up using snapshots. This helps to prevent data loss in case of an upgrade failure.
- In the navigation pane on the left, choose Version Upgrade.
- On the displayed page, set upgrade parameters.
Table 2 Upgrade parameters Parameter
Description
Upgrade Type
- Same-version upgrade: upgrade kernel patches to the latest images within the current cluster version.
- Cross-version upgrade: upgrade a cluster to the latest image of the target version.
Target Image
Image of the target version. After you select an image, the image name and target version details are displayed below.
The supported target versions are displayed in the drop-down list of Target Image. If no target image is available, possible causes are as follows:
- The current cluster is of the latest version.
- The current cluster is created before 2023 and has vector indexes.
- The new version images have not been added at the current region.
- The current cluster does not support the upgrade type you have selected.
- After setting the parameters, click Submit. Determine whether to enable Check full index snapshot and Perform cluster load detection and click OK.
If a cluster is overloaded, the upgrade task may suspend or fail. Enabling Cluster load detection can effectively avoid failures.
If any of the following situations occurs during the detection, wait or reduce the load. If you urgently need to upgrade the version and you have understood the upgrade failure risks, you can disable the Cluster load detection function. The cluster load check items are as follows:
- nodes.thread_pool.search.queue < 1000: Check whether the maximum number of search queues is less than 1000.
- nodes.thread_pool.write.queue < 200: Check whether the maximum number of write queues is less than 200.
- nodes.process.cpu.percent < 90: Check whether the maximum CPU usage is less than 90%.
- nodes.os.cpu.load_average/Number of CPU cores < 80%: Check whether the ratio of the maximum load to the number of CPU cores is less than 80%.
- View the upgrade task in the task list. If the task status is Running, you can expand the task list and click View Progress to view the upgrade progress.
If the task status is Failed, you can retry or terminate the task.
- Retry the task: Click Retry in the Operation column.
- Terminate the task: Click Terminate in the Operation column.Notice
- Same version upgrade: If the upgrade task status is Failed, you can terminate the upgrade task.
- Cross version upgrade: You can stop an upgrade task only when the task status is Failed and no node has been upgraded.
After an upgrade task is terminated, the Task Status of the cluster is rolled back to the status before the upgrade, and other tasks in the cluster are not affected.
- Upgrade Impact
- Constraints
- Pre-Upgrade Check
- Creating an Upgrade Task