You can isolate query requests that consume a large amount of memory or take a long period of time. This way, you ensure service availability for other requests. If the heap memory usage of a node is too high, an interrupt control program will be triggered to terminate a large query based on the policies you configured. You can also configure a global query timeout duration. Long queries will be intercepted by an Elasticsearch-native cancel API.
Large query isolation can effectively solve the following problems and improve the search performance of clusters:
Only Elasticsearch 7.6.2 and Elasticsearch 7.10.2 clusters support large query isolation.
Log in to Kibana and go to the command execution page. Elasticsearch clusters support multiple access methods. This topic uses Kibana as an example to describe the operation procedures.
The left part of the console is the command input box, and the triangle icon in its upper-right corner is the execution button. The right part shows the execution result.
Large query isolation is enabled by default, while global query timeout is disabled by default. If you enable them, the configuration will take effect immediately.
Run the following commands to enable large query isolation and global query timeout:
PUT _cluster/settings{"persistent": {"search.isolator.enabled": true,"search.isolator.time.enabled": true}}
The two features each has an independent switch. Table 1 describes their parameters.
Switch | Parameter | Description |
|---|---|---|
search.isolator.enabled | search.isolator.memory.task.limit search.isolator.time.management | Thresholds for identifying a single shard query task as a large query. |
search.isolator.memory.pool.limit search.isolator.memory.heap.limit search.isolator.count.limit | Resource usage thresholds for isolation. If the resource usage of a query task exceeds one of these thresholds, the task will be paused. NOTE: search.isolator.memory.heap.limit defines the limit on the heap memory consumed by write, query, and other operations of a node. If this limit is exceeded, large query tasks in the isolation pool will be paused. | |
search.isolator.strategy search.isolator.strategy.ratio | Policy for selecting query tasks to pause in the isolation pool. | |
search.isolator.time.enabled | search.isolator.time.limit | Global timeout for query tasks. |
PUT _cluster/settings{"persistent": {"search.isolator.memory.task.limit": "50MB","search.isolator.time.management": "10s"}}
Parameter | Type | Description |
|---|---|---|
search.isolator.memory.task.limit | String | Threshold of the memory requested by a query task to perform aggregation or other operations. If the requested memory exceeds the threshold, the task will be isolated and observed.
|
search.isolator.time.management | String | Threshold of the duration of a query (started when cluster resources are used for query). If the duration of a query exceeds the threshold, it will be isolated and observed.
|
PUT _cluster/settings{"persistent": {"search.isolator.memory.pool.limit": "50%","search.isolator.memory.heap.limit": "90%","search.isolator.count.limit": 1000}}
Parameter | Type | Description |
|---|---|---|
search.isolator.memory.pool.limit | String | Threshold of the heap memory percentage of the current node. If the total memory requested by large query tasks in the isolation pool exceeds the threshold, the interrupt control program will be triggered to cancel one of the tasks.
|
search.isolator.memory.heap.limit | String | Heap memory threshold of the current node. If the heap memory of the node exceeds the threshold, the interrupt control program will be triggered to cancel a large query task in the isolation pool.
|
search.isolator.count.limit | Integer | Threshold of the number of large query tasks in the current node isolation pool. If the number of observed query tasks exceeds the threshold, the interrupt control program will be triggered to stop accepting new large queries. New large query requests will be directly canceled.
|
In addition to search.isolator.memory.pool.limit and search.isolator.count.limit parameters, you can configure search.isolator.memory.task.limit and search.isolator.time.management to control the number of query tasks that enter the isolation pool.
PUT _cluster/settings{"persistent": {"search.isolator.strategy": "fair","search.isolator.strategy.ratio": "0.5%"}}
Parameter | Type | Description |
|---|---|---|
search.isolator.strategy | String | Policy for selecting large queries when the interrupt control program is triggered. The selected queries will be interrupted. NOTE: The large query isolation pool is checked every second until the heap memory is within the safe range. Values: fair, mem-first, or time-first
Default value: fair |
search.isolator.strategy.ratio | String | Threshold of the fair policy. This parameter takes effect only if search.isolator.strategy is set to fair. If the difference between the memory usage of large query tasks does not exceed the threshold, the query that takes the longest time should be interrupted. If the difference between the memory usage of large query tasks exceeds the threshold, the query that uses the most memory is interrupted.
|
Run the following command to set the global timeout of query tasks:
PUT _cluster/settings{"persistent": {"search.isolator.time.limit": "120s"}}
Parameter | Type | Description |
|---|---|---|
search.isolator.time.limit | String | Global query timeout duration. Any query task that exceeds this duration will be canceled.
|
Run the following command to set the maximum number of log records kept for canceled query requests:
PUT _cluster/settings{"persistent": {"search.isolator.log.count": "100"}}
Parameter | Data Type | Description |
|---|---|---|
search.isolator.log.count | Integer | Maximum number of records of canceled query requests that can be recorded in the memory.
NOTE: You can use the following APIs to query canceled requests:
In the commands above, nodeId indicates the node ID. |