nav-img
Advanced

CCE Node Problem Detector

Add-on Overview

CCE Node Problem Detector (node-problem-detector, NPD) is an add-on that monitors abnormal events of cluster nodes and connects to a third-party monitoring platform. It is a daemon running on each node. It collects node issues from different daemons and reports them to the API server. It can run as a DaemonSet or a daemon.

Add-on Parameters

Table 1 Parameters

Parameter

Mandatory

Type

Description

basic

No

object

Basic configuration parameters, which do not need to be specified

flavor

Yes

Table 2 object

Flavor parameters

custom

Yes

Table 3 object

Custom parameters

Table 2 Configuration of flavor

Parameter

Mandatory

Type

Description

description

No

String

Add-on description

name

Yes

String

Add-on specification name. The value is fixed at Single-instance.

replicas

Yes

String

Number of pods. The default value is 1.

resources

Yes

resources object

Container resource (CPU and memory) quotas

Table 3 Configuration of custom

Parameter

Mandatory

Type

Description

feature_gate

No

String

Feature gate, which is used to enable the beta features

multiAZBalance

No

Bool

Multi AZ deployment

multiAZEnabled

No

Bool

Whether to deploy the add-on pods in multiple AZs. The default value is false. If this parameter is set to true, cross-AZ deployment is forcibly performed. If this parameter is set to false, cross-AZ deployment is preferred.

npc

Yes

object Table 5

node-problem-controller configuration

tolerations

No

List<Object> Table 7

Tolerations of the add-on

node_match_expressions

No

List<Object> Table 7

Node affinity configuration of the add-on

Table 4 Data structure of the resources field

Parameter

Mandatory

Type

Description

limitsCpu

Yes

String

CPU size limit (unit: m)

limitsMem

Yes

String

Memory size limit (unit: Mi)

name

Yes

String

Add-on name. The value is fixed at custom-resources.

requestsCpu

Yes

String

Requested CPU size (unit: m)

requestsMem

Yes

String

Requested memory size (unit: Mi)

Table 5 Data structure of the npc field

Parameter

Mandatory

Type

Description

maxTaintedNode

Yes

String or Int

The maximum number of nodes that NPC can add taints to when a single fault occurs on multiple nodes for minimizing impact.

The value can be in int or percentage format.

Table 6 Taints and tolerations

Parameter

Mandatory

Type

Description

key

No

String

Taint key

effect

No

String

Taint policy

operator

No

String

Operator

tolerationSeconds

No

Int

Toleration time window

Table 7 nodeMatchExpresssion node affinity

Parameter

Mandatory

Type

Description

key

No

String

Taint key

values

No

List<String>

Node affinity name

operator

No

String

Operator

Example Request

{
"kind": "Addon",
"apiVersion": "v3",
"metadata": {
"annotations": {
"addon.install/type": "install"
}
},
"spec": {
"clusterID": "b78fb690-b82c-11ee-83cf-0255ac100b0f",
"version": "1.18.48",
"addonTemplateName": "npd",
"values": {
"basic": {
"image_version": "1.18.48",
"swr_addr": "***",
"swr_user": "***",
"rbac_enabled": true,
"cluster_version": "v1.23"
},
"flavor": {
"description": "custom resources",
"name": "custom-resources",
"replicas": 2,
"resources": [
{
"limitsCpu": "100m",
"limitsMem": "300Mi",
"name": "node-problem-controller",
"requestsCpu": "30m",
"requestsMem": "100Mi"
},
{
"limitsCpu": "100m",
"limitsMem": "300Mi",
"name": "node-problem-detector",
"requestsCpu": "30m",
"requestsMem": "100Mi"
}
],
"category": [
"CCE",
"Turbo"
]
},
"custom": {
"annotations": {},
"common": {},
"feature_gates": "",
"multiAZBalance": false,
"multiAZEnabled": false,
"node_match_expressions": [],
"npc": {
"maxTaintedNode": "10%"
},
"tolerations": [
{
"key": "node.kubernetes.io/not-ready",
"operator": "Exists",
"effect": "NoExecute",
"tolerationSeconds": 60
},
{
"key": "node.kubernetes.io/unreachable",
"operator": "Exists",
"effect": "NoExecute",
"tolerationSeconds": 60
}
]
}
}
}
}