nav-img
Advanced

Volcano Scheduler

Add-on Overview

Volcano is a batch scheduling platform based on Kubernetes. It provides a series of features required by machine learning, deep learning, bioinformatics, genomics, and other big data applications, as a powerful supplement to Kubernetes capabilities.

Add-on Parameters

Table 1 Parameters

Parameter

Mandatory

Type

Description

basic

No

Table 2 object

Basic configuration parameters, which do not need to be specified

flavor

Yes

Table 3 object

Flavor parameters

custom

Yes

Table 4 object

Custom parameters

Table 2 Configuration of basic

Parameter

Mandatory

Type

Description

swr_addr

Yes

String

Add-on download address, which does not need to be specified

swr_user

Yes

String

User who can download the add-on. This parameter does not need to be specified.

platform

Yes

String

Add-on platform, which does not need to be specified

escEndpoint

Yes

String

ECS address, which does not need to be specified

xccsEndpoint

  • Add-on versions ≥ 1.16.11: Yes
  • Add-on versions < 1.16.11: No

String

XCCS service address, which does not need to be specified

Table 3 Configuration of flavor

Parameter

Mandatory

Type

Description

description

No

String

Add-on description

name

Yes

String

Add-on specification name

  • Add-on versions ≥ 1.14.7: Node50, Node200, Node1000, and custom-resources
  • Add-on versions earlier than 1.14.7: HA, Single, and custom-resources

replicas

Yes

String

Number of pods. The default value is 2.

resources

Yes

resources object

Container resource (CPU and memory) quotas

Table 4 Configuration of custom

Parameter

Mandatory

Type

Description

multiAZEnabled

No

Bool

Whether to enable multi-AZ deployment for the add-on. The default value is false.

  • true: Volcano pods are deployed in different AZs based on the hard anti-affinity policy.
  • false: Volcano pods are deployed in multiple AZs based on the soft anti-affinity policy.

controller_kube_api_qps

No

int

API server QPS of the controller component. The default value is 200.

scheduler_kube_api_qps

No

int

API server QPS of the scheduler component. The default value is 200.

admission_kube_api_qps

No

int

API server QPS of the admission component. The default value is 200.

update_pod_status_qps

No

int

Used to update the pod status QPS. The default value is 200.

admissions

No

string

Webhooks supported by Volcano

colocation_enable

No

string

Whether hybrid deployment is supported

oversubscription_ratio

No

int

Dynamic oversubscription ratio. The default value is 60.

oversubscription_method

No

string

Method of calculating oversubscribed resources. The options are nodeResource and podProfile. nodeResource is the default algorithm based on node resource usage, and podProfile is the algorithm based on pod profiling. By default, nodeResource is used.

oversubscription_profile_period

No

int

Interval for pod profiling, in seconds

workload_balancer_third_party_types

No

string

Character string consisting of group, version, and kind of a third-party workload

workload_balancer_score_annotation_key

No

string

Used to specify the score annotation key of a pod

node_match_expressions

No

Expression for matching the Volcano Scheduler pods to nodes

tolerations

No

The format is the same as that of Kubernetes tolerations. It is used to add taints to Volcano Scheduler pods.

oversubscription_ratio

No

int

Node resource overcommitment ratio in the Volcano scheduling environment

descheduler_enable

No

Bool

Whether rescheduling is supported

enable_workload_balancer

No

Bool

Whether load balancers are supported

default_scheduler_conf

Yes

yaml

The format is the same as that of the YAML for Volcano.

deschedulerPolicy

No

yaml

The format is the same as that of the YAML for Volcano descheduling configuration.

Table 5 Data structure of the resources field

Parameter

Mandatory

Type

Description

limitsCpu

Yes

String

CPU size limit (unit: m)

The default values are differentiated by component.

limitsMem

Yes

String

Memory size limit (unit: Mi)

The default values are differentiated by component.

name

Yes

String

Add-on name

requestsCpu

Yes

String

Requested CPU size (unit: m)

The default values are differentiated by component.

requestsMem

Yes

String

Requested memory size (unit: Mi)

The values are differentiated by component.

Table 6 Taints and tolerations

Parameter

Mandatory

Type

Description

key

No

String

Taint key

effect

No

String

Taint effect

operator

No

String

Operator

tolerationSeconds

No

Int

Toleration time window

Table 7 nodeMatchExpresssion node affinity

Parameter

Mandatory

Type

Description

key

No

String

Taint key

values

No

List<String>

Node affinity name

operator

No

String

Operator

Example Request

{
"kind": "Addon",
"apiVersion": "v3",
"metadata": {
"annotations": {
"addon.install/type": "install"
}
},
"spec": {
"clusterID": "ad24dc34-******-0255ac100030",
"version": "1.16.8",
"addonTemplateName": "volcano",
"values": {
"basic": {
"ecsEndpoint": "x.x.x.x",
"platform": "linux-amd64",
"swr_addr": "swr.*******.com",
"swr_user": "hwofficial"
},
"flavor": {
"description": "For 50 nodes, 5000 pods in cluster",
"name": "Node50",
"resources": [
{
"name": "volcano-scheduler",
"limitsCpu": "2000m",
"requestsCpu": "500m",
"replicas": 2,
"limitsMem": "2000Mi",
"requestsMem": "500Mi"
},
{
"name": "volcano-controller",
"limitsCpu": "2000m",
"requestsCpu": "500m",
"replicas": 2,
"limitsMem": "2000Mi",
"requestsMem": "500Mi"
},
{
"name": "volcano-admission",
"limitsCpu": "500m",
"requestsCpu": "200m",
"replicas": 2,
"limitsMem": "500Mi",
"requestsMem": "500Mi"
},
{
"limitsCpu": "200m",
"limitsMem": "200Mi",
"name": "volcano-agent",
"requestsCpu": "100m",
"requestsMem": "150Mi"
},
{
"limitsCpu": "100m",
"limitsMem": "100Mi",
"name": "resource-exporter",
"requestsCpu": "50m",
"requestsMem": "50Mi"
},
{
"limitsCpu": "1000m",
"limitsMem": "512Mi",
"name": "volcano-descheduler",
"replicas": 2,
"requestsCpu": "500m",
"requestsMem": "256Mi"
},
{
"limitsCpu": "500m",
"limitsMem": "1000Mi",
"name": "volcano-recommender",
"replicas": 2,
"requestsCpu": "300m",
"requestsMem": "500Mi"
},
{
"limitsCpu": "300m",
"limitsMem": "300Mi",
"name": "volcano-recommender-prometheus-adapter",
"replicas": 2,
"requestsCpu": "200m",
"requestsMem": "200Mi"
}
],
"size": "small",
"category": [
"CCE",
"Turbo"
]
},
"custom": {
"admission_kube_api_qps": 200,
"admissions": "/jobs/mutate,/jobs/validate,/podgroups/mutate,/pods/validate,/pods/mutate,/queues/mutate,/queues/validate,/eas/pods/mutate,/eas/pods/validate,/npu/jobs/validate,/resource/validate,/resource/mutate,/workloadbalancer/balancer/validate,/workloadbalancer/balancerpolicytemplate/validate",
"colocation_enable": "false",
"controller_kube_api_qps": 200,
"default_scheduler_conf": {
"actions": "allocate, backfill, preempt",
"metrics": {
"interval": "30s",
"type": ""
},
"tiers": [
{
"plugins": [
{
"name": "priority"
},
{
"enableJobStarving": false,
"enablePreemptable": false,
"name": "gang"
},
{
"name": "conformance"
}
]
},
{
"plugins": [
{
"enablePreemptable": false,
"name": "drf"
},
{
"name": "predicates"
},
{
"name": "nodeorder"
}
]
},
{
"plugins": [
{
"name": "cce-gpu-topology-predicate"
},
{
"name": "cce-gpu-topology-priority"
},
{
"name": "xgpu"
}
]
},
{
"plugins": [
{
"name": "nodelocalvolume"
},
{
"name": "nodeemptydirvolume"
},
{
"name": "nodeCSIscheduling"
},
{
"name": "networkresource"
}
]
}
]
},
"deschedulerPolicy": {
"profiles": [
{
"name": "ProfileName",
"pluginConfig": [
{
"args": {
"nodeFit": true
},
"name": "DefaultEvictor"
},
{
"args": {
"evictableNamespaces": {
"exclude": [
"kube-system"
]
},
"thresholds": {
"cpu": 20,
"memory": 20
}
},
"name": "HighNodeUtilization"
},
{
"args": {
"evictableNamespaces": {
"exclude": [
"kube-system"
]
},
"metrics": {
"type": "prometheus_adaptor"
},
"nodeFit": true,
"targetThresholds": {
"cpu": 80,
"memory": 85
},
"thresholds": {
"cpu": 30,
"memory": 30
}
},
"name": "LoadAware"
}
],
"plugins": {
"balance": {
"enabled": null
}
}
}
]
},
"descheduler_enable": "false",
"deschedulingInterval": "10m",
"enable_workload_balancer": false,
"multiAZEnabled": false,
"node_match_expressions": [],
"oversubscription_method": "nodeResource",
"oversubscription_profile_period": 300,
"oversubscription_ratio": 60,
"scheduler_kube_api_qps": 200,
"tolerations": [
{
"effect": "NoExecute",
"key": "node.kubernetes.io/not-ready",
"operator": "Exists",
"tolerationSeconds": 60
},
{
"effect": "NoExecute",
"key": "node.kubernetes.io/unreachable",
"operator": "Exists",
"tolerationSeconds": 60
},
{
"effect": "NoSchedule",
"key": "node.cilium.io/agent-not-ready",
"operator": "Exists"
}
],
"update_pod_status_qps": 50,
"workload_balancer_score_annotation_key": "",
"workload_balancer_third_party_types": "",
"multiAZBalance": false
}
}
}
}