Volcano Scheduler
Add-on Overview
Volcano is a batch scheduling platform based on Kubernetes. It provides a series of features required by machine learning, deep learning, bioinformatics, genomics, and other big data applications, as a powerful supplement to Kubernetes capabilities.
Add-on Parameters
Parameter | Mandatory | Type | Description |
---|---|---|---|
swr_addr | Yes | String | Add-on download address, which does not need to be specified |
swr_user | Yes | String | User who can download the add-on. This parameter does not need to be specified. |
platform | Yes | String | Add-on platform, which does not need to be specified |
escEndpoint | Yes | String | ECS address, which does not need to be specified |
xccsEndpoint |
| String | XCCS service address, which does not need to be specified |
Parameter | Mandatory | Type | Description |
---|---|---|---|
description | No | String | Add-on description |
name | Yes | String | Add-on specification name
|
replicas | Yes | String | Number of pods. The default value is 2. |
resources | Yes | resources object | Container resource (CPU and memory) quotas |
Parameter | Mandatory | Type | Description |
---|---|---|---|
multiAZEnabled | No | Bool | Whether to enable multi-AZ deployment for the add-on. The default value is false.
|
controller_kube_api_qps | No | int | API server QPS of the controller component. The default value is 200. |
scheduler_kube_api_qps | No | int | API server QPS of the scheduler component. The default value is 200. |
admission_kube_api_qps | No | int | API server QPS of the admission component. The default value is 200. |
update_pod_status_qps | No | int | Used to update the pod status QPS. The default value is 200. |
admissions | No | string | Webhooks supported by Volcano |
colocation_enable | No | string | Whether hybrid deployment is supported |
oversubscription_ratio | No | int | Dynamic oversubscription ratio. The default value is 60. |
oversubscription_method | No | string | Method of calculating oversubscribed resources. The options are nodeResource and podProfile. nodeResource is the default algorithm based on node resource usage, and podProfile is the algorithm based on pod profiling. By default, nodeResource is used. |
oversubscription_profile_period | No | int | Interval for pod profiling, in seconds |
workload_balancer_third_party_types | No | string | Character string consisting of group, version, and kind of a third-party workload |
workload_balancer_score_annotation_key | No | string | Used to specify the score annotation key of a pod |
node_match_expressions | No | Expression for matching the Volcano Scheduler pods to nodes | |
tolerations | No | The format is the same as that of Kubernetes tolerations. It is used to add taints to Volcano Scheduler pods. | |
oversubscription_ratio | No | int | Node resource overcommitment ratio in the Volcano scheduling environment |
descheduler_enable | No | Bool | Whether rescheduling is supported |
enable_workload_balancer | No | Bool | Whether load balancers are supported |
default_scheduler_conf | Yes | yaml | The format is the same as that of the YAML for Volcano. |
deschedulerPolicy | No | yaml | The format is the same as that of the YAML for Volcano descheduling configuration. |
Parameter | Mandatory | Type | Description |
---|---|---|---|
limitsCpu | Yes | String | CPU size limit (unit: m) The default values are differentiated by component. |
limitsMem | Yes | String | Memory size limit (unit: Mi) The default values are differentiated by component. |
name | Yes | String | Add-on name |
requestsCpu | Yes | String | Requested CPU size (unit: m) The default values are differentiated by component. |
requestsMem | Yes | String | Requested memory size (unit: Mi) The values are differentiated by component. |
Parameter | Mandatory | Type | Description |
---|---|---|---|
key | No | String | Taint key |
effect | No | String | Taint effect |
operator | No | String | Operator |
tolerationSeconds | No | Int | Toleration time window |
Parameter | Mandatory | Type | Description |
---|---|---|---|
key | No | String | Taint key |
values | No | List<String> | Node affinity name |
operator | No | String | Operator |
Example Request
{"kind": "Addon","apiVersion": "v3","metadata": {"annotations": {"addon.install/type": "install"}},"spec": {"clusterID": "ad24dc34-******-0255ac100030","version": "1.16.8","addonTemplateName": "volcano","values": {"basic": {"ecsEndpoint": "x.x.x.x","platform": "linux-amd64","swr_addr": "swr.*******.com","swr_user": "hwofficial"},"flavor": {"description": "For 50 nodes, 5000 pods in cluster","name": "Node50","resources": [{"name": "volcano-scheduler","limitsCpu": "2000m","requestsCpu": "500m","replicas": 2,"limitsMem": "2000Mi","requestsMem": "500Mi"},{"name": "volcano-controller","limitsCpu": "2000m","requestsCpu": "500m","replicas": 2,"limitsMem": "2000Mi","requestsMem": "500Mi"},{"name": "volcano-admission","limitsCpu": "500m","requestsCpu": "200m","replicas": 2,"limitsMem": "500Mi","requestsMem": "500Mi"},{"limitsCpu": "200m","limitsMem": "200Mi","name": "volcano-agent","requestsCpu": "100m","requestsMem": "150Mi"},{"limitsCpu": "100m","limitsMem": "100Mi","name": "resource-exporter","requestsCpu": "50m","requestsMem": "50Mi"},{"limitsCpu": "1000m","limitsMem": "512Mi","name": "volcano-descheduler","replicas": 2,"requestsCpu": "500m","requestsMem": "256Mi"},{"limitsCpu": "500m","limitsMem": "1000Mi","name": "volcano-recommender","replicas": 2,"requestsCpu": "300m","requestsMem": "500Mi"},{"limitsCpu": "300m","limitsMem": "300Mi","name": "volcano-recommender-prometheus-adapter","replicas": 2,"requestsCpu": "200m","requestsMem": "200Mi"}],"size": "small","category": ["CCE","Turbo"]},"custom": {"admission_kube_api_qps": 200,"admissions": "/jobs/mutate,/jobs/validate,/podgroups/mutate,/pods/validate,/pods/mutate,/queues/mutate,/queues/validate,/eas/pods/mutate,/eas/pods/validate,/npu/jobs/validate,/resource/validate,/resource/mutate,/workloadbalancer/balancer/validate,/workloadbalancer/balancerpolicytemplate/validate","colocation_enable": "false","controller_kube_api_qps": 200,"default_scheduler_conf": {"actions": "allocate, backfill, preempt","metrics": {"interval": "30s","type": ""},"tiers": [{"plugins": [{"name": "priority"},{"enableJobStarving": false,"enablePreemptable": false,"name": "gang"},{"name": "conformance"}]},{"plugins": [{"enablePreemptable": false,"name": "drf"},{"name": "predicates"},{"name": "nodeorder"}]},{"plugins": [{"name": "cce-gpu-topology-predicate"},{"name": "cce-gpu-topology-priority"},{"name": "xgpu"}]},{"plugins": [{"name": "nodelocalvolume"},{"name": "nodeemptydirvolume"},{"name": "nodeCSIscheduling"},{"name": "networkresource"}]}]},"deschedulerPolicy": {"profiles": [{"name": "ProfileName","pluginConfig": [{"args": {"nodeFit": true},"name": "DefaultEvictor"},{"args": {"evictableNamespaces": {"exclude": ["kube-system"]},"thresholds": {"cpu": 20,"memory": 20}},"name": "HighNodeUtilization"},{"args": {"evictableNamespaces": {"exclude": ["kube-system"]},"metrics": {"type": "prometheus_adaptor"},"nodeFit": true,"targetThresholds": {"cpu": 80,"memory": 85},"thresholds": {"cpu": 30,"memory": 30}},"name": "LoadAware"}],"plugins": {"balance": {"enabled": null}}}]},"descheduler_enable": "false","deschedulingInterval": "10m","enable_workload_balancer": false,"multiAZEnabled": false,"node_match_expressions": [],"oversubscription_method": "nodeResource","oversubscription_profile_period": 300,"oversubscription_ratio": 60,"scheduler_kube_api_qps": 200,"tolerations": [{"effect": "NoExecute","key": "node.kubernetes.io/not-ready","operator": "Exists","tolerationSeconds": 60},{"effect": "NoExecute","key": "node.kubernetes.io/unreachable","operator": "Exists","tolerationSeconds": 60},{"effect": "NoSchedule","key": "node.cilium.io/agent-not-ready","operator": "Exists"}],"update_pod_status_qps": 50,"workload_balancer_score_annotation_key": "","workload_balancer_third_party_types": "","multiAZBalance": false}}}}
- Add-on Overview
- Add-on Parameters
- Example Request