Skip to content

Commit ea90848

Browse files
authored
Feature: Add support for DRS in a Cluster (#7723)
This pull request (PR) implements a Distributed Resource Scheduler (DRS) for a CloudStack cluster. The primary objective of this feature is to enable automatic resource optimization and workload balancing within the cluster by live migrating the VMs as per configuration. Administrators can also execute DRS manually for a cluster, using the UI or the API. Adds support for two algorithms - condensed & balanced. Algorithms are pluggable allowing ACS Administrators to have customized control over scheduling. Implementation There are three top level components: Scheduler A timer task which: Generate DRS plan for clusters Process DRS plan Remove old DRS plan records DRS Execution We go through each VM in the cluster and use the specified algorithm to check if DRS is required and to calculate cost, benefit & improvement of migrating that VM to another host in the cluster. On the basis of cost, benefit & improvement, the best migration is selected for the current iteration and the VM is migrated. The maximum number of iterations (live migrations) possible on the cluster is defined by drs.iterations which is defined as a percentage (as a value between 0 and 1) of total number of workloads. Algorithm Every algorithms implements two methods: needsDrs - to check if drs is required for cluster getMetrics - to calculate cost, benefit & improvement of a migrating a VM to another host. Algorithms Condensed - Packs all the VMs on minimum number of hosts in the cluster. Balanced - Distributes the VMs evenly across hosts in the cluster. Algorithms use drs.level to decide the amount of imbalance to allow in the cluster. APIs Added listClusterDrsPlan id - ID of the DRS plan to list clusterid - to list plans for a cluster id generateClusterDrsPlan id - cluster id iterations - The maximum number of iterations in a DRS job defined as a percentage (as a value between 0 and 1) of total number of workloads. Defaults to value of cluster's drs.iterations setting. executeClusterDrsPlan id - ID of the cluster for which DRS plan is to be executed. migrateto - This parameter specifies the mapping between a vm and a host to migrate that VM. Format of this parameter: migrateto[vm-index].vm=<uuid>&migrateto[vm-index].host=<uuid>. Config Keys Added ClusterDrsPlanExpireInterval Key drs.plan.expire.interval Scope Global Default Value 30 days Description The interval in days after which old DRS records will be cleaned up. ClusterDrsEnabled Key drs.automatic.enable Scope Cluster Default Value false Description Enable/disable automatic DRS on a cluster. ClusterDrsInterval Key drs.automatic.interval Scope Cluster Default Value 60 minutes Description The interval in minutes after which a periodic background thread will schedule DRS for a cluster. ClusterDrsIterations Key drs.max.migrations Scope Cluster Default Value 50 Description Maximum number of live migrations in a DRS execution. ClusterDrsAlgorithm Key drs.algorithm Scope Cluster Default Value condensed Description DRS algorithm to execute on the cluster. This PR implements two algorithms - balanced & condensed. ClusterDrsLevel Key drs.imbalance Scope Cluster Default Value 0.5 Description Percentage (as a value between 0.0 and 1.0) of imbalance allowed in the cluster. 1.0 means no imbalance is allowed and 0.0 means imbalance is allowed. ClusterDrsMetric Key drs.imbalance.metric Scope Cluster Default Value memory Description The cluster imbalance metric to use when checking the drs.imbalance.threshold. Possible values are memory and cpu.
1 parent 68e1d46 commit ea90848

File tree

60 files changed

+4227
-30
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+4227
-30
lines changed

.github/workflows/ci.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,8 @@ jobs:
5959
smoke/test_domain_network_offerings
6060
smoke/test_domain_service_offerings
6161
smoke/test_domain_vpc_offerings",
62-
"smoke/test_dynamicroles
62+
"smoke/test_cluster_drs
63+
smoke/test_dynamicroles
6364
smoke/test_enable_account_settings_for_domain
6465
smoke/test_enable_role_based_users_in_projects
6566
smoke/test_events_resource

api/pom.xml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,11 @@
3636
<groupId>com.google.code.gson</groupId>
3737
<artifactId>gson</artifactId>
3838
</dependency>
39+
<dependency>
40+
<groupId>org.apache.commons</groupId>
41+
<artifactId>commons-math3</artifactId>
42+
<version>${cs.commons-math3.version}</version>
43+
</dependency>
3944
<dependency>
4045
<groupId>javax.servlet</groupId>
4146
<artifactId>javax.servlet-api</artifactId>

api/src/main/java/com/cloud/event/EventTypes.java

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -678,6 +678,11 @@ public class EventTypes {
678678
//Usage related events
679679
public static final String EVENT_USAGE_REMOVE_USAGE_RECORDS = "USAGE.REMOVE.USAGE.RECORDS";
680680

681+
// DRS Events
682+
public static final String EVENT_CLUSTER_DRS = "CLUSTER.DRS";
683+
public static final String EVENT_CLUSTER_DRS_GENERATE = "CLUSTER.DRS.GENERATE";
684+
685+
681686
// Netscaler Service Package events
682687
public static final String EVENT_NETSCALER_SERVICEPACKAGE_ADD = "NETSCALER.SERVICEPACKAGE.ADD";
683688
public static final String EVENT_NETSCALER_SERVICEPACKAGE_DELETE = "NETSCALER.SERVICEPACKAGE.DELETE";

api/src/main/java/com/cloud/server/ManagementService.java

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -441,6 +441,8 @@ public interface ManagementService {
441441
*/
442442
Ternary<Pair<List<? extends Host>, Integer>, List<? extends Host>, Map<Host, Boolean>> listHostsForMigrationOfVM(Long vmId, Long startIndex, Long pageSize, String keyword);
443443

444+
Ternary<Pair<List<? extends Host>, Integer>, List<? extends Host>, Map<Host, Boolean>> listHostsForMigrationOfVM(VirtualMachine vm, Long startIndex, Long pageSize, String keyword, List<VirtualMachine> vmList);
445+
444446
/**
445447
* List storage pools for live migrating of a volume. The API returns list of all pools in the cluster to which the
446448
* volume can be migrated. Current pool is not included in the list. In case of vSphere datastore cluster storage pools,

api/src/main/java/com/cloud/vm/VirtualMachine.java

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -315,6 +315,9 @@ public boolean isUsedBySystem() {
315315
@Override
316316
Long getHostId();
317317

318+
319+
void setHostId(Long hostId);
320+
318321
/**
319322
* @return should HA be enabled for this machine?
320323
*/

api/src/main/java/com/cloud/vm/VmDetailConstants.java

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,8 @@ public interface VmDetailConstants {
7373

7474
String CONFIG_DRIVE_LOCATION = "configDriveLocation";
7575

76+
String SKIP_DRS = "skipFromDRS";
77+
7678
// VM import with nic, disk and custom params for custom compute offering
7779
String NIC = "nic";
7880
String NETWORK = "network";

api/src/main/java/org/apache/cloudstack/affinity/AffinityGroupProcessor.java

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,11 @@
2121
import com.cloud.deploy.DeploymentPlanner.ExcludeList;
2222
import com.cloud.exception.AffinityConflictException;
2323
import com.cloud.utils.component.Adapter;
24+
import com.cloud.vm.VirtualMachine;
2425
import com.cloud.vm.VirtualMachineProfile;
2526

27+
import java.util.List;
28+
2629
public interface AffinityGroupProcessor extends Adapter {
2730

2831
/**
@@ -35,7 +38,12 @@ public interface AffinityGroupProcessor extends Adapter {
3538
* deployment plan that tells you where it's being deployed to.
3639
* @param avoid
3740
* avoid these data centers, pods, clusters, or hosts.
41+
* @param vmList
42+
* list of virtual machines objects according to which the affinity group should be processed.
43+
* This can be used to process a theoretical state in some cases like generating DRS plans
3844
*/
45+
void process(VirtualMachineProfile vm, DeploymentPlan plan, ExcludeList avoid, List<VirtualMachine> vmList) throws AffinityConflictException;
46+
3947
void process(VirtualMachineProfile vm, DeploymentPlan plan, ExcludeList avoid) throws AffinityConflictException;
4048

4149
/**

api/src/main/java/org/apache/cloudstack/affinity/AffinityProcessorBase.java

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,14 +21,23 @@
2121
import com.cloud.deploy.DeploymentPlanner.ExcludeList;
2222
import com.cloud.exception.AffinityConflictException;
2323
import com.cloud.utils.component.AdapterBase;
24+
import com.cloud.vm.VirtualMachine;
2425
import com.cloud.vm.VirtualMachineProfile;
2526

27+
import java.util.Collections;
28+
import java.util.List;
29+
2630
public class AffinityProcessorBase extends AdapterBase implements AffinityGroupProcessor {
2731

2832
protected String _type;
2933

3034
@Override
3135
public void process(VirtualMachineProfile vm, DeploymentPlan plan, ExcludeList avoid) throws AffinityConflictException {
36+
process(vm, plan, avoid, Collections.emptyList());
37+
}
38+
39+
@Override
40+
public void process(VirtualMachineProfile vm, DeploymentPlan plan, ExcludeList avoid, List<VirtualMachine> vmList) throws AffinityConflictException {
3241

3342
}
3443

api/src/main/java/org/apache/cloudstack/api/ApiConstants.java

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -292,6 +292,7 @@ public class ApiConstants {
292292
public static final String MIN_CPU_NUMBER = "mincpunumber";
293293
public static final String MIN_MEMORY = "minmemory";
294294
public static final String MIGRATION_TYPE = "migrationtype";
295+
public static final String MIGRATIONS = "migrations";
295296
public static final String MEMORY = "memory";
296297
public static final String MODE = "mode";
297298
public static final String NAME = "name";
@@ -661,6 +662,7 @@ public class ApiConstants {
661662
public static final String SPECIFY_IP_RANGES = "specifyipranges";
662663
public static final String IS_SOURCE_NAT = "issourcenat";
663664
public static final String IS_STATIC_NAT = "isstaticnat";
665+
public static final String ITERATIONS = "iterations";
664666
public static final String SORT_BY = "sortby";
665667
public static final String CHANGE_CIDR = "changecidr";
666668
public static final String PURPOSE = "purpose";
Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one
3+
* or more contributor license agreements. See the NOTICE file
4+
* distributed with this work for additional information
5+
* regarding copyright ownership. The ASF licenses this file
6+
* to you under the Apache License, Version 2.0 (the
7+
* "License"); you may not use this file except in compliance
8+
* with the License. You may obtain a copy of the License at
9+
*
10+
* http://www.apache.org/licenses/LICENSE-2.0
11+
*
12+
* Unless required by applicable law or agreed to in writing,
13+
* software distributed under the License is distributed on an
14+
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
* KIND, either express or implied. See the License for the
16+
* specific language governing permissions and limitations
17+
* under the License.
18+
*/
19+
20+
package org.apache.cloudstack.api.command.admin.cluster;
21+
22+
import com.cloud.event.EventTypes;
23+
import com.cloud.exception.InvalidParameterValueException;
24+
import com.cloud.host.Host;
25+
import com.cloud.user.Account;
26+
import com.cloud.utils.UuidUtils;
27+
import com.cloud.vm.VirtualMachine;
28+
import org.apache.cloudstack.api.APICommand;
29+
import org.apache.cloudstack.api.ApiCommandResourceType;
30+
import org.apache.cloudstack.api.ApiConstants;
31+
import org.apache.cloudstack.api.BaseAsyncCmd;
32+
import org.apache.cloudstack.api.Parameter;
33+
import org.apache.cloudstack.api.response.ClusterDrsPlanResponse;
34+
import org.apache.cloudstack.api.response.ClusterResponse;
35+
import org.apache.cloudstack.cluster.ClusterDrsService;
36+
import org.apache.commons.collections.MapUtils;
37+
38+
import javax.inject.Inject;
39+
import java.util.Collection;
40+
import java.util.HashMap;
41+
import java.util.Iterator;
42+
import java.util.Map;
43+
44+
@APICommand(name = "executeClusterDrsPlan",
45+
description = "Execute DRS for a cluster. If there is another plan in progress for the same cluster, " +
46+
"this command will fail.",
47+
responseObject = ClusterDrsPlanResponse.class, since = "4.19.0", requestHasSensitiveInfo = false,
48+
responseHasSensitiveInfo = false)
49+
public class ExecuteClusterDrsPlanCmd extends BaseAsyncCmd {
50+
51+
@Parameter(name = ApiConstants.ID, type = CommandType.UUID, entityType = ClusterResponse.class, required = true,
52+
description = "ID of cluster")
53+
private Long id;
54+
55+
@Parameter(
56+
name = ApiConstants.MIGRATE_TO,
57+
type = CommandType.MAP,
58+
description = "Virtual Machine to destination host mapping. This parameter specifies the mapping between " +
59+
"a vm and a host to migrate that VM. clusterid is required if this parameter is set." +
60+
"Format of this parameter: migrateto[vm-index].vm=<uuid>&migrateto[vm-index].host=<uuid> " +
61+
"Where, [vm-index] indicates the index to identify the vm that you want to migrate, " +
62+
"vm=<uuid> indicates the UUID of the vm that you want to migrate, and " +
63+
"host=<uuid> indicates the UUID of the host where you want to migrate the vm. " +
64+
"Example: migrateto[0].vm=<71f43cd6-69b0-4d3b-9fbc-67f50963d60b>" +
65+
"&migrateto[0].host=<a382f181-3d2b-4413-b92d-b8931befa7e1>" +
66+
"&migrateto[1].vm=<88de0173-55c0-4c1c-a269-83d0279eeedf>" +
67+
"&migrateto[1].host=<95d6e97c-6766-4d67-9a30-c449c15011d1>" +
68+
"&migrateto[2].vm=<1b331390-59f2-4796-9993-bf11c6e76225>" +
69+
"&migrateto[2].host=<41fdb564-9d3b-447d-88ed-7628f7640cbc>")
70+
private Map<String, String> migrateVmTo;
71+
72+
@Inject
73+
private ClusterDrsService clusterDrsService;
74+
75+
public Map<VirtualMachine, Host> getVmToHostMap() {
76+
Map<VirtualMachine, Host> vmToHostMap = new HashMap<>();
77+
if (MapUtils.isNotEmpty(migrateVmTo)) {
78+
Collection<?> allValues = migrateVmTo.values();
79+
Iterator<?> iter = allValues.iterator();
80+
while (iter.hasNext()) {
81+
HashMap<String, String> vmToHost = (HashMap<String, String>) iter.next();
82+
83+
String vmId = vmToHost.get("vm");
84+
String hostId = vmToHost.get("host");
85+
86+
VirtualMachine vm;
87+
Host host;
88+
if (UuidUtils.isUuid(vmId)) {
89+
vm = _entityMgr.findByUuid(VirtualMachine.class, vmId);
90+
} else {
91+
vm = _entityMgr.findById(VirtualMachine.class, Long.parseLong(vmId));
92+
}
93+
94+
if (UuidUtils.isUuid(hostId)) {
95+
host = _entityMgr.findByUuid(Host.class, hostId);
96+
} else {
97+
host = _entityMgr.findById(Host.class, Long.parseLong(hostId));
98+
}
99+
100+
if (vm == null || host == null) {
101+
throw new InvalidParameterValueException(
102+
String.format("Unable to find the vm/host for vmId=%s, destHostId=%s", vmId, hostId));
103+
}
104+
105+
vmToHostMap.put(vm, host);
106+
}
107+
}
108+
return vmToHostMap;
109+
}
110+
111+
@Override
112+
public void execute() {
113+
ClusterDrsPlanResponse response = clusterDrsService.executeDrsPlan(this);
114+
response.setResponseName(getCommandName());
115+
this.setResponseObject(response);
116+
}
117+
118+
@Override
119+
public long getEntityOwnerId() {
120+
return Account.ACCOUNT_ID_SYSTEM;
121+
}
122+
123+
@Override
124+
public Long getApiResourceId() {
125+
return getId();
126+
}
127+
128+
public Long getId() {
129+
return id;
130+
}
131+
132+
@Override
133+
public ApiCommandResourceType getApiResourceType() {
134+
return ApiCommandResourceType.Cluster;
135+
}
136+
137+
138+
@Override
139+
public String getEventType() {
140+
return EventTypes.EVENT_CLUSTER_DRS;
141+
}
142+
143+
@Override
144+
public String getEventDescription() {
145+
return String.format("Executing DRS plan for cluster: %d", getId());
146+
}
147+
}

0 commit comments

Comments
 (0)