[PATCH 0/3] qemu: add support for the SCHED_DEADLINE scheduling policy

Hello everyone, This patchset aims at adding support for the SCHED_DEADLINE Linux scheduling policy for vcpus, io-threads and emulator processes. In fact, libvirt currently supports SCHED_OTHER, SCHED_BATCH, SCHED_IDLE, SCHE_FIFO and SCHED_RR, but not SCHED_DEADLINE. SCHED_DEADLINE is a policy implementing an algorithm originating from the real-time scheduling community, but it can be useful outside of the real-time computing field as well. It allows one to set a specific amount of CPU time that a task should receive with a given periodicity, and withing a certain deadline. E.g., task t should be scheduled at least for 50 ms every 100 ms. To achieve this, it needs 3 parameters: runtime, deadline and period (although period can just be equal to deadline, which is what happens automatically if one sets period=0). It must always hold that: runtime <= deadline <= period (and this is enforced by the kernel, but checks are included in the patches, so that meaningful and easy to interpret error messages can be printed to the user). More info on SCHED_DEADLINE are available here: https://docs.kernel.org/scheduler/sched-deadline.html The interface will look like this, e.g., for setting SCHED_DEADLINE as a policy for 3 (0-2) vcpus, with runtime = 10000000, deadline = 15000000 and period = 20000000: <cputune> ... <vcpusched vcpus="0-2" scheduler="deadline" runtime="10000000" deadline="15000000" period="20000000"/> ... </cputune> This a link to a branch containing the patches: https://gitlab.com/Algisi-00/libvirt/-/tree/sched-deadline And this is the link to results of running the CI on such branch: https://gitlab.com/Algisi-00/libvirt/-/pipelines/601795712 Note that the jobs that are failing are also failing in the exact same way without these patches applied. Feedback is welcome and very much appreciated. Thanks and regards. Sasha Algisi (3): virprocess: define sched_attr and sched_setattr virprocess: add the SCHED_DEADLINE scheduling policy domain_conf: add SCHED_DEADLINE support in the XML configuration NEWS.rst | 5 ++ docs/formatdomain.rst | 16 +++- src/ch/ch_process.c | 3 +- src/conf/domain_conf.c | 52 +++++++++++-- src/conf/domain_conf.h | 3 + src/conf/schemas/domaincommon.rng | 16 ++++ src/qemu/qemu_process.c | 8 +- src/util/virprocess.c | 123 +++++++++++++++++++++++++++++- src/util/virprocess.h | 6 +- 9 files changed, 216 insertions(+), 16 deletions(-) -- 2.37.1 -- ------------------------ Indirizzo istituzionale di posta elettronica degli studenti e dei laureati dell'Università di TorinoOfficial University of Turin email address for students and graduates

In order to use SCHED_DEADLINE we need sched_setattr(), as sched_setscheduler() does not support all the necessary parameters. Signed-off-by: Sasha Algisi <sasha.algisi@edu.unito.it> Signed-off-by: Dario Faggioli <dfaggioli@suse.com> --- src/util/virprocess.c | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/src/util/virprocess.c b/src/util/virprocess.c index 013afd91b4..a8f86784e1 100644 --- a/src/util/virprocess.c +++ b/src/util/virprocess.c @@ -51,6 +51,10 @@ # include <sys/cpuset.h> #endif +#if WITH_SYS_SYSCALL_H +# include <sys/syscall.h> +#endif + #ifdef WIN32 # define WIN32_LEAN_AND_MEAN # include <windows.h> @@ -67,6 +71,10 @@ #define VIR_FROM_THIS VIR_FROM_NONE +#if defined(__linux__) && !defined(SCHED_FLAG_RESET_ON_FORK) +# define SCHED_FLAG_RESET_ON_FORK 0x01 +#endif + VIR_LOG_INIT("util.process"); VIR_ENUM_IMPL(virProcessSchedPolicy, @@ -79,6 +87,37 @@ VIR_ENUM_IMPL(virProcessSchedPolicy, ); +#if defined(__linux__) && defined(SCHED_DEADLINE) + +struct sched_attr { + uint32_t size; + uint32_t sched_policy; + uint64_t sched_flags; + + /*SCHED_OTHER, SCHED_BATCH*/ + int32_t sched_nice; + + /*SCHED_FIFO, SCHED_RR*/ + uint32_t sched_priority; + + /*SCHED_DEADLINE*/ + uint64_t sched_runtime; + uint64_t sched_deadline; + uint64_t sched_period; +}; + + +static +int sched_setattr(pid_t pid, + struct sched_attr *attr, + unsigned int flags) +{ + return syscall(SYS_sched_setattr, pid, attr, flags); +} + +#endif + + #ifndef WIN32 /** * virProcessTranslateStatus: -- 2.37.1 -- ------------------------ Indirizzo istituzionale di posta elettronica degli studenti e dei laureati dell'Università di TorinoOfficial University of Turin email address for students and graduates

On 8/1/22 19:11, Sasha Algisi wrote:
In order to use SCHED_DEADLINE we need sched_setattr(), as sched_setscheduler() does not support all the necessary parameters.
Signed-off-by: Sasha Algisi <sasha.algisi@edu.unito.it> Signed-off-by: Dario Faggioli <dfaggioli@suse.com> --- src/util/virprocess.c | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+)
diff --git a/src/util/virprocess.c b/src/util/virprocess.c index 013afd91b4..a8f86784e1 100644 --- a/src/util/virprocess.c +++ b/src/util/virprocess.c @@ -51,6 +51,10 @@ # include <sys/cpuset.h> #endif
+#if WITH_SYS_SYSCALL_H +# include <sys/syscall.h> +#endif + #ifdef WIN32 # define WIN32_LEAN_AND_MEAN # include <windows.h> @@ -67,6 +71,10 @@
#define VIR_FROM_THIS VIR_FROM_NONE
+#if defined(__linux__) && !defined(SCHED_FLAG_RESET_ON_FORK) +# define SCHED_FLAG_RESET_ON_FORK 0x01 +#endif + VIR_LOG_INIT("util.process");
VIR_ENUM_IMPL(virProcessSchedPolicy, @@ -79,6 +87,37 @@ VIR_ENUM_IMPL(virProcessSchedPolicy, );
+#if defined(__linux__) && defined(SCHED_DEADLINE) + +struct sched_attr { + uint32_t size; + uint32_t sched_policy; + uint64_t sched_flags; + + /*SCHED_OTHER, SCHED_BATCH*/ + int32_t sched_nice; + + /*SCHED_FIFO, SCHED_RR*/ + uint32_t sched_priority; + + /*SCHED_DEADLINE*/ + uint64_t sched_runtime; + uint64_t sched_deadline; + uint64_t sched_period; +};
Darn, I wish we could just include <linux/sched/types.h> but we can't. Kernel headers (at least the version I'm using: 5.15) are broken as the header file redefines sched_param struct.
+ + +static +int sched_setattr(pid_t pid,
We format it differently: static int function(int arg, ..)
+ struct sched_attr *attr, + unsigned int flags) +{ + return syscall(SYS_sched_setattr, pid, attr, flags); +} + +#endif
Now, this function is not used and is static which makes compiler sad. Maybe it can be marked as G_GNUC_UNUSED for the time being, until is used (in the following patch). Or just squash patches together. Michal

Tasks associated to virtual CPUs, IO Threads and Emulator processes can be created with the SCHED_DEADLINE policy. The policy is described in details here: https://docs.kernel.org/scheduler/sched-deadline.html It requires the following parameters (all in nanoseconds): 1) runtime 2) deadline 3) period It must always holds that: runtime <= deadline <= period. The kernel enforces that the values stay within [1024, 2^63-1]. Note, however, that a smaller range could be set (or be already set by default) via sysctl (see kernel.sched_deadline_period_max_us and kernel.sched_deadline_period_min_us). All the three parameters are mandatory but period can be set to 0, in which case it will set to the same value of deadline. Signed-off-by: Sasha Algisi <sasha.algisi@edu.unito.it> Signed-off-by: Dario Faggioli <dfaggioli@suse.com> --- src/ch/ch_process.c | 3 +- src/conf/domain_conf.h | 3 ++ src/qemu/qemu_process.c | 8 +++- src/util/virprocess.c | 84 +++++++++++++++++++++++++++++++++++++++-- src/util/virprocess.h | 6 ++- 5 files changed, 97 insertions(+), 7 deletions(-) diff --git a/src/ch/ch_process.c b/src/ch/ch_process.c index 77f55e777b..a40d188aac 100644 --- a/src/ch/ch_process.c +++ b/src/ch/ch_process.c @@ -293,7 +293,8 @@ virCHProcessSetupPid(virDomainObj *vm, /* Set scheduler type and priority, but not for the main thread. */ if (sched && nameval != VIR_CGROUP_THREAD_EMULATOR && - virProcessSetScheduler(pid, sched->policy, sched->priority) < 0) + virProcessSetScheduler(pid, sched->policy, sched->priority, + sched->runtime, sched->deadline, sched->period) < 0) goto cleanup; ret = 0; diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 060c395943..c3d1a1b65d 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2454,6 +2454,9 @@ typedef enum { struct _virDomainThreadSchedParam { virProcessSchedPolicy policy; int priority; + uint64_t runtime; + uint64_t deadline; + uint64_t period; }; struct _virDomainTimerCatchupDef { diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 137dcf5cf4..7586e0538a 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2728,7 +2728,8 @@ qemuProcessSetupPid(virDomainObj *vm, /* Set scheduler type and priority, but not for the main thread. */ if (sched && nameval != VIR_CGROUP_THREAD_EMULATOR && - virProcessSetScheduler(pid, sched->policy, sched->priority) < 0) + virProcessSetScheduler(pid, sched->policy, sched->priority, + sched->runtime, sched->deadline, sched->period) < 0) goto cleanup; ret = 0; @@ -7813,7 +7814,10 @@ qemuProcessLaunch(virConnectPtr conn, if (vm->def->cputune.emulatorsched && virProcessSetScheduler(vm->pid, vm->def->cputune.emulatorsched->policy, - vm->def->cputune.emulatorsched->priority) < 0) + vm->def->cputune.emulatorsched->priority, + vm->def->cputune.emulatorsched->runtime, + vm->def->cputune.emulatorsched->deadline, + vm->def->cputune.emulatorsched->period) < 0) goto cleanup; VIR_DEBUG("Setting any required VM passwords"); diff --git a/src/util/virprocess.c b/src/util/virprocess.c index a8f86784e1..c96bfc45fd 100644 --- a/src/util/virprocess.c +++ b/src/util/virprocess.c @@ -84,6 +84,7 @@ VIR_ENUM_IMPL(virProcessSchedPolicy, "idle", "fifo", "rr", + "deadline", ); @@ -1610,6 +1611,13 @@ virProcessSchedTranslatePolicy(virProcessSchedPolicy policy) case VIR_PROC_POLICY_RR: return SCHED_RR; + case VIR_PROC_POLICY_DEADLINE: +# ifdef SCHED_DEADLINE + return SCHED_DEADLINE; +# else + return -1; +# endif + case VIR_PROC_POLICY_LAST: /* nada */ break; @@ -1621,13 +1629,20 @@ virProcessSchedTranslatePolicy(virProcessSchedPolicy policy) int virProcessSetScheduler(pid_t pid, virProcessSchedPolicy policy, - int priority) + int priority, + uint64_t runtime G_GNUC_UNUSED, + uint64_t deadline G_GNUC_UNUSED, + uint64_t period G_GNUC_UNUSED) { struct sched_param param = {0}; int pol = virProcessSchedTranslatePolicy(policy); - VIR_DEBUG("pid=%lld, policy=%d, priority=%u", - (long long) pid, policy, priority); + VIR_DEBUG("pid=%lld, policy=%d, priority=%u, " + "runtime=%llu, deadline=%llu, period=%llu", + (long long) pid, policy, priority, + (unsigned long long) runtime, + (unsigned long long) deadline, + (unsigned long long) period); if (!policy) return 0; @@ -1667,6 +1682,69 @@ virProcessSetScheduler(pid_t pid, param.sched_priority = priority; } +# ifdef SCHED_DEADLINE + if (pol == SCHED_DEADLINE) { + struct sched_attr attr = {0}; + /* + * The range is enforced in the kernel. + * See: https://man7.org/linux/man-pages/man7/sched.7.html + */ + uint64_t min_value = 1024; + uint64_t max_value = (1ULL << 63) - 1; + + if (runtime < min_value || runtime > max_value) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Scheduler runtime %llu out of range " + "[%llu, %llu]"), + (unsigned long long) runtime, + (unsigned long long) min_value, + (unsigned long long) max_value); + return -1; + } + + if (deadline < runtime || deadline > max_value) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Scheduler deadline %llu out of range " + "[%llu, %llu]"), + (unsigned long long) deadline, + (unsigned long long) runtime, + (unsigned long long) max_value); + return -1; + } + + if ((period < deadline || period > max_value) && period != 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Invalid scheduler period %llu, " + "possible correct values are 0 " + "or [%llu, %llu]"), + (unsigned long long) period, + (unsigned long long) deadline, + (unsigned long long) max_value); + return -1; + } + + attr.size = sizeof(attr); + attr.sched_policy = pol; + /* + * Setting reset-on-fork is necessary as SCHED_DEADLINE + * tasks cannot fork. See: + * https://docs.kernel.org/scheduler/sched-deadline.html#default-behavior + */ + attr.sched_flags = SCHED_FLAG_RESET_ON_FORK; + attr.sched_runtime = runtime; + attr.sched_deadline = deadline; + attr.sched_period = period; + + if (sched_setattr(pid, &attr, 0) == 0) { + return 0; + } else { + virReportSystemError(errno, + _("Cannot set scheduler parameters for pid %lld"), + (long long) pid); + return -1; + } + } +# endif if (sched_setscheduler(pid, pol, ¶m) < 0) { virReportSystemError(errno, _("Cannot set scheduler parameters for pid %lld"), diff --git a/src/util/virprocess.h b/src/util/virprocess.h index 30b6981c73..84fdf00fdf 100644 --- a/src/util/virprocess.h +++ b/src/util/virprocess.h @@ -33,6 +33,7 @@ typedef enum { VIR_PROC_POLICY_IDLE, VIR_PROC_POLICY_FIFO, VIR_PROC_POLICY_RR, + VIR_PROC_POLICY_DEADLINE, VIR_PROC_POLICY_LAST } virProcessSchedPolicy; @@ -116,7 +117,10 @@ int virProcessSetupPrivateMountNS(void); int virProcessSetScheduler(pid_t pid, virProcessSchedPolicy policy, - int priority); + int priority, + uint64_t runtime, + uint64_t deadline, + uint64_t period); GStrv virProcessGetStat(pid_t pid, pid_t tid); -- 2.37.1 -- ------------------------ Indirizzo istituzionale di posta elettronica degli studenti e dei laureati dell'Università di TorinoOfficial University of Turin email address for students and graduates

On 8/1/22 19:11, Sasha Algisi wrote:
Tasks associated to virtual CPUs, IO Threads and Emulator processes can be created with the SCHED_DEADLINE policy. The policy is described in details here: https://docs.kernel.org/scheduler/sched-deadline.html
It requires the following parameters (all in nanoseconds): 1) runtime 2) deadline 3) period
It must always holds that: runtime <= deadline <= period.
The kernel enforces that the values stay within [1024, 2^63-1]. Note, however, that a smaller range could be set (or be already set by default) via sysctl (see kernel.sched_deadline_period_max_us and kernel.sched_deadline_period_min_us).
All the three parameters are mandatory but period can be set to 0, in which case it will set to the same value of deadline.
Signed-off-by: Sasha Algisi <sasha.algisi@edu.unito.it> Signed-off-by: Dario Faggioli <dfaggioli@suse.com> --- src/ch/ch_process.c | 3 +- src/conf/domain_conf.h | 3 ++ src/qemu/qemu_process.c | 8 +++- src/util/virprocess.c | 84 +++++++++++++++++++++++++++++++++++++++-- src/util/virprocess.h | 6 ++- 5 files changed, 97 insertions(+), 7 deletions(-)
diff --git a/src/ch/ch_process.c b/src/ch/ch_process.c index 77f55e777b..a40d188aac 100644 --- a/src/ch/ch_process.c +++ b/src/ch/ch_process.c @@ -293,7 +293,8 @@ virCHProcessSetupPid(virDomainObj *vm, /* Set scheduler type and priority, but not for the main thread. */ if (sched && nameval != VIR_CGROUP_THREAD_EMULATOR && - virProcessSetScheduler(pid, sched->policy, sched->priority) < 0) + virProcessSetScheduler(pid, sched->policy, sched->priority, + sched->runtime, sched->deadline, sched->period) < 0) goto cleanup;
ret = 0; diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 060c395943..c3d1a1b65d 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2454,6 +2454,9 @@ typedef enum { struct _virDomainThreadSchedParam { virProcessSchedPolicy policy; int priority; + uint64_t runtime; + uint64_t deadline; + uint64_t period;
Or just unsigned long long. Here we don't face kernel just yet, and can use 'more generic' types which also allows you to drop plenty of typecasts when using internal helpers.
};
struct _virDomainTimerCatchupDef { diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 137dcf5cf4..7586e0538a 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2728,7 +2728,8 @@ qemuProcessSetupPid(virDomainObj *vm, /* Set scheduler type and priority, but not for the main thread. */ if (sched && nameval != VIR_CGROUP_THREAD_EMULATOR && - virProcessSetScheduler(pid, sched->policy, sched->priority) < 0) + virProcessSetScheduler(pid, sched->policy, sched->priority, + sched->runtime, sched->deadline, sched->period) < 0) goto cleanup;
ret = 0; @@ -7813,7 +7814,10 @@ qemuProcessLaunch(virConnectPtr conn, if (vm->def->cputune.emulatorsched && virProcessSetScheduler(vm->pid, vm->def->cputune.emulatorsched->policy, - vm->def->cputune.emulatorsched->priority) < 0) + vm->def->cputune.emulatorsched->priority, + vm->def->cputune.emulatorsched->runtime, + vm->def->cputune.emulatorsched->deadline, + vm->def->cputune.emulatorsched->period) < 0) goto cleanup;
VIR_DEBUG("Setting any required VM passwords"); diff --git a/src/util/virprocess.c b/src/util/virprocess.c index a8f86784e1..c96bfc45fd 100644 --- a/src/util/virprocess.c +++ b/src/util/virprocess.c @@ -84,6 +84,7 @@ VIR_ENUM_IMPL(virProcessSchedPolicy, "idle", "fifo", "rr", + "deadline", );
@@ -1610,6 +1611,13 @@ virProcessSchedTranslatePolicy(virProcessSchedPolicy policy) case VIR_PROC_POLICY_RR: return SCHED_RR;
+ case VIR_PROC_POLICY_DEADLINE: +# ifdef SCHED_DEADLINE + return SCHED_DEADLINE; +# else + return -1; +# endif + case VIR_PROC_POLICY_LAST: /* nada */ break; @@ -1621,13 +1629,20 @@ virProcessSchedTranslatePolicy(virProcessSchedPolicy policy) int virProcessSetScheduler(pid_t pid, virProcessSchedPolicy policy, - int priority) + int priority, + uint64_t runtime G_GNUC_UNUSED, + uint64_t deadline G_GNUC_UNUSED, + uint64_t period G_GNUC_UNUSED)
The !WITH_SCHED_SETSCHEDULER case stub is not updated correspondingly. And these new arguments don't need the unused annotation ...
{ struct sched_param param = {0}; int pol = virProcessSchedTranslatePolicy(policy);
- VIR_DEBUG("pid=%lld, policy=%d, priority=%u", - (long long) pid, policy, priority); + VIR_DEBUG("pid=%lld, policy=%d, priority=%u, " + "runtime=%llu, deadline=%llu, period=%llu", + (long long) pid, policy, priority, + (unsigned long long) runtime, + (unsigned long long) deadline, + (unsigned long long) period);
.. as they are used here. Also, arguments can be ull type because it's only later in this function that we need to convert them to uint64_t.
if (!policy) return 0; @@ -1667,6 +1682,69 @@ virProcessSetScheduler(pid_t pid, param.sched_priority = priority; }
+# ifdef SCHED_DEADLINE + if (pol == SCHED_DEADLINE) { + struct sched_attr attr = {0}; + /* + * The range is enforced in the kernel. + * See: https://man7.org/linux/man-pages/man7/sched.7.html + */ + uint64_t min_value = 1024; + uint64_t max_value = (1ULL << 63) - 1; + + if (runtime < min_value || runtime > max_value) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Scheduler runtime %llu out of range " + "[%llu, %llu]"), + (unsigned long long) runtime, + (unsigned long long) min_value, + (unsigned long long) max_value); + return -1; + } + + if (deadline < runtime || deadline > max_value) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Scheduler deadline %llu out of range " + "[%llu, %llu]"), + (unsigned long long) deadline, + (unsigned long long) runtime, + (unsigned long long) max_value); + return -1; + } + + if ((period < deadline || period > max_value) && period != 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Invalid scheduler period %llu, " + "possible correct values are 0 " + "or [%llu, %llu]"), + (unsigned long long) period, + (unsigned long long) deadline, + (unsigned long long) max_value); + return -1; + } + + attr.size = sizeof(attr); + attr.sched_policy = pol; + /* + * Setting reset-on-fork is necessary as SCHED_DEADLINE + * tasks cannot fork. See: + * https://docs.kernel.org/scheduler/sched-deadline.html#default-behavior + */ + attr.sched_flags = SCHED_FLAG_RESET_ON_FORK; + attr.sched_runtime = runtime; + attr.sched_deadline = deadline; + attr.sched_period = period; + + if (sched_setattr(pid, &attr, 0) == 0) { + return 0; + } else { + virReportSystemError(errno, + _("Cannot set scheduler parameters for pid %lld"), + (long long) pid); + return -1; + } + } +# endif if (sched_setscheduler(pid, pol, ¶m) < 0) {
So what happens when the deadline policy is requested (e.g. virProcessSetScheduler(policy=VIR_PROC_POLICY_DEADLINE) but SCHED_DEADLINE is not available? Does this sched_setscheduler() here error out?
virReportSystemError(errno, _("Cannot set scheduler parameters for pid %lld"), diff --git a/src/util/virprocess.h b/src/util/virprocess.h index 30b6981c73..84fdf00fdf 100644 --- a/src/util/virprocess.h +++ b/src/util/virprocess.h @@ -33,6 +33,7 @@ typedef enum { VIR_PROC_POLICY_IDLE, VIR_PROC_POLICY_FIFO, VIR_PROC_POLICY_RR, + VIR_PROC_POLICY_DEADLINE,
This fails to compile, because of a switch() inside virDomainSchedulerFormat() that does not handle this new case. I think you can work around this by reordering things a bit. For instance, the first patch can introduce XML parsing/formatting and this new enum member. Then, follow up patch(-es) can extend virProcessSetScheduler.
VIR_PROC_POLICY_LAST } virProcessSchedPolicy; @@ -116,7 +117,10 @@ int virProcessSetupPrivateMountNS(void);
int virProcessSetScheduler(pid_t pid, virProcessSchedPolicy policy, - int priority); + int priority, + uint64_t runtime, + uint64_t deadline, + uint64_t period);
GStrv virProcessGetStat(pid_t pid, pid_t tid);
Michal

Users can set SCHED_DEADLINE as a scheduling policy. For example, for setting runtime = 10000000, deadline = 15000000 and period = 20000000 for vcpus 0-2: <cputune> ... <vcpusched vcpus="0-2" scheduler="deadline" runtime="10000000" deadline="15000000" period="20000000"/> ... </cputune> Update release notes accordingly. Signed-off-by: Sasha Algisi <sasha.algisi@edu.unito.it> Signed-off-by: Dario Faggioli <dfaggioli@suse.com> --- NEWS.rst | 5 +++ docs/formatdomain.rst | 16 +++++++--- src/conf/domain_conf.c | 52 ++++++++++++++++++++++++++++--- src/conf/schemas/domaincommon.rng | 16 ++++++++++ 4 files changed, 80 insertions(+), 9 deletions(-) diff --git a/NEWS.rst b/NEWS.rst index ef298da539..23484afdc2 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -17,6 +17,11 @@ v8.7.0 (unreleased) * **New features** + * qemu: support for SCHED_DEADLINE scheduling + + Users can now use the SCHED_DEADLINE scheduling policy for tasks + associated to virtual CPUs, IO Threads and Emulator processes. + * **Improvements** * **Bug fixes** diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index 1ed969ac3e..216262b79d 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst @@ -910,10 +910,11 @@ CPU Tuning support since 2.1.0` ``vcpusched``, ``iothreadsched`` and ``emulatorsched`` The optional ``vcpusched``, ``iothreadsched`` and ``emulatorsched`` elements - specify the scheduler type (values ``batch``, ``idle``, ``fifo``, ``rr``) for - particular vCPU, IOThread and emulator threads respectively. For ``vcpusched`` - and ``iothreadsched`` the attributes ``vcpus`` and ``iothreads`` select which - vCPUs/IOThreads this setting applies to, leaving them out sets the default. + specify the scheduler type (values ``batch``, ``idle``, ``fifo``, ``rr``, + ``deadline`` :since:`Since 8.7.0`) for particular vCPU, IOThread and emulator + threads respectively. For ``vcpusched`` and ``iothreadsched`` the attributes + ``vcpus`` and ``iothreads`` select which vCPUs/IOThreads this setting applies + to, leaving them out sets the default. The element ``emulatorsched`` does not have that attribute. Valid ``vcpus`` values start at 0 through one less than the number of vCPU's defined for the domain. Valid ``iothreads`` values are described in the `IOThreads Allocation`_ @@ -923,6 +924,13 @@ CPU Tuning priority must be specified as well (and is ignored for non-real-time ones). The value range for the priority depends on the host kernel (usually 1-99). :since:`Since 1.2.13` ``emulatorsched`` :since:`since 5.3.0` + For SCHED_DEADLINE (``deadline``), runtime , deadline and period must also + be specified (they are ignored in other schedulers). It must always be true + that: runtime <= deadline <= period. + The values are specified in nanoseconds. The valid range for the parameters + is [1024, 2^63-1] (but a smaller one can be put in place via sysctl). The + period can be set to 0, in which case, a period equal to the deadline is + used. ``cachetune`` :since:`Since 4.1.0` Optional ``cachetune`` element can control allocations for CPU caches using the resctrl on the host. Whether or not is this supported can be gathered diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index e85cc1f809..86ada8f147 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -16693,7 +16693,10 @@ virDomainLoaderDefParseXML(virDomainLoaderDef *loader, static int virDomainSchedulerParseCommonAttrs(xmlNodePtr node, virProcessSchedPolicy *policy, - int *priority) + int *priority, + uint64_t *runtime, + uint64_t *deadline, + uint64_t *period) { if (virXMLPropEnum(node, "scheduler", virProcessSchedPolicyTypeFromString, VIR_XML_PROP_REQUIRED | VIR_XML_PROP_NONZERO, @@ -16706,6 +16709,20 @@ virDomainSchedulerParseCommonAttrs(xmlNodePtr node, return -1; } + if (*policy == VIR_PROC_POLICY_DEADLINE) { + if (virXMLPropULongLong(node, "runtime", 10, VIR_XML_PROP_REQUIRED, + (unsigned long long *) runtime) < 0) + return -1; + + if (virXMLPropULongLong(node, "deadline", 10, VIR_XML_PROP_REQUIRED, + (unsigned long long *) deadline) < 0) + return -1; + + if (virXMLPropULongLong(node, "period", 10, VIR_XML_PROP_REQUIRED, + (unsigned long long *) period) < 0) + return -1; + } + return 0; } @@ -16720,7 +16737,10 @@ virDomainEmulatorSchedParse(xmlNodePtr node, if (virDomainSchedulerParseCommonAttrs(node, &sched->policy, - &sched->priority) < 0) + &sched->priority, + &sched->runtime, + &sched->deadline, + &sched->period) < 0) return -1; def->cputune.emulatorsched = g_steal_pointer(&sched); @@ -16733,7 +16753,10 @@ virDomainSchedulerParse(xmlNodePtr node, const char *elementName, const char *attributeName, virProcessSchedPolicy *policy, - int *priority) + int *priority, + uint64_t *runtime, + uint64_t *deadline, + uint64_t *period) { g_autoptr(virBitmap) ret = NULL; g_autofree char *tmp = NULL; @@ -16755,7 +16778,8 @@ virDomainSchedulerParse(xmlNodePtr node, return NULL; } - if (virDomainSchedulerParseCommonAttrs(node, policy, priority) < 0) + if (virDomainSchedulerParseCommonAttrs(node, policy, priority, + runtime, deadline, period) < 0) return NULL; return g_steal_pointer(&ret); @@ -16773,10 +16797,14 @@ virDomainThreadSchedParseHelper(xmlNodePtr node, virDomainThreadSchedParam *sched = NULL; virProcessSchedPolicy policy = 0; int priority = 0; + uint64_t runtime = 0; + uint64_t deadline = 0; + uint64_t period = 0; g_autoptr(virBitmap) map = NULL; if (!(map = virDomainSchedulerParse(node, elementName, attributeName, - &policy, &priority))) + &policy, &priority, &runtime, + &deadline, &period))) return -1; while ((next = virBitmapNextSetBit(map, next)) > -1) { @@ -16792,6 +16820,9 @@ virDomainThreadSchedParseHelper(xmlNodePtr node, sched->policy = policy; sched->priority = priority; + sched->runtime = runtime; + sched->deadline = deadline; + sched->period = period; } return 0; @@ -26029,6 +26060,17 @@ virDomainSchedulerFormat(virBuffer *buf, sched->priority); break; + case VIR_PROC_POLICY_DEADLINE: + virBufferAsprintf(buf, "<%ssched", name); + if (multiple_threads) + virBufferAsprintf(buf, " %ss='%zu'", name, id); + virBufferAsprintf(buf, " scheduler='%s' runtime='%llu' deadline='%llu' period='%llu'/>\n", + virProcessSchedPolicyTypeToString(sched->policy), + (unsigned long long) sched->runtime, + (unsigned long long) sched->deadline, + (unsigned long long) sched->period); + break; + case VIR_PROC_POLICY_NONE: case VIR_PROC_POLICY_LAST: break; diff --git a/src/conf/schemas/domaincommon.rng b/src/conf/schemas/domaincommon.rng index c4f293a4c3..86daffab8c 100644 --- a/src/conf/schemas/domaincommon.rng +++ b/src/conf/schemas/domaincommon.rng @@ -1168,6 +1168,22 @@ <ref name="unsignedShort"/> </attribute> </group> + <group> + <attribute name="scheduler"> + <choice> + <value>deadline</value> + </choice> + </attribute> + <attribute name="runtime"> + <ref name="unsignedLong"/> + </attribute> + <attribute name="deadline"> + <ref name="unsignedLong"/> + </attribute> + <attribute name="period"> + <ref name="unsignedLong"/> + </attribute> + </group> </choice> </define> -- 2.37.1 -- ------------------------ Indirizzo istituzionale di posta elettronica degli studenti e dei laureati dell'Università di TorinoOfficial University of Turin email address for students and graduates

On 8/1/22 19:11, Sasha Algisi wrote:
Users can set SCHED_DEADLINE as a scheduling policy.
For example, for setting runtime = 10000000, deadline = 15000000 and period = 20000000 for vcpus 0-2:
<cputune> ... <vcpusched vcpus="0-2" scheduler="deadline" runtime="10000000" deadline="15000000" period="20000000"/> ... </cputune>
Update release notes accordingly.
Signed-off-by: Sasha Algisi <sasha.algisi@edu.unito.it> Signed-off-by: Dario Faggioli <dfaggioli@suse.com> --- NEWS.rst | 5 +++ docs/formatdomain.rst | 16 +++++++--- src/conf/domain_conf.c | 52 ++++++++++++++++++++++++++++--- src/conf/schemas/domaincommon.rng | 16 ++++++++++ 4 files changed, 80 insertions(+), 9 deletions(-)
diff --git a/NEWS.rst b/NEWS.rst index ef298da539..23484afdc2 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -17,6 +17,11 @@ v8.7.0 (unreleased)
* **New features**
+ * qemu: support for SCHED_DEADLINE scheduling + + Users can now use the SCHED_DEADLINE scheduling policy for tasks + associated to virtual CPUs, IO Threads and Emulator processes. + * **Improvements**
Bonus points for remembering to update NEWS.rst, but we tend to do that in a separate patch. The reason being: easier backports. I mean, when a downstream maintainer decides to backport these patches, they would get a conflict in the NEWS file instantly.
* **Bug fixes** diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index 1ed969ac3e..216262b79d 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst @@ -910,10 +910,11 @@ CPU Tuning support since 2.1.0` ``vcpusched``, ``iothreadsched`` and ``emulatorsched`` The optional ``vcpusched``, ``iothreadsched`` and ``emulatorsched`` elements - specify the scheduler type (values ``batch``, ``idle``, ``fifo``, ``rr``) for - particular vCPU, IOThread and emulator threads respectively. For ``vcpusched`` - and ``iothreadsched`` the attributes ``vcpus`` and ``iothreads`` select which - vCPUs/IOThreads this setting applies to, leaving them out sets the default. + specify the scheduler type (values ``batch``, ``idle``, ``fifo``, ``rr``, + ``deadline`` :since:`Since 8.7.0`) for particular vCPU, IOThread and emulator + threads respectively. For ``vcpusched`` and ``iothreadsched`` the attributes + ``vcpus`` and ``iothreads`` select which vCPUs/IOThreads this setting applies + to, leaving them out sets the default. The element ``emulatorsched`` does not have that attribute. Valid ``vcpus`` values start at 0 through one less than the number of vCPU's defined for the domain. Valid ``iothreads`` values are described in the `IOThreads Allocation`_ @@ -923,6 +924,13 @@ CPU Tuning priority must be specified as well (and is ignored for non-real-time ones). The value range for the priority depends on the host kernel (usually 1-99). :since:`Since 1.2.13` ``emulatorsched`` :since:`since 5.3.0` + For SCHED_DEADLINE (``deadline``), runtime , deadline and period must also + be specified (they are ignored in other schedulers). It must always be true + that: runtime <= deadline <= period. + The values are specified in nanoseconds. The valid range for the parameters + is [1024, 2^63-1] (but a smaller one can be put in place via sysctl). The + period can be set to 0, in which case, a period equal to the deadline is + used.
I wonder whether we should make the @period attribute optional then and if not provided in the XML then fill in the value provided to @deadline. Just a suggestion though.
``cachetune`` :since:`Since 4.1.0` Optional ``cachetune`` element can control allocations for CPU caches using the resctrl on the host. Whether or not is this supported can be gathered diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index e85cc1f809..86ada8f147 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -16693,7 +16693,10 @@ virDomainLoaderDefParseXML(virDomainLoaderDef *loader, static int virDomainSchedulerParseCommonAttrs(xmlNodePtr node, virProcessSchedPolicy *policy, - int *priority) + int *priority, + uint64_t *runtime, + uint64_t *deadline, + uint64_t *period)
Again, why not use ull instead? I'm just going to stop picking on this. I'm sure you get the idea. The rest of the patch looks correct. What I'm missing in this patch is a test case. We tend to either introduce a new one or extend an existing one whenever we touch XML parser/formatter (tests/qemuxml2xmltest.c). Quick git grep scheduler= -- tests/ shows some test cases that could be extended, or where inspiration for a new one can be taken from. Michal

On 8/1/22 19:11, Sasha Algisi wrote:
Hello everyone,
This patchset aims at adding support for the SCHED_DEADLINE Linux scheduling policy for vcpus, io-threads and emulator processes.
In fact, libvirt currently supports SCHED_OTHER, SCHED_BATCH, SCHED_IDLE, SCHE_FIFO and SCHED_RR, but not SCHED_DEADLINE. SCHED_DEADLINE is a policy implementing an algorithm originating from the real-time scheduling community, but it can be useful outside of the real-time computing field as well.
It allows one to set a specific amount of CPU time that a task should receive with a given periodicity, and withing a certain deadline. E.g., task t should be scheduled at least for 50 ms every 100 ms. To achieve this, it needs 3 parameters: runtime, deadline and period (although period can just be equal to deadline, which is what happens automatically if one sets period=0). It must always hold that: runtime <= deadline <= period (and this is enforced by the kernel, but checks are included in the patches, so that meaningful and easy to interpret error messages can be printed to the user).
More info on SCHED_DEADLINE are available here:
https://docs.kernel.org/scheduler/sched-deadline.html
The interface will look like this, e.g., for setting SCHED_DEADLINE as a policy for 3 (0-2) vcpus, with runtime = 10000000, deadline = 15000000 and period = 20000000:
<cputune> ... <vcpusched vcpus="0-2" scheduler="deadline" runtime="10000000" deadline="15000000" period="20000000"/> ... </cputune>
This a link to a branch containing the patches:
https://gitlab.com/Algisi-00/libvirt/-/tree/sched-deadline
And this is the link to results of running the CI on such branch:
https://gitlab.com/Algisi-00/libvirt/-/pipelines/601795712
Note that the jobs that are failing are also failing in the exact same way without these patches applied.
Feedback is welcome and very much appreciated.
Thanks and regards.
Sasha Algisi (3): virprocess: define sched_attr and sched_setattr virprocess: add the SCHED_DEADLINE scheduling policy domain_conf: add SCHED_DEADLINE support in the XML configuration
NEWS.rst | 5 ++ docs/formatdomain.rst | 16 +++- src/ch/ch_process.c | 3 +- src/conf/domain_conf.c | 52 +++++++++++-- src/conf/domain_conf.h | 3 + src/conf/schemas/domaincommon.rng | 16 ++++ src/qemu/qemu_process.c | 8 +- src/util/virprocess.c | 123 +++++++++++++++++++++++++++++- src/util/virprocess.h | 6 +- 9 files changed, 216 insertions(+), 16 deletions(-)
Hey, the code looks good. However, we require that the code compiles after each patch, which is not the case with your series. The reason for our requirement is simple: easy git bisect. Therefore, it's okay if feature does not work until the very last commit. We often have patches/commits that work gradually towards grand finale. Can you please fix that in v2? Michal
participants (2)
-
Michal Prívozník
-
Sasha Algisi