On Fri, Sep 04, 2020 at 01:32:24PM +0100, Daniel P. Berrangé wrote:
On Fri, Sep 04, 2020 at 02:22:14PM +0200, Martin Kletzander wrote:
> At least in a particular scenario described in the code. Basically when
> libvirtd is running without CAP_SYS_NICE (e.g. in a container) and it is trying
> to set QEMU affinity to all CPUs (because there is no setting requested in the
> XML) it fails. But if we ignore the failure in this particular case than you
> can limit the CPUs used by controlling the affinity for libvirtd itself.
>
> In any other case (anything requested in the XML, pinning a live domain, etc.)
> the call is still considered fatal and the action errors out.
>
> Resolves:
https://bugzilla.redhat.com/1819801
I'd prefer if this commit message outlined the reason why this change is
ok, instead of just pointing to the BZ
eg add the following text:
Consider a host with 8 CPUs. There are the following possible scenarios
1. Bare metal; libvirtd has affinity of 8 CPUs; QEMU should get 8 CPUs
2. Bare metal; libvirtd has affinity of 2 CPUs; QEMU should get 8 CPUs
3. Container has affinity of 8 CPUs; libvirtd has affinity of 8 CPus;
QEMU should get 8 CPUs
4. Container has affinity of 8 CPUs; libvirtd has affinity of 2 CPus;
QEMU should get 8 CPUs
5. Container has affinity of 4 CPUs; libvirtd has affinity of 4 CPus;
QEMU should get 4 CPUs
6. Container has affinity of 4 CPUs; libvirtd has affinity of 2 CPus;
QEMU should get 4 CPUs
Scenarios 1 & 2 always work unless systemd restricted libvirtd privs.
Scenario 3 works because libvirt checks current affinity first and
skips the sched_setaffinity call, avoiding the SYS_NICE issue
Scenario 4 works only if CAP_SYS_NICE is availalbe
Scenarios 5 & 6 works only if CAP_SYS_NICE is present *AND* the cgroups
cpuset is not set on the container.
If libvirt blindly ignores the sched_setaffinity failure, then scenarios
4, 5 and 6 should all work, but with caveat in case 4 and 6, that
QEMU will only get 2 CPUs instead of the possible 8 and 4 respectively.
This is still better than failing.
Therefore libvirt can blindly ignore the setaffinity failure, but *ONLY*
ignore it when there was no affinity specified in the XML config.
If user specified affinity explicitly, libvirt must report an error if
it can't be honoured.
I replaced my commit message with yours.
>
> Suggested-by: Daniel P. Berrangé <berrange(a)redhat.com>
> Signed-off-by: Martin Kletzander <mkletzan(a)redhat.com>
> ---
> src/qemu/qemu_process.c | 41 ++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 38 insertions(+), 3 deletions(-)
>
> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> index cfe09d632633..270bb37d3682 100644
> --- a/src/qemu/qemu_process.c
> +++ b/src/qemu/qemu_process.c
> @@ -2528,6 +2528,7 @@ qemuProcessGetAllCpuAffinity(virBitmapPtr *cpumapRet)
> static int
> qemuProcessInitCpuAffinity(virDomainObjPtr vm)
> {
> + bool settingAll = false;
> g_autoptr(virBitmap) cpumapToSet = NULL;
> virDomainNumatuneMemMode mem_mode;
> qemuDomainObjPrivatePtr priv = vm->privateData;
> @@ -2566,13 +2567,30 @@ qemuProcessInitCpuAffinity(virDomainObjPtr vm)
> if (!(cpumapToSet = virBitmapNewCopy(vm->def->cputune.emulatorpin)))
> return -1;
> } else {
> + settingAll = true;
> if (qemuProcessGetAllCpuAffinity(&cpumapToSet) < 0)
> return -1;
> }
>
> if (cpumapToSet &&
> virProcessSetAffinity(vm->pid, cpumapToSet) < 0) {
> - return -1;
> + /*
> + * We only want to error out if we failed to set the affinity to
> + * user-requested mapping. If we are just trying to reset the affinity
> + * to all CPUs and this fails it can only be an issue if:
> + * 1) libvirtd does not have CAP_SYS_NICE
> + * 2) libvirtd does not run on all CPUs
> + *
> + * However since this scenario is very improbable, we rather skip
> + * reporting the error because it helps running libvirtd in a a scenario
> + * where pinning is handled by someone else.
I wouldn't call this scenario "improbably" - it is entirely expected
by some of our users. Replace these three lines with
"This scenario can easily occurr when libvirtd is run
inside a container with restrictive permissions and CPU
pinning"
Yeah, I meant "improbable on bare metal". I replaced them with your
explanation.
With the text changes
Reviewed-by: Daniel P. Berrangé <berrange(a)redhat.com>
Thanks, pushed.