[libvirt] running Libvirt from source code, IPC_LOCK and VFIO

Hi, I'm facing a strange behavior when running Libvirt from source code, latest upstream, on an Ubuntu 18.04.1 LTS Power 9 server. My QEMU guest - which is using VFIO and GPU passthrough - breaks on boot when trying to allocate a DMA window inside KVM. Debugging the code, I've found out that the problem is related to the process not having CAP_IPC_LOCK - at least from the host kernel perspective. This is strange because: - the same VM running directly from QEMU command line works - the same VM running in the system Libvirt (v4.0.0, Ubuntu version) also works What am I missing? My understanding on Linux process is that a process running as root should inherit the same capabilities of the user, which includes CAP_IPC_LOCK. Running Libvirt from source code should grant ipc_lock to it ... right? Any help is appreciated. I can provide more details (VM XML for example) if necessary. Thanks!

Update: I've figured it out. The bug here was that, even running as root, I was getting errors like: error : virQEMUCapsNewForBinaryInternal:4687 : internal error: Failed to probe QEMU binary with QMP: libvirt: error : prctl failed to enable 'dac_override' in the AMBIENT set: Operation not permitted The reason is that the host has libcap-ng installed. ./configure uses it if available, setting WITH_CAPNG in the code. I am unsure if this has something to do with the libcap-ng configuration in this system I'm using or if there is something missing in the Libvirt code, but the spawned QEMU process isn't inheriting the capabilities it should have. Disabling support of this lib with "--with-capng=no" in autogen.sh and rebuilding Libvirt fixed the problem. I was even able to see more NUMA nodes than I was before using the system libvirt (which is the original bug I am/was investigating). Thanks! On 2/1/19 4:04 PM, Daniel Henrique Barboza wrote:
Hi,
I'm facing a strange behavior when running Libvirt from source code, latest upstream, on an Ubuntu 18.04.1 LTS Power 9 server. My QEMU guest - which is using VFIO and GPU passthrough - breaks on boot when trying to allocate a DMA window inside KVM.
Debugging the code, I've found out that the problem is related to the process not having CAP_IPC_LOCK - at least from the host kernel perspective.
This is strange because:
- the same VM running directly from QEMU command line works - the same VM running in the system Libvirt (v4.0.0, Ubuntu version) also works
What am I missing? My understanding on Linux process is that a process running as root should inherit the same capabilities of the user, which includes CAP_IPC_LOCK. Running Libvirt from source code should grant ipc_lock to it ... right?
Any help is appreciated. I can provide more details (VM XML for example) if necessary.
Thanks!

On Fri, Feb 01, 2019 at 07:40:36PM -0200, Daniel Henrique Barboza wrote:
Update: I've figured it out.
The bug here was that, even running as root, I was getting errors like:
error : virQEMUCapsNewForBinaryInternal:4687 : internal error: Failed to probe QEMU binary with QMP: libvirt: error : prctl failed to enable 'dac_override' in the AMBIENT set: Operation not permitted
The reason is that the host has libcap-ng installed. ./configure uses it if available, setting WITH_CAPNG in the code. I am unsure if this has something to do with the libcap-ng configuration in this system I'm using or if there is something missing in the Libvirt code, but the spawned QEMU process isn't inheriting the capabilities it should have.
Disabling support of this lib with "--with-capng=no" in autogen.sh and rebuilding Libvirt fixed the problem. I was even able to see more NUMA nodes than I was before using the system libvirt (which is the original bug I am/was investigating).
Thanks!
Hey Daniel, I had the same problem with Mellanox RDMA device as backend for pvrdma device where the mlx5 driver also enforces CAP_IPC_LOCK. So at first i use this as a workaround but i knew that it is not a final approach as it makes a wrong use of memory. <memoryBacking> <locked/> </memoryBacking> Anyways, your tip helps so wanted to say thanks! Yuval
On 2/1/19 4:04 PM, Daniel Henrique Barboza wrote:
Hi,
I'm facing a strange behavior when running Libvirt from source code, latest upstream, on an Ubuntu 18.04.1 LTS Power 9 server. My QEMU guest - which is using VFIO and GPU passthrough - breaks on boot when trying to allocate a DMA window inside KVM.
Debugging the code, I've found out that the problem is related to the process not having CAP_IPC_LOCK - at least from the host kernel perspective.
This is strange because:
- the same VM running directly from QEMU command line works - the same VM running in the system Libvirt (v4.0.0, Ubuntu version) also works
What am I missing? My understanding on Linux process is that a process running as root should inherit the same capabilities of the user, which includes CAP_IPC_LOCK. Running Libvirt from source code should grant ipc_lock to it ... right?
Any help is appreciated. I can provide more details (VM XML for example) if necessary.
Thanks!
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 2/4/19 6:47 AM, Yuval Shaia wrote:
On Fri, Feb 01, 2019 at 07:40:36PM -0200, Daniel Henrique Barboza wrote:
Update: I've figured it out.
The bug here was that, even running as root, I was getting errors like:
error : virQEMUCapsNewForBinaryInternal:4687 : internal error: Failed to probe QEMU binary with QMP: libvirt: error : prctl failed to enable 'dac_override' in the AMBIENT set: Operation not permitted
The reason is that the host has libcap-ng installed. ./configure uses it if available, setting WITH_CAPNG in the code. I am unsure if this has something to do with the libcap-ng configuration in this system I'm using or if there is something missing in the Libvirt code, but the spawned QEMU process isn't inheriting the capabilities it should have.
Disabling support of this lib with "--with-capng=no" in autogen.sh and rebuilding Libvirt fixed the problem. I was even able to see more NUMA nodes than I was before using the system libvirt (which is the original bug I am/was investigating).
Thanks!
Hey Daniel, I had the same problem with Mellanox RDMA device as backend for pvrdma device where the mlx5 driver also enforces CAP_IPC_LOCK.
So at first i use this as a workaround but i knew that it is not a final approach as it makes a wrong use of memory. <memoryBacking> <locked/> </memoryBacking>
Anyways, your tip helps so wanted to say thanks!
No problem! Thanks for sharing your memoryBacking workaround. DHB
Yuval
Hi,
I'm facing a strange behavior when running Libvirt from source code, latest upstream, on an Ubuntu 18.04.1 LTS Power 9 server. My QEMU guest - which is using VFIO and GPU passthrough - breaks on boot when trying to allocate a DMA window inside KVM.
Debugging the code, I've found out that the problem is related to the process not having CAP_IPC_LOCK - at least from the host kernel perspective.
This is strange because:
- the same VM running directly from QEMU command line works - the same VM running in the system Libvirt (v4.0.0, Ubuntu version) also works
What am I missing? My understanding on Linux process is that a process running as root should inherit the same capabilities of the user, which includes CAP_IPC_LOCK. Running Libvirt from source code should grant ipc_lock to it ... right?
Any help is appreciated. I can provide more details (VM XML for example) if necessary.
Thanks! --
On 2/1/19 4:04 PM, Daniel Henrique Barboza wrote: libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Fri, Feb 01, 2019 at 07:40:36PM -0200, Daniel Henrique Barboza wrote:
Update: I've figured it out.
The bug here was that, even running as root, I was getting errors like:
error : virQEMUCapsNewForBinaryInternal:4687 : internal error: Failed to probe QEMU binary with QMP: libvirt: error : prctl failed to enable 'dac_override' in the AMBIENT set: Operation not permitted
Being responsible for the latest changes wrt to capabilities, this error itself is very strange because the prctl man page says the following about EPERM errno: "option is PR_CAP_AMBIENT and arg2 is PR_CAP_AMBIENT_RAISE, but either the capability specified in arg3 is not present in the process's permitted and inheritable capability sets, or the PR_CAP_AMBIENT_LOWER securebit has been set." So I'm wondering how can that be since that prctl call happens after we applied the capabilities we want with capng_apply. Just out of curiosity, what happens if you move the whole PR_CAP_AMBIENT at the very end of virSetUIDGIDWithCaps function? Does it change anything? Thanks, Erik
The reason is that the host has libcap-ng installed. ./configure uses it if available, setting WITH_CAPNG in the code. I am unsure if this has something to do with the libcap-ng configuration in this system I'm using or if there is something missing in the Libvirt code, but the spawned QEMU process isn't inheriting the capabilities it should have.
Disabling support of this lib with "--with-capng=no" in autogen.sh and rebuilding Libvirt fixed the problem. I was even able to see more NUMA nodes than I was before using the system libvirt (which is the original bug I am/was investigating).
Thanks!
On 2/1/19 4:04 PM, Daniel Henrique Barboza wrote:
Hi,
I'm facing a strange behavior when running Libvirt from source code, latest upstream, on an Ubuntu 18.04.1 LTS Power 9 server. My QEMU guest - which is using VFIO and GPU passthrough - breaks on boot when trying to allocate a DMA window inside KVM.
Debugging the code, I've found out that the problem is related to the process not having CAP_IPC_LOCK - at least from the host kernel perspective.
This is strange because:
- the same VM running directly from QEMU command line works - the same VM running in the system Libvirt (v4.0.0, Ubuntu version) also works
What am I missing? My understanding on Linux process is that a process running as root should inherit the same capabilities of the user, which includes CAP_IPC_LOCK. Running Libvirt from source code should grant ipc_lock to it ... right?
Any help is appreciated. I can provide more details (VM XML for example) if necessary.
Thanks!
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

Hey Erik, On 2/4/19 8:11 AM, Erik Skultety wrote:
On Fri, Feb 01, 2019 at 07:40:36PM -0200, Daniel Henrique Barboza wrote:
Update: I've figured it out.
The bug here was that, even running as root, I was getting errors like:
error : virQEMUCapsNewForBinaryInternal:4687 : internal error: Failed to probe QEMU binary with QMP: libvirt: error : prctl failed to enable 'dac_override' in the AMBIENT set: Operation not permitted Being responsible for the latest changes wrt to capabilities, this error itself is very strange because the prctl man page says the following about EPERM errno:
"option is PR_CAP_AMBIENT and arg2 is PR_CAP_AMBIENT_RAISE, but either the capability specified in arg3 is not present in the process's permitted and inheritable capability sets, or the PR_CAP_AMBIENT_LOWER securebit has been set."
So I'm wondering how can that be since that prctl call happens after we applied the capabilities we want with capng_apply. Just out of curiosity, what happens if you move the whole PR_CAP_AMBIENT at the very end of virSetUIDGIDWithCaps function? Does it change anything?
Moving the code as you suggested got rid of the internal error: --- a/src/util/virutil.c +++ b/src/util/virutil.c @@ -1587,27 +1587,6 @@ virSetUIDGIDWithCaps(uid_t uid, gid_t gid, gid_t *groups, int ngroups, goto cleanup; } -# ifdef PR_CAP_AMBIENT - /* we couldn't do this in the loop earlier above, because the capabilities - * were not applied yet, since in order to add a capability into the AMBIENT - * set, it has to be present in both the PERMITTED and INHERITABLE sets - * (capabilities(7)) - */ - for (i = 0; i <= CAP_LAST_CAP; i++) { - capstr = capng_capability_to_name(i); - - if (capBits & (1ULL << i)) { - if (prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_RAISE, i, 0, 0) < 0) { - virReportSystemError(errno, - _("prctl failed to enable '%s' in the " - "AMBIENT set"), - capstr); - goto cleanup; - } - } - } -# endif - /* Set bounding set while we have CAP_SETPCAP. Unfortunately we cannot * do this if we failed to get the capability above, so ignore the * return value. @@ -1630,6 +1609,27 @@ virSetUIDGIDWithCaps(uid_t uid, gid_t gid, gid_t *groups, int ngroups, goto cleanup; } +# ifdef PR_CAP_AMBIENT + /* we couldn't do this in the loop earlier above, because the capabilities + * were not applied yet, since in order to add a capability into the AMBIENT + * set, it has to be present in both the PERMITTED and INHERITABLE sets + * (capabilities(7)) + */ + for (i = 0; i <= CAP_LAST_CAP; i++) { + capstr = capng_capability_to_name(i); + + if (capBits & (1ULL << i)) { + if (prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_RAISE, i, 0, 0) < 0) { + virReportSystemError(errno, + _("prctl failed to enable '%s' in the " + "AMBIENT set"), + capstr); + goto cleanup; + } + } + } +# endif + However, this code still doesn't add IPC_LOCK as capability: index 0d58f1ee57..f4b46abc08 100644 --- a/src/util/virutil.c +++ b/src/util/virutil.c +++ b/src/qemu/qemu_capabilities.c @@ -4525,6 +4525,9 @@ virQEMUCapsInitQMPCommandRun(virQEMUCapsInitQMPCommandPtr cmd, /* QEMU might run into permission issues, e.g. /dev/sev (0600), override * them just for the purpose of probing */ virCommandAllowCap(cmd->cmd, CAP_DAC_OVERRIDE); + virCommandAllowCap(cmd->cmd, CAP_IPC_LOCK); + virCommandAllowCap(cmd->cmd, CAP_IPC_OWNER); + #endif So I am not sure if my mod above is wrong or your suggestion of moving the PR_CAP_AMBIENT code made the warning go away but isn't setting the capabilities at all. I'll investigate it more. DHB
Thanks, Erik
The reason is that the host has libcap-ng installed. ./configure uses it if available, setting WITH_CAPNG in the code. I am unsure if this has something to do with the libcap-ng configuration in this system I'm using or if there is something missing in the Libvirt code, but the spawned QEMU process isn't inheriting the capabilities it should have.
Disabling support of this lib with "--with-capng=no" in autogen.sh and rebuilding Libvirt fixed the problem. I was even able to see more NUMA nodes than I was before using the system libvirt (which is the original bug I am/was investigating).
Thanks!
Hi,
I'm facing a strange behavior when running Libvirt from source code, latest upstream, on an Ubuntu 18.04.1 LTS Power 9 server. My QEMU guest - which is using VFIO and GPU passthrough - breaks on boot when trying to allocate a DMA window inside KVM.
Debugging the code, I've found out that the problem is related to the process not having CAP_IPC_LOCK - at least from the host kernel perspective.
This is strange because:
- the same VM running directly from QEMU command line works - the same VM running in the system Libvirt (v4.0.0, Ubuntu version) also works
What am I missing? My understanding on Linux process is that a process running as root should inherit the same capabilities of the user, which includes CAP_IPC_LOCK. Running Libvirt from source code should grant ipc_lock to it ... right?
Any help is appreciated. I can provide more details (VM XML for example) if necessary.
Thanks! --
On 2/1/19 4:04 PM, Daniel Henrique Barboza wrote: libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

Hi Erik, Just to let you know that the error I reported in one of my replies was being caused by one change I forgot to undo. This error here: error : virQEMUCapsNewForBinaryInternal:4687 : internal error: Failed to probe QEMU binary with QMP: libvirt: error : prctl failed to enable 'dac_override' in the AMBIENT set: Operation not permitted was happening because I have commented out this line inside qemu_capabilities.c: --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -4519,7 +4519,7 @@ virQEMUCapsInitQMPCommandRun(virQEMUCapsInitQMPCommandPtr cmd, "-daemonize", NULL); virCommandAddEnvPassCommon(cmd->cmd); - virCommandClearCaps(cmd->cmd); + // virCommandClearCaps(cmd->cmd); #if WITH_CAPNG /* QEMU might run into permission issues, e.g. /dev/sev (0600), override Thus there is no need to move the PR_CAP_AMBIENT around to prevent the error message. Sorry for any alarms I might have raised there. I'm still experiencing the issue with IPC_LOCK inside the guest though. I'll update here when I have concrete findings about it. Thanks, DHB On 2/4/19 4:26 PM, Daniel Henrique Barboza wrote:
Hey Erik,
On 2/4/19 8:11 AM, Erik Skultety wrote:
On Fri, Feb 01, 2019 at 07:40:36PM -0200, Daniel Henrique Barboza wrote:
Update: I've figured it out.
The bug here was that, even running as root, I was getting errors like:
error : virQEMUCapsNewForBinaryInternal:4687 : internal error: Failed to probe QEMU binary with QMP: libvirt: error : prctl failed to enable 'dac_override' in the AMBIENT set: Operation not permitted Being responsible for the latest changes wrt to capabilities, this error itself is very strange because the prctl man page says the following about EPERM errno:
"option is PR_CAP_AMBIENT and arg2 is PR_CAP_AMBIENT_RAISE, but either the capability specified in arg3 is not present in the process's permitted and inheritable capability sets, or the PR_CAP_AMBIENT_LOWER securebit has been set."
So I'm wondering how can that be since that prctl call happens after we applied the capabilities we want with capng_apply. Just out of curiosity, what happens if you move the whole PR_CAP_AMBIENT at the very end of virSetUIDGIDWithCaps function? Does it change anything?
Moving the code as you suggested got rid of the internal error:
--- a/src/util/virutil.c +++ b/src/util/virutil.c @@ -1587,27 +1587,6 @@ virSetUIDGIDWithCaps(uid_t uid, gid_t gid, gid_t *groups, int ngroups, goto cleanup; }
-# ifdef PR_CAP_AMBIENT - /* we couldn't do this in the loop earlier above, because the capabilities - * were not applied yet, since in order to add a capability into the AMBIENT - * set, it has to be present in both the PERMITTED and INHERITABLE sets - * (capabilities(7)) - */ - for (i = 0; i <= CAP_LAST_CAP; i++) { - capstr = capng_capability_to_name(i); - - if (capBits & (1ULL << i)) { - if (prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_RAISE, i, 0, 0) < 0) { - virReportSystemError(errno, - _("prctl failed to enable '%s' in the " - "AMBIENT set"), - capstr); - goto cleanup; - } - } - } -# endif - /* Set bounding set while we have CAP_SETPCAP. Unfortunately we cannot * do this if we failed to get the capability above, so ignore the * return value. @@ -1630,6 +1609,27 @@ virSetUIDGIDWithCaps(uid_t uid, gid_t gid, gid_t *groups, int ngroups, goto cleanup; }
+# ifdef PR_CAP_AMBIENT + /* we couldn't do this in the loop earlier above, because the capabilities + * were not applied yet, since in order to add a capability into the AMBIENT + * set, it has to be present in both the PERMITTED and INHERITABLE sets + * (capabilities(7)) + */ + for (i = 0; i <= CAP_LAST_CAP; i++) { + capstr = capng_capability_to_name(i); + + if (capBits & (1ULL << i)) { + if (prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_RAISE, i, 0, 0) < 0) { + virReportSystemError(errno, + _("prctl failed to enable '%s' in the " + "AMBIENT set"), + capstr); + goto cleanup; + } + } + } +# endif +
However, this code still doesn't add IPC_LOCK as capability:
index 0d58f1ee57..f4b46abc08 100644 --- a/src/util/virutil.c +++ b/src/util/virutil.c +++ b/src/qemu/qemu_capabilities.c @@ -4525,6 +4525,9 @@ virQEMUCapsInitQMPCommandRun(virQEMUCapsInitQMPCommandPtr cmd, /* QEMU might run into permission issues, e.g. /dev/sev (0600), override * them just for the purpose of probing */ virCommandAllowCap(cmd->cmd, CAP_DAC_OVERRIDE); + virCommandAllowCap(cmd->cmd, CAP_IPC_LOCK); + virCommandAllowCap(cmd->cmd, CAP_IPC_OWNER); + #endif
So I am not sure if my mod above is wrong or your suggestion of moving the PR_CAP_AMBIENT code made the warning go away but isn't setting the capabilities at all. I'll investigate it more.
DHB
Thanks, Erik
The reason is that the host has libcap-ng installed. ./configure uses it if available, setting WITH_CAPNG in the code. I am unsure if this has something to do with the libcap-ng configuration in this system I'm using or if there is something missing in the Libvirt code, but the spawned QEMU process isn't inheriting the capabilities it should have.
Disabling support of this lib with "--with-capng=no" in autogen.sh and rebuilding Libvirt fixed the problem. I was even able to see more NUMA nodes than I was before using the system libvirt (which is the original bug I am/was investigating).
Thanks!
Hi,
I'm facing a strange behavior when running Libvirt from source code, latest upstream, on an Ubuntu 18.04.1 LTS Power 9 server. My QEMU guest - which is using VFIO and GPU passthrough - breaks on boot when trying to allocate a DMA window inside KVM.
Debugging the code, I've found out that the problem is related to the process not having CAP_IPC_LOCK - at least from the host kernel perspective.
This is strange because:
- the same VM running directly from QEMU command line works - the same VM running in the system Libvirt (v4.0.0, Ubuntu version) also works
What am I missing? My understanding on Linux process is that a process running as root should inherit the same capabilities of the user, which includes CAP_IPC_LOCK. Running Libvirt from source code should grant ipc_lock to it ... right?
Any help is appreciated. I can provide more details (VM XML for example) if necessary.
Thanks! --
On 2/1/19 4:04 PM, Daniel Henrique Barboza wrote: libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Mon, Feb 04, 2019 at 08:44:21PM -0200, Daniel Henrique Barboza wrote:
Hi Erik,
Just to let you know that the error I reported in one of my replies was being caused by one change I forgot to undo. This error here:
error : virQEMUCapsNewForBinaryInternal:4687 : internal error: Failed to probe QEMU binary with QMP: libvirt: error : prctl failed to enable 'dac_override' in the AMBIENT set: Operation not permitted
was happening because I have commented out this line inside qemu_capabilities.c:
--- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -4519,7 +4519,7 @@ virQEMUCapsInitQMPCommandRun(virQEMUCapsInitQMPCommandPtr cmd, "-daemonize", NULL); virCommandAddEnvPassCommon(cmd->cmd); - virCommandClearCaps(cmd->cmd); + // virCommandClearCaps(cmd->cmd);
#if WITH_CAPNG /* QEMU might run into permission issues, e.g. /dev/sev (0600), override
Thus there is no need to move the PR_CAP_AMBIENT around to prevent the error message. Sorry for any alarms I might have raised there.
Well, at least it doesn't create even more confusion :). Still, you said you were running libvirtd root and yet you experience the error which I still don't understand, it's like capng_apply failed on Power and therefore you don't have DAC_OVERRIDE in any of the sets which prctl relies on OR prctl's constraints differ from x86 and indeed having a cap in the PERMITTED && INHERITED sets is not enough. I'm sorry I can't help more here as I don't have a Power machine.
I'm still experiencing the issue with IPC_LOCK inside the guest though. I'll update here when I have concrete findings about it.
This is tricky, ideally, we wouldn't want every QEMU to have the IPC_LOCK capability (IMHO none of them should), but IIUC we don't know where the device's memory will be mapped and therefore from DMA's POV we don't know which regions to lock (we can only lock the whole memory + 1GB for any future VFIO devices), am I right? Erik

On Mon, Feb 04, 2019 at 08:44:21PM -0200, Daniel Henrique Barboza wrote:
Hi Erik,
Just to let you know that the error I reported in one of my replies was being caused by one change I forgot to undo. This error here:
error : virQEMUCapsNewForBinaryInternal:4687 : internal error: Failed to probe QEMU binary with QMP: libvirt: error : prctl failed to enable 'dac_override' in the AMBIENT set: Operation not permitted
was happening because I have commented out this line inside qemu_capabilities.c:
--- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -4519,7 +4519,7 @@ virQEMUCapsInitQMPCommandRun(virQEMUCapsInitQMPCommandPtr cmd, "-daemonize", NULL); virCommandAddEnvPassCommon(cmd->cmd); - virCommandClearCaps(cmd->cmd); + // virCommandClearCaps(cmd->cmd);
#if WITH_CAPNG /* QEMU might run into permission issues, e.g. /dev/sev (0600), override
Thus there is no need to move the PR_CAP_AMBIENT around to prevent the error message. Sorry for any alarms I might have raised there.
I'm still experiencing the issue with IPC_LOCK inside the guest though. I'll update here when I have concrete findings about it.
Any use of capabilities "inside the guest" is not libvirt's responsibility. It only cares about capabilities on the *host* OS used by QEMU. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 2/5/19 8:32 AM, Daniel P. Berrangé wrote:
On Mon, Feb 04, 2019 at 08:44:21PM -0200, Daniel Henrique Barboza wrote:
I'm still experiencing the issue with IPC_LOCK inside the guest though. I'll update here when I have concrete findings about it.
Any use of capabilities "inside the guest" is not libvirt's responsibility. It only cares about capabilities on the *host* OS used by QEMU.
I've used poor wording there. The issue I was referring to is the lack of IPC_LOCK of the QEMU process, making the host KVM failing to allocate extra mem pages for the guest process. Thanks, DHB
Regards, Daniel

On 2/1/19 7:04 PM, Daniel Henrique Barboza wrote:
Hi,
I'm facing a strange behavior when running Libvirt from source code, latest upstream, on an Ubuntu 18.04.1 LTS Power 9 server. My QEMU guest - which is using VFIO and GPU passthrough - breaks on boot when trying to allocate a DMA window inside KVM.
Debugging the code, I've found out that the problem is related to the process not having CAP_IPC_LOCK - at least from the host kernel perspective.
This is strange because:
- the same VM running directly from QEMU command line works - the same VM running in the system Libvirt (v4.0.0, Ubuntu version) also works
What am I missing? My understanding on Linux process is that a process running as root should inherit the same capabilities of the user, which includes CAP_IPC_LOCK. Running Libvirt from source code should grant ipc_lock to it ... right?
No. Ideally, you trust libvirt and want it to manage devices on your system thus it needs all the capabilities. But qemu spawn by libvirt should have no capabilities as libvirt set up everything that's needed for qemu to run. But this is hard to get right - qemu changes and so does the capabilities it may require (these depend on domain configuration anyway). Therefore, it is possible to set libvirt so it does not drop capabilities for qemu process - see clear_emulator_capabilities in qemu.conf - but then libvirt can't guarantee that a compromised qemu does no harm. This corresponds with your finding about ./configure - if there is no libncap-ng found there's no way for libvirt to drop capabilities and thus it doesn't do that. Michal

On Mon, Feb 4, 2019 at 9:59 AM Michal Privoznik <mprivozn@redhat.com> wrote:
On 2/1/19 7:04 PM, Daniel Henrique Barboza wrote:
Hi,
I'm facing a strange behavior when running Libvirt from source code, latest upstream, on an Ubuntu 18.04.1 LTS Power 9 server. My QEMU guest - which is using VFIO and GPU passthrough - breaks on boot when trying to allocate a DMA window inside KVM.
Debugging the code, I've found out that the problem is related to the process not having CAP_IPC_LOCK - at least from the host kernel perspective.
This is strange because:
- the same VM running directly from QEMU command line works - the same VM running in the system Libvirt (v4.0.0, Ubuntu version) also works
FYI Ubuntu's version is build with capng for the cap dropping feature as discussed in this thread. libcap-ng-dev is installed as build dependencies and "--with-capng" is set on all Linux builds.
What am I missing? My understanding on Linux process is that a process running as root should inherit the same capabilities of the user, which includes CAP_IPC_LOCK. Running Libvirt from source code should grant ipc_lock to it ... right?
No. Ideally, you trust libvirt and want it to manage devices on your system thus it needs all the capabilities. But qemu spawn by libvirt should have no capabilities as libvirt set up everything that's needed for qemu to run. But this is hard to get right - qemu changes and so does the capabilities it may require (these depend on domain configuration anyway). Therefore, it is possible to set libvirt so it does not drop capabilities for qemu process - see clear_emulator_capabilities in qemu.conf - but then libvirt can't guarantee that a compromised qemu does no harm.
This corresponds with your finding about ./configure - if there is no libncap-ng found there's no way for libvirt to drop capabilities and thus it doesn't do that.
Michal
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
-- Christian Ehrhardt Software Engineer, Ubuntu Server Canonical Ltd

On 2/4/19 7:48 AM, Christian Ehrhardt wrote:
On 2/1/19 7:04 PM, Daniel Henrique Barboza wrote:
Hi,
I'm facing a strange behavior when running Libvirt from source code, latest upstream, on an Ubuntu 18.04.1 LTS Power 9 server. My QEMU guest - which is using VFIO and GPU passthrough - breaks on boot when trying to allocate a DMA window inside KVM.
Debugging the code, I've found out that the problem is related to the process not having CAP_IPC_LOCK - at least from the host kernel perspective.
This is strange because:
- the same VM running directly from QEMU command line works - the same VM running in the system Libvirt (v4.0.0, Ubuntu version) also works FYI Ubuntu's version is build with capng for the cap dropping feature as discussed in this thread.
On Mon, Feb 4, 2019 at 9:59 AM Michal Privoznik <mprivozn@redhat.com> wrote: libcap-ng-dev is installed as build dependencies and "--with-capng" is set on all Linux builds.
Good to know. "jus disable ligcap-ng support" wasn't a solution that I was going to propose, but with this info I can explain to other people why this isn't viable at all.
What am I missing? My understanding on Linux process is that a process running as root should inherit the same capabilities of the user, which includes CAP_IPC_LOCK. Running Libvirt from source code should grant ipc_lock to it ... right? No. Ideally, you trust libvirt and want it to manage devices on your system thus it needs all the capabilities. But qemu spawn by libvirt should have no capabilities as libvirt set up everything that's needed for qemu to run. But this is hard to get right - qemu changes and so does the capabilities it may require (these depend on domain configuration anyway). Therefore, it is possible to set libvirt so it does not drop capabilities for qemu process - see clear_emulator_capabilities in qemu.conf - but then libvirt can't guarantee that a compromised qemu does no harm.
Thanks for the explanation. What I need to do then is to see exactly the capabilities Iam missing from QEMU (aside from IPC_LOCK) and see if we can get Libvirt to set them. Thanks, DHB
This corresponds with your finding about ./configure - if there is no libncap-ng found there's no way for libvirt to drop capabilities and thus it doesn't do that.
Michal
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Mon, Feb 04, 2019 at 09:58:47AM +0100, Michal Privoznik wrote:
On 2/1/19 7:04 PM, Daniel Henrique Barboza wrote:
Hi,
I'm facing a strange behavior when running Libvirt from source code, latest upstream, on an Ubuntu 18.04.1 LTS Power 9 server. My QEMU guest - which is using VFIO and GPU passthrough - breaks on boot when trying to allocate a DMA window inside KVM.
Debugging the code, I've found out that the problem is related to the process not having CAP_IPC_LOCK - at least from the host kernel perspective.
This is strange because:
- the same VM running directly from QEMU command line works - the same VM running in the system Libvirt (v4.0.0, Ubuntu version) also works
What am I missing? My understanding on Linux process is that a process running as root should inherit the same capabilities of the user, which includes CAP_IPC_LOCK. Running Libvirt from source code should grant ipc_lock to it ... right?
No. Ideally, you trust libvirt and want it to manage devices on your system thus it needs all the capabilities. But qemu spawn by libvirt should have no capabilities as libvirt set up everything that's needed for qemu to run. But this is hard to get right - qemu changes and so does the capabilities it may require (these depend on domain configuration anyway). Therefore, it is possible to set libvirt so it does not drop capabilities for qemu process - see clear_emulator_capabilities in qemu.conf - but then libvirt can't guarantee that a compromised qemu does no harm.
In my case it is not a matter of risk of a malicious guest, it is that the device cannot utilize the host device *unless* it has the 'lock' capability.
This corresponds with your finding about ./configure - if there is no libncap-ng found there's no way for libvirt to drop capabilities and thus it doesn't do that.
Michal
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

+ Kamal and Marcel On Mon, Feb 04, 2019 at 09:58:47AM +0100, Michal Privoznik wrote:
On 2/1/19 7:04 PM, Daniel Henrique Barboza wrote:
Hi,
I'm facing a strange behavior when running Libvirt from source code, latest upstream, on an Ubuntu 18.04.1 LTS Power 9 server. My QEMU guest - which is using VFIO and GPU passthrough - breaks on boot when trying to allocate a DMA window inside KVM.
Debugging the code, I've found out that the problem is related to the process not having CAP_IPC_LOCK - at least from the host kernel perspective.
This is strange because:
- the same VM running directly from QEMU command line works - the same VM running in the system Libvirt (v4.0.0, Ubuntu version) also works
What am I missing? My understanding on Linux process is that a process running as root should inherit the same capabilities of the user, which includes CAP_IPC_LOCK. Running Libvirt from source code should grant ipc_lock to it ... right?
No. Ideally, you trust libvirt and want it to manage devices on your system thus it needs all the capabilities. But qemu spawn by libvirt should have no capabilities as libvirt set up everything that's needed for qemu to run. But this is hard to get right - qemu changes and so does the capabilities it may require (these depend on domain configuration anyway). Therefore, it is possible to set libvirt so it does not drop capabilities for qemu process - see clear_emulator_capabilities in qemu.conf - but then libvirt can't guarantee that a compromised qemu does no harm.
In my case it is not a matter of risk of a malicious guest, it is that the device cannot utilize the host device *unless* it has the 'lock' capability.
This corresponds with your finding about ./configure - if there is no libncap-ng found there's no way for libvirt to drop capabilities and thus it doesn't do that.
Michal
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

Hi, After investigating more I've fixed the problem in a way that I believe it's worth a patch. I'll be sending it to the ML shortly. Thanks everyone for the inputs and insights, DHB On 2/4/19 10:59 AM, Yuval Shaia wrote:
+ Kamal and Marcel
On Mon, Feb 04, 2019 at 09:58:47AM +0100, Michal Privoznik wrote:
Hi,
I'm facing a strange behavior when running Libvirt from source code, latest upstream, on an Ubuntu 18.04.1 LTS Power 9 server. My QEMU guest - which is using VFIO and GPU passthrough - breaks on boot when trying to allocate a DMA window inside KVM.
Debugging the code, I've found out that the problem is related to the process not having CAP_IPC_LOCK - at least from the host kernel perspective.
This is strange because:
- the same VM running directly from QEMU command line works - the same VM running in the system Libvirt (v4.0.0, Ubuntu version) also works
What am I missing? My understanding on Linux process is that a process running as root should inherit the same capabilities of the user, which includes CAP_IPC_LOCK. Running Libvirt from source code should grant ipc_lock to it ... right? No. Ideally, you trust libvirt and want it to manage devices on your system
On 2/1/19 7:04 PM, Daniel Henrique Barboza wrote: thus it needs all the capabilities. But qemu spawn by libvirt should have no capabilities as libvirt set up everything that's needed for qemu to run. But this is hard to get right - qemu changes and so does the capabilities it may require (these depend on domain configuration anyway). Therefore, it is possible to set libvirt so it does not drop capabilities for qemu process - see clear_emulator_capabilities in qemu.conf - but then libvirt can't guarantee that a compromised qemu does no harm. In my case it is not a matter of risk of a malicious guest, it is that the device cannot utilize the host device *unless* it has the 'lock' capability.
This corresponds with your finding about ./configure - if there is no libncap-ng found there's no way for libvirt to drop capabilities and thus it doesn't do that.
Michal
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
participants (6)
-
Christian Ehrhardt
-
Daniel Henrique Barboza
-
Daniel P. Berrangé
-
Erik Skultety
-
Michal Privoznik
-
Yuval Shaia