On Thu, Nov 01, 2018 at 09:31 AM +0100, Martin Kletzander <mkletzan(a)redhat.com>
wrote:
On Tue, Oct 30, 2018 at 03:07:59PM +0000, Daniel P. Berrangé wrote:
>On Tue, Oct 30, 2018 at 03:45:36PM +0100, Michal Privoznik wrote:
>> On 10/30/2018 02:46 PM, Michal Privoznik wrote:
>> > On 10/30/2018 01:55 PM, Daniel P. Berrangé wrote:
>> >> On Tue, Oct 30, 2018 at 10:32:08AM +0000, Daniel P. Berrangé wrote:
>> >>> On Tue, Oct 30, 2018 at 11:08:45AM +0100, Michal Privoznik wrote:
>> >>>> On 10/30/2018 10:35 AM, Daniel P. Berrangé wrote:
>> >>>>> On Tue, Oct 30, 2018 at 09:13:50AM +0100, Michal Privoznik
wrote:
>> >>>>>> On 10/29/2018 06:34 PM, Marc Hartmayer wrote:
>> >>>>>>> Introduce caching whether /dev/kvm is usable as the
QEMU user:QEMU
>> >>>>>>> group. This reduces the overhead of the QEMU
capabilities cache
>> >>>>>>> lookup. Before this patch there were many fork()
calls used for
>> >>>>>>> checking whether /dev/kvm is accessible. Now we
store the result
>> >>>>>>> whether /dev/kvm is accessible or not and we only
need to re-run the
>> >>>>>>> virFileAccessibleAs check if the ctime of /dev/kvm
has changed.
>> >>>>>>>
>> >>>>>>> Suggested-by: Daniel P. Berrangé
<berrange(a)redhat.com>
>> >>>>>>> Signed-off-by: Marc Hartmayer
<mhartmay(a)linux.ibm.com>
>> >>>>>>> ---
>> >>>>>>> src/qemu/qemu_capabilities.c | 54
++++++++++++++++++++++++++++++++++--
>> >>>>>>> 1 file changed, 52 insertions(+), 2 deletions(-)
>> >>>>>>>
>> >>>>>>> diff --git a/src/qemu/qemu_capabilities.c
b/src/qemu/qemu_capabilities.c
>> >>>>>>> index e228f52ec0bb..85516954149b 100644
>> >>>>>>> --- a/src/qemu/qemu_capabilities.c
>> >>>>>>> +++ b/src/qemu/qemu_capabilities.c
>> >>>>>>> @@ -3238,6 +3238,10 @@ struct _virQEMUCapsCachePriv
{
>> >>>>>>> virArch hostArch;
>> >>>>>>> unsigned int microcodeVersion;
>> >>>>>>> char *kernelVersion;
>> >>>>>>> +
>> >>>>>>> + /* cache whether /dev/kvm is usable as
runUid:runGuid */
>> >>>>>>> + virTristateBool kvmUsable;
>> >>>>>>> + time_t kvmCtime;
>> >>>>>>> };
>> >>>>>>> typedef struct _virQEMUCapsCachePriv
virQEMUCapsCachePriv;
>> >>>>>>> typedef virQEMUCapsCachePriv
*virQEMUCapsCachePrivPtr;
>> >>>>>>> @@ -3824,6 +3828,52 @@ virQEMUCapsSaveFile(void
*data,
>> >>>>>>> }
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> +/* Determine whether '/dev/kvm' is usable
as QEMU user:QEMU group. */
>> >>>>>>> +static bool
>> >>>>>>> +virQEMUCapsKVMUsable(virQEMUCapsCachePrivPtr priv)
>> >>>>>>> +{
>> >>>>>>> + struct stat sb;
>> >>>>>>> + static const char *kvm_device =
"/dev/kvm";
>> >>>>>>> + virTristateBool value;
>> >>>>>>> + virTristateBool cached_value =
priv->kvmUsable;
>> >>>>>>> + time_t kvm_ctime;
>> >>>>>>> + time_t cached_kvm_ctime = priv->kvmCtime;
>> >>>>>>> +
>> >>>>>>> + if (stat(kvm_device, &sb) < 0) {
>> >>>>>>> + virReportSystemError(errno,
>> >>>>>>> + _("Failed to stat
%s"), kvm_device);
>> >>>>>>> + return false;
>> >>>>>>> + }
>> >>>>>>> + kvm_ctime = sb.st_ctime;
>> >>>>>>
>> >>>>>> This doesn't feel right. /dev/kvm ctime is changed
every time qemu is
>> >>>>>> started or powered off (try running stat over it before
and after a
>> >>>>>> domain is started/shut off). So effectively we will fork
more often than
>> >>>>>> we would think. Should we cache inode number instead?
Because for all
>> >>>>>> that we care is simply if the file is there.
>> >>>>>
>> >>>>> Urgh, that is a bit strange and not the usual semantics for
timestamps :-(
>> >>>>
>> >>>> Indeed.
>> >>>>
>> >>>>>
>> >>>>> We can't stat the inode - the code was explicitly trying
to cope with the
>> >>>>> way /dev/kvm can change permissions when udev rules get
applied. We would
>> >>>>> have to compare the user, group, permissions mask and even
ACL, or a hash
>> >>>>> of those.
>> >>>>
>> >>>> Well, we can use ctime as suggested and post a patch for kernel
to fix
>> >>>> ctime behaviour. Until the patch is merged our behaviour would
be
>> >>>> suboptimal, but still better than it is now.
>> >>>
>> >>> I guess lets talk to KVM team for their input on this and then
decide
>> >>> what todo.
>> >>
>> >> Hmm, I wonder if it is not actually a kernel problem, but rather
something
>> >> in userspace genuinely touching the device in a way that caues these
>> >> timestamps to be updated.
>> >>
>> >
>> > It is kernel problem. In my testing, the moment I call:
>> >
>> > ioctl(kvm, KVM_CREATE_VM, 0);
>>
>> Okay, I have to retract this claim. 'udevadm monitor' shows some events:
>>
>> KERNEL[3631.129645] change /devices/virtual/misc/kvm (misc)
>> UDEV [3631.130816] change /devices/virtual/misc/kvm (misc)
>>
>> and stopping udevd leaves all three times untouched. So it is udev after
>> all. I just don't know how to find the rule that is causing the issue.
>> Anyway, as for this patch, I think we can merge it in the end, can't we?
>
>Not really. Udev is in use everywhere, so this behaviour makes the
>patch useless in practice, even though it is technically right in
>theory :-(
>
Does it? With this behaviour we still do the "expensive" work after any
machine
has started. But for one machine starting it still has the effect of running it
only once. And we *need* to run it for each machine unless we also cache the
result per (at least) user:group of that machine as every machine can run under
different user:group.
How can you run a machine/QEMU VM under a different user:group other
than changing the user:group in qemu.conf and restart/reload libvirtd?
As soon as a VM is running we have not to verify /dev/kvm access, no?
(so there should be no problem when libvirtd tries to “reconnect” to
already running VMs).
I'll go through the patch again (just skimmed it the first time), but I think
this actually still makes it better (although not perfect). I wonder why the
udev rule does not fire only on change (why doesn't it say
ACTION=="change").
>Regards,
>Daniel
>--
>|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
>|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
>|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|
>
>--
>libvir-list mailing list
>libvir-list(a)redhat.com
>https://www.redhat.com/mailman/listinfo/libvir-list
--
Kind regards / Beste Grüße
Marc Hartmayer
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294