[libvirt] [PATCH] qemu: turn on virtlockd by default

We have had virtlockd available for a long time now but have always defaulted to the 'nop' lock driver which does no locking. This gives users an unsafe deployment by default unless they know to turn on lockd. virtlockd will auto-activate via systemd when guests launch, so setting it on by default will only cause upgrade pain for people on non-systemd distros. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/qemu/qemu.conf | 3 ++- src/qemu/qemu_driver.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu.conf b/src/qemu/qemu.conf index 4fa5e8a..73f4a0a 100644 --- a/src/qemu/qemu.conf +++ b/src/qemu/qemu.conf @@ -421,7 +421,8 @@ # share one writable disk, libvirt offers two approaches for # locking files. The first one is sanlock, the other one, # virtlockd, is then our own implementation. Accepted values -# are "sanlock" and "lockd". +# are "sanlock", "lockd" and "nop", with "lockd" being the +# default. # #lock_manager = "lockd" diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index e46404b..36fb047 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -728,7 +728,7 @@ qemuStateInitialize(bool privileged, if (!(qemu_driver->lockManager = virLockManagerPluginNew(cfg->lockManagerName ? - cfg->lockManagerName : "nop", + cfg->lockManagerName : "lockd", "qemu", cfg->configBaseDir, 0))) -- 2.5.0

On Fri, Jan 22, 2016 at 03:56:08PM +0000, Daniel P. Berrange wrote:
We have had virtlockd available for a long time now but have always defaulted to the 'nop' lock driver which does no locking. This gives users an unsafe deployment by default unless they know to turn on lockd. virtlockd will auto-activate via systemd when guests launch, so setting it on by default will only cause upgrade pain for people on non-systemd distros.
I tried to test this patch in the below way after applying to my local Git master, and my test seems to fail. Compile and `virtlockd` setup ----------------------------- After applying the patch, I'm here: $ git describe v1.3.1-34-gd3a39cd Compile: $ cd ~/build/libvirt $ ~/src/libvirt/./autgen.sh --system && make -j8 Stop the system libvirtd and virtlockd: $ systemctl stop libvirtd $ systemctl stop virtlockd Enable `virtlockd` for the QEMU driver: $ augtool -s set /files/etc/libvirt/qemu.conf/lock_manager lockd Invoke just compiled virtlockd and libvirtd (in that order): $ sudo ./run src/virtlockd & $ sudo ./run daemon/libvirtd & Test ---- Purposefully make two libvirt VMs point to the same disk image: $ sudo ./run tools/virsh dumpxml cvm1 | grep 'source file' <source file='/var/lib/libvirt/images/cirros-0.3.3-x86_64-disk.img'/> $ sudo ./run tools/virsh dumpxml cvm2 | grep 'source file' <source file='/var/lib/libvirt/images/cirros-0.3.3-x86_64-disk.img'/> Start the first VM (with `virsh` from the build dir), it fails: $ sudo ./run tools/virsh start cvm1 error: Failed to start domain cvm1 error: Failed to connect socket to '/var/run/libvirt/virtlogd-sock': No such file or directory However, the virtlogd-sock file exists in the said path: $ ls -lsrt /var/run/libvirt/virtlockd-sock 0 srwx------. 1 root root 0 Jan 22 19:49 /var/run/libvirt/virtlockd-sock What else am I missing? -- /kashyap

On 01/22/2016 02:35 PM, Kashyap Chamarthy wrote:
On Fri, Jan 22, 2016 at 03:56:08PM +0000, Daniel P. Berrange wrote:
We have had virtlockd available for a long time now but have always defaulted to the 'nop' lock driver which does no locking. This gives users an unsafe deployment by default unless they know to turn on lockd. virtlockd will auto-activate via systemd when guests launch, so setting it on by default will only cause upgrade pain for people on non-systemd distros.
I tried to test this patch in the below way after applying to my local Git master, and my test seems to fail.
Compile and `virtlockd` setup -----------------------------
After applying the patch, I'm here:
$ git describe v1.3.1-34-gd3a39cd
Compile:
$ cd ~/build/libvirt $ ~/src/libvirt/./autgen.sh --system && make -j8
Stop the system libvirtd and virtlockd:
$ systemctl stop libvirtd $ systemctl stop virtlockd
Enable `virtlockd` for the QEMU driver:
$ augtool -s set /files/etc/libvirt/qemu.conf/lock_manager lockd
Invoke just compiled virtlockd and libvirtd (in that order):
$ sudo ./run src/virtlockd & $ sudo ./run daemon/libvirtd &
Test ----
Purposefully make two libvirt VMs point to the same disk image:
$ sudo ./run tools/virsh dumpxml cvm1 | grep 'source file' <source file='/var/lib/libvirt/images/cirros-0.3.3-x86_64-disk.img'/> $ sudo ./run tools/virsh dumpxml cvm2 | grep 'source file' <source file='/var/lib/libvirt/images/cirros-0.3.3-x86_64-disk.img'/>
Start the first VM (with `virsh` from the build dir), it fails:
$ sudo ./run tools/virsh start cvm1 error: Failed to start domain cvm1 error: Failed to connect socket to '/var/run/libvirt/virtlogd-sock': No such file or directory
However, the virtlogd-sock file exists in the said path:
$ ls -lsrt /var/run/libvirt/virtlockd-sock 0 srwx------. 1 root root 0 Jan 22 19:49 /var/run/libvirt/virtlockd-sock
You're mixing up virtlogd and virtlockd :) virtlogd was added recently and should be started. But maybe there's another issue if it wasn't started automatically. - Cole

On Fri, Jan 22, 2016 at 03:35:33PM -0500, Cole Robinson wrote:
On 01/22/2016 02:35 PM, Kashyap Chamarthy wrote:
On Fri, Jan 22, 2016 at 03:56:08PM +0000, Daniel P. Berrange wrote:
We have had virtlockd available for a long time now but have always defaulted to the 'nop' lock driver which does no locking. This gives users an unsafe deployment by default unless they know to turn on lockd. virtlockd will auto-activate via systemd when guests launch, so setting it on by default will only cause upgrade pain for people on non-systemd distros.
I tried to test this patch in the below way after applying to my local Git master, and my test seems to fail.
My mistake - test was failing because I missed to start 'virtlogd' daemon _before_ libvirtd. (Thanks, Cole, for noticing that I mixed up 'virtlogd' and 'virtlockd'.) With that fixed, my same test succeeds[*]. ACK & Tested-By: Kashyap Chamarthy <kchamart@redhat.com> [*] Test (1) Purposefully make two libvirt VMs point to the same disk image: $ sudo ./run tools/virsh dumpxml cvm1 | grep 'source file' <source file='/var/lib/libvirt/images/cirros-0.3.3-x86_64-disk.img'/> $ sudo ./run tools/virsh dumpxml cvm2 | grep 'source file' <source file='/var/lib/libvirt/images/cirros-0.3.3-x86_64-disk.img'/> (2) Start the first guest (cvm1): $ sudo ./run tools/virsh start cvm1 Domain cvm1 started (3) Start the second guest (cvm2), the disk image is locked, and libvirt, correctly, refuses to start it: $ sudo ./run tools/virsh start cvm2 error: Failed to start domain cvm2 error: resource busy: Lockspace resource '/var/lib/libvirt/images/cirros-0.3.3-x86_64-disk.img' is locked And, an appropriate error is thrown in libvirt debug log: ... 2016-01-22 21:56:19.095+0000: 3008: error : virLockSpaceAcquireResource:640 : resource busy: Lockspace resource '/var/lib/libvirt/images/cirros-0.3.3-x86_64-disk.img' is locked ... [...]
Enable `virtlockd` for the QEMU driver:
$ augtool -s set /files/etc/libvirt/qemu.conf/lock_manager lockd
Invoke just compiled virtlockd and libvirtd (in that order):
$ sudo ./run src/virtlockd &
Noting for completeness, I missed a step here: starting the virtlogd daemon _before_ libvirtd $ sudo ./run src/virtlogd & (Just remembered that recently a patch (a8bb330) was merged that made it mandatory to have virtlogd start _before_ libvirtd, because: "In the non-systemd case, without socket activation, we need to proper ordering.")
$ sudo ./run daemon/libvirtd &
[...]
You're mixing up virtlogd and virtlockd :) virtlogd was added recently and should be started. But maybe there's another issue if it wasn't started automatically.
Duh, that's embarassing, you're of course right. Thanks for catching that. After starting the 'virtlogd' daemon, all is good. -- /kashyap

On Fri, Jan 22, 2016 at 03:56:08PM +0000, Daniel P. Berrange wrote:
We have had virtlockd available for a long time now but have always defaulted to the 'nop' lock driver which does no locking. This gives users an unsafe deployment by default unless they know to turn on lockd.
Does the default setup of virtlockd offer any safety? After looking at: https://bugzilla.redhat.com/show_bug.cgi?id=1191802 It seems that with dynamic_ownership enabled we first apply the security labels, then check the disk locks by calling virDomainLockProcessResume right before starting the CPUs in virDomainLockProcessResume. After that fails, resetting the uid:gid back to root:root cuts off the access to the disk for the already running domain. Is there a reason why the locks are checked so late? I assume this only ever worked for the case of migration with shared storage (and shared lockspace). Jan
virtlockd will auto-activate via systemd when guests launch, so setting it on by default will only cause upgrade pain for people on non-systemd distros.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/qemu/qemu.conf | 3 ++- src/qemu/qemu_driver.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-)

On Tue, Jan 26, 2016 at 02:28:56PM +0100, Ján Tomko wrote:
On Fri, Jan 22, 2016 at 03:56:08PM +0000, Daniel P. Berrange wrote:
We have had virtlockd available for a long time now but have always defaulted to the 'nop' lock driver which does no locking. This gives users an unsafe deployment by default unless they know to turn on lockd.
Does the default setup of virtlockd offer any safety?
After looking at: https://bugzilla.redhat.com/show_bug.cgi?id=1191802 It seems that with dynamic_ownership enabled we first apply the security labels, then check the disk locks by calling virDomainLockProcessResume right before starting the CPUs in virDomainLockProcessResume. After that fails, resetting the uid:gid back to root:root cuts off the access to the disk for the already running domain.
Is there a reason why the locks are checked so late?
I assume this only ever worked for the case of migration with shared storage (and shared lockspace).
NB, the virtlockd integration is intended to protect the *content* of the disk image from being written to by two processes at once. Protecting the image metadata not a goal, since that is something that should be dealt with by the security drivers. ie the security driver should be acquiring suitable locks of its own so that it does not block away in-use labels for other VMs. We've had patches floating around for the security drivers for a while, but never got as far as merging them yet. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Tue, Jan 26, 2016 at 01:47:02PM +0000, Daniel P. Berrange wrote:
On Tue, Jan 26, 2016 at 02:28:56PM +0100, Ján Tomko wrote:
On Fri, Jan 22, 2016 at 03:56:08PM +0000, Daniel P. Berrange wrote:
We have had virtlockd available for a long time now but have always defaulted to the 'nop' lock driver which does no locking. This gives users an unsafe deployment by default unless they know to turn on lockd.
Does the default setup of virtlockd offer any safety?
After looking at: https://bugzilla.redhat.com/show_bug.cgi?id=1191802 It seems that with dynamic_ownership enabled we first apply the security labels, then check the disk locks by calling virDomainLockProcessResume right before starting the CPUs in virDomainLockProcessResume. After that fails, resetting the uid:gid back to root:root cuts off the access to the disk for the already running domain.
Is there a reason why the locks are checked so late?
I assume this only ever worked for the case of migration with shared storage (and shared lockspace).
NB, the virtlockd integration is intended to protect the *content* of the disk image from being written to by two processes at once.
Protecting the image metadata not a goal, since that is something that should be dealt with by the security drivers. ie the security driver should be acquiring suitable locks of its own so that it does not block away in-use labels for other VMs. We've had patches floating around for the security drivers for a while, but never got as far as merging them yet.
Having said all that, I wonder if we should actually *not* turn on virtlockd, until we have addressed the problem in the security driver. Without the latter fix, it seems we'll get a never ending stream of bug reports about this issue. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Fri, Jan 22, 2016 at 03:56:08PM +0000, Daniel P. Berrange wrote:
We have had virtlockd available for a long time now but have always defaulted to the 'nop' lock driver which does no locking. This gives users an unsafe deployment by default unless they know to turn on lockd. virtlockd will auto-activate via systemd when guests launch, so setting it on by default will only cause upgrade pain for people on non-systemd distros.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/qemu/qemu.conf | 3 ++- src/qemu/qemu_driver.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-)
ACK Jan
participants (4)
-
Cole Robinson
-
Daniel P. Berrange
-
Ján Tomko
-
Kashyap Chamarthy