[libvirt] [PATCH] migration: do not restore labels on failed migration

https://bugzilla.redhat.com/show_bug.cgi?id=822052 When doing a live migration, if the destination fails for any reason after the point in which files should be labeled, then the cleanup of the destination would restore the labels to their defaults, even though the source is still trying to continue running with the image open. Bug 822052 mentioned one source of live migration failure - a mismatch in SELinux virt_use_nfs settings (on for source, off for destination); but I found other situations that would also trigger it (for example, having a graphics device tied to port 5999 on the source, and a different domain on the destination already using that port, so that the destination cannot reuse the port). In short, just as cleanup of the source on a successful migration must not relabel files (because the destination would be crippled by the relabel), cleanup of the destination on a failed migraion must not relabel files (because the source would be crippled). * src/qemu/qemu_process.c (qemuProcessStart): Set flag to avoid label restoration when cleaning up on failed migration. Signed-off-by: Eric Blake <eblake@redhat.com> --- src/qemu/qemu_process.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 31de759..d727fc4 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3466,6 +3466,10 @@ int qemuProcessStart(virConnectPtr conn, * restore any security label as we would overwrite labels * we did not set. */ stop_flags = VIR_QEMU_PROCESS_STOP_NO_RELABEL; + /* If we fail while doing incoming migration, then we must not + * relabel, as the source is still using the files. */ + if (migrateFrom) + stop_flags |= VIR_QEMU_PROCESS_STOP_MIGRATED; hookData.conn = conn; hookData.vm = vm; -- 1.8.3.1

On 08/20/2013 04:46 PM, Eric Blake wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=822052
When doing a live migration, if the destination fails for any reason after the point in which files should be labeled, then the cleanup of the destination would restore the labels to their defaults, even though the source is still trying to continue running with the image open. Bug 822052 mentioned one source of live migration failure - a mismatch in SELinux virt_use_nfs settings (on for source, off for destination); but I found other situations that would also trigger it (for example, having a graphics device tied to port 5999 on the source, and a different domain on the destination already using that port, so that the destination cannot reuse the port).
In short, just as cleanup of the source on a successful migration must not relabel files (because the destination would be crippled by the relabel), cleanup of the destination on a failed migraion must not relabel files (because the source would be crippled).
Note that while debugging this, I ran across a bigger design problem. This patch works as long as source and destination are configured equally, or where the difference (such as having selinux virt_use_nfs settings distinct) can be detected immediately upon the attempt to start qemu on the destination. A much more insidious case occurs if you have a shared disk image on NFS, but where you have a configuration mismatch between source and destination. In my case, the mismatch was caused by running Fedora's pre-packaged libvirtd (which is built using qemu:qemu as the default user/group for qemu processes) on the source, but a self-built libvirtd (built with ./autogen.sh --system) on the destination, and with no edits to /etc/libvirt/qemu.conf on either side. Note that the default user is a configure option, and that if /etc/libvirt/qemu.conf does NOT call out a user/group setting, then libvirtd uses the configure defaults; and since './autogen.sh --system' currently does not change the configure defaults, it ends up using root:root to run qemu guests. This is probably a shortcoming of './autogen.sh --system' for not matching what the rpm does, but fixing that is a separate patch. The problem at hand is that when migrating between different configurations, the destination first attempts to label files before even starting qemu - and does a chown() to root:root. The failure case described above for matched configurations occurred when the destination did a chown() to root:root as part of the cleanup after failure to migrate, but with my mismatched configuration, the chown() happened much earlier. One of the (numerous) annoyances of NFS is that a chown() can kill access of existing open fds (on a POSIX file system, once you have an fd via open() or other means, you cannot lose rights to the inode that the fd is visiting, no matter whether chown(), unlink(), or any other number of things occur on the file that you originally opened in the meantime; not so for NFS). But since the forced loss of access isn't instantaneous, you now have a race - if live migration is fast enough, the source can still get the guest migrated to the destination; but if there is lots of I/O and memory churn in the guest, so that the live migration takes long enough for the forced loss of access to kick in, then the source will start seeing I/O errors, possibly causing severe corruption to the guest. We should really consider whether we need more framework for diagnosing incompatible configurations between source and destination, and outright rejecting migration where two machines do not share the same user/group setup. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Tue, Aug 20, 2013 at 05:10:38PM -0600, Eric Blake wrote:
The problem at hand is that when migrating between different configurations, the destination first attempts to label files before even starting qemu - and does a chown() to root:root.
IMHO making libvirt work in the face of such admin mis-configurations is out of scope.
We should really consider whether we need more framework for diagnosing incompatible configurations between source and destination, and outright rejecting migration where two machines do not share the same user/group setup.
I tend to view this as a general OS management problem. Tools like puppet/chef/etc should be used to manage your virt hosts to avoid inconsistent configs between hosts. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Tue, Aug 20, 2013 at 04:46:47PM -0600, Eric Blake wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=822052
When doing a live migration, if the destination fails for any reason after the point in which files should be labeled, then the cleanup of the destination would restore the labels to their defaults, even though the source is still trying to continue running with the image open. Bug 822052 mentioned one source of live migration failure - a mismatch in SELinux virt_use_nfs settings (on for source, off for destination); but I found other situations that would also trigger it (for example, having a graphics device tied to port 5999 on the source, and a different domain on the destination already using that port, so that the destination cannot reuse the port).
In short, just as cleanup of the source on a successful migration must not relabel files (because the destination would be crippled by the relabel), cleanup of the destination on a failed migraion must not relabel files (because the source would be crippled).
* src/qemu/qemu_process.c (qemuProcessStart): Set flag to avoid label restoration when cleaning up on failed migration.
Signed-off-by: Eric Blake <eblake@redhat.com> --- src/qemu/qemu_process.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 31de759..d727fc4 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3466,6 +3466,10 @@ int qemuProcessStart(virConnectPtr conn, * restore any security label as we would overwrite labels * we did not set. */ stop_flags = VIR_QEMU_PROCESS_STOP_NO_RELABEL; + /* If we fail while doing incoming migration, then we must not + * relabel, as the source is still using the files. */ + if (migrateFrom) + stop_flags |= VIR_QEMU_PROCESS_STOP_MIGRATED;
hookData.conn = conn; hookData.vm = vm;
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 08/21/2013 06:54 AM, Daniel P. Berrange wrote:
On Tue, Aug 20, 2013 at 04:46:47PM -0600, Eric Blake wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=822052
When doing a live migration, if the destination fails for any reason after the point in which files should be labeled, then the cleanup of the destination would restore the labels to their defaults, even though the source is still trying to continue running with the image open. Bug 822052 mentioned one source of live migration failure - a mismatch in SELinux virt_use_nfs settings (on for source, off for destination); but I found other situations that would also trigger it (for example, having a graphics device tied to port 5999 on the source, and a different domain on the destination already using that port, so that the destination cannot reuse the port).
In short, just as cleanup of the source on a successful migration must not relabel files (because the destination would be crippled by the relabel), cleanup of the destination on a failed migraion must not relabel files (because the source would be crippled).
* src/qemu/qemu_process.c (qemuProcessStart): Set flag to avoid label restoration when cleaning up on failed migration.
Signed-off-by: Eric Blake <eblake@redhat.com> --- src/qemu/qemu_process.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 31de759..d727fc4 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3466,6 +3466,10 @@ int qemuProcessStart(virConnectPtr conn, * restore any security label as we would overwrite labels * we did not set. */ stop_flags = VIR_QEMU_PROCESS_STOP_NO_RELABEL; + /* If we fail while doing incoming migration, then we must not + * relabel, as the source is still using the files. */ + if (migrateFrom) + stop_flags |= VIR_QEMU_PROCESS_STOP_MIGRATED;
hookData.conn = conn; hookData.vm = vm;
ACK
Thanks; pushed. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
participants (2)
-
Daniel P. Berrange
-
Eric Blake