[libvirt] UID/GID during kvm/qemu migrate

newer
[libvirt] [PATCH] cpu: Drop CPUID...

Stephan von Krawczynski

29 Jul 2019 29 Jul '19

4:04 a.m.

Hello, is there some immanent code in libvirt that forces UID/GID of the libvirt standard user to be the same on two boxes migrating qemu vms against each other? The migration itself uses root obviously (password is requested). But if a vm xml does not contain any definition regarding UID/GID what else could prevent this from working? I believe I ran into such a problem trying to migrate and ending up in an error, a vm still working on original host but its fs (netfs pool (nfs/raw)) being switched to read-only... -- Regards, Stephan

Show replies by date

Daniel P. Berrangé

30 Jul 30 Jul

2:35 a.m.

On Mon, Jul 29, 2019 at 02:04:14PM +0200, Stephan von Krawczynski wrote:

...

Hello,

is there some immanent code in libvirt that forces UID/GID of the libvirt standard user to be the same on two boxes migrating qemu vms against each other? The migration itself uses root obviously (password is requested). But if a vm xml does not contain any definition regarding UID/GID what else could prevent this from working?

I believe I ran into such a problem trying to migrate and ending up in an error, a vm still working on original host but its fs (netfs pool (nfs/raw)) being switched to read-only...

When migrating a VM whose image is hosted on NFS, you have 2 QEMU processes which both need to be able to open the same image file at the same time. QEMU runs as an unprivileged user normally, and so the disk images get chowned to this unprivileged user by libvirt when QEMU is started. If the QEMU on the target host is given a UID/GID that's different from the QEMU on the source host, then the target QEMU will likely have problems opening the image. Basically when using shared FS storage, the rule is to have all your hosts configured in the same way from libvirt's POV. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Stephan von Krawczynski

4:49 a.m.

On Tue, 30 Jul 2019 11:35:45 +0100 Daniel P. Berrangé <berrange@redhat.com> wrote:

...

On Mon, Jul 29, 2019 at 02:04:14PM +0200, Stephan von Krawczynski wrote:

...
Hello,

is there some immanent code in libvirt that forces UID/GID of the libvirt standard user to be the same on two boxes migrating qemu vms against each other? The migration itself uses root obviously (password is requested). But if a vm xml does not contain any definition regarding UID/GID what else could prevent this from working?

I believe I ran into such a problem trying to migrate and ending up in an error, a vm still working on original host but its fs (netfs pool (nfs/raw)) being switched to read-only...

When migrating a VM whose image is hosted on NFS, you have 2 QEMU processes which both need to be able to open the same image file at the same time. QEMU runs as an unprivileged user normally, and so the disk images get chowned to this unprivileged user by libvirt when QEMU is started. If the QEMU on the target host is given a UID/GID that's different from the QEMU on the source host, then the target QEMU will likely have problems opening the image.

Basically when using shared FS storage, the rule is to have all your hosts configured in the same way from libvirt's POV.

Regards, Daniel

Hello Daniel, thank you for the short explanation. The key words are "both need to be able to open the same image file at the same time". I would not have expected that. I thought qemu 1 will close and exit, and then qemu 2 will open the image, which means he can change the uid/gid right before - just as in normal operation. Is this the reason why my failing try leaves me with a read-only fs on the guest? Which I would see as a bug, not? Turning it read-only is possibly the only way to not corrupt the fs image if two qemus have it open simultaneously. -- Regards, Stephan

Daniel P. Berrangé

5 a.m.

On Tue, Jul 30, 2019 at 02:49:02PM +0200, Stephan von Krawczynski wrote:

...

On Tue, 30 Jul 2019 11:35:45 +0100 Daniel P. Berrangé <berrange@redhat.com> wrote:

...
On Mon, Jul 29, 2019 at 02:04:14PM +0200, Stephan von Krawczynski wrote:

...
Hello,

is there some immanent code in libvirt that forces UID/GID of the libvirt standard user to be the same on two boxes migrating qemu vms against each other? The migration itself uses root obviously (password is requested). But if a vm xml does not contain any definition regarding UID/GID what else could prevent this from working?

I believe I ran into such a problem trying to migrate and ending up in an error, a vm still working on original host but its fs (netfs pool (nfs/raw)) being switched to read-only...

When migrating a VM whose image is hosted on NFS, you have 2 QEMU processes which both need to be able to open the same image file at the same time. QEMU runs as an unprivileged user normally, and so the disk images get chowned to this unprivileged user by libvirt when QEMU is started. If the QEMU on the target host is given a UID/GID that's different from the QEMU on the source host, then the target QEMU will likely have problems opening the image.

Basically when using shared FS storage, the rule is to have all your hosts configured in the same way from libvirt's POV.

Regards, Daniel

Hello Daniel,

thank you for the short explanation. The key words are "both need to be able to open the same image file at the same time". I would not have expected that. I thought qemu 1 will close and exit, and then qemu 2 will open the image, which means he can change the uid/gid right before

This is supposed to be a safety thing. If anything goes badly wrong on the target host you need to be able to rollback & continue running on the source VMs. So the source VM doesn't want to close the disk images, until the target VM has confirmed it is successfully running. This implies there is a period of time when both have to have the disk image open. Crucially though, even when 2 QEMUs have the disks open, only *1* QEMU has the guest CPU running and permitting disk writes.

...

- just as in normal operation. Is this the reason why my failing try leaves me with a read-only fs on the guest? Which I would see as a bug, not? Turning it read-only is possibly the only way to not corrupt the fs image if two qemus have it open simultaneously.

The guest sets its FS read-only when it gets an I/O error reported by the virtual disk driver. QEMU reports an I/O error to the guest, when it in turns gets an I/O error on the host storage. This happens because qemu is loosing privileges to access the disks. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Stephan von Krawczynski

5:21 a.m.

On Tue, 30 Jul 2019 14:00:48 +0100 Daniel P. Berrangé <berrange@redhat.com> wrote:

...

On Tue, Jul 30, 2019 at 02:49:02PM +0200, Stephan von Krawczynski wrote:

...
On Tue, 30 Jul 2019 11:35:45 +0100 Daniel P. Berrangé <berrange@redhat.com> wrote:

...
On Mon, Jul 29, 2019 at 02:04:14PM +0200, Stephan von Krawczynski wrote:

...
Hello,

is there some immanent code in libvirt that forces UID/GID of the libvirt standard user to be the same on two boxes migrating qemu vms against each other? The migration itself uses root obviously (password is requested). But if a vm xml does not contain any definition regarding UID/GID what else could prevent this from working?

I believe I ran into such a problem trying to migrate and ending up in an error, a vm still working on original host but its fs (netfs pool (nfs/raw)) being switched to read-only...

When migrating a VM whose image is hosted on NFS, you have 2 QEMU processes which both need to be able to open the same image file at the same time. QEMU runs as an unprivileged user normally, and so the disk images get chowned to this unprivileged user by libvirt when QEMU is started. If the QEMU on the target host is given a UID/GID that's different from the QEMU on the source host, then the target QEMU will likely have problems opening the image.

Basically when using shared FS storage, the rule is to have all your hosts configured in the same way from libvirt's POV.

Regards, Daniel

Hello Daniel,

thank you for the short explanation. The key words are "both need to be able to open the same image file at the same time". I would not have expected that. I thought qemu 1 will close and exit, and then qemu 2 will open the image, which means he can change the uid/gid right before

This is supposed to be a safety thing. If anything goes badly wrong on the target host you need to be able to rollback & continue running on the source VMs. So the source VM doesn't want to close the disk images, until the target VM has confirmed it is successfully running. This implies there is a period of time when both have to have the disk image open. Crucially though, even when 2 QEMUs have the disks open, only *1* QEMU has the guest CPU running and permitting disk writes.

...
- just as in normal operation. Is this the reason why my failing try leaves me with a read-only fs on the guest? Which I would see as a bug, not? Turning it read-only is possibly the only way to not corrupt the fs image if two qemus have it open simultaneously.

The guest sets its FS read-only when it gets an I/O error reported by the virtual disk driver. QEMU reports an I/O error to the guest, when it in turns gets an I/O error on the host storage. This happens because qemu is loosing privileges to access the disks.

Regards, Daniel

Ok, the source VM does not want to close the image until confirmed that the destination VM is up and running. But if it is not up and running and the source VM still cannot go on - because its fs is read-only - then what's the use of keeping it open in the first place? The simple fact is: a reproducably failing migration leaves you with a non-working guest. I cannot think of any argument supporting the idea this being no bug. I mean the whole story of migration is only done for keeping up the guest... Else you could as well copy the config with the guest offline. And: the source guest VM has no I/O error. The destination guest VM cannot touch the fs image for permission reason, so the fs is still safe in the state the frozen source guest left it. -- Regards, Stephan

2406

Age (days ago)

2407

Last active (days ago)

List overview

Download

4 comments

2 participants

participants (2)

Daniel P. Berrangé
Stephan von Krawczynski