[libvirt] QEMU migration with non-shared storage

Hello, I am trying to understand libvirt's logic for checking whether migration of a VM is safe, and how it determines which disks should be mirrored by QEMU. My particular use case involves VMs that may have disks backed onto LVM or onto Ceph RBD, or both. As far as I can tell, the qemuMigrationIsSafe check is there to ensure that all disks are readonly, or have cache=none, or their backends can guarantee cache coherence. As far as I can tell, however, QEMU flushes *all* block devices when it pauses a VM's CPUs (just before the final part of migration, for instance), so I'm wondering why this check is needed. Is there any possible situation for the source VM to be paused, for its block devices to be flushed, and yet the destination VM can't see all completed writes? Why is RBD is handled specially in this function? The current logic is that an RBD-backed disk is safe to be migrated even if it's got caching enabled, but I'm not sure how RBD is different from other backends in this regard. If VIR_MIGRATE_NON_SHARED_DISK or _INC is specified, should these safety checks be relaxed? It seems to me that if any non-shared disk is going to be *explicitly* copied from the source to the destination VM, then cache coherence in the backend is irrelevant. At the moment, the set of non-shared block devices copied by VIR_MIGRATE_NON_SHARED_* differs depending on whether NBD is being used in the migration: - If NBD can't be used (e.g. with a tunnelled migration), then QEMU will copy *all* non-readonly block devices; - If NBD is being used, then QEMU will only mirror "shareable", "readonly" or "sourceless" disks. A problem arises with RBD disks that have caching enabled. According to qemuMigrationIsSafe, these disks are "safe" to be migrated. However in either the NBD or the non-NBD case, the RBD disk will be copied. This is clearly not desirable. If RBD is a special case in qemuMigrationIsSafe, does it also need to be a special case when configuring the NBD server? Or, if an NBD server is not going to be used, should the migration be considered "unsafe" if an RBD disk is present? I'd very much appreciate some help in understanding all of this. At the moment, I think my only option is to run RBD without caching at all. However, not only does that result in very poor performance, it also doesn't seem to match the qemuMigrationIsSafe check. Regards, Michael

Hi Michael! On 11 September 2014 14:13, Michael Chapman <mike@very.puzzling.org> wrote:
Why is RBD is handled specially in this function? The current logic is that an RBD-backed disk is safe to be migrated even if it's got caching enabled, but I'm not sure how RBD is different from other backends in this regard.
I recall this has was discussed before but am having trouble finding the thread. I think the gist of it was that the rbd integration was just lucky, but it wasn't using the appropriate interfaces defined by libvirt for flushing. And I think Debian patches away the qemuMigrationIsSafe pass for RBD, so it'll only pass for cache=none.
A problem arises with RBD disks that have caching enabled. According to qemuMigrationIsSafe, these disks are "safe" to be migrated. However in either the NBD or the non-NBD case, the RBD disk will be copied. This is clearly not desirable. If RBD is a special case in qemuMigrationIsSafe, does it also need to be a special case when configuring the NBD server? Or, if an NBD server is not going to be used, should the migration be considered "unsafe" if an RBD disk is present?
A related problem arises mixing RBD and local disks when drive-mirroring / block-migrating - the RBDs get migrated too, volume round trips out of ceph -> into source hypervisor -> into dest hypervisor -> back into ceph. Not great! This patch fixes that issue (but thus far I've only tested against 1.1.1 on Precise - will do with 1.2 on Trusty next week) ===== qemuMigrationDriveMirror(virQEMUDriverPt virDomainBlockJobInfo info; /* skip shared, RO and source-less disks */ - if (disk->shared || disk->readonly || !disk->src) + if (disk->shared || disk->readonly || !disk->src || + (disk->type == VIR_DOMAIN_DISK_TYPE_NETWORK && + disk->protocol == VIR_DOMAIN_DISK_PROTOCOL_RBD)) continue; VIR_FREE(diskAlias); ===== So interested in feedback on this and whether it should be pushed up... -- Cheers, ~Blairo

On Thu, Sep 11, 2014 at 02:45:41PM +1000, Blair Bethwaite wrote:
Hi Michael!
On 11 September 2014 14:13, Michael Chapman <mike@very.puzzling.org> wrote:
Why is RBD is handled specially in this function? The current logic is that an RBD-backed disk is safe to be migrated even if it's got caching enabled, but I'm not sure how RBD is different from other backends in this regard.
I recall this has was discussed before but am having trouble finding the thread. I think the gist of it was that the rbd integration was just lucky, but it wasn't using the appropriate interfaces defined by libvirt for flushing.
And I think Debian patches away the qemuMigrationIsSafe pass for RBD, so it'll only pass for cache=none.
Debian doesn't ship such a patch. Cheers, -- Guido

On Thu, 11 Sep 2014, Blair Bethwaite wrote:
A related problem arises mixing RBD and local disks when drive-mirroring / block-migrating - the RBDs get migrated too, volume round trips out of ceph -> into source hypervisor -> into dest hypervisor -> back into ceph. Not great!
This patch fixes that issue (but thus far I've only tested against 1.1.1 on Precise - will do with 1.2 on Trusty next week) ===== qemuMigrationDriveMirror(virQEMUDriverPt virDomainBlockJobInfo info;
/* skip shared, RO and source-less disks */ - if (disk->shared || disk->readonly || !disk->src) + if (disk->shared || disk->readonly || !disk->src || + (disk->type == VIR_DOMAIN_DISK_TYPE_NETWORK && + disk->protocol == VIR_DOMAIN_DISK_PROTOCOL_RBD)) continue;
VIR_FREE(diskAlias); =====
So interested in feedback on this and whether it should be pushed up...
I think you would need a corresponding change in qemuMigrationStartNBDServer as well. This change would largely solve the problems I'm encountering, but it's still annoying having to use VIR_MIGRATE_UNSAFE just so that libvirt doesn't complain about the non-shared disks, even though they're going to be explicitly copied from the source to destination VM. I'm going to see if I can find out why RBD is being treated differently from other disk backends. As I said before, I can't see anything in QEMU that indicates it behaves differently from the others, and indeed it appears that *all* backends should be safe since (as far as I can tell) they're explicitly flushed when the VM is paused. Maybe at the time that RBD was added to QEMU and libvirt, QEMU only did this explicit flushing for RBD. It seems to me that libvirt is also probably going to need some more detailed logic on deciding when a disk device should be copied or not (with the understanding that when NBD is *not* being used, that decision is ultimately up to QEMU only). If we have exceptional cases like RBD, then the decision can't be made purely according to the shared flag. - Michael
participants (4)
-
Blair Bethwaite
-
Guido Günther
-
Michael Chapman
-
Michael Chapman