[libvirt] [PATCH] qemu: check for vm after starting a job

https://bugzilla.redhat.com/show_bug.cgi?id=638285 - when migrating a guest, it was very easy to provoke a race where an application could query block information on a VM that had just been migrated away. Any time qemu code obtains a job lock, it must also check that the VM was not taken down in the time where it was waiting for the lock. * src/qemu/qemu_driver.c (qemudDomainSetMemory) (qemudDomainGetInfo, qemuDomainGetBlockInfo): Check that vm still exists after obtaining job lock, before starting monitor action. --- src/qemu/qemu_driver.c | 25 ++++++++++++++++++++++--- 1 files changed, 22 insertions(+), 3 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index ae1d833..af5126d 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -463,7 +463,8 @@ static int ATTRIBUTE_RETURN_CHECK qemuDomainObjEndJob(virDomainObjPtr obj) * obj must be locked before calling, qemud_driver must be unlocked * * To be called immediately before any QEMU monitor API call - * Must have alrady called qemuDomainObjBeginJob(). + * Must have already called qemuDomainObjBeginJob(), and checked + * that the VM is still active. * * To be followed with qemuDomainObjExitMonitor() once complete */ @@ -504,7 +505,7 @@ static void qemuDomainObjExitMonitor(virDomainObjPtr obj) * obj must be locked before calling, qemud_driver must be locked * * To be called immediately before any QEMU monitor API call - * Must have alrady called qemuDomainObjBeginJob(). + * Must have already called qemuDomainObjBeginJob(). * * To be followed with qemuDomainObjExitMonitorWithDriver() once complete */ @@ -522,7 +523,7 @@ static void qemuDomainObjEnterMonitorWithDriver(struct qemud_driver *driver, vir /* obj must NOT be locked before calling, qemud_driver must be unlocked, * and will be locked after returning * - * Should be paired with an earlier qemuDomainObjEnterMonitor() call + * Should be paired with an earlier qemuDomainObjEnterMonitorWithDriver() call */ static void qemuDomainObjExitMonitorWithDriver(struct qemud_driver *driver, virDomainObjPtr obj) { @@ -4948,6 +4949,12 @@ static int qemudDomainSetMemory(virDomainPtr dom, unsigned long newmem) { if (qemuDomainObjBeginJob(vm) < 0) goto cleanup; + if (!virDomainObjIsActive(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("domain is not running")); + goto endjob; + } + priv = vm->privateData; qemuDomainObjEnterMonitor(vm); r = qemuMonitorSetBalloon(priv->mon, newmem); @@ -5014,11 +5021,17 @@ static int qemudDomainGetInfo(virDomainPtr dom, } else if (!priv->jobActive) { if (qemuDomainObjBeginJob(vm) < 0) goto cleanup; + if (!virDomainObjIsActive(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("domain is not running")); + goto endjob; + } qemuDomainObjEnterMonitor(vm); err = qemuMonitorGetBalloonInfo(priv->mon, &balloon); qemuDomainObjExitMonitor(vm); if (err < 0) { + endjob: if (qemuDomainObjEndJob(vm) == 0) vm = NULL; goto cleanup; @@ -10343,6 +10356,11 @@ static int qemuDomainGetBlockInfo(virDomainPtr dom, qemuDomainObjPrivatePtr priv = vm->privateData; if (qemuDomainObjBeginJob(vm) < 0) goto cleanup; + if (!virDomainObjIsActive(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("domain is not running")); + goto endjob; + } qemuDomainObjEnterMonitor(vm); ret = qemuMonitorGetBlockExtent(priv->mon, @@ -10350,6 +10368,7 @@ static int qemuDomainGetBlockInfo(virDomainPtr dom, &info->allocation); qemuDomainObjExitMonitor(vm); + endjob: if (qemuDomainObjEndJob(vm) == 0) vm = NULL; } else { -- 1.7.2.3

On Tue, Oct 26, 2010 at 05:32:09PM -0600, Eric Blake wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=638285 - when migrating a guest, it was very easy to provoke a race where an application could query block information on a VM that had just been migrated away. Any time qemu code obtains a job lock, it must also check that the VM was not taken down in the time where it was waiting for the lock.
* src/qemu/qemu_driver.c (qemudDomainSetMemory) (qemudDomainGetInfo, qemuDomainGetBlockInfo): Check that vm still exists after obtaining job lock, before starting monitor action. [...] if (err < 0) { + endjob: if (qemuDomainObjEndJob(vm) == 0) [...]
+ endjob: if (qemuDomainObjEndJob(vm) == 0)
Looks fine to me but it seems there is some indentation issues just check those before commit, ACK Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On Tue, Oct 26, 2010 at 05:32:09PM -0600, Eric Blake wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=638285 - when migrating a guest, it was very easy to provoke a race where an application could query block information on a VM that had just been migrated away. Any time qemu code obtains a job lock, it must also check that the VM was not taken down in the time where it was waiting for the lock.
* src/qemu/qemu_driver.c (qemudDomainSetMemory) (qemudDomainGetInfo, qemuDomainGetBlockInfo): Check that vm still exists after obtaining job lock, before starting monitor action. --- src/qemu/qemu_driver.c | 25 ++++++++++++++++++++++--- 1 files changed, 22 insertions(+), 3 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index ae1d833..af5126d 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -463,7 +463,8 @@ static int ATTRIBUTE_RETURN_CHECK qemuDomainObjEndJob(virDomainObjPtr obj) * obj must be locked before calling, qemud_driver must be unlocked * * To be called immediately before any QEMU monitor API call - * Must have alrady called qemuDomainObjBeginJob(). + * Must have already called qemuDomainObjBeginJob(), and checked + * that the VM is still active. * * To be followed with qemuDomainObjExitMonitor() once complete */ @@ -504,7 +505,7 @@ static void qemuDomainObjExitMonitor(virDomainObjPtr obj) * obj must be locked before calling, qemud_driver must be locked * * To be called immediately before any QEMU monitor API call - * Must have alrady called qemuDomainObjBeginJob(). + * Must have already called qemuDomainObjBeginJob(). * * To be followed with qemuDomainObjExitMonitorWithDriver() once complete */ @@ -522,7 +523,7 @@ static void qemuDomainObjEnterMonitorWithDriver(struct qemud_driver *driver, vir /* obj must NOT be locked before calling, qemud_driver must be unlocked, * and will be locked after returning * - * Should be paired with an earlier qemuDomainObjEnterMonitor() call + * Should be paired with an earlier qemuDomainObjEnterMonitorWithDriver() call */ static void qemuDomainObjExitMonitorWithDriver(struct qemud_driver *driver, virDomainObjPtr obj) { @@ -4948,6 +4949,12 @@ static int qemudDomainSetMemory(virDomainPtr dom, unsigned long newmem) { if (qemuDomainObjBeginJob(vm) < 0) goto cleanup;
+ if (!virDomainObjIsActive(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("domain is not running")); + goto endjob; + } +
There is already an IsActive check in this method, it is simply in the wrong place wrt to the BeginJob call, so it is better to just move the existing call, rather than duplicate it i reckon.
@@ -5014,11 +5021,17 @@ static int qemudDomainGetInfo(virDomainPtr dom, } else if (!priv->jobActive) { if (qemuDomainObjBeginJob(vm) < 0) goto cleanup; + if (!virDomainObjIsActive(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("domain is not running")); + goto endjob; + }
qemuDomainObjEnterMonitor(vm); err = qemuMonitorGetBalloonInfo(priv->mon, &balloon); qemuDomainObjExitMonitor(vm); if (err < 0) { + endjob: if (qemuDomainObjEndJob(vm) == 0) vm = NULL; goto cleanup; @@ -10343,6 +10356,11 @@ static int qemuDomainGetBlockInfo(virDomainPtr dom, qemuDomainObjPrivatePtr priv = vm->privateData; if (qemuDomainObjBeginJob(vm) < 0) goto cleanup; + if (!virDomainObjIsActive(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("domain is not running")); + goto endjob; + }
qemuDomainObjEnterMonitor(vm); ret = qemuMonitorGetBlockExtent(priv->mon, @@ -10350,6 +10368,7 @@ static int qemuDomainGetBlockInfo(virDomainPtr dom, &info->allocation); qemuDomainObjExitMonitor(vm);
+ endjob: if (qemuDomainObjEndJob(vm) == 0) vm = NULL; } else {
I'm not much of a fan of adding goto jump targets which are in the middle of long methods. Could we just invert the IsActive checks in these two methods & thus avoid the 'endjob' in the middle of the method. Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On 10/27/2010 08:08 AM, Daniel P. Berrange wrote:
On Tue, Oct 26, 2010 at 05:32:09PM -0600, Eric Blake wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=638285 - when migrating a guest, it was very easy to provoke a race where an application could query block information on a VM that had just been migrated away. Any time qemu code obtains a job lock, it must also check that the VM was not taken down in the time where it was waiting for the lock.
@@ -4948,6 +4949,12 @@ static int qemudDomainSetMemory(virDomainPtr dom, unsigned long newmem) { if (qemuDomainObjBeginJob(vm)< 0) goto cleanup;
+ if (!virDomainObjIsActive(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("domain is not running")); + goto endjob; + } +
There is already an IsActive check in this method, it is simply in the wrong place wrt to the BeginJob call, so it is better to just move the existing call, rather than duplicate it i reckon.
Okay, I'll see about swapping things around.
@@ -10350,6 +10368,7 @@ static int qemuDomainGetBlockInfo(virDomainPtr dom, &info->allocation); qemuDomainObjExitMonitor(vm);
+ endjob: if (qemuDomainObjEndJob(vm) == 0) vm = NULL; } else {
I'm not much of a fan of adding goto jump targets which are in the middle of long methods. Could we just invert the IsActive checks in these two methods& thus avoid the 'endjob' in the middle of the method.
Yes, I'll submit a v2 that avoids extra goto labels (or at least sinks the endjob label to the end of the function rather than the middle). -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

https://bugzilla.redhat.com/show_bug.cgi?id=638285 - when migrating a guest, it was very easy to provoke a race where an application could query block information on a VM that had just been migrated away. Any time qemu code obtains a job lock, it must also check that the VM was not taken down in the time where it was waiting for the lock. * src/qemu/qemu_driver.c (qemudDomainSetMemory) (qemudDomainGetInfo, qemuDomainGetBlockInfo): Check that vm still exists after obtaining job lock, before starting monitor action. --- v2: incorporate danpb's suggestions, to minimize virDomainObjIsActive calls and avoid use of goto for mid-function labels. src/qemu/qemu_driver.c | 60 ++++++++++++++++++++++++----------------------- 1 files changed, 31 insertions(+), 29 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index b119ca1..f1f8cdf 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -466,7 +466,8 @@ static int ATTRIBUTE_RETURN_CHECK qemuDomainObjEndJob(virDomainObjPtr obj) * obj must be locked before calling, qemud_driver must be unlocked * * To be called immediately before any QEMU monitor API call - * Must have alrady called qemuDomainObjBeginJob(). + * Must have already called qemuDomainObjBeginJob(), and checked + * that the VM is still active. * * To be followed with qemuDomainObjExitMonitor() once complete */ @@ -507,7 +508,7 @@ static void qemuDomainObjExitMonitor(virDomainObjPtr obj) * obj must be locked before calling, qemud_driver must be locked * * To be called immediately before any QEMU monitor API call - * Must have alrady called qemuDomainObjBeginJob(). + * Must have already called qemuDomainObjBeginJob(). * * To be followed with qemuDomainObjExitMonitorWithDriver() once complete */ @@ -525,7 +526,7 @@ static void qemuDomainObjEnterMonitorWithDriver(struct qemud_driver *driver, vir /* obj must NOT be locked before calling, qemud_driver must be unlocked, * and will be locked after returning * - * Should be paired with an earlier qemuDomainObjEnterMonitor() call + * Should be paired with an earlier qemuDomainObjEnterMonitorWithDriver() call */ static void qemuDomainObjExitMonitorWithDriver(struct qemud_driver *driver, virDomainObjPtr obj) { @@ -5077,12 +5078,6 @@ static int qemudDomainSetMemory(virDomainPtr dom, unsigned long newmem) { goto cleanup; } - if (!virDomainObjIsActive(vm)) { - qemuReportError(VIR_ERR_OPERATION_INVALID, - "%s", _("domain is not running")); - goto cleanup; - } - if (newmem > vm->def->mem.max_balloon) { qemuReportError(VIR_ERR_INVALID_ARG, "%s", _("cannot set memory higher than max memory")); @@ -5092,6 +5087,12 @@ static int qemudDomainSetMemory(virDomainPtr dom, unsigned long newmem) { if (qemuDomainObjBeginJob(vm) < 0) goto cleanup; + if (!virDomainObjIsActive(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("domain is not running")); + goto endjob; + } + priv = vm->privateData; qemuDomainObjEnterMonitor(vm); r = qemuMonitorSetBalloon(priv->mon, newmem); @@ -5158,26 +5159,25 @@ static int qemudDomainGetInfo(virDomainPtr dom, } else if (!priv->jobActive) { if (qemuDomainObjBeginJob(vm) < 0) goto cleanup; - - qemuDomainObjEnterMonitor(vm); - err = qemuMonitorGetBalloonInfo(priv->mon, &balloon); - qemuDomainObjExitMonitor(vm); - if (err < 0) { - if (qemuDomainObjEndJob(vm) == 0) - vm = NULL; + if (!virDomainObjIsActive(vm)) + err = 0; + else { + qemuDomainObjEnterMonitor(vm); + err = qemuMonitorGetBalloonInfo(priv->mon, &balloon); + qemuDomainObjExitMonitor(vm); + } + if (qemuDomainObjEndJob(vm) == 0) { + vm = NULL; goto cleanup; } + if (err < 0) + goto cleanup; if (err == 0) /* Balloon not supported, so maxmem is always the allocation */ info->memory = vm->def->mem.max_balloon; else info->memory = balloon; - - if (qemuDomainObjEndJob(vm) == 0) { - vm = NULL; - goto cleanup; - } } else { info->memory = vm->def->mem.cur_balloon; } @@ -10510,19 +10510,21 @@ static int qemuDomainGetBlockInfo(virDomainPtr dom, /* ..but if guest is running & not using raw disk format and on a block device, then query highest allocated extent from QEMU */ - if (virDomainObjIsActive(vm) && - disk->type == VIR_DOMAIN_DISK_TYPE_BLOCK && + if (disk->type == VIR_DOMAIN_DISK_TYPE_BLOCK && format != VIR_STORAGE_FILE_RAW && S_ISBLK(sb.st_mode)) { qemuDomainObjPrivatePtr priv = vm->privateData; if (qemuDomainObjBeginJob(vm) < 0) goto cleanup; - - qemuDomainObjEnterMonitor(vm); - ret = qemuMonitorGetBlockExtent(priv->mon, - disk->info.alias, - &info->allocation); - qemuDomainObjExitMonitor(vm); + if (!virDomainObjIsActive(vm)) + ret = 0; + else { + qemuDomainObjEnterMonitor(vm); + ret = qemuMonitorGetBlockExtent(priv->mon, + disk->info.alias, + &info->allocation); + qemuDomainObjExitMonitor(vm); + } if (qemuDomainObjEndJob(vm) == 0) vm = NULL; -- 1.7.2.3

On Wed, Oct 27, 2010 at 02:28:51PM -0600, Eric Blake wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=638285 - when migrating a guest, it was very easy to provoke a race where an application could query block information on a VM that had just been migrated away. Any time qemu code obtains a job lock, it must also check that the VM was not taken down in the time where it was waiting for the lock.
* src/qemu/qemu_driver.c (qemudDomainSetMemory) (qemudDomainGetInfo, qemuDomainGetBlockInfo): Check that vm still exists after obtaining job lock, before starting monitor action. ---
v2: incorporate danpb's suggestions, to minimize virDomainObjIsActive calls and avoid use of goto for mid-function labels.
src/qemu/qemu_driver.c | 60 ++++++++++++++++++++++++----------------------- 1 files changed, 31 insertions(+), 29 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index b119ca1..f1f8cdf 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -466,7 +466,8 @@ static int ATTRIBUTE_RETURN_CHECK qemuDomainObjEndJob(virDomainObjPtr obj) * obj must be locked before calling, qemud_driver must be unlocked * * To be called immediately before any QEMU monitor API call - * Must have alrady called qemuDomainObjBeginJob(). + * Must have already called qemuDomainObjBeginJob(), and checked + * that the VM is still active. * * To be followed with qemuDomainObjExitMonitor() once complete */ @@ -507,7 +508,7 @@ static void qemuDomainObjExitMonitor(virDomainObjPtr obj) * obj must be locked before calling, qemud_driver must be locked * * To be called immediately before any QEMU monitor API call - * Must have alrady called qemuDomainObjBeginJob(). + * Must have already called qemuDomainObjBeginJob(). * * To be followed with qemuDomainObjExitMonitorWithDriver() once complete */ @@ -525,7 +526,7 @@ static void qemuDomainObjEnterMonitorWithDriver(struct qemud_driver *driver, vir /* obj must NOT be locked before calling, qemud_driver must be unlocked, * and will be locked after returning * - * Should be paired with an earlier qemuDomainObjEnterMonitor() call + * Should be paired with an earlier qemuDomainObjEnterMonitorWithDriver() call */ static void qemuDomainObjExitMonitorWithDriver(struct qemud_driver *driver, virDomainObjPtr obj) { @@ -5077,12 +5078,6 @@ static int qemudDomainSetMemory(virDomainPtr dom, unsigned long newmem) { goto cleanup; }
- if (!virDomainObjIsActive(vm)) { - qemuReportError(VIR_ERR_OPERATION_INVALID, - "%s", _("domain is not running")); - goto cleanup; - } - if (newmem > vm->def->mem.max_balloon) { qemuReportError(VIR_ERR_INVALID_ARG, "%s", _("cannot set memory higher than max memory")); @@ -5092,6 +5087,12 @@ static int qemudDomainSetMemory(virDomainPtr dom, unsigned long newmem) { if (qemuDomainObjBeginJob(vm) < 0) goto cleanup;
+ if (!virDomainObjIsActive(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("domain is not running")); + goto endjob; + } + priv = vm->privateData; qemuDomainObjEnterMonitor(vm); r = qemuMonitorSetBalloon(priv->mon, newmem); @@ -5158,26 +5159,25 @@ static int qemudDomainGetInfo(virDomainPtr dom, } else if (!priv->jobActive) { if (qemuDomainObjBeginJob(vm) < 0) goto cleanup; - - qemuDomainObjEnterMonitor(vm); - err = qemuMonitorGetBalloonInfo(priv->mon, &balloon); - qemuDomainObjExitMonitor(vm); - if (err < 0) { - if (qemuDomainObjEndJob(vm) == 0) - vm = NULL; + if (!virDomainObjIsActive(vm)) + err = 0; + else { + qemuDomainObjEnterMonitor(vm); + err = qemuMonitorGetBalloonInfo(priv->mon, &balloon); + qemuDomainObjExitMonitor(vm); + } + if (qemuDomainObjEndJob(vm) == 0) { + vm = NULL; goto cleanup; }
+ if (err < 0) + goto cleanup; if (err == 0) /* Balloon not supported, so maxmem is always the allocation */ info->memory = vm->def->mem.max_balloon; else info->memory = balloon; - - if (qemuDomainObjEndJob(vm) == 0) { - vm = NULL; - goto cleanup; - } } else { info->memory = vm->def->mem.cur_balloon; } @@ -10510,19 +10510,21 @@ static int qemuDomainGetBlockInfo(virDomainPtr dom, /* ..but if guest is running & not using raw disk format and on a block device, then query highest allocated extent from QEMU */ - if (virDomainObjIsActive(vm) && - disk->type == VIR_DOMAIN_DISK_TYPE_BLOCK && + if (disk->type == VIR_DOMAIN_DISK_TYPE_BLOCK && format != VIR_STORAGE_FILE_RAW && S_ISBLK(sb.st_mode)) { qemuDomainObjPrivatePtr priv = vm->privateData; if (qemuDomainObjBeginJob(vm) < 0) goto cleanup; - - qemuDomainObjEnterMonitor(vm); - ret = qemuMonitorGetBlockExtent(priv->mon, - disk->info.alias, - &info->allocation); - qemuDomainObjExitMonitor(vm); + if (!virDomainObjIsActive(vm)) + ret = 0; + else { + qemuDomainObjEnterMonitor(vm); + ret = qemuMonitorGetBlockExtent(priv->mon, + disk->info.alias, + &info->allocation); + qemuDomainObjExitMonitor(vm); + }
if (qemuDomainObjEndJob(vm) == 0) vm = NULL;
ACK, it's true it makes for a nicer patch than v1 Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On 10/28/2010 08:39 AM, Daniel Veillard wrote:
On Wed, Oct 27, 2010 at 02:28:51PM -0600, Eric Blake wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=638285 - when migrating a guest, it was very easy to provoke a race where an application could query block information on a VM that had just been migrated away. Any time qemu code obtains a job lock, it must also check that the VM was not taken down in the time where it was waiting for the lock.
* src/qemu/qemu_driver.c (qemudDomainSetMemory) (qemudDomainGetInfo, qemuDomainGetBlockInfo): Check that vm still exists after obtaining job lock, before starting monitor action. ---
v2: incorporate danpb's suggestions, to minimize virDomainObjIsActive calls and avoid use of goto for mid-function labels.
ACK, it's true it makes for a nicer patch than v1
Thanks; pushed. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org
participants (3)
-
Daniel P. Berrange
-
Daniel Veillard
-
Eric Blake