[libvirt] [PATCH] blockjob: fix block-stream bandwidth race

With RHEL 6.2, virDomainBlockPull(dom, dev, bandwidth, 0) has a race with non-zero bandwidth: there is a window between the block_stream and block_job_set_speed monitor commands where an unlimited amount of data was let through, defeating the point of a throttle. This race was first identified in commit a9d3495e, and libvirt was able to reduce the size of the window for that race. In the meantime, the qemu developers decided to fix things properly; per this message: https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg03793.html the fix will be in qemu 1.1, and changes block-job-set-speed to use a different parameter name, as well as adding a new optional parameter to block-stream, which eliminates the race altogether. Since our documentation already mentioned that we can refuse a non-zero bandwidth for some hypervisors, I think the best solution is to do just that for RHEL 6.2 qemu, so that the race is obvious to the user. Meanwhile the code must be fixed to honor actual qemu 1.1 naming. * src/qemu/qemu_monitor.h (enum BLOCK_JOB_CMD): Drop unused value. * src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Reject bandwidth during pull with too-old qemu. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONBlockJob): Support difference between RHEL 6.2 and qemu 1.1 block pull. * src/libvirt.c (virDomainBlockPull, virDomainBlockRebase): Document this. --- src/libvirt.c | 6 +++- src/qemu/qemu_driver.c | 8 ++++-- src/qemu/qemu_monitor.h | 3 +- src/qemu/qemu_monitor_json.c | 53 +++++++++++++++++++++++------------------ 4 files changed, 40 insertions(+), 30 deletions(-) diff --git a/src/libvirt.c b/src/libvirt.c index af42d3b..67f324a 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -18145,7 +18145,8 @@ error: * The maximum bandwidth (in Mbps) that will be used to do the copy can be * specified with the bandwidth parameter. If set to 0, libvirt will choose a * suitable default. Some hypervisors do not support this feature and will - * return an error if bandwidth is not 0. + * return an error if bandwidth is not 0; in this case, it might still be + * possible for a later call to virDomainBlockJobSetSpeed() to succeed. * * This is shorthand for virDomainBlockRebase() with a NULL base. * @@ -18263,7 +18264,8 @@ error: * The maximum bandwidth (in Mbps) that will be used to do the copy can be * specified with the bandwidth parameter. If set to 0, libvirt will choose a * suitable default. Some hypervisors do not support this feature and will - * return an error if bandwidth is not 0. + * return an error if bandwidth is not 0; in this case, it might still be + * possible for a later call to virDomainBlockJobSetSpeed() to succeed. * * When @base is NULL and @flags is 0, this is identical to * virDomainBlockPull(). diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index c3555ca..d3aa34d 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11654,6 +11654,11 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, _("partial block pull not supported with this " "QEMU binary")); goto cleanup; + } else if (mode == BLOCK_JOB_PULL && bandwidth) { + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("setting bandwidth at start of block pull not " + "supported with this QEMU binary")); + goto cleanup; } device = qemuDiskPathToAlias(vm, path, &idx); @@ -11676,9 +11681,6 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, * relying on qemu to do this. */ ret = qemuMonitorBlockJob(priv->mon, device, base, bandwidth, info, mode, async); - if (ret == 0 && mode == BLOCK_JOB_PULL && bandwidth) - ret = qemuMonitorBlockJob(priv->mon, device, NULL, bandwidth, NULL, - BLOCK_JOB_SPEED_INTERNAL, async); qemuDomainObjExitMonitorWithDriver(driver, vm); if (ret < 0) goto endjob; diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index f3cdcdd..dc02964 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -530,8 +530,7 @@ typedef enum { BLOCK_JOB_ABORT = 0, BLOCK_JOB_INFO = 1, BLOCK_JOB_SPEED = 2, - BLOCK_JOB_SPEED_INTERNAL = 3, - BLOCK_JOB_PULL = 4, + BLOCK_JOB_PULL = 3, } BLOCK_JOB_CMD; int qemuMonitorBlockJob(qemuMonitorPtr mon, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index eb58f13..de76609 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -3430,22 +3430,29 @@ qemuMonitorJSONBlockJob(qemuMonitorPtr mon, unsigned long bandwidth, virDomainBlockJobInfoPtr info, int mode, - bool async) + bool modern) { int ret = -1; virJSONValuePtr cmd = NULL; virJSONValuePtr reply = NULL; const char *cmd_name = NULL; - if (base && (mode != BLOCK_JOB_PULL || !async)) { + if (base && (mode != BLOCK_JOB_PULL || !modern)) { qemuReportError(VIR_ERR_INTERNAL_ERROR, _("only modern block pull supports base: %s"), base); return -1; } + if (bandwidth && mode == BLOCK_JOB_PULL && !modern) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + _("only modern block pull supports bandwidth: %lu"), + bandwidth); + return -1; + } + bandwidth *= 1024ULL * 1024ULL; /* Scale MiB to bytes */ switch ((BLOCK_JOB_CMD) mode) { case BLOCK_JOB_ABORT: - cmd_name = async ? "block-job-cancel" : "block_job_cancel"; + cmd_name = modern ? "block-job-cancel" : "block_job_cancel"; cmd = qemuMonitorJSONMakeCommand(cmd_name, "s:device", device, NULL); break; case BLOCK_JOB_INFO: @@ -3453,19 +3460,24 @@ qemuMonitorJSONBlockJob(qemuMonitorPtr mon, cmd = qemuMonitorJSONMakeCommand(cmd_name, NULL); break; case BLOCK_JOB_SPEED: - case BLOCK_JOB_SPEED_INTERNAL: - cmd_name = async ? "block-job-set-speed" : "block_job_set_speed"; - cmd = qemuMonitorJSONMakeCommand(cmd_name, "s:device", - device, "U:value", - bandwidth * 1024ULL * 1024ULL, - NULL); + cmd_name = modern ? "block-job-set-speed" : "block_job_set_speed"; + cmd = qemuMonitorJSONMakeCommand(cmd_name, "s:device", device, + modern ? "U:speed" : "U:value", + bandwidth, NULL); break; case BLOCK_JOB_PULL: - cmd_name = async ? "block-stream" : "block_stream"; - cmd = qemuMonitorJSONMakeCommand(cmd_name, - "s:device", device, - base ? "s:base" : NULL, base, - NULL); + cmd_name = modern ? "block-stream" : "block_stream"; + if (bandwidth) + cmd = qemuMonitorJSONMakeCommand(cmd_name, + "s:device", device, + "U:speed", bandwidth, + base ? "s:base" : NULL, base, + NULL); + else + cmd = qemuMonitorJSONMakeCommand(cmd_name, + "s:device", device, + base ? "s:base" : NULL, base, + NULL); break; } @@ -3477,14 +3489,9 @@ qemuMonitorJSONBlockJob(qemuMonitorPtr mon, if (ret == 0 && virJSONValueObjectHasKey(reply, "error")) { ret = -1; if (qemuMonitorJSONHasError(reply, "DeviceNotActive")) { - /* If a job completes before we get a chance to set the - * speed, we don't want to fail the original command. */ - if (mode == BLOCK_JOB_SPEED_INTERNAL) - ret = 0; - else - qemuReportError(VIR_ERR_OPERATION_INVALID, - _("No active operation on device: %s"), - device); + qemuReportError(VIR_ERR_OPERATION_INVALID, + _("No active operation on device: %s"), + device); } else if (qemuMonitorJSONHasError(reply, "DeviceInUse")){ qemuReportError(VIR_ERR_OPERATION_FAILED, _("Device %s in use"), device); @@ -3497,7 +3504,7 @@ qemuMonitorJSONBlockJob(qemuMonitorPtr mon, _("Command '%s' is not found"), cmd_name); } else { qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("Unexpected error")); + _("Unexpected error")); } } -- 1.7.7.6

On Thu, Apr 26, 2012 at 12:04 AM, Eric Blake <eblake@redhat.com> wrote:
With RHEL 6.2, virDomainBlockPull(dom, dev, bandwidth, 0) has a race with non-zero bandwidth: there is a window between the block_stream and block_job_set_speed monitor commands where an unlimited amount of data was let through, defeating the point of a throttle.
This race was first identified in commit a9d3495e, and libvirt was able to reduce the size of the window for that race. In the meantime, the qemu developers decided to fix things properly; per this message: https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg03793.html the fix will be in qemu 1.1, and changes block-job-set-speed to use a different parameter name, as well as adding a new optional parameter to block-stream, which eliminates the race altogether.
Since our documentation already mentioned that we can refuse a non-zero bandwidth for some hypervisors, I think the best solution is to do just that for RHEL 6.2 qemu, so that the race is obvious to the user. Meanwhile the code must be fixed to honor actual qemu 1.1 naming.
* src/qemu/qemu_monitor.h (enum BLOCK_JOB_CMD): Drop unused value. * src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Reject bandwidth during pull with too-old qemu. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONBlockJob): Support difference between RHEL 6.2 and qemu 1.1 block pull. * src/libvirt.c (virDomainBlockPull, virDomainBlockRebase): Document this. --- src/libvirt.c | 6 +++- src/qemu/qemu_driver.c | 8 ++++-- src/qemu/qemu_monitor.h | 3 +- src/qemu/qemu_monitor_json.c | 53 +++++++++++++++++++++++------------------ 4 files changed, 40 insertions(+), 30 deletions(-)
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

On 04/26/2012 03:05 AM, Stefan Hajnoczi wrote:
On Thu, Apr 26, 2012 at 12:04 AM, Eric Blake <eblake@redhat.com> wrote:
With RHEL 6.2, virDomainBlockPull(dom, dev, bandwidth, 0) has a race with non-zero bandwidth: there is a window between the block_stream and block_job_set_speed monitor commands where an unlimited amount of data was let through, defeating the point of a throttle.
This race was first identified in commit a9d3495e, and libvirt was able to reduce the size of the window for that race. In the meantime, the qemu developers decided to fix things properly; per this message: https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg03793.html the fix will be in qemu 1.1, and changes block-job-set-speed to use a different parameter name, as well as adding a new optional parameter to block-stream, which eliminates the race altogether.
Since our documentation already mentioned that we can refuse a non-zero bandwidth for some hypervisors, I think the best solution is to do just that for RHEL 6.2 qemu, so that the race is obvious to the user. Meanwhile the code must be fixed to honor actual qemu 1.1 naming.
* src/qemu/qemu_monitor.h (enum BLOCK_JOB_CMD): Drop unused value. * src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Reject bandwidth during pull with too-old qemu. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONBlockJob): Support difference between RHEL 6.2 and qemu 1.1 block pull. * src/libvirt.c (virDomainBlockPull, virDomainBlockRebase): Document this. --- src/libvirt.c | 6 +++- src/qemu/qemu_driver.c | 8 ++++-- src/qemu/qemu_monitor.h | 3 +- src/qemu/qemu_monitor_json.c | 53 +++++++++++++++++++++++------------------ 4 files changed, 40 insertions(+), 30 deletions(-)
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Thanks. However, I found a bug:
unsigned long bandwidth,
...
+ bandwidth *= 1024ULL * 1024ULL; /* Scale MiB to bytes */
This can overflow, and silently wrap around. v2 coming up. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Wed, Apr 25, 2012 at 17:04:25 -0600, Eric Blake wrote:
With RHEL 6.2, virDomainBlockPull(dom, dev, bandwidth, 0) has a race with non-zero bandwidth: there is a window between the block_stream and block_job_set_speed monitor commands where an unlimited amount of data was let through, defeating the point of a throttle.
This race was first identified in commit a9d3495e, and libvirt was able to reduce the size of the window for that race. In the meantime, the qemu developers decided to fix things properly; per this message: https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg03793.html the fix will be in qemu 1.1, and changes block-job-set-speed to use a different parameter name, as well as adding a new optional parameter to block-stream, which eliminates the race altogether.
Since our documentation already mentioned that we can refuse a non-zero bandwidth for some hypervisors, I think the best solution is to do just that for RHEL 6.2 qemu, so that the race is obvious to the user. Meanwhile the code must be fixed to honor actual qemu 1.1 naming.
OK, this seems reasonable, but I'm concerned about backward compatibility. What if someone is happily using this now with a non-zero bandwidth even though it's racy? With this change, such usage will fail. ...
diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index f3cdcdd..dc02964 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -530,8 +530,7 @@ typedef enum { BLOCK_JOB_ABORT = 0, BLOCK_JOB_INFO = 1, BLOCK_JOB_SPEED = 2, - BLOCK_JOB_SPEED_INTERNAL = 3, - BLOCK_JOB_PULL = 4, + BLOCK_JOB_PULL = 3, } BLOCK_JOB_CMD; ...
Why do we use explicit numbering when this is only used internally? Jirka

On 04/26/2012 09:00 AM, Jiri Denemark wrote:
On Wed, Apr 25, 2012 at 17:04:25 -0600, Eric Blake wrote:
With RHEL 6.2, virDomainBlockPull(dom, dev, bandwidth, 0) has a race with non-zero bandwidth: there is a window between the block_stream and block_job_set_speed monitor commands where an unlimited amount of data was let through, defeating the point of a throttle.
This race was first identified in commit a9d3495e, and libvirt was able to reduce the size of the window for that race. In the meantime, the qemu developers decided to fix things properly; per this message: https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg03793.html the fix will be in qemu 1.1, and changes block-job-set-speed to use a different parameter name, as well as adding a new optional parameter to block-stream, which eliminates the race altogether.
Since our documentation already mentioned that we can refuse a non-zero bandwidth for some hypervisors, I think the best solution is to do just that for RHEL 6.2 qemu, so that the race is obvious to the user. Meanwhile the code must be fixed to honor actual qemu 1.1 naming.
OK, this seems reasonable, but I'm concerned about backward compatibility. What if someone is happily using this now with a non-zero bandwidth even though it's racy? With this change, such usage will fail.
Since there is no upstream qemu release with the race present, the _only_ time you can get a user affected by this change in semantics is if you are using qemu on RHEL 6.2; furthermore, that version of qemu only supported block pull for the QED file format, which is not very popular yet. And given that only RHEL 6.2 qemu is affected, anyone that uses the RHEL 6.2 libvirt will not have any regression so long as this patch is not backported to that version of libvirt. I think it is a small enough set of people to be worth the tightened semantics.
...
diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index f3cdcdd..dc02964 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -530,8 +530,7 @@ typedef enum { BLOCK_JOB_ABORT = 0, BLOCK_JOB_INFO = 1, BLOCK_JOB_SPEED = 2, - BLOCK_JOB_SPEED_INTERNAL = 3, - BLOCK_JOB_PULL = 4, + BLOCK_JOB_PULL = 3, } BLOCK_JOB_CMD; ...
Why do we use explicit numbering when this is only used internally?
No particular reason; I can trim that when pushing, if you'd like. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
participants (3)
-
Eric Blake
-
Jiri Denemark
-
Stefan Hajnoczi