Re: [libvirt] [PATCH] qemu: call drive_unplug in DetachPciDiskDevice

21 Oct 2010

      * Ryan Harper <ryanh@us.ibm.com> [2010-10-21 12:43]:
...
* Daniel P. Berrange <berrange@redhat.com> [2010-10-21 12:10]:
...
On Thu, Oct 21, 2010 at 11:53:29AM -0500, Ryan Harper wrote:
...
* Daniel P. Berrange <berrange@redhat.com> [2010-10-21 11:46]:
...
On Thu, Oct 21, 2010 at 08:50:35AM -0500, Ryan Harper wrote:
...
Currently libvirt doesn't confirm whether the guest has responded to the
disk removal request.  In some cases this can leave the guest with
continued access to the device while the mgmt layer believes that it has
been removed.  With a recent qemu monitor command[1] we can
deterministically revoke a guests access to the disk (on the QEMU side)
to ensure no futher access is permitted.
This patch adds support for the drive_unplug() command and introduces it
in the disk removal paths.  There is some discussion to be had about how
to handle the case where the guest is running in a QEMU without this
command (and the fact that we currently don't have a way of detecting
what monitor commands are available).
Basically we try to run the command and then catch the failure.
For QMP, we can check for a error with a class of 'CommandNotFound',
For HMP, QEMU will print 'unknown command' in the reply.
Neither is ideal, since neither is a guarenteed part of the monitor
interface, but it is all we have to go on, and ensure other critical
errors will still be treated as fatal by libvirt.
Yeah, this sorta already works without explictly catching the error
(unless you want to display a different message at caller level than
command not found).
This would cause the hot-unplug operation to completely terminate
though, whereas we want it to carry on, if the command is not
present. If we get a command-not-found, then we should syslog a 
warning, and then return '0' (ie success) to the caller of
qemuMonitorDriveUnplug
Alternatively, return '+1' to the caller, and make the callers
check for '+1', syslog it and treat it as non-fatal. This is
closer to the approach we use for handling 'balloon device not
present' error when code runs the 'info balloon' method.
...
...
...
My current implementation assumes that if you don't have a QEMU with
this capability that we should fail the device removal.  This is a
strong statement around hotplug that isn't consistent with previous
releases so I'm open to other approachs, but given the potential data
leakage problem hot-remove can lead to without drive_unplug, I think
it's the right thing to do.
I don't think we can do this, since it obviously breaks every single
existing deployment out there. Users who have sVirt enabled will
have a level of protection from the data leakage, so I don't think
it is a severe problem.
Yeah, I was thinking we could re-work the logic to only completely fail
in the case where the command wasn't found/successful *and* we don't
have sVirt enabled.
I'm still wary of doing that because of the significant negative
impact on existing deployments out there, even in the case where
hotunplug would complete just fine. I think we'll just need to
syslog the potential issue and people can upgrade their QMEU if
they need the extra security
Thinking on this again; if we don't have a confirmation that the guest
has removed the pci device (independent of whether we've revoked access
to the underlying block device); then I really think we should fail the
device removal; at the very least we cannot update libvirt state to
indicate that the slot is free for re-assignment.
This is why I wanted to do a failure of the PciDiskRemove() as it
address this though it does block users in the case where the guest may
respond and running on a qemu without unplug support.

I wonder if we've unplugged the disk, the guest hasn't responded, but we
update the XML that it isn't present; we should fail subsequent adds to
the same slot in the guest in which case, we'd never find the case where
we'd be out-of-sync with the guest w.r.t different disks in the same
slot.  

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com