On Tue, Sep 02, 2008 at 07:38:59AM +0200, Chris Lalancette wrote:
> Richard W.M. Jones wrote:
> > On Fri, Aug 29, 2008 at 10:44:50AM +0200, Chris Lalancette wrote:
> >> Things that I've missed?
> >
> > Maybe a good place for this list is on the wiki? On the actual
> > feature/todo page.
> >
> > I'd like to see where KVM is going to go with this, since it seems
> > they are going to implement migration checking.
>
> Yes, OK, I've put that there now. I also want to see what KVM does here;
> however, I don't think that prevents us from implementing our own, since we
> would still need similar things for other hypervisors (Xen, etc.).
Following on from this, there are some things it is simply not practical
for the underlying hypervisor to check for itself - specifically things
that require knowledge outside the scope of the HV. For example, how
would the hypervisor ever know whether /dev/sda1 on the source box was
the same as /dev/sda1 on the destination box. This information is only
available at a higher level. Indeed for some of this, even libvirt can
not answer the question, and oVirt would have to make decisions directly.
Does mandating the use of lablels or UUID solve the disk naming problem?
Looking at Chris' list of things to check for I think one thing is very
clear - a simple boolean test is not a useful API model at the libvirt,
let alone the hypervisor level. There are a series of items that need to
be checked. Some may appear have a straight yes/no answer, but in fact
the eventual decision is a matter of application policy.
For example it may seem 'obvious' that you cannot migrate a i386 guest
from an x86_64 host onto a PPC64 host. This would be a bad assumption
though, because you may be quite happy to run it on the destination host
under QEMU's x86_64 or i686 emulator. Whether such a migration is
acceptable is totally dependant on the SLA requirements of the application
running inside the guest. So you have a simple yes/no answer, but with
multiple values of 'yes', some better than others.
In other cases you may not be able to produce a yes/no answer, having to
apply heuristics. A hueristic may have a firm negative, and a probable
positive; a probable negative and a firm positive; a probable negative
and a probably positive. For example, checking CPU flag compatability.
If the source has SSE3, and the destinatio nonly have SSE2, you may or
may not be safely able to migrate depending on whether any app in the
guest has probed for & is using SSE3 instructions. Most mgmt tools will
just be conservative in this scenario and refuse to migrate. Or they
will mask CPU flags to lowest-common denominator. Another example is
a guest whose disks are on firewire/usb storage. You can check this and
if /dev/sda on the source & destination has different model/vendor/serial
you can say they're different disks. If they are the same model/vendor/
serial they may or may not be the same physical disks - it is possible
to get multi-homed firewire disks even if its not common.
There is also a problem of race conditions in the checking vs action.
A guest is using 1 GB of ram, and needs to be on a dedicated NUMA node
with 1 GB ram free. Between the time of performing the check and the
guest being migrated the situation may have changed - other guests may
have auto-ballooned up/down, the kernel itself may have consumed memory
on the desired NUMA node for its own purposes (disk/io caches), or other
user apps may have used/released memory. So we can say there's probably
enough free memory for the guest to migrate and have all its allocations
on a single node, but we can't easily guarentee it. Do we apply some
safety margin in our checks ? eg, check for 1.2 GB free if the guest
requires 1 GB ? Do we check, and then pre-reserve it, and then check
again before migrating. Or just accept that some migrations will fail
and make damn sure the VM is guarentee to keep running safely on its
original host. Or all of the above
Finally there is a problem of some compatability factors requiring some
amount of host 'setup'. If a guest is using iSCSI as its storage, then
there is a step where the host has to login to the iSCSI target and create
device nodes for the LUNs before the guest can be run. You don't want
every single host to be logged into all your iSCSI targets all the time.
So what do you do for your migration check wrt to iSCSI ? Do you just
check that both hosts can access the same iSCSI server + target ? That
might not detect LUN masking/zoning well enough. So you probably need
to actually do all the iSCSI setup on the target before doing the migrate
compatability check. And if you decide not to migrate after that, you'll
want to tear the iSCSI stuff down again.
This all makes it very hard to think of an API for 'checking' migration
compatability between 2 hosts. The best option I can think of is something
along the lines of having the application provide a list of 'facts' it
wants checked, and getting back a list of answers one per fact, with a
set of values 'no', 'yes', 'probably-yes', 'probably-no', 'no-idea'.
I'm really not sure it that would even be useful though. Maybe libvirt
should stick to just providing as rich a set of metadata about all aspects
of a host & VM as possible, and letting applications do all comparisons.
Then again I hate the idea of having do duplicate comparisons across all
apps using libvirt.
Daniel