[libvirt] libvirt + xen 3.2.1 oddities

Hi, I just ran across these oddities when using a bit more libvirt+xen: 1.) virsh setmaxmem: On a running domain: # virsh setmaxmem domain 256000 completes but virsh dumpxml as well as the config.sxp still shows the old amount of memory. Looks as the set_maxmem hypercall simply gets ignored. xm mem-max works as expected. Smells like a bug in the ioctl? 2.) virsh list: Sometimes (didn't find a pattern yet) when shutting down a running domains and restarting it I'm seeing: Id Name State ---------------------------------- 0 Domain-0 running 2 foo idle libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon: libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon: libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon: libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon: 7 bar idle Note that the number of errors the corresponds to the number of shutdowns. VirXen_getdomaininfolist returns 7 in the above case. virDomainLookupByID later on fails for these "additional" domains. 3.) virsh list: Duplicate domains: Id Name State ---------------------------------- 0 Domain-0 running libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon: libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon: libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon: 14 bar no state libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon: 16 bar idle Domain 14 can't be shut down (xm list only lists domain 16). Could be a similar problem as the above. This is all libvirt 0.4.6 (but the code looks very similar in gurrent CVS) and xen-3.2.1 on Debian. The detected ABI is hypervisor call v2, sys ver6 dom ver5. And from a quick glance at the libxen-dev package the structs seem to match. Cheers, -- Guido

On Fri, Nov 21, 2008 at 11:13:04PM +0100, Guido G?nther wrote:
Hi, I just ran across these oddities when using a bit more libvirt+xen:
1.) virsh setmaxmem:
On a running domain: # virsh setmaxmem domain 256000 completes but virsh dumpxml as well as the config.sxp still shows the old amount of memory. Looks as the set_maxmem hypercall simply gets ignored. xm mem-max works as expected. Smells like a bug in the ioctl?
The setmaxmem API is not performance critical, so it sounds like we should first try setting it via XenD, and use Hypervisor as the fallback instead.
2.) virsh list:
Sometimes (didn't find a pattern yet) when shutting down a running domains and restarting it I'm seeing:
Id Name State ---------------------------------- 0 Domain-0 running 2 foo idle libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon: libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon: libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon: libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon: 7 bar idle
Note that the number of errors the corresponds to the number of shutdowns. VirXen_getdomaininfolist returns 7 in the above case. virDomainLookupByID later on fails for these "additional" domains.
This is basically a XenD bug. What's happening is that the domain has been shutdown, and got most of the way through cleanup, as far as the hypervisor is concerned. But something is still hanging around keeping the domain from being completely terminated. In this case XenD takes the dubious approach of just pretending the domain does not exist. So libvirt sees it exists in the hypervisor, but when asking XenD for more data, it gets that error. This really really sucks. THere's not really much we can do about it when XenD is just plain lieing about what exists. We explicitly don't ask XenD for the list of domain IDs because it is incredibly slow, hence we use the HV. The only idea I can think of is to ask XenStore for the list of domain IDs. This is still dramatically faster than asking XenD, but not quite as fast as the Hypervisor.
3.) virsh list: Duplicate domains:
Id Name State ---------------------------------- 0 Domain-0 running libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon:
2A> libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon:
libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon: 14 bar no state libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon: 16 bar idle
Domain 14 can't be shut down (xm list only lists domain 16).
Could be a similar problem as the above.
Yeha, this is almost certainly just another example of XenD not properly cleaning up / destroying domains. If you still have a machine which shows this behaviour, then I'd recommend trying this change to our Xen impl In xen_unified.c, find the method xenUnifiedListDomains and make it first call xenStoreListDomains() and then fallback to trying HV & XenD drivers. If we're lucky this will help.... Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Tue, Nov 25, 2008 at 11:39:57AM +0000, Daniel P. Berrange wrote:
On Fri, Nov 21, 2008 at 11:13:04PM +0100, Guido G?nther wrote:
Hi, I just ran across these oddities when using a bit more libvirt+xen:
1.) virsh setmaxmem:
On a running domain: # virsh setmaxmem domain 256000 completes but virsh dumpxml as well as the config.sxp still shows the old amount of memory. Looks as the set_maxmem hypercall simply gets ignored. xm mem-max works as expected. Smells like a bug in the ioctl?
The setmaxmem API is not performance critical, so it sounds like we should first try setting it via XenD, and use Hypervisor as the fallback instead. I tried that and it worked as you suggested. However, checking the "old" method of using HV out of a sudden works too now, no idea why that reliably failed the last time and doesn't do so now (the machine has been rebooted in the meantime though). I keep the patched package around in case this pops up again.
2.) virsh list:
Sometimes (didn't find a pattern yet) when shutting down a running domains and restarting it I'm seeing:
Id Name State ---------------------------------- 0 Domain-0 running 2 foo idle libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon: libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon: libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon: libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon: 7 bar idle
Note that the number of errors the corresponds to the number of shutdowns. VirXen_getdomaininfolist returns 7 in the above case. virDomainLookupByID later on fails for these "additional" domains.
This is basically a XenD bug. What's happening is that the domain has been shutdown, and got most of the way through cleanup, as far as the hypervisor is concerned. But something is still hanging around keeping the domain from being completely terminated. In this case XenD takes the dubious approach of just pretending the domain does not exist. So libvirt sees it exists in the hypervisor, but when asking XenD for more data, it gets that error. This really really sucks.
THere's not really much we can do about it when XenD is just plain lieing about what exists. We explicitly don't ask XenD for the list of domain IDs because it is incredibly slow, hence we use the HV.
The only idea I can think of is to ask XenStore for the list of domain IDs. This is still dramatically faster than asking XenD, but not quite as fast as the Hypervisor.
3.) virsh list: Duplicate domains:
Id Name State ---------------------------------- 0 Domain-0 running libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon:
2A> libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon:
libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon: 14 bar no state libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon: 16 bar idle
Domain 14 can't be shut down (xm list only lists domain 16).
Could be a similar problem as the above.
Yeha, this is almost certainly just another example of XenD not properly cleaning up / destroying domains. If you still have a machine which shows this behaviour, then I'd recommend trying this change to our Xen impl
In xen_unified.c, find the method xenUnifiedListDomains and make it first call xenStoreListDomains() and then fallback to trying HV & XenD drivers. If we're lucky this will help....
Yes this helps indeed, thanks a lot. Possible patch attached. -- Guido

Daniel P. Berrange wrote:
On Fri, Nov 21, 2008 at 11:13:04PM +0100, Guido G?nther wrote:
Hi, I just ran across these oddities when using a bit more libvirt+xen:
1.) virsh setmaxmem:
On a running domain: # virsh setmaxmem domain 256000 completes but virsh dumpxml as well as the config.sxp still shows the old amount of memory. Looks as the set_maxmem hypercall simply gets ignored. xm mem-max works as expected. Smells like a bug in the ioctl?
The setmaxmem API is not performance critical, so it sounds like we should first try setting it via XenD, and use Hypervisor as the fallback instead.
I have a patch for 0.4.6 in suse packages to do just this. Using xend, you also get the value changed in dom config.
2.) virsh list:
Sometimes (didn't find a pattern yet) when shutting down a running domains and restarting it I'm seeing:
Id Name State ---------------------------------- 0 Domain-0 running 2 foo idle libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon: libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon: libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon: libvir: Xen Daemon error : GET operation failed: xend_get: error from xen daemon: 7 bar idle
Note that the number of errors the corresponds to the number of shutdowns. VirXen_getdomaininfolist returns 7 in the above case. virDomainLookupByID later on fails for these "additional" domains.
This is basically a XenD bug. What's happening is that the domain has been shutdown, and got most of the way through cleanup, as far as the hypervisor is concerned. But something is still hanging around keeping the domain from being completely terminated. In this case XenD takes the dubious approach of just pretending the domain does not exist. So libvirt sees it exists in the hypervisor, but when asking XenD for more data, it gets that error. This really really sucks.
I spent some time looking into this bug as well. I found that we ask HV for number of domains and get back more than actually exist. We subsequently query xend about such domains and get the error message noted. It turned out being a 'dead domain' memory leak in xen itself. Jan Beulich plugged the hole and sent patch upstream but I can't seem to find the relevant c/s now :-(. Anyhow, with Jan's fix I no longer see these error messages. Cheers, Jim

On Wed, Nov 26, 2008 at 11:14:51AM -0700, Jim Fehlig wrote:
Daniel P. Berrange wrote:
On Fri, Nov 21, 2008 at 11:13:04PM +0100, Guido G?nther wrote:
Hi, I just ran across these oddities when using a bit more libvirt+xen:
1.) virsh setmaxmem:
On a running domain: # virsh setmaxmem domain 256000 completes but virsh dumpxml as well as the config.sxp still shows the old amount of memory. Looks as the set_maxmem hypercall simply gets ignored. xm mem-max works as expected. Smells like a bug in the ioctl?
The setmaxmem API is not performance critical, so it sounds like we should first try setting it via XenD, and use Hypervisor as the fallback instead.
I have a patch for 0.4.6 in suse packages to do just this. Using xend, you also get the value changed in dom config.
Do send the patch to the list & we'll see about applying it...
This is basically a XenD bug. What's happening is that the domain has been shutdown, and got most of the way through cleanup, as far as the hypervisor is concerned. But something is still hanging around keeping the domain from being completely terminated. In this case XenD takes the dubious approach of just pretending the domain does not exist. So libvirt sees it exists in the hypervisor, but when asking XenD for more data, it gets that error. This really really sucks.
I spent some time looking into this bug as well. I found that we ask HV for number of domains and get back more than actually exist. We subsequently query xend about such domains and get the error message noted. It turned out being a 'dead domain' memory leak in xen itself. Jan Beulich plugged the hole and sent patch upstream but I can't seem to find the relevant c/s now :-(. Anyhow, with Jan's fix I no longer see these error messages.
This seems to be quite a common problem for a number of users. Its good that Xen has a fix now, but if switching to querying XenStore for domain IDs make it work we should do that as a preventative measure in libvirt Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Tue, Nov 25, 2008 at 11:39:57AM +0000, Daniel P. Berrange wrote:
Yeha, this is almost certainly just another example of XenD not properly cleaning up / destroying domains. If you still have a machine which shows this behaviour, then I'd recommend trying this change to our Xen impl
In xen_unified.c, find the method xenUnifiedListDomains and make it first call xenStoreListDomains() and then fallback to trying HV & XenD drivers. If we're lucky this will help.... Yes this helps indeed, thanks a lot. Possible patch attached. -- Guido

On Thu, Nov 27, 2008 at 05:04:05PM +0100, Guido G?nther wrote:
On Tue, Nov 25, 2008 at 11:39:57AM +0000, Daniel P. Berrange wrote:
Yeha, this is almost certainly just another example of XenD not properly cleaning up / destroying domains. If you still have a machine which shows this behaviour, then I'd recommend trying this change to our Xen impl
In xen_unified.c, find the method xenUnifiedListDomains and make it first call xenStoreListDomains() and then fallback to trying HV & XenD drivers. If we're lucky this will help.... Yes this helps indeed, thanks a lot. Possible patch attached.
Thanks for confirming. ACK to this patch - it looks reasonable to me, and shouldn't have any serious performance implication. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Thu, Nov 27, 2008 at 04:07:48PM +0000, Daniel P. Berrange wrote:
On Thu, Nov 27, 2008 at 05:04:05PM +0100, Guido G?nther wrote:
On Tue, Nov 25, 2008 at 11:39:57AM +0000, Daniel P. Berrange wrote:
Yeha, this is almost certainly just another example of XenD not properly cleaning up / destroying domains. If you still have a machine which shows this behaviour, then I'd recommend trying this change to our Xen impl
In xen_unified.c, find the method xenUnifiedListDomains and make it first call xenStoreListDomains() and then fallback to trying HV & XenD drivers. If we're lucky this will help.... Yes this helps indeed, thanks a lot. Possible patch attached.
Thanks for confirming. ACK to this patch - it looks reasonable to me, and shouldn't have any serious performance implication. O.k., commited now. -- Guido
participants (3)
-
Daniel P. Berrange
-
Guido Günther
-
Jim Fehlig