
On Tue, Jul 28, 2009 at 10:15:10AM +0200 Daniel Veillard wrote:
On Tue, Jul 28, 2009 at 08:30:12AM +0200, Jonas Eriksson wrote:
Hi,
I have been examining a bug where libvirtd (and virsh) does not show all virtual machines on a xen host. This proved to be because of this program flow: 1. virConnectNumOfDomains -> .. -> xenUnifiedNumOfDomains -> xenHypervisorNumOfDomains => 3 2. virConnectListDomains(max=3) -> .. -> xenUnifiedListDomains(max=3) -> xenStoreNumOfDomains(max=3) => { 0, 2, 7 }
Actually the problem I see is that ListDomains should really go through the Hypervisor API i.e. xenHypervisorListDomains(), which is *way* faster and garanteed to be acurate. We should try the hypervisor first, IMHO, the function code was modified end of last year to avoid Xend not properly cleaning up:
http://www.mail-archive.com/libvir-list@redhat.com/msg09855.html
Ah, yes.. This seems very related. I remember seeing this earlier but never had the time to look into it back then.
The problem is that we have put xenstore driver call first, while it's clearly slower and has a higher chance of getting things wrong than the hypervisor itself (if the HV get it wrong I guess there is no cure :-)
Could you try changing xenUnifiedListDomains() and make the xenHypervisorListDomains try first, then check if it still works with 3.3, if yes that's even better. Now the patch looks reasonable to me but I think the reorder should be done too if this works, and will lead to really good results ... as long as the hypercall works.
Yes, this works. I tried with the reordering in xenUnifiedListDomains, both with and without my patch to make sure that libvirt used the Hypervisor interface instead of XenStore. Do you want a revised patch for this? In that case, I think that XenStore should be last in the prio-list due to both the performance issues, not helped by my patch, and because it behaves..differently. /Jonas -- Jonas Eriksson Consultant at AS/EAB/FLJ/IL Phone: +46 8 58086281 Combitech AB Älvsjö, Sweden