[Libvir] virsh start problem + patch

Hello! I've been trying to use virsh to manage my Xen domains on Fedora 8. I found that I am unable to start any xen managed domains from virsh, because virsh complains that 'Domain is already running'. Actually, I am able to start them once, but once they are stopped, they are impossible to re-start. steps to reprouce: 1. use virt-install to install HVM domain 2. virsh start domain 3. shut down the domain from within the guest (i.e. Start-> shutdown) 4. virsh start domain -> error The problem appears with stock F8 (with current updates), but I've also tried the same with libvirt 4.1, with the same results. I've looked into the virsh code, and it seems that it was written with only only old-style xen in mind, and xen 3.1's managed domains break the logic. I've attached a simple patch against 0.4.1 that checks the actual domain state, instead of simply checking if libvirt knows about the domain. I have experienced a similar problem with virt-manager 0.5.3, where virt-manager is stuck showing 'running' status even after a domain has been shut down, hence it is not able to start the domain. regards István Tóth

On Sat, Mar 08, 2008 at 09:47:43AM +0100, Toth Istvan wrote:
I've looked into the virsh code, and it seems that it was written with only only old-style xen in mind, and xen 3.1's managed domains break the logic.
I don't understand this statement. The current 'cmdStart' code checks if the domain ID is -1 (ie. a managed domain, but inactive), and that seems correct. What's the output of 'xm list --long' for a domain in this unstartable state? Rich. -- Richard Jones, Emerging Technologies, Red Hat http://et.redhat.com/~rjones virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into Xen guests. http://et.redhat.com/~rjones/virt-p2v

On Mon, Mar 10, 2008 at 11:37:19AM +0000, Richard W.M. Jones wrote:
On Sat, Mar 08, 2008 at 09:47:43AM +0100, Toth Istvan wrote:
I've looked into the virsh code, and it seems that it was written with only only old-style xen in mind, and xen 3.1's managed domains break the logic.
I don't understand this statement. The current 'cmdStart' code checks if the domain ID is -1 (ie. a managed domain, but inactive), and that seems correct.
This sounds very much like a bug in XenD to be be honest. If libvirt has got an ID of -1 then the domain is definitely dead - libvirt talks directly to the hypervisor. So if XenD meanwhile things its not dead, then its a XenD bug. Dan. -- |: Red Hat, Engineering, Boston -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Actually, there are two different problems here. ---------------------------------------------------------------------------------------------------------------------------- One is that a freshly installed xen hvm windows domain is stuck in the ----- state, until I reboot Dom0. The way to reproduce the problematic ------ state in xen is to do a fresh install of windows xp by virt-install. # virsh list --all Id Name State ---------------------------------- 0 Domain-0 running - windows-xen shut off - windows-xen2 no state # xm list Name ID Mem VCPUs State Time(s) Domain-0 0 3489 4 r----- 853.6 windows-xen 1 512 0 -b-s-d 39.4 windows-xen2 4 512 0 ------ 22.5 The windows-xen, and windows-xen2 domains were installed the very same way, except I've had a Dom0 reboot since I've installed windows-xen, so xen has had the opportunity to sort it's state out, whil the windows-xen2 domain is a fresh install. Starting and stopping (by xm) a freshly installed windows hvm domain does not sort out the state, only a Dom0 reboot (or a xend restart) does. I have attached the output of xm list --long command. This problem indeed does look like a Xend bug, but it turns out that it does not actually affect virsh. (It does affect virt-manager, as I've written to the other list) -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- The other problem is I am pretty sure, a virsh logic bug, and is independent of the first one. Virsh cannot start ANY managed xen domain, wheter it's stuck in ------, or in a completely legal state. I've added a debug statement to (unpatched virsh) cmdStart here: if (!(dom = vshCommandOptDomainBy(ctl, cmd, "name", NULL, VSH_BYNAME))) return FALSE;
printf("virDomainGetID(dom) returns %d", virDomainGetID(dom));
if (virDomainGetID(dom) != (unsigned int)-1) { vshError(ctl, FALSE, "%s", _("Domain is already active")); virDomainFree(dom); return FALSE; } and I get this: ./virsh start windows-xen <warnings> error: Domain is already active virDomainGetID(dom) returns 1 ./virsh start windows-xen2 <warnings> error: Domain is already active virDomainGetID(dom) returns 3 so virDomainGetID() does not return -1, but returns the actual xen domain id of the managed, but inactive xen domain, and I believe this is what it should do, as it's job is not to tell us about the state of the domain, but to tell the id of the domain, regardles of its state. Xend knows about the domain, so libvirt knows about the domain, it does have an id despite being inactive, and virDominGetID tells it to us. That's the way a managed xen domain should work in my opinion. Back in the day of unmanaged xen domains, this code did the right thing, as Xend knew only about running domains, but managed domains changed the semantics,and now defined but inactive domains have domain ids. So I still think that we do need to check the domain state in cmdStart, as my patch does. Actually virsh start should work without any state check, as the lower layers would throw an error if we tried to start a running domain anyway, but it would be sloppy. regards István Tóth Daniel P. Berrange wrote:
On Mon, Mar 10, 2008 at 11:37:19AM +0000, Richard W.M. Jones wrote:
On Sat, Mar 08, 2008 at 09:47:43AM +0100, Toth Istvan wrote:
I've looked into the virsh code, and it seems that it was written with only only old-style xen in mind, and xen 3.1's managed domains break the logic.
I don't understand this statement. The current 'cmdStart' code checks if the domain ID is -1 (ie. a managed domain, but inactive), and that seems correct.
This sounds very much like a bug in XenD to be be honest. If libvirt has got an ID of -1 then the domain is definitely dead - libvirt talks directly to the hypervisor. So if XenD meanwhile things its not dead, then its a XenD bug.
Dan.
[?1034h(domain (domid 0) (on_crash restart) (uuid 00000000-0000-0000-0000-000000000000) (bootloader_args ) (vcpus 4) (name Domain-0) (on_poweroff destroy) (on_reboot restart) (bootloader ) (maxmem 16777215) (memory 3489) (shadow_memory 0) (cpu_weight 256) (cpu_cap 0) (features ) (on_xend_start ignore) (on_xend_stop ignore) (cpu_time 495.244784973) (online_vcpus 4) (image (linux (kernel ))) (status 2) (state r-----) ) (domain (domid 1) (on_crash destroy) (uuid 12951667-5f44-a4a4-79f2-614a4c536722) (bootloader_args ) (vcpus 1) (name windows-xen) (on_poweroff destroy) (on_reboot destroy) (bootloader ) (maxmem 512) (memory 512) (shadow_memory 5) (cpu_weight 256) (cpu_cap 0) (features ) (on_xend_start ignore) (on_xend_stop ignore) (start_time 1205208695.79) (cpu_time 39.441136677) (online_vcpus 0) (image (hvm (kernel /usr/lib/xen/boot/hvmloader) (boot c) (device_model /usr/lib64/xen/bin/qemu-dm) (localtime 1) (pae 1) (rtc_timeoffset 4294934920) (serial pty) (usb 1) (usbdevice tablet) (notes (SUSPEND_CANCEL 1)) ) ) (status 0) (state -b-s-d) (store_mfn 131070) ) (domain (domid 3) (on_crash destroy) (uuid 18dae3d6-a63b-7cf3-3d85-1301963a4a73) (bootloader_args ) (vcpus 1) (name windows-xen2) (on_poweroff destroy) (on_reboot destroy) (bootloader ) (maxmem 512) (memory 512) (shadow_memory 5) (cpu_weight 256) (cpu_cap 0) (features ) (on_xend_start ignore) (on_xend_stop ignore) (start_time 1205210040.08) (cpu_time 240.302828584) (online_vcpus 1) (image (hvm (kernel /usr/lib/xen/boot/hvmloader) (boot c) (device_model /usr/lib64/xen/bin/qemu-dm) (localtime 1) (pae 1) (rtc_timeoffset 4294967295) (serial pty) (usb 1) (usbdevice tablet) (notes (SUSPEND_CANCEL 1)) ) ) (status 0) (state ------) (store_mfn 131070) )
participants (4)
-
Daniel P. Berrange
-
Richard W.M. Jones
-
Toth Istvan
-
Tóth István