Actually, there are two different problems here.
----------------------------------------------------------------------------------------------------------------------------
One is that a freshly installed xen hvm windows domain is stuck in the
----- state, until I reboot Dom0.
The way to reproduce the problematic ------ state in xen is to do a
fresh install of windows xp by virt-install.
# virsh list --all
Id Name State
----------------------------------
0 Domain-0 running
- windows-xen shut off
- windows-xen2 no state
# xm list
Name ID Mem VCPUs State
Time(s)
Domain-0 0 3489 4 r-----
853.6
windows-xen 1 512 0
-b-s-d 39.4
windows-xen2 4 512 0
------ 22.5
The windows-xen, and windows-xen2 domains were installed the very same
way, except I've had a Dom0 reboot since I've installed windows-xen, so
xen has had the opportunity to sort it's state out, whil the
windows-xen2 domain is a fresh install. Starting and stopping (by xm) a
freshly installed windows hvm domain does not sort out the state, only a
Dom0 reboot (or a xend restart) does.
I have attached the output of xm list --long command.
This problem indeed does look like a Xend bug, but it turns out that it
does not actually affect virsh. (It does affect virt-manager, as I've
written to the other list)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
The other problem is I am pretty sure, a virsh logic bug, and is
independent of the first one.
Virsh cannot start ANY managed xen domain, wheter it's stuck in ------,
or in a completely legal state.
I've added a debug statement to (unpatched virsh) cmdStart here:
if (!(dom = vshCommandOptDomainBy(ctl, cmd, "name", NULL, VSH_BYNAME)))
return FALSE;
>> printf("virDomainGetID(dom) returns %d",
virDomainGetID(dom));
if (virDomainGetID(dom) != (unsigned int)-1) {
vshError(ctl, FALSE, "%s", _("Domain is already active"));
virDomainFree(dom);
return FALSE;
}
and I get this:
./virsh start windows-xen
<warnings>
error: Domain is already active
virDomainGetID(dom) returns 1
./virsh start windows-xen2
<warnings>
error: Domain is already active
virDomainGetID(dom) returns 3
so virDomainGetID() does not return -1, but returns the actual xen
domain id of the managed, but inactive xen domain, and I believe this is
what it should do,
as it's job is not to tell us about the state of the domain, but to tell
the id of the domain, regardles of its state.
Xend knows about the domain, so libvirt knows about the domain, it does
have an id despite being inactive, and virDominGetID tells it to us.
That's the way a managed xen domain should work in my opinion.
Back in the day of unmanaged xen domains, this code did the right thing,
as Xend knew only about running domains, but managed domains changed the
semantics,and now defined but inactive domains have domain ids.
So I still think that we do need to check the domain state in cmdStart,
as my patch does.
Actually virsh start should work without any state check, as the lower
layers would throw an error if we tried to start a running domain
anyway, but it would be sloppy.
regards
István Tóth
Daniel P. Berrange wrote:
On Mon, Mar 10, 2008 at 11:37:19AM +0000, Richard W.M. Jones wrote:
> On Sat, Mar 08, 2008 at 09:47:43AM +0100, Toth Istvan wrote:
>
>> I've looked into the virsh code, and it seems that it was written with
>> only only old-style xen in mind, and xen 3.1's managed domains break the
>> logic.
>>
> I don't understand this statement. The current 'cmdStart' code checks
> if the domain ID is -1 (ie. a managed domain, but inactive), and that
> seems correct.
>
This sounds very much like a bug in XenD to be be honest. If libvirt has got
an ID of -1 then the domain is definitely dead - libvirt talks directly to
the hypervisor. So if XenD meanwhile things its not dead, then its a XenD
bug.
Dan.
[?1034h(domain
(domid 0)
(on_crash restart)
(uuid 00000000-0000-0000-0000-000000000000)
(bootloader_args )
(vcpus 4)
(name Domain-0)
(on_poweroff destroy)
(on_reboot restart)
(bootloader )
(maxmem 16777215)
(memory 3489)
(shadow_memory 0)
(cpu_weight 256)
(cpu_cap 0)
(features )
(on_xend_start ignore)
(on_xend_stop ignore)
(cpu_time 495.244784973)
(online_vcpus 4)
(image (linux (kernel )))
(status 2)
(state r-----)
)
(domain
(domid 1)
(on_crash destroy)
(uuid 12951667-5f44-a4a4-79f2-614a4c536722)
(bootloader_args )
(vcpus 1)
(name windows-xen)
(on_poweroff destroy)
(on_reboot destroy)
(bootloader )
(maxmem 512)
(memory 512)
(shadow_memory 5)
(cpu_weight 256)
(cpu_cap 0)
(features )
(on_xend_start ignore)
(on_xend_stop ignore)
(start_time 1205208695.79)
(cpu_time 39.441136677)
(online_vcpus 0)
(image
(hvm
(kernel /usr/lib/xen/boot/hvmloader)
(boot c)
(device_model /usr/lib64/xen/bin/qemu-dm)
(localtime 1)
(pae 1)
(rtc_timeoffset 4294934920)
(serial pty)
(usb 1)
(usbdevice tablet)
(notes (SUSPEND_CANCEL 1))
)
)
(status 0)
(state -b-s-d)
(store_mfn 131070)
)
(domain
(domid 3)
(on_crash destroy)
(uuid 18dae3d6-a63b-7cf3-3d85-1301963a4a73)
(bootloader_args )
(vcpus 1)
(name windows-xen2)
(on_poweroff destroy)
(on_reboot destroy)
(bootloader )
(maxmem 512)
(memory 512)
(shadow_memory 5)
(cpu_weight 256)
(cpu_cap 0)
(features )
(on_xend_start ignore)
(on_xend_stop ignore)
(start_time 1205210040.08)
(cpu_time 240.302828584)
(online_vcpus 1)
(image
(hvm
(kernel /usr/lib/xen/boot/hvmloader)
(boot c)
(device_model /usr/lib64/xen/bin/qemu-dm)
(localtime 1)
(pae 1)
(rtc_timeoffset 4294967295)
(serial pty)
(usb 1)
(usbdevice tablet)
(notes (SUSPEND_CANCEL 1))
)
)
(status 0)
(state ------)
(store_mfn 131070)
)