[Libvir] VIR_DOMAIN_NOSTATE

On Solaris-based xen 3.1.2 I am seeing virDomainGetInfo() often returning VIR_DOMAIN_NOSTATE in dominfo.state for domUs that are powered on and running. What exactly does VIR_DOMAIN_NOSTATE mean? Why does it exist? I suspect it is a catch-all in the API for hosts that don't entirely have their act together, that do not always return a valid guest state. Is that the case? Is anyone else seeing this on Xen? -- ----------------------------------------------------- Russ Blaine | Solaris Kernel | russell.blaine@sun.com

Russ Blaine wrote:
On Solaris-based xen 3.1.2 I am seeing virDomainGetInfo() often returning VIR_DOMAIN_NOSTATE in dominfo.state for domUs that are powered on and running. What exactly does VIR_DOMAIN_NOSTATE mean? Why does it exist? I suspect it is a catch-all in the API for hosts that don't entirely have their act together, that do not always return a valid guest state. Is that the case?
Is anyone else seeing this on Xen?
It means that we query xcnd and there is no domain/state part of the sexpr for the domain. What does 'xm list --long' say? Rich. -- Emerging Technologies, Red Hat - http://et.redhat.com/~rjones/ Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 03798903

Richard W.M. Jones wrote:
It means that we query xcnd ^^^ Not sure what happened there ... xend
Rich. -- Emerging Technologies, Red Hat - http://et.redhat.com/~rjones/ Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 03798903

Richard W.M. Jones wrote:
Russ Blaine wrote:
On Solaris-based xen 3.1.2 I am seeing virDomainGetInfo() often returning VIR_DOMAIN_NOSTATE in dominfo.state for domUs that are powered on and running. What exactly does VIR_DOMAIN_NOSTATE mean? Why does it exist? I suspect it is a catch-all in the API for hosts that don't entirely have their act together, that do not always return a valid guest state. Is that the case?
Is anyone else seeing this on Xen?
It means that we query xcnd and there is no domain/state part of the sexpr for the domain.
What does 'xm list --long' say?
[ libvirt 0.4.0; I just subscribed to the list so it might take a bit for traffic to flow to me. ] When the problem occurs, the state line in xm list --long is: (state ------) And in the 'xm list' it looks like this: Name ID Mem VCPUs State Time(s) abc2 27 512 1 ------ 9.9 It's an XP domain, and this happens while it's booting up. - Russ ----------------------------------------------------- Russ Blaine | Solaris Kernel | russell.blaine@sun.com

On Fri, Jan 25, 2008 at 06:39:00PM -0800, Russ Blaine wrote:
Richard W.M. Jones wrote:
Russ Blaine wrote:
On Solaris-based xen 3.1.2 I am seeing virDomainGetInfo() often returning VIR_DOMAIN_NOSTATE in dominfo.state for domUs that are powered on and running. What exactly does VIR_DOMAIN_NOSTATE mean? Why does it exist? I suspect it is a catch-all in the API for hosts that don't entirely have their act together, that do not always return a valid guest state. Is that the case?
Is anyone else seeing this on Xen?
It means that we query xcnd and there is no domain/state part of the sexpr for the domain.
Actually we will most likely be getting this data direct from the hypervisor rather than XenD.
[ libvirt 0.4.0; I just subscribed to the list so it might take a bit for traffic to flow to me. ]
When the problem occurs, the state line in xm list --long is: (state ------)
And in the 'xm list' it looks like this: Name ID Mem VCPUs State Time(s) abc2 27 512 1 ------ 9.9
Can you edit the src/xen_internal.c file and in the xenHypervisorGetDomInfo() method, add a printf() for the 'domain_flags' flags variable. IMHO the way we deal with this isn't quite correct. We merely mask out the high bits and then switch on the resulting value. This isn't the way the Xen hypervisor uses this field though. Xen more or less uses the whole thing as as bitmask, allowing near arbitrary combinations of bits to be set. So I think what is happening is there's a combo of bits set which cause the switch() statement to fail all cases, resulting in NOSTATE. We'll probably need to replace the switch(domain_flags) with something which explicitly tests for the bits we're interested in, rather than looking at the value as a whole. Regards, Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

Daniel P. Berrange wrote:
Can you edit the src/xen_internal.c file and in the xenHypervisorGetDomInfo() method, add a printf() for the 'domain_flags' flags variable. IMHO the way we deal with this isn't quite correct. We merely mask out the high bits and then switch on the resulting value. This isn't the way the Xen hypervisor uses this field though. Xen more or less uses the whole thing as as bitmask, allowing near arbitrary combinations of bits to be set. So I think what is happening is there's a combo of bits set which cause the switch() statement to fail all cases, resulting in NOSTATE.
We'll probably need to replace the switch(domain_flags) with something which explicitly tests for the bits we're interested in, rather than looking at the value as a whole.
when xenHypervisorGetDomInfo() returns VIR_DOMAIN_NONE, domain_flags is 0 (sampled after the HVM bit has been masked out). So is there some other state implied by domain_flags being 0? In all cases I have seen, the domain is actually running -- accumulating CPU time, etc. ----------------------------------------------------- Russ Blaine | Solaris Kernel | russell.blaine@sun.com

Russ Blaine wrote:
Daniel P. Berrange wrote:
Can you edit the src/xen_internal.c file and in the xenHypervisorGetDomInfo() method, add a printf() for the 'domain_flags' flags variable. IMHO the way we deal with this isn't quite correct. We merely mask out the high bits and then switch on the resulting value. This isn't the way the Xen hypervisor uses this field though. Xen more or less uses the whole thing as as bitmask, allowing near arbitrary combinations of bits to be set. So I think what is happening is there's a combo of bits set which cause the switch() statement to fail all cases, resulting in NOSTATE.
We'll probably need to replace the switch(domain_flags) with something which explicitly tests for the bits we're interested in, rather than looking at the value as a whole.
when xenHypervisorGetDomInfo() returns VIR_DOMAIN_NONE, domain_flags is 0 (sampled after the HVM bit has been masked out). So is there some other state implied by domain_flags being 0? In all cases I have seen, the domain is actually running -- accumulating CPU time, etc.
----------------------------------------------------- Russ Blaine | Solaris Kernel | russell.blaine@sun.com
This begs a question actually ... Is the host (or guest) Solaris? Rich. -- Emerging Technologies, Red Hat - http://et.redhat.com/~rjones/ Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 03798903

Host is Solaris, guest is WinXP. Richard W.M. Jones wrote:
Russ Blaine wrote:
Daniel P. Berrange wrote:
Can you edit the src/xen_internal.c file and in the xenHypervisorGetDomInfo() method, add a printf() for the 'domain_flags' flags variable. IMHO the way we deal with this isn't quite correct. We merely mask out the high bits and then switch on the resulting value. This isn't the way the Xen hypervisor uses this field though. Xen more or less uses the whole thing as as bitmask, allowing near arbitrary combinations of bits to be set. So I think what is happening is there's a combo of bits set which cause the switch() statement to fail all cases, resulting in NOSTATE.
We'll probably need to replace the switch(domain_flags) with something which explicitly tests for the bits we're interested in, rather than looking at the value as a whole.
when xenHypervisorGetDomInfo() returns VIR_DOMAIN_NONE, domain_flags is 0 (sampled after the HVM bit has been masked out). So is there some other state implied by domain_flags being 0? In all cases I have seen, the domain is actually running -- accumulating CPU time, etc.
----------------------------------------------------- Russ Blaine | Solaris Kernel | russell.blaine@sun.com
This begs a question actually ... Is the host (or guest) Solaris?
Rich.
-- ----------------------------------------------------- Russ Blaine | Solaris Kernel | russell.blaine@sun.com

On Tue, Jan 29, 2008 at 10:01:35AM -0800, Russ Blaine wrote:
Daniel P. Berrange wrote:
Can you edit the src/xen_internal.c file and in the xenHypervisorGetDomInfo() method, add a printf() for the 'domain_flags' flags variable. IMHO the way we deal with this isn't quite correct. We merely mask out the high bits and then switch on the resulting value. This isn't the way the Xen hypervisor uses this field though. Xen more or less uses the whole thing as as bitmask, allowing near arbitrary combinations of bits to be set. So I think what is happening is there's a combo of bits set which cause the switch() statement to fail all cases, resulting in NOSTATE.
We'll probably need to replace the switch(domain_flags) with something which explicitly tests for the bits we're interested in, rather than looking at the value as a whole.
when xenHypervisorGetDomInfo() returns VIR_DOMAIN_NONE, domain_flags is 0 (sampled after the HVM bit has been masked out). So is there some other state implied by domain_flags being 0? In all cases I have seen, the domain is actually running -- accumulating CPU time, etc.
What hypervisor version are you running. I'm struggling to see the codepath in the hypervisor 'getdomaininfo' call which could lead to domain_flags being zero. AFAICT, as well as the HVM flags, there must always be at least one other bit set. int flags = XEN_DOMINF_blocked; for_each_vcpu ( d, v ) { .... if ( !test_bit(_VPF_down, &v->pause_flags) ) { if ( !(v->pause_flags & VPF_blocked) ) flags &= ~XEN_DOMINF_blocked; if ( v->is_running ) flags |= XEN_DOMINF_running; info->nr_online_vcpus++; } } Guarenteed either XEN_DOMINF_blocked or XEN_DOMINF_running is set now. And this next block, simply sets a few more bits info->flags = flags | ((d->is_dying == DOMDYING_dead) ? XEN_DOMINF_dying : 0) | (d->is_shut_down ? XEN_DOMINF_shutdown : 0) | (d->is_paused_by_controller ? XEN_DOMINF_paused : 0) | (d->debugger_attached ? XEN_DOMINF_debugged : 0) | d->shutdown_code << XEN_DOMINF_shutdownshift; if ( is_hvm_domain(d) ) info->flags |= XEN_DOMINF_hvm_guest; This is all in getdomaininfo, from xen/common/domctl.c Regards, Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

Daniel P. Berrange wrote:
What hypervisor version are you running. I'm struggling to see the codepath in the hypervisor 'getdomaininfo' call which could lead to domain_flags being zero. AFAICT, as well as the HVM flags, there must always be at least one other bit set.
This is opensolaris 81 with xen 3.1.2. Not having studied this code extensively, I do see a path that could cause this: flags starts out life as XEN_DOMINF_blocked That flag then gets cleared in this marked line:
int flags = XEN_DOMINF_blocked;
for_each_vcpu ( d, v ) { .... if ( !test_bit(_VPF_down, &v->pause_flags) ) { if ( !(v->pause_flags & VPF_blocked) ) flags &= ~XEN_DOMINF_blocked; <-----here if ( v->is_running ) flags |= XEN_DOMINF_running; info->nr_online_vcpus++; } }
and then no bits are set in the next block.
Guarenteed either XEN_DOMINF_blocked or XEN_DOMINF_running is set now.
And this next block, simply sets a few more bits
info->flags = flags | ((d->is_dying == DOMDYING_dead) ? XEN_DOMINF_dying : 0) | (d->is_shut_down ? XEN_DOMINF_shutdown : 0) | (d->is_paused_by_controller ? XEN_DOMINF_paused : 0) | (d->debugger_attached ? XEN_DOMINF_debugged : 0) | d->shutdown_code << XEN_DOMINF_shutdownshift;
if ( is_hvm_domain(d) ) info->flags |= XEN_DOMINF_hvm_guest;
This is all in getdomaininfo, from xen/common/domctl.c
----------------------------------------------------- Russ Blaine | Solaris Kernel | russell.blaine@sun.com

[please subscribe to the mailing-list, I can't guarantee proper delivery of mail bounced by the list tool if you are not subscibed, thanks !] On Thu, Jan 24, 2008 at 12:32:32PM -0800, Russ Blaine wrote:
On Solaris-based xen 3.1.2 I am seeing virDomainGetInfo() often returning VIR_DOMAIN_NOSTATE in dominfo.state for domUs that are powered on and running. What exactly does VIR_DOMAIN_NOSTATE mean? Why does it exist? I
that we could not extract the state status from the hypervisor.
suspect it is a catch-all in the API for hosts that don't entirely have their act together, that do not always return a valid guest state. Is that the case?
Well, info->state is set to VIR_DOMAIN_NOSTATE only if in the SEXPR returned by xend when asking about the domain, the (domain ... (state X) ...) is either missing or empty, assuming the call ended up using a Xend RPC. If the call used an hypercall, then the call should go though xenHypervisorGetDomInfo and the switch on the domain state extracted seems to be failing setting info->state to VIR_DOMAIN_NOSTATE (actually it sets it to VIR_DOMAIN_NONE which is the wrong enum value, but 0 so equivalent in practice, I will fix this in CVS).
Is anyone else seeing this on Xen?
What version of libvirt are you using ? Its behaviour might be different from the upstream version I looked at, but in both case it really should imply that we failed to extract the state informations from the hypervisor. Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
participants (4)
-
Daniel P. Berrange
-
Daniel Veillard
-
Richard W.M. Jones
-
Russ Blaine