
Stumbled across a problem trying to list domains with libvirt-0.1.5. Using virsh I get errors such as xen81:/tests/jim # virsh list Id Name State ---------------------------------- 0 Domain-0 running libvir: Xen Daemon error : GET operation failed: No such domain 16 xm shows xen81:/tests/jim # xm list Name ID Mem(MiB) VCPUs State Time(s) Domain-0 0 2048 4 r----- 815825.8 vm1 1 512 2 -b---- 0.3 I found that the buffer provided for XEN_V1_OP_GETDOMAININFOLIST hypercall differs slightly from the buffer in xen/dom0_ops.h. Attached is a patch against current cvs that works for me, but I'm not familiar with this part of the code so not sure if this is the proper fix. I'm using xen 3.0.2 that shipped with SLES10. Regards, Jim

On Wed, Sep 13, 2006 at 12:55:01PM -0600, Jim Fehlig wrote:
I found that the buffer provided for XEN_V1_OP_GETDOMAININFOLIST hypercall differs slightly from the buffer in xen/dom0_ops.h.
ohh, that's quite possible I made a mistake in rebuilding the code, yes
Attached is a patch against current cvs that works for me, but I'm not familiar with this part of the code so not sure if this is the proper fix.
I will try to double check today or tomorrow as I'm on the road, thanks a lot for the patch.
I'm using xen 3.0.2 that shipped with SLES10.
Thanks ! Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

On Wed, Sep 13, 2006 at 09:13:35PM -0400, Daniel Veillard wrote:
On Wed, Sep 13, 2006 at 12:55:01PM -0600, Jim Fehlig wrote:
I found that the buffer provided for XEN_V1_OP_GETDOMAININFOLIST hypercall differs slightly from the buffer in xen/dom0_ops.h.
ohh, that's quite possible I made a mistake in rebuilding the code, yes
Attached is a patch against current cvs that works for me, but I'm not familiar with this part of the code so not sure if this is the proper fix.
I will try to double check today or tomorrow as I'm on the road, thanks a lot for the patch.
Okay, I'm finally back, confirmed my error (sorry I should have tried a 32bit box too but had none handy with the old code). So applied and commited, thanks a lot ! Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

On Tue, Sep 19, 2006 at 12:12:19PM -0400, Daniel Veillard wrote:
On Wed, Sep 13, 2006 at 09:13:35PM -0400, Daniel Veillard wrote:
On Wed, Sep 13, 2006 at 12:55:01PM -0600, Jim Fehlig wrote:
I found that the buffer provided for XEN_V1_OP_GETDOMAININFOLIST hypercall differs slightly from the buffer in xen/dom0_ops.h.
ohh, that's quite possible I made a mistake in rebuilding the code, yes
Attached is a patch against current cvs that works for me, but I'm not familiar with this part of the code so not sure if this is the proper fix.
I will try to double check today or tomorrow as I'm on the road, thanks a lot for the patch.
Okay, I'm finally back, confirmed my error (sorry I should have tried a 32bit box too but had none handy with the old code). So applied and commited,
Urm, but this has now broken things on 32-bit 3.0.3 based Xen HV. # virsh dominfo Domain-0 | grep CPU CPU(s): 115 And also # virsh vcpuinfo Domain-0 libvir: Xen error : failed Xen syscall ioctl 3166208 Looks like we need different versions of this struct depending on which Xen we're working against. Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

On Thu, Sep 28, 2006 at 07:20:35PM +0100, Daniel P. Berrange wrote:
On Tue, Sep 19, 2006 at 12:12:19PM -0400, Daniel Veillard wrote:
On Wed, Sep 13, 2006 at 09:13:35PM -0400, Daniel Veillard wrote:
On Wed, Sep 13, 2006 at 12:55:01PM -0600, Jim Fehlig wrote:
I found that the buffer provided for XEN_V1_OP_GETDOMAININFOLIST hypercall differs slightly from the buffer in xen/dom0_ops.h.
ohh, that's quite possible I made a mistake in rebuilding the code, yes
Attached is a patch against current cvs that works for me, but I'm not familiar with this part of the code so not sure if this is the proper fix.
I will try to double check today or tomorrow as I'm on the road, thanks a lot for the patch.
Okay, I'm finally back, confirmed my error (sorry I should have tried a 32bit box too but had none handy with the old code). So applied and commited,
Urm, but this has now broken things on 32-bit 3.0.3 based Xen HV.
# virsh dominfo Domain-0 | grep CPU CPU(s): 115
And also
# virsh vcpuinfo Domain-0 libvir: Xen error : failed Xen syscall ioctl 3166208
Looks like we need different versions of this struct depending on which Xen we're working against.
This is really quite a nasty problem, because the struct is passed into from numerous locations in the xen_internal.h code & I didn't want to cover the entire source with conditionals. So what I've done is declared a new xen_v2_domaininfo struct which is the same as xen_v0_domaininfo, but with Jim's patch reverted again. Then provide two unions union xen_getdomaininfo { struct xen_v0_getdomaininfo v0; struct xen_v2_getdomaininfo v2; }; typedef union xen_getdomaininfo xen_getdomaininfo; union xen_getdomaininfolist { struct xen_v0_getdomaininfo *v0; struct xen_v2_getdomaininfo *v2; }; typedef union xen_getdomaininfolist xen_getdomaininfolist; The caller must populate & read either v0, or v2 as apropriate - to avoid ugly if (hypervisor_version < 2) ...v0... else ...v2... I define a bunhc of macros for accessing fields in these two unions. eg #define XEN_GETDOMAININFOLIST_DOMAIN(domlist, n) \ (hypervisor_version < 2 ? \ domlist.v0[n].domain : \ domlist.v2[n].domain) Or #define XEN_GETDOMAININFOLIST_CLEAR(domlist, size) \ (hypervisor_version < 2 ? \ memset(domlist.v0, 0, sizeof(xen_v0_getdomaininfo) * size) : \ memset(domlist.v2, 0, sizeof(xen_v2_getdomaininfo) * size)) Anyway, I'm attaching a patch which I've tested against 32-bit HV on both Xen 3.0.2 and 3.0.3, and also a 64-bit HV on 3.0.2 and 3.0.3 and all the operations now work correctly again... Regards, Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

On Thu, Sep 28, 2006 at 09:43:19PM +0100, Daniel P. Berrange wrote:
On Thu, Sep 28, 2006 at 07:20:35PM +0100, Daniel P. Berrange wrote:
On Tue, Sep 19, 2006 at 12:12:19PM -0400, Daniel Veillard wrote:
On Wed, Sep 13, 2006 at 09:13:35PM -0400, Daniel Veillard wrote:
On Wed, Sep 13, 2006 at 12:55:01PM -0600, Jim Fehlig wrote:
I found that the buffer provided for XEN_V1_OP_GETDOMAININFOLIST hypercall differs slightly from the buffer in xen/dom0_ops.h.
ohh, that's quite possible I made a mistake in rebuilding the code, yes
Attached is a patch against current cvs that works for me, but I'm not familiar with this part of the code so not sure if this is the proper fix.
I will try to double check today or tomorrow as I'm on the road, thanks a lot for the patch.
Okay, I'm finally back, confirmed my error (sorry I should have tried a 32bit box too but had none handy with the old code). So applied and commited,
Urm, but this has now broken things on 32-bit 3.0.3 based Xen HV.
# virsh dominfo Domain-0 | grep CPU CPU(s): 115
And also
# virsh vcpuinfo Domain-0 libvir: Xen error : failed Xen syscall ioctl 3166208
Looks like we need different versions of this struct depending on which Xen we're working against.
This is really quite a nasty problem, because the struct is passed into from numerous locations in the xen_internal.h code & I didn't want to cover the entire source with conditionals.
So what I've done is declared a new xen_v2_domaininfo struct which is the same as xen_v0_domaininfo, but with Jim's patch reverted again. Then provide two unions
Okay, I really expected they didn't broke that data structure too, sigh ... Yeah go ahead that seems the less uglier possible ! I wonder why I didn't catch that, maybe my 3.0.3 test was done on an x86_64, thanks ! Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

On Thu, Sep 28, 2006 at 05:22:08PM -0400, Daniel Veillard wrote:
On Thu, Sep 28, 2006 at 09:43:19PM +0100, Daniel P. Berrange wrote:
On Thu, Sep 28, 2006 at 07:20:35PM +0100, Daniel P. Berrange wrote:
Urm, but this has now broken things on 32-bit 3.0.3 based Xen HV.
# virsh dominfo Domain-0 | grep CPU CPU(s): 115
And also
# virsh vcpuinfo Domain-0 libvir: Xen error : failed Xen syscall ioctl 3166208
Looks like we need different versions of this struct depending on which Xen we're working against.
This is really quite a nasty problem, because the struct is passed into from numerous locations in the xen_internal.h code & I didn't want to cover the entire source with conditionals.
So what I've done is declared a new xen_v2_domaininfo struct which is the same as xen_v0_domaininfo, but with Jim's patch reverted again. Then provide two unions
Okay, I really expected they didn't broke that data structure too, sigh ... Yeah go ahead that seems the less uglier possible !
Ok, I comitted this to CVS now.
I wonder why I didn't catch that, maybe my 3.0.3 test was done on an x86_64,
Luck. On one of my 32-bit 3.0.3 boxes the 0.1.6 appeared to work fine for the basic operations - it was just giving bogus data (eg, 153 cpus). Depending on how many domains were running & what their data was things either silently worked (but bogus data), or completely crashed & burned. Anyway, I've now tested this on both versions of HV, and run through valgrind too as an extra sanity check. Lets just hope Xen doesn't break ABI yet again in the future.... Regards, Dan -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

On Thu, Sep 28, 2006 at 11:37:42PM +0100, Daniel P. Berrange wrote:
On Thu, Sep 28, 2006 at 05:22:08PM -0400, Daniel Veillard wrote:
On Thu, Sep 28, 2006 at 09:43:19PM +0100, Daniel P. Berrange wrote:
This is really quite a nasty problem, because the struct is passed into from numerous locations in the xen_internal.h code & I didn't want to cover the entire source with conditionals.
So what I've done is declared a new xen_v2_domaininfo struct which is the same as xen_v0_domaininfo, but with Jim's patch reverted again. Then provide two unions
Okay, I really expected they didn't broke that data structure too, sigh ... Yeah go ahead that seems the less uglier possible !
Ok, I comitted this to CVS now.
I wonder why I didn't catch that, maybe my 3.0.3 test was done on an x86_64,
Luck. On one of my 32-bit 3.0.3 boxes the 0.1.6 appeared to work fine for the basic operations - it was just giving bogus data (eg, 153 cpus). Depending on how many domains were running & what their data was things either silently worked (but bogus data), or completely crashed & burned.
Anyway, I've now tested this on both versions of HV, and run through valgrind too as an extra sanity check. Lets just hope Xen doesn't break ABI yet again in the future....
Updating for this ABI breakage is beginning to feel like a never ending problem :-( It seems I missed one change in my last patch - mlock()'ing the wrong bit of data for getdomainlist. By (bad) luck it just happened to work anyway on my tests because I guess there was sufficient allocated memory that mlock() was still in range. Running unprivileged via the proxy though, the proxy will always request info on 1020 domains (even if only a couple are running), which meant we mlock() several 10's of KB of data. THis ran in an area on unallocated mem & caused mlock() to fail, hence my discovering the problem. Anyway, attached is the patch I've applied to CVS... Regards, Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

On Mon, Oct 02, 2006 at 07:31:41PM +0100, Daniel P. Berrange wrote:
Updating for this ABI breakage is beginning to feel like a never ending problem :-( It seems I missed one change in my last patch - mlock()'ing the wrong bit of data for getdomainlist. By (bad) luck it just happened to work anyway on my tests because I guess there was sufficient allocated memory that mlock() was still in range. Running unprivileged via the proxy though, the proxy will always request info on 1020 domains (even if only a couple are running), which meant we mlock() several 10's of KB of data. THis ran in an area on unallocated mem & caused mlock() to fail, hence my discovering the problem.
urgh !
Anyway, attached is the patch I've applied to CVS...
Thanks for spotting this! I fixed the munlock -> mlock transition :-) and updated the error messages to report the new sizes, Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
participants (3)
-
Daniel P. Berrange
-
Daniel Veillard
-
Jim Fehlig