[libvirt] libvirt/libxl implemetation of get_online_cpu / virNodeGetCPUMap?

Hi, A recent OpenStack nova commit make use of virNodeGetCPUMap to get the list of online cpu of a host. But this API is not implemented for the libvirt xen driver. The commit: Add handling for offlined CPUs to the nova libvirt driver. https://review.openstack.org/gitweb?p=openstack/nova.git;a=commitdiff;h=0696... Is there a need to use this under Xen? (Is it possible to have offline CPU?). What libxl API those provide this information, if it exist? I found libxl_get_online_cpus() but that not enough. They want a bitmap. Thanks, -- Anthony PERARD

On Tue, Feb 24, 2015 at 12:41:01PM +0000, Anthony PERARD wrote:
Hi,
A recent OpenStack nova commit make use of virNodeGetCPUMap to get the list of online cpu of a host. But this API is not implemented for the libvirt xen driver.
The commit: Add handling for offlined CPUs to the nova libvirt driver. https://review.openstack.org/gitweb?p=openstack/nova.git;a=commitdiff;h=0696...
FWIW, this should not impact Xen based on my understanding. The code path in question should only be used when Nova is setup todo NUMA pinning support, and that is not supported with Xen in OpenStack, only KVM. Did it actually cause failures for you, or are you simply keeping track of all used APIs in Nova as a sanity check ?
Is there a need to use this under Xen? (Is it possible to have offline CPU?). What libxl API those provide this information, if it exist?
I found libxl_get_online_cpus() but that not enough. They want a bitmap.
Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Tue, Feb 24, 2015 at 12:46:44PM +0000, Daniel P. Berrange wrote:
On Tue, Feb 24, 2015 at 12:41:01PM +0000, Anthony PERARD wrote:
Hi,
A recent OpenStack nova commit make use of virNodeGetCPUMap to get the list of online cpu of a host. But this API is not implemented for the libvirt xen driver.
The commit: Add handling for offlined CPUs to the nova libvirt driver. https://review.openstack.org/gitweb?p=openstack/nova.git;a=commitdiff;h=0696...
FWIW, this should not impact Xen based on my understanding. The code path in question should only be used when Nova is setup todo NUMA pinning support, and that is not supported with Xen in OpenStack, only KVM. Did it actually cause failures for you, or are you simply keeping track of all used APIs in Nova as a sanity check ?
It prevent nova from starting. I do the setup with DevStack. The error: libvirtError: this function is not supported by the connection driver: virNodeGetCPUMap And a part of the traceback: File "/opt/stack/nova/nova/openstack/common/service.py", line 491, in run_service service.start() File "/opt/stack/nova/nova/service.py", line 181, in start self.manager.pre_start_hook() File "/opt/stack/nova/nova/compute/manager.py", line 1188, in pre_start_hook self.update_available_resource(nova.context.get_admin_context()) File "/opt/stack/nova/nova/compute/manager.py", line 6062, in update_available_resource rt.update_available_resource(context) File "/opt/stack/nova/nova/compute/resource_tracker.py", line 315, in update_available_resource resources = self.driver.get_available_resource(self.nodename) File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 4896, in get_available_resource numa_topology = self._get_host_numa_topology() File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 4749, in _get_host_numa_topology online_cpus = self._host.get_online_cpus() File "/opt/stack/nova/nova/virt/libvirt/host.py", line 599, in get_online_cpus (cpus, cpu_map, online) = self.get_connection().getCPUMap() I'll look into why nova is going through NUMA code paths then. Thanks for your information, -- Anthony PERARD

On Tue, Feb 24, 2015 at 01:15:57PM +0000, Anthony PERARD wrote:
On Tue, Feb 24, 2015 at 12:46:44PM +0000, Daniel P. Berrange wrote:
On Tue, Feb 24, 2015 at 12:41:01PM +0000, Anthony PERARD wrote:
Hi,
A recent OpenStack nova commit make use of virNodeGetCPUMap to get the list of online cpu of a host. But this API is not implemented for the libvirt xen driver.
The commit: Add handling for offlined CPUs to the nova libvirt driver. https://review.openstack.org/gitweb?p=openstack/nova.git;a=commitdiff;h=0696...
FWIW, this should not impact Xen based on my understanding. The code path in question should only be used when Nova is setup todo NUMA pinning support, and that is not supported with Xen in OpenStack, only KVM. Did it actually cause failures for you, or are you simply keeping track of all used APIs in Nova as a sanity check ?
It prevent nova from starting. I do the setup with DevStack.
The error: libvirtError: this function is not supported by the connection driver: virNodeGetCPUMap
And a part of the traceback: File "/opt/stack/nova/nova/openstack/common/service.py", line 491, in run_service service.start() File "/opt/stack/nova/nova/service.py", line 181, in start self.manager.pre_start_hook() File "/opt/stack/nova/nova/compute/manager.py", line 1188, in pre_start_hook self.update_available_resource(nova.context.get_admin_context()) File "/opt/stack/nova/nova/compute/manager.py", line 6062, in update_available_resource rt.update_available_resource(context) File "/opt/stack/nova/nova/compute/resource_tracker.py", line 315, in update_available_resource resources = self.driver.get_available_resource(self.nodename) File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 4896, in get_available_resource numa_topology = self._get_host_numa_topology() File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 4749, in _get_host_numa_topology online_cpus = self._host.get_online_cpus() File "/opt/stack/nova/nova/virt/libvirt/host.py", line 599, in get_online_cpus (cpus, cpu_map, online) = self.get_connection().getCPUMap()
I'll look into why nova is going through NUMA code paths then.
Oh damn, yes, I understand why now. Please file a bug against Nova for this, as we must fix it as a high pripority. It was certainly not my intention to break Xen when I approved this change Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Tue, Feb 24, 2015 at 01:22:19PM +0000, Daniel P. Berrange wrote:
On Tue, Feb 24, 2015 at 01:15:57PM +0000, Anthony PERARD wrote:
On Tue, Feb 24, 2015 at 12:46:44PM +0000, Daniel P. Berrange wrote:
On Tue, Feb 24, 2015 at 12:41:01PM +0000, Anthony PERARD wrote:
Hi,
A recent OpenStack nova commit make use of virNodeGetCPUMap to get the list of online cpu of a host. But this API is not implemented for the libvirt xen driver.
The commit: Add handling for offlined CPUs to the nova libvirt driver. https://review.openstack.org/gitweb?p=openstack/nova.git;a=commitdiff;h=0696...
FWIW, this should not impact Xen based on my understanding. The code path in question should only be used when Nova is setup todo NUMA pinning support, and that is not supported with Xen in OpenStack, only KVM. Did it actually cause failures for you, or are you simply keeping track of all used APIs in Nova as a sanity check ?
It prevent nova from starting. I do the setup with DevStack.
The error: libvirtError: this function is not supported by the connection driver: virNodeGetCPUMap
And a part of the traceback: File "/opt/stack/nova/nova/openstack/common/service.py", line 491, in run_service service.start() File "/opt/stack/nova/nova/service.py", line 181, in start self.manager.pre_start_hook() File "/opt/stack/nova/nova/compute/manager.py", line 1188, in pre_start_hook self.update_available_resource(nova.context.get_admin_context()) File "/opt/stack/nova/nova/compute/manager.py", line 6062, in update_available_resource rt.update_available_resource(context) File "/opt/stack/nova/nova/compute/resource_tracker.py", line 315, in update_available_resource resources = self.driver.get_available_resource(self.nodename) File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 4896, in get_available_resource numa_topology = self._get_host_numa_topology() File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 4749, in _get_host_numa_topology online_cpus = self._host.get_online_cpus() File "/opt/stack/nova/nova/virt/libvirt/host.py", line 599, in get_online_cpus (cpus, cpu_map, online) = self.get_connection().getCPUMap()
I'll look into why nova is going through NUMA code paths then.
Oh damn, yes, I understand why now. Please file a bug against Nova for this, as we must fix it as a high pripority. It was certainly not my intention to break Xen when I approved this change
Here is the bug report: https://bugs.launchpad.net/nova/+bug/1425115 Regards, -- Anthony PERARD

On Tue, Feb 24, 2015 at 03:00:16PM +0000, Anthony PERARD wrote:
On Tue, Feb 24, 2015 at 01:22:19PM +0000, Daniel P. Berrange wrote:
On Tue, Feb 24, 2015 at 01:15:57PM +0000, Anthony PERARD wrote:
On Tue, Feb 24, 2015 at 12:46:44PM +0000, Daniel P. Berrange wrote:
On Tue, Feb 24, 2015 at 12:41:01PM +0000, Anthony PERARD wrote:
Hi,
A recent OpenStack nova commit make use of virNodeGetCPUMap to get the list of online cpu of a host. But this API is not implemented for the libvirt xen driver.
The commit: Add handling for offlined CPUs to the nova libvirt driver. https://review.openstack.org/gitweb?p=openstack/nova.git;a=commitdiff;h=0696...
FWIW, this should not impact Xen based on my understanding. The code path in question should only be used when Nova is setup todo NUMA pinning support, and that is not supported with Xen in OpenStack, only KVM. Did it actually cause failures for you, or are you simply keeping track of all used APIs in Nova as a sanity check ?
It prevent nova from starting. I do the setup with DevStack.
The error: libvirtError: this function is not supported by the connection driver: virNodeGetCPUMap
And a part of the traceback: File "/opt/stack/nova/nova/openstack/common/service.py", line 491, in run_service service.start() File "/opt/stack/nova/nova/service.py", line 181, in start self.manager.pre_start_hook() File "/opt/stack/nova/nova/compute/manager.py", line 1188, in pre_start_hook self.update_available_resource(nova.context.get_admin_context()) File "/opt/stack/nova/nova/compute/manager.py", line 6062, in update_available_resource rt.update_available_resource(context) File "/opt/stack/nova/nova/compute/resource_tracker.py", line 315, in update_available_resource resources = self.driver.get_available_resource(self.nodename) File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 4896, in get_available_resource numa_topology = self._get_host_numa_topology() File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 4749, in _get_host_numa_topology online_cpus = self._host.get_online_cpus() File "/opt/stack/nova/nova/virt/libvirt/host.py", line 599, in get_online_cpus (cpus, cpu_map, online) = self.get_connection().getCPUMap()
I'll look into why nova is going through NUMA code paths then.
Oh damn, yes, I understand why now. Please file a bug against Nova for this, as we must fix it as a high pripority. It was certainly not my intention to break Xen when I approved this change
Here is the bug report: https://bugs.launchpad.net/nova/+bug/1425115
If you are able to test a fix easily then try the patch here: https://review.openstack.org/#/c/159106/ If you're able to add a comment to the review indicating that you've confirmed it fixes Xen, that'd be useful, since there's no automated testing of Xen that reports on reviews. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Tuesday 24 February 2015 08:30 PM, Anthony PERARD wrote:
On Tue, Feb 24, 2015 at 01:22:19PM +0000, Daniel P. Berrange wrote:
On Tue, Feb 24, 2015 at 12:46:44PM +0000, Daniel P. Berrange wrote:
Hi,
A recent OpenStack nova commit make use of virNodeGetCPUMap to get the list of online cpu of a host. But this API is not implemented for the libvirt xen driver.
The commit: Add handling for offlined CPUs to the nova libvirt driver. https://review.openstack.org/gitweb?p=openstack/nova.git;a=commitdiff;h=0696... FWIW, this should not impact Xen based on my understanding. The code
On Tue, Feb 24, 2015 at 12:41:01PM +0000, Anthony PERARD wrote: path in question should only be used when Nova is setup todo NUMA pinning support, and that is not supported with Xen in OpenStack, only KVM. Did it actually cause failures for you, or are you simply keeping track of all used APIs in Nova as a sanity check ? It prevent nova from starting. I do the setup with DevStack.
The error: libvirtError: this function is not supported by the connection driver: virNodeGetCPUMap
And a part of the traceback: File "/opt/stack/nova/nova/openstack/common/service.py", line 491, in run_service service.start() File "/opt/stack/nova/nova/service.py", line 181, in start self.manager.pre_start_hook() File "/opt/stack/nova/nova/compute/manager.py", line 1188, in pre_start_hook self.update_available_resource(nova.context.get_admin_context()) File "/opt/stack/nova/nova/compute/manager.py", line 6062, in update_available_resource rt.update_available_resource(context) File "/opt/stack/nova/nova/compute/resource_tracker.py", line 315, in update_available_resource resources = self.driver.get_available_resource(self.nodename) File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 4896, in get_available_resource numa_topology = self._get_host_numa_topology() File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 4749, in _get_host_numa_topology online_cpus = self._host.get_online_cpus() File "/opt/stack/nova/nova/virt/libvirt/host.py", line 599, in get_online_cpus (cpus, cpu_map, online) = self.get_connection().getCPUMap()
I'll look into why nova is going through NUMA code paths then. Oh damn, yes, I understand why now. Please file a bug against Nova for
On Tue, Feb 24, 2015 at 01:15:57PM +0000, Anthony PERARD wrote: this, as we must fix it as a high pripority. It was certainly not my intention to break Xen when I approved this change I applied the patch, not getting this python libvirtError, but libvirt deamon is throwing error. 2015-03-24 08:46:31.169+0000: 1030: error : virNodeGetCPUMap:1342 : this function is not supported by the connection driver: virNodeGetCPUMap
Here is the bug report: https://bugs.launchpad.net/nova/+bug/1425115
Regards,

On Tue, 2015-02-24 at 12:41 +0000, Anthony PERARD wrote:
Hi,
A recent OpenStack nova commit make use of virNodeGetCPUMap to get the list of online cpu of a host. But this API is not implemented for the libvirt xen driver.
The commit: Add handling for offlined CPUs to the nova libvirt driver. https://review.openstack.org/gitweb?p=openstack/nova.git;a=commitdiff;h=0696...
Is there a need to use this under Xen? (Is it possible to have offline CPU?).
Yes, I think so. No idea how you use it though.
What libxl API those provide this information, if it exist?
I found libxl_get_online_cpus() but that not enough. They want a bitmap.
I think that is all which currently exists, at least at the libxl level, you may need to add a new interface. It'd be worth looking into the various host numa interfaces -- perhaps one of them indirectly exposes what you want? Ian.

On Tue, 2015-02-24 at 13:10 +0000, Ian Campbell wrote:
On Tue, 2015-02-24 at 12:41 +0000, Anthony PERARD wrote:
What libxl API those provide this information, if it exist?
I found libxl_get_online_cpus() but that not enough. They want a bitmap.
I think that is all which currently exists, at least at the libxl level, you may need to add a new interface.
It'd be worth looking into the various host numa interfaces -- perhaps one of them indirectly exposes what you want?
Given Daniel's latest emails, I'm not sure this is useful but libxl_get_cpu_topology() should put LIBXL_CPUTOPOLOGY_INVALID_ENTRY in all the fields of the i-eth element of the array it returns, if the i-eth pcpu is offline (see the implementation of XEN_SYSCTL_topologyinfo in xen/common/sysctl.c). So, scanning that array and constructing the bitmap according to whether or not we find that marker on the various elements would be the way to go, I would say. I've actually never tested this, i.e., I've never tried offlining a pcpu on the host. I'll give it a go as soon as I find 5 minutes, and let know if it works. Regards, Dario

On Wed, 2015-02-25 at 10:24 +0100, Dario Faggioli wrote:
On Tue, 2015-02-24 at 13:10 +0000, Ian Campbell wrote:
On Tue, 2015-02-24 at 12:41 +0000, Anthony PERARD wrote:
What libxl API those provide this information, if it exist?
I found libxl_get_online_cpus() but that not enough. They want a bitmap.
I think that is all which currently exists, at least at the libxl level, you may need to add a new interface.
It'd be worth looking into the various host numa interfaces -- perhaps one of them indirectly exposes what you want?
Given Daniel's latest emails, I'm not sure this is useful but libxl_get_cpu_topology() should put LIBXL_CPUTOPOLOGY_INVALID_ENTRY in all the fields of the i-eth element of the array it returns, if the i-eth pcpu is offline (see the implementation of XEN_SYSCTL_topologyinfo in xen/common/sysctl.c).
So, scanning that array and constructing the bitmap according to whether or not we find that marker on the various elements would be the way to go, I would say.
It could work, yes, although if there were other reasons for INVALID entry it would fall down. Thinking about it, it might be a better idea long term to expose a some specific interfaces for managing or interrogating host CPU status rather than inferring it through other means, it's not like the hypercall would be very hard to setup and plumb through. But as you say, Daniel's response might have made this all moot anyway, or at least deferrable. Ian.
I've actually never tested this, i.e., I've never tried offlining a pcpu on the host. I'll give it a go as soon as I find 5 minutes, and let know if it works.
Regards, Dario

On Wed, Feb 25, 2015 at 10:24:37AM +0100, Dario Faggioli wrote:
On Tue, 2015-02-24 at 13:10 +0000, Ian Campbell wrote:
On Tue, 2015-02-24 at 12:41 +0000, Anthony PERARD wrote:
What libxl API those provide this information, if it exist?
I found libxl_get_online_cpus() but that not enough. They want a bitmap.
I think that is all which currently exists, at least at the libxl level, you may need to add a new interface.
It'd be worth looking into the various host numa interfaces -- perhaps one of them indirectly exposes what you want?
Given Daniel's latest emails, I'm not sure this is useful but libxl_get_cpu_topology() should put LIBXL_CPUTOPOLOGY_INVALID_ENTRY in all the fields of the i-eth element of the array it returns, if the i-eth pcpu is offline (see the implementation of XEN_SYSCTL_topologyinfo in xen/common/sysctl.c).
So, scanning that array and constructing the bitmap according to whether or not we find that marker on the various elements would be the way to go, I would say.
I've actually never tested this, i.e., I've never tried offlining a pcpu on the host. I'll give it a go as soon as I find 5 minutes, and let know if it works.
FWIW, this code in openstack was only added for benefit of s390 architecture where apparently it is common to have hosts with CPUs offlined. Presumably you have to pay IBM for each extra CPU you turn online :) Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Wed, 2015-02-25 at 15:03 +0000, Daniel P. Berrange wrote:
FWIW, this code in openstack was only added for benefit of s390 architecture where apparently it is common to have hosts with CPUs offlined. Presumably you have to pay IBM for each extra CPU you turn online :)
Presumably :-) OOI, why does the code care which CPUs are online rather than just the total number (IOW why a bitmap)? Ian.

On Wed, Feb 25, 2015 at 03:13:36PM +0000, Ian Campbell wrote:
On Wed, 2015-02-25 at 15:03 +0000, Daniel P. Berrange wrote:
FWIW, this code in openstack was only added for benefit of s390 architecture where apparently it is common to have hosts with CPUs offlined. Presumably you have to pay IBM for each extra CPU you turn online :)
Presumably :-)
OOI, why does the code care which CPUs are online rather than just the total number (IOW why a bitmap)?
When doing strict CPU pinning,the openstack scheduler needs to have the list of all pCPUs available in the host. It then tries to place guests on pCPUs such that the guest does not span across NUMA nodes. To do this it needs to know which particular pCPUs in the host are available. So we need the full bitmap rather than just a total count. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Wed, 2015-02-25 at 15:20 +0000, Daniel P. Berrange wrote:
On Wed, Feb 25, 2015 at 03:13:36PM +0000, Ian Campbell wrote:
On Wed, 2015-02-25 at 15:03 +0000, Daniel P. Berrange wrote:
FWIW, this code in openstack was only added for benefit of s390 architecture where apparently it is common to have hosts with CPUs offlined. Presumably you have to pay IBM for each extra CPU you turn online :)
Presumably :-)
OOI, why does the code care which CPUs are online rather than just the total number (IOW why a bitmap)?
When doing strict CPU pinning,the openstack scheduler needs to have the list of all pCPUs available in the host. It then tries to place guests on pCPUs such that the guest does not span across NUMA nodes. To do this it needs to know which particular pCPUs in the host are available. So we need the full bitmap rather than just a total count.
Makes perfect sense, thanks. Ian.
participants (5)
-
Anthony PERARD
-
Daniel P. Berrange
-
Dario Faggioli
-
Ian Campbell
-
Manish Jaggi