[libvirt] [RFC PATCH 0/4] powerpc : Libvirt for the PowerPC platform

Recent development in KVM for 64-bit Power ISA Book3S machines, allows users to run multiple KVM guest instances on POWER7 and PPC970 processor based systems. Also qemu-system-ppc64 has been enhanced to support a new machine type "pseries" suitable for Power Book3S machines. This addition effectively brings the KVM+qemu combination to run multiple guest instances on a Power Book3S machine. Libvirt continues to be the key interface to configure and manage the KVM guest instances on x86. We would like to enhance libvirt to support KVM guest configuration and management on Power Book3S machines. The following set of patches tries to illustrate the porting issues in libvirt since it has been predominantly designed and used for x86 KVM. I would like to discuss these issues and list out the possibilities of refactoring the code and make it easier for adding support for non-x86 architectures. Some of the key changes that would be needed in libvirt to manage guests on qemu-system-ppc64 'pseries' platform would be as follows: 1) A new driver would be needed for PowerPC CPU, to identify and filter supported PowerPC-CPU families. 2) A new set of interfaces would be needed to extract host system and firmware information (SMBIOS and host-OS information like CPU and memory topology, processor type, hardware threads, etc). 3) Clean abstraction of platform-specific references such as ACPI dependencies from generic domain initialization sequence. (Many such options get automatically selected by libvirt, unless they are explicitly flagged as unrequired in XML. This default list will differ for every architecture). 4) A mechanism to specify the list of allowable devices for every architecture -- the 'pseries' vm would boot with its own set of devices, some of which may not be available with x86. In order to address the above requirements, we could take one of the following implementation schemes: Approach 1: ----------- Create a new host backend for powerpc-kvm -- similar to xen, kvm, esx, etc. Advantage : Even if the qemu device model on ppc varies significantly, the difference between the device model between qemu-system-ppc64 and qemu-system-x86_64 can be easily managed. It could possibly allow easier ways of segregating supported devices, and also specifying a new set of methods to gather host system information. Drawback: - Overhead of maintaining a new backend, which is expected to have some similarities with the x86-specific 'KVM' backend. - Might get confusing for end users. Approach 2: ----------- Hack the present 'kvm' backend to add powerpc-specific features. Advantage: Having a seamless 'kvm' interface across architectures would be of more convenience to the end-user -- a single XML spec could work with only a small subset of arch-specific changes. Also, newer features that come in for one arch would be more easily ported to others. However, it would entail more run-time switches based on the host kvm architecture. One way to do this would be to add a new arch-specific layer within the 'kvm' backend. This would be compiled-in depending on the host architecture, and would expose only those features (system information, devices, features etc) which are implemented in kvm _on_that_platform_. Drawback: This will cause some rewriting of how internal qemu/kvm interfaces interact in libvirt. The attached patch series (albeit hackish in nature) attempts to illustrate the difference in the architecture and call out parts of the libvirt code that would need changes. -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India

Libvirt presently depends on /proc/cpuinfo to gather information about the x86 host. Parsing of /proc/cpuinfo is arch-specific; the information fields are also not consistent. A cleaner way would be to use Sysfs. Both x86 and PowerPC specific information can be obtained from sysfs with different arch specific parsing routines. --- src/nodeinfo.c | 106 +++++++++++++++++++------------------------------------- 1 files changed, 36 insertions(+), 70 deletions(-) diff --git a/src/nodeinfo.c b/src/nodeinfo.c index 6448b79..cdd339d 100644 --- a/src/nodeinfo.c +++ b/src/nodeinfo.c @@ -30,6 +30,7 @@ #include <errno.h> #include <dirent.h> #include <sys/utsname.h> +#include <sched.h> #if HAVE_NUMACTL # define NUMA_VERSION1_COMPATIBILITY 1 @@ -191,6 +192,11 @@ static int parse_socket(unsigned int cpu) return ret; } +static int parse_core(unsigned int cpu) +{ + return get_cpu_value(cpu, "topology/core_id", false); +} + int linuxNodeInfoCPUPopulate(FILE *cpuinfo, virNodeInfoPtr nodeinfo, bool need_hyperthreads) @@ -199,15 +205,14 @@ int linuxNodeInfoCPUPopulate(FILE *cpuinfo, DIR *cpudir = NULL; struct dirent *cpudirent = NULL; unsigned int cpu; - unsigned long cur_threads; - int socket; - unsigned long long socket_mask = 0; - unsigned int remaining; + unsigned long core, socket, cur_threads; + cpu_set_t core_mask; + cpu_set_t socket_mask; int online; nodeinfo->cpus = 0; nodeinfo->mhz = 0; - nodeinfo->cores = 1; + nodeinfo->cores = 0; nodeinfo->nodes = 1; # if HAVE_NUMACTL @@ -221,20 +226,10 @@ int linuxNodeInfoCPUPopulate(FILE *cpuinfo, /* NOTE: hyperthreads are ignored here; they are parsed out of /sys */ while (fgets(line, sizeof(line), cpuinfo) != NULL) { char *buf = line; - if (STRPREFIX(buf, "processor")) { /* aka a single logical CPU */ - buf += 9; - while (*buf && c_isspace(*buf)) - buf++; - if (*buf != ':') { - nodeReportError(VIR_ERR_INTERNAL_ERROR, - "%s", _("parsing cpuinfo processor")); - return -1; - } - nodeinfo->cpus++; # if defined(__x86_64__) || \ defined(__amd64__) || \ defined(__i386__) - } else if (STRPREFIX(buf, "cpu MHz")) { + if (STRPREFIX(buf, "cpu MHz")) { char *p; unsigned int ui; buf += 9; @@ -249,24 +244,9 @@ int linuxNodeInfoCPUPopulate(FILE *cpuinfo, /* Accept trailing fractional part. */ && (*p == '\0' || *p == '.' || c_isspace(*p))) nodeinfo->mhz = ui; - } else if (STRPREFIX(buf, "cpu cores")) { /* aka cores */ - char *p; - unsigned int id; - buf += 9; - while (*buf && c_isspace(*buf)) - buf++; - if (*buf != ':' || !buf[1]) { - nodeReportError(VIR_ERR_INTERNAL_ERROR, - _("parsing cpuinfo cpu cores %c"), *buf); - return -1; - } - if (virStrToLong_ui(buf+1, &p, 10, &id) == 0 - && (*p == '\0' || c_isspace(*p)) - && id > nodeinfo->cores) - nodeinfo->cores = id; # elif defined(__powerpc__) || \ defined(__powerpc64__) - } else if (STRPREFIX(buf, "clock")) { + if (STRPREFIX(buf, "clock")) { char *p; unsigned int ui; buf += 5; @@ -281,53 +261,30 @@ int linuxNodeInfoCPUPopulate(FILE *cpuinfo, /* Accept trailing fractional part. */ && (*p == '\0' || *p == '.' || c_isspace(*p))) nodeinfo->mhz = ui; -# elif defined(__s390__) || \ - defined(__s390x__) - } else if (STRPREFIX(buf, "# processors")) { - char *p; - unsigned int ui; - buf += 12; - while (*buf && c_isspace(*buf)) - buf++; - if (*buf != ':' || !buf[1]) { - nodeReportError(VIR_ERR_INTERNAL_ERROR, - _("parsing number of processors %c"), *buf); - return -1; - } - if (virStrToLong_ui(buf+1, &p, 10, &ui) == 0 - && (*p == '\0' || c_isspace(*p))) - nodeinfo->cpus = ui; /* No other interesting infos are available in /proc/cpuinfo. * However, there is a line identifying processor's version, * identification and machine, but we don't want it to be caught * and parsed in next iteration, because it is not in expected * format and thus lead to error. */ - break; # else # warning Parser for /proc/cpuinfo needs to be adapted for your architecture # endif } } - if (!nodeinfo->cpus) { - nodeReportError(VIR_ERR_INTERNAL_ERROR, - "%s", _("no cpus found")); - return -1; - } - - if (!need_hyperthreads) - return 0; - - /* OK, we've parsed what we can out of /proc/cpuinfo. Get the socket - * and thread information from /sys + /* OK, we've parsed clock speed out of /proc/cpuinfo. Get the core, socket + * thread and topology information from /sys */ - remaining = nodeinfo->cpus; cpudir = opendir(CPU_SYS_PATH); if (cpudir == NULL) { virReportSystemError(errno, _("cannot opendir %s"), CPU_SYS_PATH); return -1; } - while ((errno = 0), remaining && (cpudirent = readdir(cpudir))) { + + CPU_ZERO(&core_mask); + CPU_ZERO(&socket_mask); + + while (cpudirent = readdir(cpudir)) { if (sscanf(cpudirent->d_name, "cpu%u", &cpu) != 1) continue; @@ -338,15 +295,19 @@ int linuxNodeInfoCPUPopulate(FILE *cpuinfo, } if (!online) continue; - remaining--; + nodeinfo->cpus++; - socket = parse_socket(cpu); - if (socket < 0) { - closedir(cpudir); - return -1; + /* Parse core */ + core = parse_core(cpu); + if (!CPU_ISSET(core, &core_mask)) { + CPU_SET(core, &core_mask); + nodeinfo->cores++; } - if (!(socket_mask & (1 << socket))) { - socket_mask |= (1 << socket); + + /* Parse socket */ + socket = parse_socket(cpu); + if (!CPU_ISSET(socket, &socket_mask)) { + CPU_SET(socket, &socket_mask); nodeinfo->sockets++; } @@ -367,7 +328,12 @@ int linuxNodeInfoCPUPopulate(FILE *cpuinfo, closedir(cpudir); - /* there should always be at least one socket and one thread */ + /* there should always be at least one cpu, socket and one thread */ + if (nodeinfo->cpus == 0) { + nodeReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("no CPUs found")); + return -1; + } if (nodeinfo->sockets == 0) { nodeReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("no sockets found")); -- 1.7.0.4 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India

On Sat, Oct 08, 2011 at 12:07:08AM +0530, Prerna Saxena wrote:
Libvirt presently depends on /proc/cpuinfo to gather information about the x86 host. Parsing of /proc/cpuinfo is arch-specific; the information fields are also not consistent. A cleaner way would be to use Sysfs. Both x86 and PowerPC specific information can be obtained from sysfs with different arch specific parsing routines.
--- src/nodeinfo.c | 106 +++++++++++++++++++------------------------------------- 1 files changed, 36 insertions(+), 70 deletions(-)
diff --git a/src/nodeinfo.c b/src/nodeinfo.c index 6448b79..cdd339d 100644 --- a/src/nodeinfo.c +++ b/src/nodeinfo.c @@ -30,6 +30,7 @@ #include <errno.h> #include <dirent.h> #include <sys/utsname.h> +#include <sched.h>
#if HAVE_NUMACTL # define NUMA_VERSION1_COMPATIBILITY 1 @@ -191,6 +192,11 @@ static int parse_socket(unsigned int cpu) return ret; }
+static int parse_core(unsigned int cpu) +{ + return get_cpu_value(cpu, "topology/core_id", false); +} + int linuxNodeInfoCPUPopulate(FILE *cpuinfo, virNodeInfoPtr nodeinfo, bool need_hyperthreads) @@ -199,15 +205,14 @@ int linuxNodeInfoCPUPopulate(FILE *cpuinfo, DIR *cpudir = NULL; struct dirent *cpudirent = NULL; unsigned int cpu; - unsigned long cur_threads; - int socket; - unsigned long long socket_mask = 0; - unsigned int remaining; + unsigned long core, socket, cur_threads; + cpu_set_t core_mask; + cpu_set_t socket_mask; int online;
nodeinfo->cpus = 0; nodeinfo->mhz = 0; - nodeinfo->cores = 1; + nodeinfo->cores = 0;
nodeinfo->nodes = 1; # if HAVE_NUMACTL @@ -221,20 +226,10 @@ int linuxNodeInfoCPUPopulate(FILE *cpuinfo, /* NOTE: hyperthreads are ignored here; they are parsed out of /sys */ while (fgets(line, sizeof(line), cpuinfo) != NULL) { char *buf = line; - if (STRPREFIX(buf, "processor")) { /* aka a single logical CPU */ - buf += 9; - while (*buf && c_isspace(*buf)) - buf++; - if (*buf != ':') { - nodeReportError(VIR_ERR_INTERNAL_ERROR, - "%s", _("parsing cpuinfo processor")); - return -1; - } - nodeinfo->cpus++; # if defined(__x86_64__) || \ defined(__amd64__) || \ defined(__i386__) - } else if (STRPREFIX(buf, "cpu MHz")) { + if (STRPREFIX(buf, "cpu MHz")) { char *p; unsigned int ui; buf += 9; @@ -249,24 +244,9 @@ int linuxNodeInfoCPUPopulate(FILE *cpuinfo, /* Accept trailing fractional part. */ && (*p == '\0' || *p == '.' || c_isspace(*p))) nodeinfo->mhz = ui; - } else if (STRPREFIX(buf, "cpu cores")) { /* aka cores */ - char *p; - unsigned int id; - buf += 9; - while (*buf && c_isspace(*buf)) - buf++; - if (*buf != ':' || !buf[1]) { - nodeReportError(VIR_ERR_INTERNAL_ERROR, - _("parsing cpuinfo cpu cores %c"), *buf); - return -1; - } - if (virStrToLong_ui(buf+1, &p, 10, &id) == 0 - && (*p == '\0' || c_isspace(*p)) - && id > nodeinfo->cores) - nodeinfo->cores = id; # elif defined(__powerpc__) || \ defined(__powerpc64__) - } else if (STRPREFIX(buf, "clock")) { + if (STRPREFIX(buf, "clock")) { char *p; unsigned int ui; buf += 5; @@ -281,53 +261,30 @@ int linuxNodeInfoCPUPopulate(FILE *cpuinfo, /* Accept trailing fractional part. */ && (*p == '\0' || *p == '.' || c_isspace(*p))) nodeinfo->mhz = ui; -# elif defined(__s390__) || \ - defined(__s390x__) - } else if (STRPREFIX(buf, "# processors")) { - char *p; - unsigned int ui; - buf += 12; - while (*buf && c_isspace(*buf)) - buf++; - if (*buf != ':' || !buf[1]) { - nodeReportError(VIR_ERR_INTERNAL_ERROR, - _("parsing number of processors %c"), *buf); - return -1; - } - if (virStrToLong_ui(buf+1, &p, 10, &ui) == 0 - && (*p == '\0' || c_isspace(*p))) - nodeinfo->cpus = ui; /* No other interesting infos are available in /proc/cpuinfo. * However, there is a line identifying processor's version, * identification and machine, but we don't want it to be caught * and parsed in next iteration, because it is not in expected * format and thus lead to error. */ - break; # else # warning Parser for /proc/cpuinfo needs to be adapted for your architecture # endif } }
- if (!nodeinfo->cpus) { - nodeReportError(VIR_ERR_INTERNAL_ERROR, - "%s", _("no cpus found")); - return -1; - } - - if (!need_hyperthreads) - return 0; - - /* OK, we've parsed what we can out of /proc/cpuinfo. Get the socket - * and thread information from /sys + /* OK, we've parsed clock speed out of /proc/cpuinfo. Get the core, socket + * thread and topology information from /sys */ - remaining = nodeinfo->cpus; cpudir = opendir(CPU_SYS_PATH); if (cpudir == NULL) { virReportSystemError(errno, _("cannot opendir %s"), CPU_SYS_PATH); return -1; } - while ((errno = 0), remaining && (cpudirent = readdir(cpudir))) { + + CPU_ZERO(&core_mask); + CPU_ZERO(&socket_mask); + + while (cpudirent = readdir(cpudir)) { if (sscanf(cpudirent->d_name, "cpu%u", &cpu) != 1) continue;
@@ -338,15 +295,19 @@ int linuxNodeInfoCPUPopulate(FILE *cpuinfo, } if (!online) continue; - remaining--; + nodeinfo->cpus++;
- socket = parse_socket(cpu); - if (socket < 0) { - closedir(cpudir); - return -1; + /* Parse core */ + core = parse_core(cpu); + if (!CPU_ISSET(core, &core_mask)) { + CPU_SET(core, &core_mask); + nodeinfo->cores++; } - if (!(socket_mask & (1 << socket))) { - socket_mask |= (1 << socket); + + /* Parse socket */ + socket = parse_socket(cpu); + if (!CPU_ISSET(socket, &socket_mask)) { + CPU_SET(socket, &socket_mask); nodeinfo->sockets++; }
@@ -367,7 +328,12 @@ int linuxNodeInfoCPUPopulate(FILE *cpuinfo,
closedir(cpudir);
- /* there should always be at least one socket and one thread */ + /* there should always be at least one cpu, socket and one thread */ + if (nodeinfo->cpus == 0) { + nodeReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("no CPUs found")); + return -1; + } if (nodeinfo->sockets == 0) { nodeReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("no sockets found"));
I checked the output of 'virsh nodeinfo' before and after this patch on RHEL5 and RHEL-6 hosts, and it did not change. When building, however, you get a compiele warning about the unused parameter 'need_hyperthreads'. How come this was needed before, but is not used anymore ? If it is redundant, you should remove the parameter entirely. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Mon, 10 Oct 2011 10:55:46 +0100 "Daniel P. Berrange" <berrange@redhat.com> wrote:
On Sat, Oct 08, 2011 at 12:07:08AM +0530, Prerna Saxena wrote:
Libvirt presently depends on /proc/cpuinfo to gather information about the x86 host. Parsing of /proc/cpuinfo is arch-specific; the information fields are also not consistent. A cleaner way would be to use Sysfs. Both x86 and PowerPC specific information can be obtained from sysfs with different arch specific parsing routines.
--- src/nodeinfo.c | 106 +++++++++++++++++++------------------------------------- 1 files changed, 36 insertions(+), 70 deletions(-)
diff --git a/src/nodeinfo.c b/src/nodeinfo.c index 6448b79..cdd339d 100644 ...[snip]...
I checked the output of 'virsh nodeinfo' before and after this patch on RHEL5 and RHEL-6 hosts, and it did not change.
This validates the patch :) On an x86 system, virsh nodeinfo will give the same output; just that it will now depend on sysfs for learning about host topology, in place of /proc/cpuinfo. On a KVM-ppc64 machine, virsh nodeinfo will now work without having to write a new parser for parsing /proc/cpuinfo for PowerPC.
When building, however, you get a compiele warning about the unused parameter 'need_hyperthreads'. How come this was needed before, but is not used anymore ? If it is redundant, you should remove the parameter entirely.
Agree, I'll do that in the next series of patches which I send. Thanks, -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India

This part of code primarily compares host and guest CPUs of a given architecture for feature compatibility. x86 makes this choice based on CPUID comparison. Presently the PowerPC code has stubs to just get a 'pseries' guest to boot. It would be augmented going forward, to do a detailed feature comparison between guest and host CPUs on powerpc. This part of code is presently well-classified into different architectures, and consequently does not need reorganizing. --- src/Makefile.am | 3 +- src/cpu/cpu.c | 2 + src/cpu/cpu_powerpc.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++ src/cpu/cpu_powerpc.h | 32 +++++++++++++++++++ 4 files changed, 117 insertions(+), 1 deletions(-) create mode 100644 src/cpu/cpu_powerpc.c create mode 100644 src/cpu/cpu_powerpc.h diff --git a/src/Makefile.am b/src/Makefile.am index 738ee91..4f990fd 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -511,7 +511,8 @@ CPU_SOURCES = \ cpu/cpu.h cpu/cpu.c \ cpu/cpu_generic.h cpu/cpu_generic.c \ cpu/cpu_x86.h cpu/cpu_x86.c cpu/cpu_x86_data.h \ - cpu/cpu_map.h cpu/cpu_map.c + cpu/cpu_map.h cpu/cpu_map.c cpu/cpu_powerpc.h \ + cpu/cpu_powerpc.c VMX_SOURCES = \ vmx/vmx.c vmx/vmx.h diff --git a/src/cpu/cpu.c b/src/cpu/cpu.c index 2906be9..e8a65f0 100644 --- a/src/cpu/cpu.c +++ b/src/cpu/cpu.c @@ -28,6 +28,7 @@ #include "xml.h" #include "cpu.h" #include "cpu_x86.h" +#include "cpu_powerpc.h" #include "cpu_generic.h" @@ -36,6 +37,7 @@ static struct cpuArchDriver *drivers[] = { &cpuDriverX86, + &cpuDriverPowerPC, /* generic driver must always be the last one */ &cpuDriverGeneric }; diff --git a/src/cpu/cpu_powerpc.c b/src/cpu/cpu_powerpc.c new file mode 100644 index 0000000..6ceedc3 --- /dev/null +++ b/src/cpu/cpu_powerpc.c @@ -0,0 +1,81 @@ +/* + * cpu_powerpc.h: CPU driver for PowerPC CPUs + * + * Copyright (C) Copyright (C) IBM Corporation, 2010 + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * Authors: + * Anton Blanchard <anton@au.ibm.com> + * Prerna Saxena <prerna@linux.vnet.ibm.com> + */ + +#include <config.h> + +#include "memory.h" +#include "cpu.h" + + +#define VIR_FROM_THIS VIR_FROM_CPU + +static const char *archs[] = { "ppc64" }; + +static union cpuData * +PowerPCNodeData(void) +{ + union cpuData *data; + + if (VIR_ALLOC(data) < 0) { + virReportOOMError(); + return NULL; + } + + return data; +} + + +static int +PowerPCDecode(virCPUDefPtr cpu, + const union cpuData *data, + const char **models, + unsigned int nmodels, + const char *preferred) +{ + return 0; +} + +static int +PowerPCDataFree(union cpuData *data) +{ + if (data == NULL) + return 0; + + VIR_FREE(data); +} + +struct cpuArchDriver cpuDriverPowerPC = { + .name = "ppc64", + .arch = archs, + .narch = ARRAY_CARDINALITY(archs), + .compare = NULL, + .decode = PowerPCDecode, + .encode = NULL, + .free = PowerPCDataFree, + .nodeData = PowerPCNodeData, + .guestData = NULL, + .baseline = NULL, + .update = NULL, + .hasFeature = NULL, +}; diff --git a/src/cpu/cpu_powerpc.h b/src/cpu/cpu_powerpc.h new file mode 100644 index 0000000..2e0c1a5 --- /dev/null +++ b/src/cpu/cpu_powerpc.h @@ -0,0 +1,32 @@ +/* + * cpu_powerpc.h: CPU driver for PowerPC CPUs + * + * Copyright (C) Copyright (C) IBM Corporation, 2010 + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * Authors: + * Anton Blanchard <anton@au.ibm.com> + * Prerna Saxena <prerna@linux.vnet.ibm.com> + */ + +#ifndef __VIR_CPU_POWERPC_H__ +# define __VIR_CPU_POWERPC_H__ + +# include "cpu.h" + +extern struct cpuArchDriver cpuDriverPowerPC; + +#endif /* __VIR_CPU_POWERPC_H__ */ -- 1.7.0.4 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India

On Sat, Oct 08, 2011 at 12:12:46AM +0530, Prerna Saxena wrote:
This part of code primarily compares host and guest CPUs of a given architecture for feature compatibility. x86 makes this choice based on CPUID comparison. Presently the PowerPC code has stubs to just get a 'pseries' guest to boot. It would be augmented going forward, to do a detailed feature comparison between guest and host CPUs on powerpc. This part of code is presently well-classified into different architectures, and consequently does not need reorganizing.
--- src/Makefile.am | 3 +- src/cpu/cpu.c | 2 + src/cpu/cpu_powerpc.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++ src/cpu/cpu_powerpc.h | 32 +++++++++++++++++++ 4 files changed, 117 insertions(+), 1 deletions(-) create mode 100644 src/cpu/cpu_powerpc.c create mode 100644 src/cpu/cpu_powerpc.h
The idea here looks fine.
diff --git a/src/cpu/cpu_powerpc.c b/src/cpu/cpu_powerpc.c new file mode 100644 index 0000000..6ceedc3 --- /dev/null +++ b/src/cpu/cpu_powerpc.c + +#define VIR_FROM_THIS VIR_FROM_CPU + +static const char *archs[] = { "ppc64" };
How about 'ppc' too ?
+static union cpuData * +PowerPCNodeData(void) +{ + union cpuData *data; + + if (VIR_ALLOC(data) < 0) { + virReportOOMError(); + return NULL; + } + + return data; +} + + +static int +PowerPCDecode(virCPUDefPtr cpu, + const union cpuData *data, + const char **models, + unsigned int nmodels, + const char *preferred)
Need to annotate these with 'ATTRIBUTE_UNUSED' to avoid compiler warnings.
+{ + return 0; +} + +static int
Should be 'void'
+PowerPCDataFree(union cpuData *data) +{ + if (data == NULL) + return 0; + + VIR_FREE(data); +} + +struct cpuArchDriver cpuDriverPowerPC = { + .name = "ppc64", + .arch = archs, + .narch = ARRAY_CARDINALITY(archs), + .compare = NULL, + .decode = PowerPCDecode, + .encode = NULL, + .free = PowerPCDataFree, + .nodeData = PowerPCNodeData, + .guestData = NULL, + .baseline = NULL, + .update = NULL, + .hasFeature = NULL, +};
Should we have another copy for 'ppc' arch too ? Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Mon, 10 Oct 2011 10:57:34 +0100 "Daniel P. Berrange" <berrange@redhat.com> wrote:
On Sat, Oct 08, 2011 at 12:12:46AM +0530, Prerna Saxena wrote:
This part of code primarily compares host and guest CPUs of a given architecture for feature compatibility. x86 makes this choice based on CPUID comparison. Presently the PowerPC code has stubs to just get a 'pseries' guest to boot. It would be augmented going forward, to do a detailed feature comparison between guest and host CPUs on powerpc. This part of code is presently well-classified into different architectures, and consequently does not need reorganizing.
--- src/Makefile.am | 3 +- src/cpu/cpu.c | 2 + src/cpu/cpu_powerpc.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++ src/cpu/cpu_powerpc.h | 32 +++++++++++++++++++ 4 files changed, 117 insertions(+), 1 deletions(-) create mode 100644 src/cpu/cpu_powerpc.c create mode 100644 src/cpu/cpu_powerpc.h
The idea here looks fine.
Thanks.
diff --git a/src/cpu/cpu_powerpc.c b/src/cpu/cpu_powerpc.c new file mode 100644 index 0000000..6ceedc3 --- /dev/null +++ b/src/cpu/cpu_powerpc.c + +#define VIR_FROM_THIS VIR_FROM_CPU + +static const char *archs[] = { "ppc64" };
How about 'ppc' too ?
+static union cpuData * +PowerPCNodeData(void) +{ + union cpuData *data; + + if (VIR_ALLOC(data) < 0) { + virReportOOMError(); + return NULL; + } + + return data; +} + + +static int +PowerPCDecode(virCPUDefPtr cpu, + const union cpuData *data, + const char **models, + unsigned int nmodels, + const char *preferred)
Need to annotate these with 'ATTRIBUTE_UNUSED' to avoid compiler warnings.
Will do.
+{ + return 0; +} + +static int
Should be 'void'
Thanks for pointing this out, will fix it.
+PowerPCDataFree(union cpuData *data) +{ + if (data == NULL) + return 0; + + VIR_FREE(data); +} + +struct cpuArchDriver cpuDriverPowerPC = { + .name = "ppc64", + .arch = archs, + .narch = ARRAY_CARDINALITY(archs), + .compare = NULL, + .decode = PowerPCDecode, + .encode = NULL, + .free = PowerPCDataFree, + .nodeData = PowerPCNodeData, + .guestData = NULL, + .baseline = NULL, + .update = NULL, + .hasFeature = NULL, +};
Should we have another copy for 'ppc' arch too ?
At present, I've tested this with KVM for Power ISA Book3S machines, which introduces a new machine type "pseries" based on the ppc64 architecture. Extending this to 'ppc' architecture should be a trivial effort -- I'll add that as soon as I get to test it. -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India

This patch has a small parser which correctly interprets qemu-system-ppc64 output for a specific query. This code is independent for powerpc and largely does not interfere with x86 implementation. --- src/qemu/qemu_capabilities.c | 64 ++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 64 insertions(+), 0 deletions(-) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 850d46e..5bdb304 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -181,6 +181,7 @@ static const struct qemu_arch_info const arch_info_hvm[] = { { "mipsel", 32, NULL, "qemu-system-mipsel", NULL, NULL, 0 }, { "sparc", 32, NULL, "qemu-system-sparc", NULL, NULL, 0 }, { "ppc", 32, NULL, "qemu-system-ppc", NULL, NULL, 0 }, + { "ppc64", 64, NULL, "qemu-system-ppc64", NULL, NULL, 0 }, { "itanium", 64, NULL, "qemu-system-ia64", NULL, NULL, 0 }, { "s390x", 64, NULL, "qemu-system-s390x", NULL, NULL, 0 }, }; @@ -473,6 +474,67 @@ error: return -1; } +/* ppc64 parser. + * Format : PowerPC <machine> <description> + */ +static int +qemuCapsParsePPCModels(const char *output, + unsigned int *retcount, + const char ***retcpus) +{ + const char *p = output; + const char *next; + unsigned int count = 0; + const char **cpus = NULL; + int i; + do { + const char *t; + + if ((next = strchr(p, '\n'))) + next++; + + if (!STRPREFIX(p, "PowerPC ")) + continue; + + /* Skip the preceding sub-string "PowerPC " */ + p += 8; + + /*Malformed string, does not obey the format 'PowerPC <model> <desc>'*/ + if (!(t = strchr(p, ' ')) || (next && t >= next)) + continue; + + if (*p == '\0' || *p == '\n') + continue; + + if (retcpus) { + unsigned int len; + + if (VIR_REALLOC_N(cpus, count + 1) < 0) + goto error; + + if (t) + len = t - p - 1; + + if (!(cpus[count] = strndup(p, len))) + goto error; + } + count++; + } while ((p = next)); + + if (retcount) + *retcount = count; + if (retcpus) + *retcpus = cpus; + return 0; + +error: + if (cpus) { + for (i = 0; i < count; i++) + VIR_FREE(cpus[i]); + } + VIR_FREE(cpus); + return -1; +} int qemuCapsProbeCPUModels(const char *qemu, @@ -493,6 +555,8 @@ qemuCapsProbeCPUModels(const char *qemu, if (STREQ(arch, "i686") || STREQ(arch, "x86_64")) parse = qemuCapsParseX86Models; + else if (STREQ(arch, "ppc64")) + parse = qemuCapsParsePPCModels; else { VIR_DEBUG("don't know how to parse %s CPU models", arch); return 0; -- 1.7.0.4 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India

On Sat, Oct 08, 2011 at 12:17:18AM +0530, Prerna Saxena wrote:
This patch has a small parser which correctly interprets qemu-system-ppc64 output for a specific query. This code is independent for powerpc and largely does not interfere with x86 implementation.
--- src/qemu/qemu_capabilities.c | 64 ++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 64 insertions(+), 0 deletions(-)
diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 850d46e..5bdb304 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -181,6 +181,7 @@ static const struct qemu_arch_info const arch_info_hvm[] = { { "mipsel", 32, NULL, "qemu-system-mipsel", NULL, NULL, 0 }, { "sparc", 32, NULL, "qemu-system-sparc", NULL, NULL, 0 }, { "ppc", 32, NULL, "qemu-system-ppc", NULL, NULL, 0 }, + { "ppc64", 64, NULL, "qemu-system-ppc64", NULL, NULL, 0 }, { "itanium", 64, NULL, "qemu-system-ia64", NULL, NULL, 0 }, { "s390x", 64, NULL, "qemu-system-s390x", NULL, NULL, 0 }, }; @@ -473,6 +474,67 @@ error: return -1; }
+/* ppc64 parser. + * Format : PowerPC <machine> <description> + */ +static int +qemuCapsParsePPCModels(const char *output, + unsigned int *retcount, + const char ***retcpus) +{ + const char *p = output; + const char *next; + unsigned int count = 0; + const char **cpus = NULL; + int i; + do { + const char *t; + + if ((next = strchr(p, '\n'))) + next++; + + if (!STRPREFIX(p, "PowerPC ")) + continue; + + /* Skip the preceding sub-string "PowerPC " */ + p += 8; + + /*Malformed string, does not obey the format 'PowerPC <model> <desc>'*/ + if (!(t = strchr(p, ' ')) || (next && t >= next)) + continue; + + if (*p == '\0' || *p == '\n') + continue; + + if (retcpus) { + unsigned int len; + + if (VIR_REALLOC_N(cpus, count + 1) < 0) + goto error; + + if (t) + len = t - p - 1; + + if (!(cpus[count] = strndup(p, len))) + goto error; + } + count++; + } while ((p = next)); + + if (retcount) + *retcount = count; + if (retcpus) + *retcpus = cpus; + return 0; + +error: + if (cpus) { + for (i = 0; i < count; i++) + VIR_FREE(cpus[i]); + } + VIR_FREE(cpus); + return -1; +}
int qemuCapsProbeCPUModels(const char *qemu, @@ -493,6 +555,8 @@ qemuCapsProbeCPUModels(const char *qemu,
if (STREQ(arch, "i686") || STREQ(arch, "x86_64")) parse = qemuCapsParseX86Models; + else if (STREQ(arch, "ppc64")) + parse = qemuCapsParsePPCModels; else { VIR_DEBUG("don't know how to parse %s CPU models", arch); return 0;
Same question about whether we should include 'ppc' in the arch test here too ? ACK to the rest of the code anyway. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Mon, 10 Oct 2011 10:58:42 +0100 "Daniel P. Berrange" <berrange@redhat.com> wrote:
On Sat, Oct 08, 2011 at 12:17:18AM +0530, Prerna Saxena wrote:
This patch has a small parser which correctly interprets qemu-system-ppc64 output for a specific query. This code is independent for powerpc and largely does not interfere with x86 implementation.
--- src/qemu/qemu_capabilities.c | 64 ++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 64 insertions(+), 0 deletions(-)
diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 850d46e..5bdb304 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c ...[snip]...
Same question about whether we should include 'ppc' in the arch test here too ?
As explained earlier, it should be a trivial effort.
ACK to the rest of the code anyway.
Thanks. -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India

This patch is a hack at the moment and would need code refactoring to split-out the defaults for x86 and powerpc. Libvirt chooses a set of default options such as disk controller, network specific options, etc which are suitable for a x86 host. These defaults are arch specific and hence libvirt needs a runtime switch to setup defaults based on host architecture. Libvirt should have this routine split per-arch at the right level with minimal code duplication. --- src/qemu/qemu_command.c | 30 +++++++++++++++++++++++++++++- 1 files changed, 29 insertions(+), 1 deletions(-) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 0adc56a..3040f6a 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -1385,6 +1385,8 @@ qemuBuildDeviceAddressStr(virBufferPtr buf, * When QEMU grows support for > 1 PCI domain, then pci.0 change * to pciNN.0 where NN is the domain number */ +/* Prerna hack : remove PCI reference in command line */ +#if 0 if (qemuCapsGet(qemuCaps, QEMU_CAPS_PCI_MULTIBUS)) virBufferAsprintf(buf, ",bus=pci.0"); else @@ -1394,6 +1396,7 @@ qemuBuildDeviceAddressStr(virBufferPtr buf, info->addr.pci.slot, info->addr.pci.function); else virBufferAsprintf(buf, ",addr=0x%x", info->addr.pci.slot); +#endif } else if (info->type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_USB) { virBufferAsprintf(buf, ",bus="); qemuUsbId(buf, info->addr.usb.bus); @@ -1565,14 +1568,22 @@ qemuBuildDriveStr(virDomainDiskDefPtr disk, virBufferAsprintf(&opt, "file=%s,", disk->src); } } + +/* Prerna hack : force 'if=scsi' for powerKVM cmd line */ +#if 0 if (qemuCapsGet(qemuCaps, QEMU_CAPS_DEVICE)) virBufferAddLit(&opt, "if=none"); else - virBufferAsprintf(&opt, "if=%s", bus); +#endif + /* force 'if=scsi' for qemu command line output */ + virBufferAsprintf(&opt, "if=%s", bus); + if (disk->device == VIR_DOMAIN_DISK_DEVICE_CDROM) virBufferAddLit(&opt, ",media=cdrom"); +/* Prerna Hack : remove 'id=drive-scsi0-0-0' */ +#if 0 if (qemuCapsGet(qemuCaps, QEMU_CAPS_DEVICE)) { virBufferAsprintf(&opt, ",id=%s%s", QEMU_DRIVE_HOST_PREFIX, disk->info.alias); } else { @@ -1586,6 +1597,8 @@ qemuBuildDriveStr(virDomainDiskDefPtr disk, virBufferAsprintf(&opt, ",unit=%d", unitid); } } +#endif + if (bootable && qemuCapsGet(qemuCaps, QEMU_CAPS_DRIVE_BOOT) && disk->device == VIR_DOMAIN_DISK_DEVICE_DISK && @@ -3380,6 +3393,8 @@ qemuBuildCommandLine(virConnectPtr conn, if (!def->graphics) virCommandAddArg(cmd, "-nographic"); +/* Prerna hack : remove unnecesary options */ +#if 0 if (qemuCapsGet(qemuCaps, QEMU_CAPS_DEVICE)) { if (qemuCapsGet(qemuCaps, QEMU_CAPS_NODEFCONFIG)) virCommandAddArg(cmd, @@ -3387,6 +3402,7 @@ qemuBuildCommandLine(virConnectPtr conn, virCommandAddArg(cmd, "-nodefaults"); /* Disable default guest devices */ } +#endif /* Serial graphics adapter */ if (def->os.bios.useserial == VIR_DOMAIN_BIOS_USESERIAL_YES) { @@ -3692,6 +3708,8 @@ qemuBuildCommandLine(virConnectPtr conn, } usblegacy = true; } else { +/* Prerna hack : remove invocation for -drive lsi */ +#if 0 virCommandAddArg(cmd, "-device"); char *devstr; @@ -3701,6 +3719,7 @@ qemuBuildCommandLine(virConnectPtr conn, virCommandAddArg(cmd, devstr); VIR_FREE(devstr); +#endif } } } @@ -3833,6 +3852,8 @@ qemuBuildCommandLine(virConnectPtr conn, bootindex); } } else { +/* Prerna hack: Remove -device string */ +#if 0 virCommandAddArg(cmd, "-device"); if (!(optstr = qemuBuildDriveDevStr(disk, bootindex, @@ -3840,6 +3861,7 @@ qemuBuildCommandLine(virConnectPtr conn, goto error; virCommandAddArg(cmd, optstr); VIR_FREE(optstr); +#endif } } } @@ -4250,9 +4272,12 @@ qemuBuildCommandLine(virConnectPtr conn, virCommandAddArg(cmd, devstr); VIR_FREE(devstr); +/* Prerna hack : remove 'isa-serial' device */ +#if 0 virCommandAddArg(cmd, "-device"); virCommandAddArgFormat(cmd, "isa-serial,chardev=char%s,id=%s", serial->info.alias, serial->info.alias); +#endif } else { virCommandAddArg(cmd, "-serial"); if (!(devstr = qemuBuildChrArgStr(&serial->source, NULL))) @@ -5089,6 +5114,8 @@ qemuBuildCommandLine(virConnectPtr conn, * NB: Earlier we declared that VirtIO balloon will always be in * slot 0x3 on bus 0x0 */ +/* Prerna Hack : Remove all virtio-balloon devices */ +#if 0 if ((def->memballoon) && (def->memballoon->model != VIR_DOMAIN_MEMBALLOON_MODEL_NONE)) { if (def->memballoon->model != VIR_DOMAIN_MEMBALLOON_MODEL_VIRTIO) { @@ -5110,6 +5137,7 @@ qemuBuildCommandLine(virConnectPtr conn, virCommandAddArgList(cmd, "-balloon", "virtio", NULL); } } +#endif if (snapshot) virCommandAddArgList(cmd, "-loadvm", snapshot->def->name, NULL); -- 1.7.0.4 -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India

On Sat, Oct 08, 2011 at 12:19:11AM +0530, Prerna Saxena wrote:
This patch is a hack at the moment and would need code refactoring to split-out the defaults for x86 and powerpc. Libvirt chooses a set of default options such as disk controller, network specific options, etc which are suitable for a x86 host. These defaults are arch specific and hence libvirt needs a runtime switch to setup defaults based on host architecture. Libvirt should have this routine split per-arch at the right level with minimal code duplication.
--- src/qemu/qemu_command.c | 30 +++++++++++++++++++++++++++++- 1 files changed, 29 insertions(+), 1 deletions(-)
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 0adc56a..3040f6a 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -1385,6 +1385,8 @@ qemuBuildDeviceAddressStr(virBufferPtr buf, * When QEMU grows support for > 1 PCI domain, then pci.0 change * to pciNN.0 where NN is the domain number */ +/* Prerna hack : remove PCI reference in command line */ +#if 0 if (qemuCapsGet(qemuCaps, QEMU_CAPS_PCI_MULTIBUS)) virBufferAsprintf(buf, ",bus=pci.0"); else @@ -1394,6 +1396,7 @@ qemuBuildDeviceAddressStr(virBufferPtr buf, info->addr.pci.slot, info->addr.pci.function); else virBufferAsprintf(buf, ",addr=0x%x", info->addr.pci.slot); +#endif } else if (info->type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_USB) { virBufferAsprintf(buf, ",bus="); qemuUsbId(buf, info->addr.usb.bus); @@ -1565,14 +1568,22 @@ qemuBuildDriveStr(virDomainDiskDefPtr disk, virBufferAsprintf(&opt, "file=%s,", disk->src); } } + +/* Prerna hack : force 'if=scsi' for powerKVM cmd line */ +#if 0 if (qemuCapsGet(qemuCaps, QEMU_CAPS_DEVICE)) virBufferAddLit(&opt, "if=none"); else - virBufferAsprintf(&opt, "if=%s", bus); +#endif + /* force 'if=scsi' for qemu command line output */ + virBufferAsprintf(&opt, "if=%s", bus); +
if (disk->device == VIR_DOMAIN_DISK_DEVICE_CDROM) virBufferAddLit(&opt, ",media=cdrom");
+/* Prerna Hack : remove 'id=drive-scsi0-0-0' */ +#if 0 if (qemuCapsGet(qemuCaps, QEMU_CAPS_DEVICE)) { virBufferAsprintf(&opt, ",id=%s%s", QEMU_DRIVE_HOST_PREFIX, disk->info.alias); } else { @@ -1586,6 +1597,8 @@ qemuBuildDriveStr(virDomainDiskDefPtr disk, virBufferAsprintf(&opt, ",unit=%d", unitid); } } +#endif + if (bootable && qemuCapsGet(qemuCaps, QEMU_CAPS_DRIVE_BOOT) && disk->device == VIR_DOMAIN_DISK_DEVICE_DISK && @@ -3380,6 +3393,8 @@ qemuBuildCommandLine(virConnectPtr conn, if (!def->graphics) virCommandAddArg(cmd, "-nographic");
+/* Prerna hack : remove unnecesary options */ +#if 0 if (qemuCapsGet(qemuCaps, QEMU_CAPS_DEVICE)) { if (qemuCapsGet(qemuCaps, QEMU_CAPS_NODEFCONFIG)) virCommandAddArg(cmd, @@ -3387,6 +3402,7 @@ qemuBuildCommandLine(virConnectPtr conn, virCommandAddArg(cmd, "-nodefaults"); /* Disable default guest devices */ } +#endif
/* Serial graphics adapter */ if (def->os.bios.useserial == VIR_DOMAIN_BIOS_USESERIAL_YES) { @@ -3692,6 +3708,8 @@ qemuBuildCommandLine(virConnectPtr conn, } usblegacy = true; } else { +/* Prerna hack : remove invocation for -drive lsi */ +#if 0 virCommandAddArg(cmd, "-device");
char *devstr; @@ -3701,6 +3719,7 @@ qemuBuildCommandLine(virConnectPtr conn,
virCommandAddArg(cmd, devstr); VIR_FREE(devstr); +#endif } } } @@ -3833,6 +3852,8 @@ qemuBuildCommandLine(virConnectPtr conn, bootindex); } } else { +/* Prerna hack: Remove -device string */ +#if 0 virCommandAddArg(cmd, "-device");
if (!(optstr = qemuBuildDriveDevStr(disk, bootindex, @@ -3840,6 +3861,7 @@ qemuBuildCommandLine(virConnectPtr conn, goto error; virCommandAddArg(cmd, optstr); VIR_FREE(optstr); +#endif } } } @@ -4250,9 +4272,12 @@ qemuBuildCommandLine(virConnectPtr conn, virCommandAddArg(cmd, devstr); VIR_FREE(devstr);
+/* Prerna hack : remove 'isa-serial' device */ +#if 0 virCommandAddArg(cmd, "-device"); virCommandAddArgFormat(cmd, "isa-serial,chardev=char%s,id=%s", serial->info.alias, serial->info.alias); +#endif } else { virCommandAddArg(cmd, "-serial"); if (!(devstr = qemuBuildChrArgStr(&serial->source, NULL))) @@ -5089,6 +5114,8 @@ qemuBuildCommandLine(virConnectPtr conn, * NB: Earlier we declared that VirtIO balloon will always be in * slot 0x3 on bus 0x0 */ +/* Prerna Hack : Remove all virtio-balloon devices */ +#if 0 if ((def->memballoon) && (def->memballoon->model != VIR_DOMAIN_MEMBALLOON_MODEL_NONE)) { if (def->memballoon->model != VIR_DOMAIN_MEMBALLOON_MODEL_VIRTIO) { @@ -5110,6 +5137,7 @@ qemuBuildCommandLine(virConnectPtr conn, virCommandAddArgList(cmd, "-balloon", "virtio", NULL); } } +#endif
if (snapshot) virCommandAddArgList(cmd, "-loadvm", snapshot->def->name, NULL);
I have a rough idea of why you need most of these changes, but to just clarify things, could you show us the following data from a PPC QEMU guest after you have this patch applied. - XML configuration file - The QEMU ARGV from /var/log/libvirt/qemu/$GUESTNAME.log - The output of $ virsh qemu-monitor-command --hmp $GUESTNAME 'info qtree' $ virsh qemu-monitor-command --hmp $GUESTNAME 'info pci' $ virsh qemu-monitor-command --hmp $GUESTNAME 'info block' - The output of $ qemu-kvm -device '?' Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Mon, 10 Oct 2011 11:02:08 +0100 "Daniel P. Berrange" <berrange@redhat.com> wrote:
On Sat, Oct 08, 2011 at 12:19:11AM +0530, Prerna Saxena wrote:
This patch is a hack at the moment and would need code refactoring to split-out the defaults for x86 and powerpc. Libvirt chooses a set of default options such as disk controller, network specific options, etc which are suitable for a x86 host. These defaults are arch specific and hence libvirt needs a runtime switch to setup defaults based on host architecture. Libvirt should have this routine split per-arch at the right level with minimal code duplication.
--- src/qemu/qemu_command.c | 30 +++++++++++++++++++++++++++++- 1 files changed, 29 insertions(+), 1 deletions(-)
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 0adc56a..3040f6a 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c
...[snip]...
I have a rough idea of why you need most of these changes, but to just clarify things, could you show us the following data from a PPC QEMU guest after you have this patch applied.
- XML configuration file
<domain type='kvm'> <name>test-1</name> <memory>1048576</memory> <memoryBacking> <hugepages/> </memoryBacking> <vcpu>2</vcpu> <os> <type arch='ppc64' machine='pseries'>hvm</type> </os> <features> <acpi/> </features> <clock offset='utc'/> <devices> <emulator>qemu-system-ppc64</emulator> <disk type='file' device='disk' > <driver name='qemu' type='raw'/> <source file='/TEST.img'/> <target dev='sda' bus='scsi'/> </disk> <serial type='pty'> <target port='0'/> </serial> <console type='pty'> <target type='serial' port='0'/> </console> <graphics type='vnc' port='-1' autoport='yes' listen='www.xxx.yyy.zzz'/> </devices> </domain>
- The QEMU ARGV from /var/log/libvirt/qemu/$GUESTNAME.log
qemu-system-ppc64 -S -M pseries -enable-kvm -m 4096 -mem-prealloc \ -mem-path /hugetlbfs/libvirt/qemu -smp 2,sockets=2,cores=1,\ threads=1 -name test1 -chardev socket,id=charmonitor,\ path=/var/lib/libvirt/qemu/test1.monitor,server,nowait \ -mon chardev=charmonitor,id=monitor,mode=readline -rtc base=utc \ -drive file=/TEST.img,if=scsi,format=raw -chardev pty,id=charserial0\ -usb -vnc www.xxx.yyy.zzz -vga cirrus
- The output of
$ virsh qemu-monitor-command --hmp $GUESTNAME 'info qtree'
virsh # qemu-monitor-command --hmp test-1 'info qtree' bus: main-system-bus type System dev: spapr-virtio-bridge, id "" irq 0 bus: spapr-virtio type spapr-virtio dev: spapr-vio-bridge, id "" irq 0 bus: spapr-vio type spapr-vio dev: spapr-vscsi, id "v-scsi@2000" dev-prop: reg = 8192 bus-prop: irq = 18 bus: v-scsi@2000.0 type SCSI dev: scsi-disk, id "" dev-prop: drive = scsi0-cd2 dev-prop: logical_block_size = 512 dev-prop: physical_block_size = 512 dev-prop: min_io_size = 0 dev-prop: opt_io_size = 0 dev-prop: bootindex = -1 dev-prop: discard_granularity = 0 dev-prop: ver = "0.15.50" dev-prop: serial = <null> dev-prop: removable = off bus-prop: scsi-id = 2 bus-prop: lun = 0 dev: scsi-disk, id "" dev-prop: drive = scsi0-hd0 dev-prop: logical_block_size = 512 dev-prop: physical_block_size = 512 dev-prop: min_io_size = 0 dev-prop: opt_io_size = 0 dev-prop: bootindex = -1 dev-prop: discard_granularity = 0 dev-prop: ver = "0.15.50" dev-prop: serial = <null> dev-prop: removable = off bus-prop: scsi-id = 0 bus-prop: lun = 0 dev: spapr-vlan, id "l-lan@1000" dev-prop: reg = 4096 dev-prop: mac = 52:54:00:12:34:56 dev-prop: vlan = 0 dev-prop: netdev = <null> dev-prop: bootindex = -1 bus-prop: irq = 17 dev: spapr-vty, id "vty@30000000" dev-prop: reg = 805306368 dev-prop: chardev = serial0 bus-prop: irq = 16
$ virsh qemu-monitor-command --hmp $GUESTNAME 'info pci' $ virsh qemu-monitor-command --hmp $GUESTNAME 'info block'
virsh # qemu-monitor-command --hmp test-1 'info pci' virsh # qemu-monitor-command --hmp test-1 'info block' scsi0-hd0: removable=0 file=/home/prerna/RHEL6_install.img ro=0 drv=raw encrypted=0 scsi0-cd2: removable=1 locked=0 tray-open=0 [not inserted] floppy0: removable=1 locked=0 tray-open=0 [not inserted] sd0: removable=1 locked=0 tray-open=0 [not inserted]
- The output of $ qemu-kvm -device '?'
Throws a lot of devices, I'm pasting a few here : name "xilinx,ethlite", bus System name "spapr-vty", bus spapr-vio name "spapr-vscsi", bus spapr-vio name "scsi-cd", bus SCSI, desc "virtual SCSI CD-ROM" name "scsi-hd", bus SCSI, desc "virtual SCSI disk" ... Thanks, -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India

On Fri, Oct 07, 2011 at 11:58:16PM +0530, Prerna Saxena wrote:
Libvirt continues to be the key interface to configure and manage the KVM guest instances on x86. We would like to enhance libvirt to support KVM guest configuration and management on Power Book3S machines.
Yep, we'd like to see other architectures more widely used/tested/ supported in libvirt.
Some of the key changes that would be needed in libvirt to manage guests on qemu-system-ppc64 'pseries' platform would be as follows:
1) A new driver would be needed for PowerPC CPU, to identify and filter supported PowerPC-CPU families.
Ok, we were expecting that.
2) A new set of interfaces would be needed to extract host system and firmware information (SMBIOS and host-OS information like CPU and memory topology, processor type, hardware threads, etc).
Yes, we more or less expected that too.
3) Clean abstraction of platform-specific references such as ACPI dependencies from generic domain initialization sequence. (Many such options get automatically selected by libvirt, unless they are explicitly flagged as unrequired in XML. This default list will differ for every architecture).
4) A mechanism to specify the list of allowable devices for every architecture -- the 'pseries' vm would boot with its own set of devices, some of which may not be available with x86.
Yep, these 2 are the bulk of the work, mostly in the qemu_command.c file I expect.
Approach 1: ----------- Create a new host backend for powerpc-kvm -- similar to xen, kvm, esx, etc.
Advantage : Even if the qemu device model on ppc varies significantly, the difference between the device model between qemu-system-ppc64 and qemu-system-x86_64 can be easily managed. It could possibly allow easier ways of segregating supported devices, and also specifying a new set of methods to gather host system information.
Drawback: - Overhead of maintaining a new backend, which is expected to have some similarities with the x86-specific 'KVM' backend. - Might get confusing for end users.
This approach is totally out of the question. There are many 1000's of lines of code in the QEMU driver, of which very little is architecture specific. It just would not be a good use of resources to fork that.
Approach 2: ----------- Hack the present 'kvm' backend to add powerpc-specific features.
Advantage: Having a seamless 'kvm' interface across architectures would be of more convenience to the end-user -- a single XML spec could work with only a small subset of arch-specific changes. Also, newer features that come in for one arch would be more easily ported to others. However, it would entail more run-time switches based on the host kvm architecture.
One way to do this would be to add a new arch-specific layer within the 'kvm' backend. This would be compiled-in depending on the host architecture, and would expose only those features (system information, devices, features etc) which are implemented in kvm _on_that_platform_.
Drawback: This will cause some rewriting of how internal qemu/kvm interfaces interact in libvirt.
I don't think that's a drawback actually. What you're actually doing here is just improving the quality of our existing driver, by removing some bogus assumptions it has in it. So making it work with PPC will benefit us in general. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Hi Daniel, Thanks for taking time to review this. On Mon, 10 Oct 2011 11:09:34 +0100 "Daniel P. Berrange" <berrange@redhat.com> wrote:
On Fri, Oct 07, 2011 at 11:58:16PM +0530, Prerna Saxena wrote:
Libvirt continues to be the key interface to configure and manage the KVM guest instances on x86. We would like to enhance libvirt to support KVM guest configuration and management on Power Book3S machines.
Yep, we'd like to see other architectures more widely used/tested/ supported in libvirt.
Nice to hear this.
Some of the key changes that would be needed in libvirt to manage guests on qemu-system-ppc64 'pseries' platform would be as follows:
1) A new driver would be needed for PowerPC CPU, to identify and filter supported PowerPC-CPU families.
Ok, we were expecting that.
2) A new set of interfaces would be needed to extract host system and firmware information (SMBIOS and host-OS information like CPU and memory topology, processor type, hardware threads, etc).
Yes, we more or less expected that too.
3) Clean abstraction of platform-specific references such as ACPI dependencies from generic domain initialization sequence. (Many such options get automatically selected by libvirt, unless they are explicitly flagged as unrequired in XML. This default list will differ for every architecture).
4) A mechanism to specify the list of allowable devices for every architecture -- the 'pseries' vm would boot with its own set of devices, some of which may not be available with x86.
Yep, these 2 are the bulk of the work, mostly in the qemu_command.c file I expect.
Yes. For starters, I'm seeing if qemu-command.c can be split into : qemu-command-common.c qemu-command-x86.c qemu-command-ppc64.c Depending on the host architecture, qemu-command-common.o could be linked with the respective arch-specific device spec. Makefile would resemble something like this: ... qemu-command-y = qemu-command-common.o qemu-command-y += qemu-command-$(TARGET_BASE_ARCH) ... It is just an initial thought. Suggestions ??
Approach 1: ----------- Create a new host backend for powerpc-kvm -- similar to xen, kvm, esx, etc.
Advantage : Even if the qemu device model on ppc varies significantly, the difference between the device model between qemu-system-ppc64 and qemu-system-x86_64 can be easily managed. It could possibly allow easier ways of segregating supported devices, and also specifying a new set of methods to gather host system information.
Drawback: - Overhead of maintaining a new backend, which is expected to have some similarities with the x86-specific 'KVM' backend. - Might get confusing for end users.
This approach is totally out of the question. There are many 1000's of lines of code in the QEMU driver, of which very little is architecture specific. It just would not be a good use of resources to fork that.
Approach 2: ----------- Hack the present 'kvm' backend to add powerpc-specific features.
Advantage: Having a seamless 'kvm' interface across architectures would be of more convenience to the end-user -- a single XML spec could work with only a small subset of arch-specific changes. Also, newer features that come in for one arch would be more easily ported to others. However, it would entail more run-time switches based on the host kvm architecture.
One way to do this would be to add a new arch-specific layer within the 'kvm' backend. This would be compiled-in depending on the host architecture, and would expose only those features (system information, devices, features etc) which are implemented in kvm _on_that_platform_.
Drawback: This will cause some rewriting of how internal qemu/kvm interfaces interact in libvirt.
I don't think that's a drawback actually. What you're actually doing here is just improving the quality of our existing driver, by removing some bogus assumptions it has in it. So making it work with PPC will benefit us in general.
Agree. Let me work on interfaces that would allow libvirt to choose default guest devices based on host architecture, for a guest running on 'KVM' hypervisor. Will send the patches out soon.
Regards, Daniel
-- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India

On Thu, Oct 13, 2011 at 17:48:52 +0530, Prerna Saxena wrote:
Yep, these 2 are the bulk of the work, mostly in the qemu_command.c file I expect.
Yes. For starters, I'm seeing if qemu-command.c can be split into : qemu-command-common.c qemu-command-x86.c qemu-command-ppc64.c
Depending on the host architecture, qemu-command-common.o could be linked with the respective arch-specific device spec. Makefile would resemble something like this: ... qemu-command-y = qemu-command-common.o qemu-command-y += qemu-command-$(TARGET_BASE_ARCH) ...
It is just an initial thought. Suggestions ??
That's not the right approach. The code that needs to be used to generate qemu command line doesn't depend on the architecture on which libvirtd is running but rather on the architecture of the domain you're going to start. There's no reason you should not be allowed to run qemu-system-ppc64 in purely emulating mode on x86_64 host. We should not artificially disable this usage. Jirka

On 10/13/2011 06:53 PM, Jiri Denemark wrote:
On Thu, Oct 13, 2011 at 17:48:52 +0530, Prerna Saxena wrote:
Yep, these 2 are the bulk of the work, mostly in the qemu_command.c file I expect.
Yes. For starters, I'm seeing if qemu-command.c can be split into : qemu-command-common.c qemu-command-x86.c qemu-command-ppc64.c
Depending on the host architecture, qemu-command-common.o could be linked with the respective arch-specific device spec. Makefile would resemble something like this: ... qemu-command-y = qemu-command-common.o qemu-command-y += qemu-command-$(TARGET_BASE_ARCH) ...
It is just an initial thought. Suggestions ??
That's not the right approach. The code that needs to be used to generate qemu command line doesn't depend on the architecture on which libvirtd is running but rather on the architecture of the domain you're going to start. There's no reason you should not be allowed to run qemu-system-ppc64 in purely emulating mode on x86_64 host. We should not artificially disable this usage.
Jirka
Hi Jirka, Thanks for pointing it out-- this aspect had escaped my attention. I'll work on how we can create an additional interface which would let libvirt choose appropriate devices at runtime, depending on guest domain architecture. Thanks, -- Prerna Saxena Linux Technology Centre, IBM Systems and Technology Lab, Bangalore, India

On Fri, Oct 14, 2011 at 01:20:01PM +0530, Prerna Saxena wrote:
On 10/13/2011 06:53 PM, Jiri Denemark wrote:
On Thu, Oct 13, 2011 at 17:48:52 +0530, Prerna Saxena wrote:
Yep, these 2 are the bulk of the work, mostly in the qemu_command.c file I expect.
Yes. For starters, I'm seeing if qemu-command.c can be split into : qemu-command-common.c qemu-command-x86.c qemu-command-ppc64.c
Depending on the host architecture, qemu-command-common.o could be linked with the respective arch-specific device spec. Makefile would resemble something like this: ... qemu-command-y = qemu-command-common.o qemu-command-y += qemu-command-$(TARGET_BASE_ARCH) ...
It is just an initial thought. Suggestions ??
That's not the right approach. The code that needs to be used to generate qemu command line doesn't depend on the architecture on which libvirtd is running but rather on the architecture of the domain you're going to start. There's no reason you should not be allowed to run qemu-system-ppc64 in purely emulating mode on x86_64 host. We should not artificially disable this usage.
Jirka
Hi Jirka, Thanks for pointing it out-- this aspect had escaped my attention. I'll work on how we can create an additional interface which would let libvirt choose appropriate devices at runtime, depending on guest domain architecture.
If you have a virDomainDefPtr object instance, you can look at the field def->os.arch and make decisions from that. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (3)
-
Daniel P. Berrange
-
Jiri Denemark
-
Prerna Saxena