[libvirt] [PATCH 1/1] Remove contiguous CPU indexes assumption

From: Li Zhang <zhlcindy@linux.vnet.ibm.com> When getting CPUs' information, it assumes that CPU indexes are not contiguous. But for ppc64 platform, CPU indexes are not contiguous because SMT is needed to be disabled, so CPU information is not right on ppc64 and vpuinfo, vcpupin can't work corretly. This patch is to remove the assumption to be compatible with ppc64. Test: 4 vcpus are assigned to one VM and execute vcpuinfo command. Without patch: There is only one vcpu informaion can be listed. With patch: All vcpus' information can be listed correctly. Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com> --- src/qemu/qemu_monitor_json.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 9991a0a..e130f8c 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -1230,13 +1230,6 @@ qemuMonitorJSONExtractCPUInfo(virJSONValuePtr reply, goto cleanup; } - if (cpu != i) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("unexpected cpu index %d expecting %d"), - i, cpu); - goto cleanup; - } - threads[i] = thread; } -- 1.7.10.1

Hi Eric, This should belong to bug-fix, could it be pushed to 1.0.3? Thanks. On Wed, Feb 27, 2013 at 11:18 AM, Li Zhang <zhlcindy@gmail.com> wrote:
From: Li Zhang <zhlcindy@linux.vnet.ibm.com>
When getting CPUs' information, it assumes that CPU indexes are not contiguous. But for ppc64 platform, CPU indexes are not contiguous because SMT is needed to be disabled, so CPU information is not right on ppc64 and vpuinfo, vcpupin can't work corretly.
This patch is to remove the assumption to be compatible with ppc64.
Test: 4 vcpus are assigned to one VM and execute vcpuinfo command.
Without patch: There is only one vcpu informaion can be listed. With patch: All vcpus' information can be listed correctly.
Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com> --- src/qemu/qemu_monitor_json.c | 7 ------- 1 file changed, 7 deletions(-)
diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 9991a0a..e130f8c 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -1230,13 +1230,6 @@ qemuMonitorJSONExtractCPUInfo(virJSONValuePtr reply, goto cleanup; }
- if (cpu != i) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("unexpected cpu index %d expecting %d"), - i, cpu); - goto cleanup; - } - threads[i] = thread; }
-- 1.7.10.1
-- Best Regards -Li

On 02/27/2013 05:13 AM, Li Zhang wrote:
Hi Eric,
This should belong to bug-fix, could it be pushed to 1.0.3?
Without any test case under 'make check' that exposes the failure, I'm a bit worried that this might cause regressions on other setups. I'm not seven sure I even understand the scope of the problem. Is it something specific to running a qemu-system-ppc64 emulator (but can be reproduced on any host architecture), or is it specific to running on a powerpc host (but happens for any version of qemu-* target architecture), or is it something else altogether? We have test framework in place to allow replaying of a QMP JSON response and seeing how qemu_monitor_json.c deals with it; what I'd really like to see is a side-by-side comparison of what the QMP 'query-cpus' command returns for a guest with multiple vcpus on a setup where you are seeing the problem, when compared to a setup that does not have the issue. You can get at this with virsh qemu-monitor-command $dom '{"execute":"query-cpus"}', if that helps. To help you out, here's what I got for a guest using 3 vcpus on my x86_64 host machine and using the qemu-kvm binary: # virsh qemu-monitor-command guest '{"execute":"query-cpus"}' {"return":[{"current":true,"CPU":0,"pc":1025829,"halted":false,"thread_id":5849},{"current":false,"CPU":1,"pc":1005735,"halted":true,"thread_id":5850},{"current":false,"CPU":2,"pc":1005735,"halted":true,"thread_id":5851}],"id":"libvirt-9"} That is, your patch might be right, but I have not yet been convinced that it is right; and while things may currently be broken on ppc, it is not a recent breakage, so being conservative and either proving the fix is right (via testsuite addition) or deferring the fix until post 1.0.3 seems like safer alternatives. Or maybe someone else will chime in and agree to take it now, without waiting for my desired burden of proof. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 02/27/2013 05:13 AM, Li Zhang wrote:
Hi Eric,
This should belong to bug-fix, could it be pushed to 1.0.3? Without any test case under 'make check' that exposes the failure, I'm a bit worried that this might cause regressions on other setups. I'm not seven sure I even understand the scope of the problem. Is it something specific to running a qemu-system-ppc64 emulator (but can be reproduced on any host architecture), or is it specific to running on a powerpc host (but happens for any version of qemu-* target architecture), or is it something else altogether? So sorry to see the failure. I tried to 'make check' with my patch on ppc64, all of these cases pass. For x86, I tried it with on x86 server, and pulled the latest version of
On 2013年02月28日 07:13, Eric Blake wrote: libvirt. Even without my patch, it also fails during execute ./virshtest and the tests with cpuset. +++ out 2013-02-28 13:02:14.290794080 +0800 @@ -1,2 +1,3 @@ error: vcpupin: Invalid or missing vCPU number. --- exp 2013-02-28 13:02:14.321794080 +0800 +++ out 2013-02-28 13:02:14.320794080 +0800 @@ -1,2 +1,3 @@ error: vcpupin: Invalid vCPU number. FAIL: vcpupin I think this problem is specific ppc64. I did some experiment on X86. X86 machine: 4 CPUs (0~3) I disable CPU1 and CPU2 by writing 0 to /sys/.../cpuX/online. Only 0 and 3 are online. Although only 2 cpus are online, but after executing "query-cpus", it can get all the information of them. [{"return": [{"current": true, "CPU": 0, "pc": 4294967280, "halted": false, "thread_id": 31831}, {"current": false, "CPU": 1, "pc": 4294967280, "halted": false, "thread_id": 31832}, {"current": false, "CPU": 2, "pc": 4294967280, "halted": false, "thread_id": 31833}, {"current": false, "CPU": 3, "pc": 4294967280, "halted": false, "thread_id": 31834}], "id": "libvirt-3"}] Qemu can expose all of these CPUs offline for X86. But for ppc64, it can't expose these offline CPUs. The following is the return value from ppc64. [{"current": true, "CPU": 0, "nip": 256, "halted": false, "thread_id": 25964}, {"current": false, "CPU": 4, "nip": 256, "halted": true, "thread_id": 25965}, {"current": false, "CPU": 8, "nip": 256, "halted": true, "thread_id": 25966}, {"current": false, "CPU": 12, "nip": 256, "halted": true, "thread_id": 25967}], "id": "libvirt-3"}
We have test framework in place to allow replaying of a QMP JSON response and seeing how qemu_monitor_json.c deals with it; what I'd really like to see is a side-by-side comparison of what the QMP 'query-cpus' command returns for a guest with multiple vcpus on a setup where you are seeing the problem, when compared to a setup that does not have the issue. You can get at this with virsh qemu-monitor-command $dom '{"execute":"query-cpus"}', if that helps.
Thanks a lot. Tried with ppc64, it is different from x86 even with some CPUs disabled.
To help you out, here's what I got for a guest using 3 vcpus on my x86_64 host machine and using the qemu-kvm binary:
# virsh qemu-monitor-command guest '{"execute":"query-cpus"}' {"return":[{"current":true,"CPU":0,"pc":1025829,"halted":false,"thread_id":5849},{"current":false,"CPU":1,"pc":1005735,"halted":true,"thread_id":5850},{"current":false,"CPU":2,"pc":1005735,"halted":true,"thread_id":5851}],"id":"libvirt-9"}
That is, your patch might be right, but I have not yet been convinced that it is right; and while things may currently be broken on ppc, it is not a recent breakage, so being conservative and either proving the fix is right (via testsuite addition) or deferring the fix until post 1.0.3 seems like safer alternatives. Or maybe someone else will chime in and agree to take it now, without waiting for my desired burden of proof.
OK, I will tried to see the testsuite problems for x86 even without my patch. I couldn't think of what kind of problems will this patch cause now. Really thank you, Eric for your review and suggestion. I don't know whether others have suggestions about my patch. Any suggestion is appreciated. Thanks. --Li

On 2013年02月28日 14:27, Li Zhang wrote:
On 02/27/2013 05:13 AM, Li Zhang wrote:
Hi Eric,
This should belong to bug-fix, could it be pushed to 1.0.3? Without any test case under 'make check' that exposes the failure, I'm a bit worried that this might cause regressions on other setups. I'm not seven sure I even understand the scope of the problem. Is it something specific to running a qemu-system-ppc64 emulator (but can be reproduced on any host architecture), or is it specific to running on a powerpc host (but happens for any version of qemu-* target architecture), or is it something else altogether? So sorry to see the failure. I tried to 'make check' with my patch on ppc64, all of these cases pass. For x86, I tried it with on x86 server, and pulled the latest version of libvirt. Even without my patch, it also fails during execute ./virshtest and
On 2013年02月28日 07:13, Eric Blake wrote: the tests with cpuset.
+++ out 2013-02-28 13:02:14.290794080 +0800 @@ -1,2 +1,3 @@ error: vcpupin: Invalid or missing vCPU number.
--- exp 2013-02-28 13:02:14.321794080 +0800 +++ out 2013-02-28 13:02:14.320794080 +0800 @@ -1,2 +1,3 @@ error: vcpupin: Invalid vCPU number. FAIL: vcpupin
The above error is caused by the environment of my X86. I haven't used it for a long time. Now, I checked the latest version with my patch, there is no error. All test cases pass. Would you help double check it? Thanks. :)
I think this problem is specific ppc64.
I did some experiment on X86. X86 machine: 4 CPUs (0~3)
I disable CPU1 and CPU2 by writing 0 to /sys/.../cpuX/online. Only 0 and 3 are online.
Although only 2 cpus are online, but after executing "query-cpus", it can get all the information of them. [{"return": [{"current": true, "CPU": 0, "pc": 4294967280, "halted": false, "thread_id": 31831}, {"current": false, "CPU": 1, "pc": 4294967280, "halted": false, "thread_id": 31832}, {"current": false, "CPU": 2, "pc": 4294967280, "halted": false, "thread_id": 31833}, {"current": false, "CPU": 3, "pc": 4294967280, "halted": false, "thread_id": 31834}], "id": "libvirt-3"}]
Qemu can expose all of these CPUs offline for X86. But for ppc64, it can't expose these offline CPUs. The following is the return value from ppc64. [{"current": true, "CPU": 0, "nip": 256, "halted": false, "thread_id": 25964}, {"current": false, "CPU": 4, "nip": 256, "halted": true, "thread_id": 25965}, {"current": false, "CPU": 8, "nip": 256, "halted": true, "thread_id": 25966}, {"current": false, "CPU": 12, "nip": 256, "halted": true, "thread_id": 25967}], "id": "libvirt-3"}
We have test framework in place to allow replaying of a QMP JSON response and seeing how qemu_monitor_json.c deals with it; what I'd really like to see is a side-by-side comparison of what the QMP 'query-cpus' command returns for a guest with multiple vcpus on a setup where you are seeing the problem, when compared to a setup that does not have the issue. You can get at this with virsh qemu-monitor-command $dom '{"execute":"query-cpus"}', if that helps.
Thanks a lot. Tried with ppc64, it is different from x86 even with some CPUs disabled.
To help you out, here's what I got for a guest using 3 vcpus on my x86_64 host machine and using the qemu-kvm binary:
# virsh qemu-monitor-command guest '{"execute":"query-cpus"}' {"return":[{"current":true,"CPU":0,"pc":1025829,"halted":false,"thread_id":5849},{"current":false,"CPU":1,"pc":1005735,"halted":true,"thread_id":5850},{"current":false,"CPU":2,"pc":1005735,"halted":true,"thread_id":5851}],"id":"libvirt-9"}
That is, your patch might be right, but I have not yet been convinced that it is right; and while things may currently be broken on ppc, it is not a recent breakage, so being conservative and either proving the fix is right (via testsuite addition) or deferring the fix until post 1.0.3 seems like safer alternatives. Or maybe someone else will chime in and agree to take it now, without waiting for my desired burden of proof.
OK, I will tried to see the testsuite problems for x86 even without my patch. I couldn't think of what kind of problems will this patch cause now.
Really thank you, Eric for your review and suggestion. I don't know whether others have suggestions about my patch.
Any suggestion is appreciated.
Thanks. --Li
participants (2)
-
Eric Blake
-
Li Zhang