[libvirt] [PATCH 0/2] Fix compatibility with MTTCG

QEMU is using MTTCG by default on an increasingly large set of host/guest combinations. This allows us to use the normal vCPU pinning support we already have for KVM. We just need to stop throwing away the PID info, and stop artificially blocking pinning APIs. Daniel P. Berrangé (2): qemu: fix recording of vCPU pids for MTTCG Revert "qemu: Forbid pinning vCPUs for TCG domain" src/qemu/qemu_domain.c | 81 ++++++++++++++++++++++++++---------------- src/qemu/qemu_driver.c | 7 ---- 2 files changed, 51 insertions(+), 37 deletions(-) -- 2.17.2

MTTCG is the new multi-threaded impl of TCG which follows KVM in having one host OS thread per vCPU. Historically we have discarded all PIDs reported for TCG guests, but we must now selectively honour this data. We don't have anything in the domain XML that indicates whether a guest is using TCG or MTTCG. While QEMU does have an option (-accel tcg,thread=single|multi), it is not desirable to expose this in libvirt. QEMU will automatically use MTTCG when the host/guest architecture pairing is known to be safe. Only developers of QEMU TCG have a strong reason to override this logic. Thus we use two sanity checks to decide if the vCPU PID information is usable. First we see if the PID duplicates the main emulator PID, and second we see if the PID duplicates any other vCPUs. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> --- src/qemu/qemu_domain.c | 81 ++++++++++++++++++++++++++---------------- 1 file changed, 51 insertions(+), 30 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index f00f1b3fdb..c7a0c03e3f 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -10326,9 +10326,10 @@ qemuDomainRefreshVcpuInfo(virQEMUDriverPtr driver, qemuDomainVcpuPrivatePtr vcpupriv; qemuMonitorCPUInfoPtr info = NULL; size_t maxvcpus = virDomainDefGetVcpusMax(vm->def); - size_t i; + size_t i, j; bool hotplug; bool fast; + bool validTIDs = true; int rc; int ret = -1; @@ -10336,6 +10337,8 @@ qemuDomainRefreshVcpuInfo(virQEMUDriverPtr driver, fast = virQEMUCapsGet(QEMU_DOMAIN_PRIVATE(vm)->qemuCaps, QEMU_CAPS_QUERY_CPUS_FAST); + VIR_DEBUG("Maxvcpus %zu hotplug %d fast query %d", maxvcpus, hotplug, fast); + if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) < 0) return -1; @@ -10348,39 +10351,57 @@ qemuDomainRefreshVcpuInfo(virQEMUDriverPtr driver, if (rc < 0) goto cleanup; + /* + * The query-cpus[-fast] commands return information + * about the vCPUs, including the OS level PID that + * is executing the vCPU. + * + * For KVM there is always a 1-1 mapping between + * vCPUs and host OS PIDs. + * + * For TCG things are a little more complicated. + * + * - In some cases the vCPUs will all have the same + * PID as the main emulator thread. + * - In some cases the first vCPU will have a distinct + * PID, but other vCPUs will share the emulator thread + * + * For MTTCG, things work the same as KVM, with each + * vCPU getting its own PID. + * + * We use the Host OS PIDs for doing vCPU pinning + * and reporting. The TCG data reporting will result + * in bad behaviour such as pinning the wrong PID. + * We must thus detect and discard bogus PID info + * from TCG, while still honouring the modern MTTCG + * impl which we can support. + */ + for (i = 0; i < maxvcpus && validTIDs; i++) { + if (info[i].tid == vm->pid) { + VIR_DEBUG("vCPU[%zu] PID %llu duplicates process", + i, (unsigned long long)info[i].tid); + validTIDs = false; + } + + for (j = 0; j < i; j++) { + if (info[i].tid == info[j].tid) { + VIR_DEBUG("vCPU[%zu] PID %llu duplicates vCPU[%zu]", + i, (unsigned long long)info[i].tid, j); + validTIDs = false; + } + } + + if (validTIDs) + VIR_DEBUG("vCPU[%zu] PID %llu is valid", + i, (unsigned long long)info[i].tid); + } + + VIR_DEBUG("Extracting vCPU information validTIDs=%d", validTIDs); for (i = 0; i < maxvcpus; i++) { vcpu = virDomainDefGetVcpu(vm->def, i); vcpupriv = QEMU_DOMAIN_VCPU_PRIVATE(vcpu); - /* - * Current QEMU *can* report info about host threads mapped - * to vCPUs, but it is not in a manner we can correctly - * deal with. The TCG CPU emulation does have a separate vCPU - * thread, but it runs every vCPU in that same thread. So it - * is impossible to setup different affinity per thread. - * - * What's more the 'query-cpus[-fast]' command returns bizarre - * data for the threads. It gives the TCG thread for the - * vCPU 0, but for vCPUs 1-> N, it actually replies with - * the main process thread ID. - * - * The result is that when we try to set affinity for - * vCPU 1, it will actually change the affinity of the - * emulator thread :-( When you try to set affinity for - * vCPUs 2, 3.... it will fail if the affinity was - * different from vCPU 1. - * - * We *could* allow vcpu pinning with TCG, if we made the - * restriction that all vCPUs had the same mask. This would - * at least let us separate emulator from vCPUs threads, as - * we do for KVM. It would need some changes to our cgroups - * CPU layout though, and error reporting for the config - * restrictions. - * - * Just disable CPU pinning with TCG until someone wants - * to try to do this hard work. - */ - if (vm->def->virtType != VIR_DOMAIN_VIRT_QEMU) + if (validTIDs) vcpupriv->tid = info[i].tid; vcpupriv->socket_id = info[i].socket_id; -- 2.17.2

On 10/17/18 10:15 AM, Daniel P. Berrangé wrote:
MTTCG is the new multi-threaded impl of TCG which follows KVM in having one host OS thread per vCPU. Historically we have discarded all PIDs reported for TCG guests, but we must now selectively honour this data.
We don't have anything in the domain XML that indicates whether a guest is using TCG or MTTCG. While QEMU does have an option (-accel tcg,thread=single|multi), it is not desirable to expose this in libvirt. QEMU will automatically use MTTCG when the host/guest architecture pairing is known to be safe. Only developers of QEMU TCG have a strong reason to override this logic.
Thus we use two sanity checks to decide if the vCPU PID information is usable. First we see if the PID duplicates the main emulator PID, and second we see if the PID duplicates any other vCPUs.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> --- src/qemu/qemu_domain.c | 81 ++++++++++++++++++++++++++---------------- 1 file changed, 51 insertions(+), 30 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index f00f1b3fdb..c7a0c03e3f 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -10326,9 +10326,10 @@ qemuDomainRefreshVcpuInfo(virQEMUDriverPtr driver, qemuDomainVcpuPrivatePtr vcpupriv; qemuMonitorCPUInfoPtr info = NULL; size_t maxvcpus = virDomainDefGetVcpusMax(vm->def); - size_t i; + size_t i, j; bool hotplug; bool fast; + bool validTIDs = true; int rc; int ret = -1;
@@ -10336,6 +10337,8 @@ qemuDomainRefreshVcpuInfo(virQEMUDriverPtr driver, fast = virQEMUCapsGet(QEMU_DOMAIN_PRIVATE(vm)->qemuCaps, QEMU_CAPS_QUERY_CPUS_FAST);
+ VIR_DEBUG("Maxvcpus %zu hotplug %d fast query %d", maxvcpus, hotplug, fast); + if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) < 0) return -1;
@@ -10348,39 +10351,57 @@ qemuDomainRefreshVcpuInfo(virQEMUDriverPtr driver, if (rc < 0) goto cleanup;
+ /* + * The query-cpus[-fast] commands return information + * about the vCPUs, including the OS level PID that + * is executing the vCPU. + * + * For KVM there is always a 1-1 mapping between + * vCPUs and host OS PIDs. + * + * For TCG things are a little more complicated. + * + * - In some cases the vCPUs will all have the same + * PID as the main emulator thread. + * - In some cases the first vCPU will have a distinct + * PID, but other vCPUs will share the emulator thread + * + * For MTTCG, things work the same as KVM, with each + * vCPU getting its own PID. + * + * We use the Host OS PIDs for doing vCPU pinning + * and reporting. The TCG data reporting will result + * in bad behaviour such as pinning the wrong PID. + * We must thus detect and discard bogus PID info + * from TCG, while still honouring the modern MTTCG + * impl which we can support. + */ + for (i = 0; i < maxvcpus && validTIDs; i++) { + if (info[i].tid == vm->pid) { + VIR_DEBUG("vCPU[%zu] PID %llu duplicates process", + i, (unsigned long long)info[i].tid); + validTIDs = false; + } +
If !validTIDs does the next loop matter? IOW: Should the above section add a "continue;" since the loop exit would force the exit? Beyond that the logic and comments look reasonable. I assume since domain XML doesn't care whether MTTCG or TCG is used and things are handled under the covers by QEMU that means there's no migration or save/restore issues. Of course you have a much deeper understanding of the QEMU code than I do! The one other question I'd have is should validTIDs setting be done just once and saved perhaps in the domain private block? There is more than 1 caller (and *Launch can call twice). It's not like it's going to change, right? So doing the same loop from a hotplug path won't matter nor would subsequent reconnects or attaches. So perhaps the validTIDs should be a tristate that only needs to be checked when the value is UNKNOWN. It's not like the loop is that expensive since it's only numeric comparisons, so it doesn't matter. I suppose I can be easily convinced taking this route would be fine, but figured I'd ask. John
+ for (j = 0; j < i; j++) { + if (info[i].tid == info[j].tid) { + VIR_DEBUG("vCPU[%zu] PID %llu duplicates vCPU[%zu]", + i, (unsigned long long)info[i].tid, j); + validTIDs = false; + } + } + + if (validTIDs) + VIR_DEBUG("vCPU[%zu] PID %llu is valid", + i, (unsigned long long)info[i].tid); + } + + VIR_DEBUG("Extracting vCPU information validTIDs=%d", validTIDs); for (i = 0; i < maxvcpus; i++) { vcpu = virDomainDefGetVcpu(vm->def, i); vcpupriv = QEMU_DOMAIN_VCPU_PRIVATE(vcpu);
- /* - * Current QEMU *can* report info about host threads mapped - * to vCPUs, but it is not in a manner we can correctly - * deal with. The TCG CPU emulation does have a separate vCPU - * thread, but it runs every vCPU in that same thread. So it - * is impossible to setup different affinity per thread. - * - * What's more the 'query-cpus[-fast]' command returns bizarre - * data for the threads. It gives the TCG thread for the - * vCPU 0, but for vCPUs 1-> N, it actually replies with - * the main process thread ID. - * - * The result is that when we try to set affinity for - * vCPU 1, it will actually change the affinity of the - * emulator thread :-( When you try to set affinity for - * vCPUs 2, 3.... it will fail if the affinity was - * different from vCPU 1. - * - * We *could* allow vcpu pinning with TCG, if we made the - * restriction that all vCPUs had the same mask. This would - * at least let us separate emulator from vCPUs threads, as - * we do for KVM. It would need some changes to our cgroups - * CPU layout though, and error reporting for the config - * restrictions. - * - * Just disable CPU pinning with TCG until someone wants - * to try to do this hard work. - */ - if (vm->def->virtType != VIR_DOMAIN_VIRT_QEMU) + if (validTIDs) vcpupriv->tid = info[i].tid;
vcpupriv->socket_id = info[i].socket_id;

On 10/29/2018 10:55 PM, John Ferlan wrote:
On 10/17/18 10:15 AM, Daniel P. Berrangé wrote:
MTTCG is the new multi-threaded impl of TCG which follows KVM in having one host OS thread per vCPU. Historically we have discarded all PIDs reported for TCG guests, but we must now selectively honour this data.
We don't have anything in the domain XML that indicates whether a guest is using TCG or MTTCG. While QEMU does have an option (-accel tcg,thread=single|multi), it is not desirable to expose this in libvirt. QEMU will automatically use MTTCG when the host/guest architecture pairing is known to be safe. Only developers of QEMU TCG have a strong reason to override this logic.
Thus we use two sanity checks to decide if the vCPU PID information is usable. First we see if the PID duplicates the main emulator PID, and second we see if the PID duplicates any other vCPUs.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> --- src/qemu/qemu_domain.c | 81 ++++++++++++++++++++++++++---------------- 1 file changed, 51 insertions(+), 30 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index f00f1b3fdb..c7a0c03e3f 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -10326,9 +10326,10 @@ qemuDomainRefreshVcpuInfo(virQEMUDriverPtr driver, qemuDomainVcpuPrivatePtr vcpupriv; qemuMonitorCPUInfoPtr info = NULL; size_t maxvcpus = virDomainDefGetVcpusMax(vm->def); - size_t i; + size_t i, j; bool hotplug; bool fast; + bool validTIDs = true; int rc; int ret = -1;
@@ -10336,6 +10337,8 @@ qemuDomainRefreshVcpuInfo(virQEMUDriverPtr driver, fast = virQEMUCapsGet(QEMU_DOMAIN_PRIVATE(vm)->qemuCaps, QEMU_CAPS_QUERY_CPUS_FAST);
+ VIR_DEBUG("Maxvcpus %zu hotplug %d fast query %d", maxvcpus, hotplug, fast); + if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) < 0) return -1;
@@ -10348,39 +10351,57 @@ qemuDomainRefreshVcpuInfo(virQEMUDriverPtr driver, if (rc < 0) goto cleanup;
+ /* + * The query-cpus[-fast] commands return information + * about the vCPUs, including the OS level PID that + * is executing the vCPU. + * + * For KVM there is always a 1-1 mapping between + * vCPUs and host OS PIDs. + * + * For TCG things are a little more complicated. + * + * - In some cases the vCPUs will all have the same + * PID as the main emulator thread. + * - In some cases the first vCPU will have a distinct + * PID, but other vCPUs will share the emulator thread + * + * For MTTCG, things work the same as KVM, with each + * vCPU getting its own PID. + * + * We use the Host OS PIDs for doing vCPU pinning + * and reporting. The TCG data reporting will result + * in bad behaviour such as pinning the wrong PID. + * We must thus detect and discard bogus PID info + * from TCG, while still honouring the modern MTTCG + * impl which we can support. + */ + for (i = 0; i < maxvcpus && validTIDs; i++) { + if (info[i].tid == vm->pid) { + VIR_DEBUG("vCPU[%zu] PID %llu duplicates process", + i, (unsigned long long)info[i].tid); + validTIDs = false; + } +
If !validTIDs does the next loop matter? IOW:
Should the above section add a "continue;" since the loop exit would force the exit?
It should break rather than continue. Even a single mismatch is enough for us to not believe qemu reply. At any rate, that is just an optimization and since nobody uses TCG with dozen or more vCPUs the O(n^2) algorithm is not that bad.
Beyond that the logic and comments look reasonable. I assume since domain XML doesn't care whether MTTCG or TCG is used and things are handled under the covers by QEMU that means there's no migration or save/restore issues. Of course you have a much deeper understanding of the QEMU code than I do!
The one other question I'd have is should validTIDs setting be done just once and saved perhaps in the domain private block? There is more than 1 caller (and *Launch can call twice). It's not like it's going to change, right? So doing the same loop from a hotplug path won't matter nor would subsequent reconnects or attaches. So perhaps the validTIDs should be a tristate that only needs to be checked when the value is UNKNOWN. It's not like the loop is that expensive since it's only numeric comparisons, so it doesn't matter. I suppose I can be easily convinced taking this route would be fine, but figured I'd ask.
Maybe we can save this, but I think it can be done in a separate patch. ACK Michal

On Mon, Oct 29, 2018 at 05:55:36PM -0400, John Ferlan wrote:
On 10/17/18 10:15 AM, Daniel P. Berrangé wrote:
MTTCG is the new multi-threaded impl of TCG which follows KVM in having one host OS thread per vCPU. Historically we have discarded all PIDs reported for TCG guests, but we must now selectively honour this data.
We don't have anything in the domain XML that indicates whether a guest is using TCG or MTTCG. While QEMU does have an option (-accel tcg,thread=single|multi), it is not desirable to expose this in libvirt. QEMU will automatically use MTTCG when the host/guest architecture pairing is known to be safe. Only developers of QEMU TCG have a strong reason to override this logic.
Thus we use two sanity checks to decide if the vCPU PID information is usable. First we see if the PID duplicates the main emulator PID, and second we see if the PID duplicates any other vCPUs.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> --- src/qemu/qemu_domain.c | 81 ++++++++++++++++++++++++++---------------- 1 file changed, 51 insertions(+), 30 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index f00f1b3fdb..c7a0c03e3f 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -10326,9 +10326,10 @@ qemuDomainRefreshVcpuInfo(virQEMUDriverPtr driver, qemuDomainVcpuPrivatePtr vcpupriv; qemuMonitorCPUInfoPtr info = NULL; size_t maxvcpus = virDomainDefGetVcpusMax(vm->def); - size_t i; + size_t i, j; bool hotplug; bool fast; + bool validTIDs = true; int rc; int ret = -1;
@@ -10336,6 +10337,8 @@ qemuDomainRefreshVcpuInfo(virQEMUDriverPtr driver, fast = virQEMUCapsGet(QEMU_DOMAIN_PRIVATE(vm)->qemuCaps, QEMU_CAPS_QUERY_CPUS_FAST);
+ VIR_DEBUG("Maxvcpus %zu hotplug %d fast query %d", maxvcpus, hotplug, fast); + if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) < 0) return -1;
@@ -10348,39 +10351,57 @@ qemuDomainRefreshVcpuInfo(virQEMUDriverPtr driver, if (rc < 0) goto cleanup;
+ /* + * The query-cpus[-fast] commands return information + * about the vCPUs, including the OS level PID that + * is executing the vCPU. + * + * For KVM there is always a 1-1 mapping between + * vCPUs and host OS PIDs. + * + * For TCG things are a little more complicated. + * + * - In some cases the vCPUs will all have the same + * PID as the main emulator thread. + * - In some cases the first vCPU will have a distinct + * PID, but other vCPUs will share the emulator thread + * + * For MTTCG, things work the same as KVM, with each + * vCPU getting its own PID. + * + * We use the Host OS PIDs for doing vCPU pinning + * and reporting. The TCG data reporting will result + * in bad behaviour such as pinning the wrong PID. + * We must thus detect and discard bogus PID info + * from TCG, while still honouring the modern MTTCG + * impl which we can support. + */ + for (i = 0; i < maxvcpus && validTIDs; i++) { + if (info[i].tid == vm->pid) { + VIR_DEBUG("vCPU[%zu] PID %llu duplicates process", + i, (unsigned long long)info[i].tid); + validTIDs = false; + } +
If !validTIDs does the next loop matter? IOW:
Should the above section add a "continue;" since the loop exit would force the exit?
Mostly I wanted to ensure that we logged all the vCPU pids and it won't impose an unreasonable performance impact by doing so.
Beyond that the logic and comments look reasonable. I assume since domain XML doesn't care whether MTTCG or TCG is used and things are handled under the covers by QEMU that means there's no migration or save/restore issues. Of course you have a much deeper understanding of the QEMU code than I do!
The one other question I'd have is should validTIDs setting be done just once and saved perhaps in the domain private block? There is more than 1 caller (and *Launch can call twice). It's not like it's going to change, right? So doing the same loop from a hotplug path won't matter nor would subsequent reconnects or attaches. So perhaps the validTIDs should be a tristate that only needs to be checked when the value is UNKNOWN. It's not like the loop is that expensive since it's only numeric comparisons, so it doesn't matter. I suppose I can be easily convinced taking this route would be fine, but figured I'd ask.
AFAIK, this can only be called once per running VM for each libvirtd run, so I dont see need to do any more advanced caching. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 10/30/18 5:25 AM, Daniel P. Berrangé wrote:
On Mon, Oct 29, 2018 at 05:55:36PM -0400, John Ferlan wrote:
On 10/17/18 10:15 AM, Daniel P. Berrangé wrote:
MTTCG is the new multi-threaded impl of TCG which follows KVM in having one host OS thread per vCPU. Historically we have discarded all PIDs reported for TCG guests, but we must now selectively honour this data.
We don't have anything in the domain XML that indicates whether a guest is using TCG or MTTCG. While QEMU does have an option (-accel tcg,thread=single|multi), it is not desirable to expose this in libvirt. QEMU will automatically use MTTCG when the host/guest architecture pairing is known to be safe. Only developers of QEMU TCG have a strong reason to override this logic.
Thus we use two sanity checks to decide if the vCPU PID information is usable. First we see if the PID duplicates the main emulator PID, and second we see if the PID duplicates any other vCPUs.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> --- src/qemu/qemu_domain.c | 81 ++++++++++++++++++++++++++---------------- 1 file changed, 51 insertions(+), 30 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index f00f1b3fdb..c7a0c03e3f 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -10326,9 +10326,10 @@ qemuDomainRefreshVcpuInfo(virQEMUDriverPtr driver, qemuDomainVcpuPrivatePtr vcpupriv; qemuMonitorCPUInfoPtr info = NULL; size_t maxvcpus = virDomainDefGetVcpusMax(vm->def); - size_t i; + size_t i, j; bool hotplug; bool fast; + bool validTIDs = true; int rc; int ret = -1;
@@ -10336,6 +10337,8 @@ qemuDomainRefreshVcpuInfo(virQEMUDriverPtr driver, fast = virQEMUCapsGet(QEMU_DOMAIN_PRIVATE(vm)->qemuCaps, QEMU_CAPS_QUERY_CPUS_FAST);
+ VIR_DEBUG("Maxvcpus %zu hotplug %d fast query %d", maxvcpus, hotplug, fast); + if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) < 0) return -1;
@@ -10348,39 +10351,57 @@ qemuDomainRefreshVcpuInfo(virQEMUDriverPtr driver, if (rc < 0) goto cleanup;
+ /* + * The query-cpus[-fast] commands return information + * about the vCPUs, including the OS level PID that + * is executing the vCPU. + * + * For KVM there is always a 1-1 mapping between + * vCPUs and host OS PIDs. + * + * For TCG things are a little more complicated. + * + * - In some cases the vCPUs will all have the same + * PID as the main emulator thread. + * - In some cases the first vCPU will have a distinct + * PID, but other vCPUs will share the emulator thread + * + * For MTTCG, things work the same as KVM, with each + * vCPU getting its own PID. + * + * We use the Host OS PIDs for doing vCPU pinning + * and reporting. The TCG data reporting will result + * in bad behaviour such as pinning the wrong PID. + * We must thus detect and discard bogus PID info + * from TCG, while still honouring the modern MTTCG + * impl which we can support. + */ + for (i = 0; i < maxvcpus && validTIDs; i++) { + if (info[i].tid == vm->pid) { + VIR_DEBUG("vCPU[%zu] PID %llu duplicates process", + i, (unsigned long long)info[i].tid); + validTIDs = false; + } +
If !validTIDs does the next loop matter? IOW:
Should the above section add a "continue;" since the loop exit would force the exit?
Mostly I wanted to ensure that we logged all the vCPU pids and it won't impose an unreasonable performance impact by doing so.
Those are only VIR_DEBUG's though, so who really sees them? Perhaps using VIR_WARN would be different.
Beyond that the logic and comments look reasonable. I assume since domain XML doesn't care whether MTTCG or TCG is used and things are handled under the covers by QEMU that means there's no migration or save/restore issues. Of course you have a much deeper understanding of the QEMU code than I do!
The one other question I'd have is should validTIDs setting be done just once and saved perhaps in the domain private block? There is more than 1 caller (and *Launch can call twice). It's not like it's going to change, right? So doing the same loop from a hotplug path won't matter nor would subsequent reconnects or attaches. So perhaps the validTIDs should be a tristate that only needs to be checked when the value is UNKNOWN. It's not like the loop is that expensive since it's only numeric comparisons, so it doesn't matter. I suppose I can be easily convinced taking this route would be fine, but figured I'd ask.
AFAIK, this can only be called once per running VM for each libvirtd run, so I dont see need to do any more advanced caching.
I looked at/for callers of qemuDomainRefreshVcpuInfo - this one stuck out once the domain is started: .... qemuProcessLaunch() ... VIR_DEBUG("setting up hotpluggable cpus"); if (qemuDomainHasHotpluggableStartupVcpus(vm->def)) { if (qemuDomainRefreshVcpuInfo(driver, vm, asyncJob, false) < 0) goto cleanup; if (qemuProcessValidateHotpluggableVcpus(vm->def) < 0) goto cleanup; if (qemuProcessSetupHotpluggableVcpus(driver, vm, asyncJob) < 0) goto cleanup; } VIR_DEBUG("Refreshing VCPU info"); if (qemuDomainRefreshVcpuInfo(driver, vm, asyncJob, false) < 0) goto cleanup; ... but this is also called during hotplug add/remove and of course attach/reconnect (which I'd expect). John

On Tue, Oct 30, 2018 at 07:30:52AM -0400, John Ferlan wrote:
On 10/30/18 5:25 AM, Daniel P. Berrangé wrote:
On Mon, Oct 29, 2018 at 05:55:36PM -0400, John Ferlan wrote:
On 10/17/18 10:15 AM, Daniel P. Berrangé wrote:
MTTCG is the new multi-threaded impl of TCG which follows KVM in having one host OS thread per vCPU. Historically we have discarded all PIDs reported for TCG guests, but we must now selectively honour this data.
We don't have anything in the domain XML that indicates whether a guest is using TCG or MTTCG. While QEMU does have an option (-accel tcg,thread=single|multi), it is not desirable to expose this in libvirt. QEMU will automatically use MTTCG when the host/guest architecture pairing is known to be safe. Only developers of QEMU TCG have a strong reason to override this logic.
Thus we use two sanity checks to decide if the vCPU PID information is usable. First we see if the PID duplicates the main emulator PID, and second we see if the PID duplicates any other vCPUs.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> --- src/qemu/qemu_domain.c | 81 ++++++++++++++++++++++++++---------------- 1 file changed, 51 insertions(+), 30 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index f00f1b3fdb..c7a0c03e3f 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -10326,9 +10326,10 @@ qemuDomainRefreshVcpuInfo(virQEMUDriverPtr driver, qemuDomainVcpuPrivatePtr vcpupriv; qemuMonitorCPUInfoPtr info = NULL; size_t maxvcpus = virDomainDefGetVcpusMax(vm->def); - size_t i; + size_t i, j; bool hotplug; bool fast; + bool validTIDs = true; int rc; int ret = -1;
@@ -10336,6 +10337,8 @@ qemuDomainRefreshVcpuInfo(virQEMUDriverPtr driver, fast = virQEMUCapsGet(QEMU_DOMAIN_PRIVATE(vm)->qemuCaps, QEMU_CAPS_QUERY_CPUS_FAST);
+ VIR_DEBUG("Maxvcpus %zu hotplug %d fast query %d", maxvcpus, hotplug, fast); + if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) < 0) return -1;
@@ -10348,39 +10351,57 @@ qemuDomainRefreshVcpuInfo(virQEMUDriverPtr driver, if (rc < 0) goto cleanup;
+ /* + * The query-cpus[-fast] commands return information + * about the vCPUs, including the OS level PID that + * is executing the vCPU. + * + * For KVM there is always a 1-1 mapping between + * vCPUs and host OS PIDs. + * + * For TCG things are a little more complicated. + * + * - In some cases the vCPUs will all have the same + * PID as the main emulator thread. + * - In some cases the first vCPU will have a distinct + * PID, but other vCPUs will share the emulator thread + * + * For MTTCG, things work the same as KVM, with each + * vCPU getting its own PID. + * + * We use the Host OS PIDs for doing vCPU pinning + * and reporting. The TCG data reporting will result + * in bad behaviour such as pinning the wrong PID. + * We must thus detect and discard bogus PID info + * from TCG, while still honouring the modern MTTCG + * impl which we can support. + */ + for (i = 0; i < maxvcpus && validTIDs; i++) { + if (info[i].tid == vm->pid) { + VIR_DEBUG("vCPU[%zu] PID %llu duplicates process", + i, (unsigned long long)info[i].tid); + validTIDs = false; + } +
If !validTIDs does the next loop matter? IOW:
Should the above section add a "continue;" since the loop exit would force the exit?
Mostly I wanted to ensure that we logged all the vCPU pids and it won't impose an unreasonable performance impact by doing so.
Those are only VIR_DEBUG's though, so who really sees them? Perhaps using VIR_WARN would be different.
We has developers / maintainers see them. It is the kind of information I like to have captured in logs to make it easier to diagnose bug reports from users. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

This reverts commit 8b035c84d8a7362a87a95e6114b8e7f959685ed9. The MTTCG impl in QEMU does allow pinning vCPUs. When the guest is running we already check if pinning is possible in the qemuDomainPinVcpuLive method, so this check was adding no benefit. When the guest is not running, we cannot know whether the subsequent launch will use MTTCG or TCG, so we must allow the pinning request. If the guest does use TCG on the next launch it will fail, but this is no worse than if the user had done a virDomainDefineXML with an XML doc specifying vCPU pinning. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> --- src/qemu/qemu_driver.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index a52e2495d5..981db01059 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -5111,13 +5111,6 @@ qemuDomainPinVcpuFlags(virDomainPtr dom, if (virDomainObjGetDefs(vm, flags, &def, &persistentDef) < 0) goto endjob; - if ((def && def->virtType == VIR_DOMAIN_VIRT_QEMU) || - (persistentDef && persistentDef->virtType == VIR_DOMAIN_VIRT_QEMU)) { - virReportError(VIR_ERR_OPERATION_FAILED, "%s", - _("Virt type 'qemu' does not support vCPU pinning")); - goto endjob; - } - if (persistentDef && !(vcpuinfo = virDomainDefGetVcpu(persistentDef, vcpu))) { virReportError(VIR_ERR_INVALID_ARG, -- 2.17.2

On 10/17/18 10:15 AM, Daniel P. Berrangé wrote:
This reverts commit 8b035c84d8a7362a87a95e6114b8e7f959685ed9.
The MTTCG impl in QEMU does allow pinning vCPUs.
When the guest is running we already check if pinning is possible in the qemuDomainPinVcpuLive method, so this check was adding no benefit.
When the guest is not running, we cannot know whether the subsequent launch will use MTTCG or TCG, so we must allow the pinning request. If the guest does use TCG on the next launch it will fail, but this is no worse than if the user had done a virDomainDefineXML with an XML doc specifying vCPU pinning.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> --- src/qemu/qemu_driver.c | 7 ------- 1 file changed, 7 deletions(-)
While looking at another "== VIR_DOMAIN_VIRT_QEMU" check in virQEMUCapsAddCPUDefinitions, I wonder why/when/if it really mattered if qemuCaps->tcgCPUModels in this code... No matter, I agree let's remove it. Reviewed-by: John Ferlan <jferlan@redhat.com> John
participants (3)
-
Daniel P. Berrangé
-
John Ferlan
-
Michal Privoznik