November 2019 - Devel - Libvirt List Archives

[libvirt] [PATCH 0/2] Fix a VM startup failure: QOS must be defined for network 'default'

by Erik Skultety

Caused by patching a coverity false positive in commit f4db846c32c0a1e99a0f62b340273e48f8a98ed3. Erik Skultety (2): Revert "network: Check for QOS before blindly using it" util: virNetDevBandwidthPlug: Drop ATTRIBUTE_UNUSED(4) src/network/bridge_driver.c | 14 -------------- src/util/virnetdevbandwidth.h | 3 +-- 2 files changed, 1 insertion(+), 16 deletions(-) -- 2.23.0

5 years, 7 months

6
13
0 / 0

[libvirt] [PATCH] qemu_capabilities: Use proper free function for caps->cpuModels

by Michal Privoznik

The cpuModels member of _virQEMUCapsAccel struct is not a virObject but regular struct with a free function defined: qemuMonitorCPUDefsFree(). Use that when clearing parent structure instead of virObjectUnref() to avoid a memleak: ==212322== 57,275 (48 direct, 57,227 indirect) bytes in 3 blocks are definitely lost in loss record 623 of 627 ==212322== at 0x4838B86: calloc (vg_replace_malloc.c:762) ==212322== by 0x554A158: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.6000.6) ==212322== by 0x17B14BF5: qemuMonitorCPUDefsNew (qemu_monitor.c:3587) ==212322== by 0x17B27BA7: qemuMonitorJSONGetCPUDefinitions (qemu_monitor_json.c:5616) ==212322== by 0x17B14B0B: qemuMonitorGetCPUDefinitions (qemu_monitor.c:3559) ==212322== by 0x17A6AFBB: virQEMUCapsFetchCPUDefinitions (qemu_capabilities.c:2571) ==212322== by 0x17A6B2CC: virQEMUCapsProbeQMPCPUDefinitions (qemu_capabilities.c:2629) ==212322== by 0x17A70C00: virQEMUCapsInitQMPMonitorTCG (qemu_capabilities.c:4769) ==212322== by 0x17A70DDF: virQEMUCapsInitQMPSingle (qemu_capabilities.c:4820) ==212322== by 0x17A70E99: virQEMUCapsInitQMP (qemu_capabilities.c:4848) ==212322== by 0x17A71044: virQEMUCapsNewForBinaryInternal (qemu_capabilities.c:4891) ==212322== by 0x17A7119C: virQEMUCapsNewData (qemu_capabilities.c:4923) Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com> --- src/qemu/qemu_capabilities.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index e5f19ddcaa..f65af5c228 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -1798,7 +1798,7 @@ virQEMUCapsAccelClear(virQEMUCapsAccelPtr caps) VIR_FREE(caps->machineTypes); virQEMUCapsHostCPUDataClear(&caps->hostCPU); - virObjectUnref(caps->cpuModels); + qemuMonitorCPUDefsFree(caps->cpuModels); } -- 2.23.0

5 years, 7 months

2
1
0 / 0

[libvirt] [PATCH v3 0/4] qemu: fix type of default video device

by Pavel Mores

This new version mostly integrates Cole's comments about the second version. Refactoring and behaviour change are now separate commits. Tests succeed for every individual patch in the series. Pavel Mores (4): qemu: default video device type selection algoritm moved into its own function qemu: prepare existing test for change of the default video device type qemu: the actual change of default video devide type selection algorithm qemu: added tests of the new default video type selection algorithm src/qemu/qemu_domain.c | 37 ++++++++++------ .../default-video-type-aarch64.xml | 16 +++++++ .../default-video-type-ppc64.xml | 16 +++++++ .../default-video-type-riscv64.xml | 16 +++++++ .../default-video-type-s390x.xml | 16 +++++++ .../default-video-type-x86_64-caps-test-0.xml | 17 ++++++++ .../default-video-type-x86_64-caps-test-1.xml | 17 ++++++++ tests/qemuxml2argvtest.c | 1 + ...ault-video-type-aarch64.aarch64-latest.xml | 42 +++++++++++++++++++ .../default-video-type-ppc64.ppc64-latest.xml | 31 ++++++++++++++ ...ault-video-type-riscv64.riscv64-latest.xml | 39 +++++++++++++++++ .../default-video-type-s390x.s390x-latest.xml | 32 ++++++++++++++ .../default-video-type-x86_64-caps-test-0.xml | 31 ++++++++++++++ .../default-video-type-x86_64-caps-test-1.xml | 31 ++++++++++++++ tests/qemuxml2xmltest.c | 10 ++++- 15 files changed, 339 insertions(+), 13 deletions(-) create mode 100644 tests/qemuxml2argvdata/default-video-type-aarch64.xml create mode 100644 tests/qemuxml2argvdata/default-video-type-ppc64.xml create mode 100644 tests/qemuxml2argvdata/default-video-type-riscv64.xml create mode 100644 tests/qemuxml2argvdata/default-video-type-s390x.xml create mode 100644 tests/qemuxml2argvdata/default-video-type-x86_64-caps-test-0.xml create mode 100644 tests/qemuxml2argvdata/default-video-type-x86_64-caps-test-1.xml create mode 100644 tests/qemuxml2xmloutdata/default-video-type-aarch64.aarch64-latest.xml create mode 100644 tests/qemuxml2xmloutdata/default-video-type-ppc64.ppc64-latest.xml create mode 100644 tests/qemuxml2xmloutdata/default-video-type-riscv64.riscv64-latest.xml create mode 100644 tests/qemuxml2xmloutdata/default-video-type-s390x.s390x-latest.xml create mode 100644 tests/qemuxml2xmloutdata/default-video-type-x86_64-caps-test-0.xml create mode 100644 tests/qemuxml2xmloutdata/default-video-type-x86_64-caps-test-1.xml -- 2.21.0

5 years, 7 months

2
8
0 / 0

[libvirt] [PATCH] Use virNetServerClientImmediateClose() rather than virNetServerClientClose()

by LanceLiu

--- src/remote/remote_daemon_stream.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/src/remote/remote_daemon_stream.c b/src/remote/remote_daemon_stream.c index de0dca3..d206d12 100644 --- a/src/remote/remote_daemon_stream.c +++ b/src/remote/remote_daemon_stream.c @@ -141,7 +141,7 @@ daemonStreamEvent(virStreamPtr st, int events, void *opaque) (events & VIR_STREAM_EVENT_WRITABLE)) { if (daemonStreamHandleWrite(client, stream) < 0) { daemonRemoveClientStream(client, stream); - virNetServerClientClose(client); + virNetServerClientImmediateClose(client); goto cleanup; } } @@ -151,7 +151,7 @@ daemonStreamEvent(virStreamPtr st, int events, void *opaque) events = events & ~(VIR_STREAM_EVENT_READABLE); if (daemonStreamHandleRead(client, stream) < 0) { daemonRemoveClientStream(client, stream); - virNetServerClientClose(client); + virNetServerClientImmediateClose(client); goto cleanup; } /* If we detected EOF during read processing, @@ -176,7 +176,7 @@ daemonStreamEvent(virStreamPtr st, int events, void *opaque) if (daemonStreamHandleFinish(client, stream, msg) < 0) { virNetMessageFree(msg); daemonRemoveClientStream(client, stream); - virNetServerClientClose(client); + virNetServerClientImmediateClose(client); goto cleanup; } break; @@ -186,7 +186,7 @@ daemonStreamEvent(virStreamPtr st, int events, void *opaque) if (daemonStreamHandleAbort(client, stream, msg) < 0) { virNetMessageFree(msg); daemonRemoveClientStream(client, stream); - virNetServerClientClose(client); + virNetServerClientImmediateClose(client); goto cleanup; } break; @@ -205,7 +205,7 @@ daemonStreamEvent(virStreamPtr st, int events, void *opaque) stream->recvEOF = true; if (!(msg = virNetMessageNew(false))) { daemonRemoveClientStream(client, stream); - virNetServerClientClose(client); + virNetServerClientImmediateClose(client); goto cleanup; } msg->cb = daemonStreamMessageFinished; @@ -219,7 +219,7 @@ daemonStreamEvent(virStreamPtr st, int events, void *opaque) "", 0) < 0) { virNetMessageFree(msg); daemonRemoveClientStream(client, stream); - virNetServerClientClose(client); + virNetServerClientImmediateClose(client); goto cleanup; } } @@ -262,7 +262,7 @@ daemonStreamEvent(virStreamPtr st, int events, void *opaque) } daemonRemoveClientStream(client, stream); if (ret < 0) - virNetServerClientClose(client); + virNetServerClientImmediateClose(client); goto cleanup; } -- 1.8.3.1

5 years, 7 months

3
2
0 / 0

[libvirt] Plans for next release

by Daniel Veillard

We are getting close to the end of the month, I suggest to enter freeze tomorrow Tuesday, the end of the week will be quiet due to US ThanksGiving, that should give plenty of time to test ! So RC2 on Thursday and if all goes well a release on Monday 2nd ! Hope this works for everybody, thanks, Daniel -- Daniel Veillard | Red Hat Developers Tools http://developer.redhat.com/ veillard(a)redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/

5 years, 7 months

1
0
0 / 0

[libvirt] [PATCH 0/2] qemu: cold-plug and cold-unplug of sound

by Jidong Xia

With this patch users can cold plug and unplug some sound devices. Jidong Xia (2): qemu: cold-plug of sound qemu: cold-unplug of sound src/conf/domain_conf.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++ src/conf/domain_conf.h | 2 ++ src/libvirt_private.syms | 2 ++ src/qemu/qemu_driver.c | 18 ++++++++++++-- 4 files changed, 81 insertions(+), 2 deletions(-) -- 1.8.3.1

5 years, 7 months

2
4
0 / 0

[libvirt] [PATCH] Fix bug libvirt daemon segfault when new force console vm session break down existed console session. When force console vm command arrived, libvirtd will break down existed console session, and in this procedure, it will call daemonStreamEvent() to release resources , so daemonStreamFilter() need to check if stream filter existed when get client object lock and client->privData's lock, if not existed, just return -1

by LanceLiu

--- src/remote/remote_daemon_stream.c | 10 +++++++++- src/rpc/virnetserverclient.c | 12 ++++++++++++ src/rpc/virnetserverclient.h | 2 ++ 3 files changed, 23 insertions(+), 1 deletion(-) diff --git a/src/remote/remote_daemon_stream.c b/src/remote/remote_daemon_stream.c index 82cadb6..de0dca3 100644 --- a/src/remote/remote_daemon_stream.c +++ b/src/remote/remote_daemon_stream.c @@ -292,10 +292,18 @@ daemonStreamFilter(virNetServerClientPtr client, { daemonClientStream *stream = opaque; int ret = 0; + daemonClientPrivatePtr priv = NULL; + int filter_id = stream->filterID; virObjectUnlock(client); + priv = virNetServerClientGetPrivateData(client); virMutexLock(&stream->priv->lock); virObjectLock(client); + if (!virNetServerClientCheckFilterExist(client, filter_id)) { + VIR_WARN("this daemon stream filter: %d have been deleted!", filter_id); + ret = -1; + goto cleanup; + } if (msg->header.type != VIR_NET_STREAM && msg->header.type != VIR_NET_STREAM_HOLE) @@ -317,7 +325,7 @@ daemonStreamFilter(virNetServerClientPtr client, ret = 1; cleanup: - virMutexUnlock(&stream->priv->lock); + virMutexUnlock(&priv->lock); return ret; } diff --git a/src/rpc/virnetserverclient.c b/src/rpc/virnetserverclient.c index 67b3bf9..f80f493 100644 --- a/src/rpc/virnetserverclient.c +++ b/src/rpc/virnetserverclient.c @@ -287,6 +287,18 @@ void virNetServerClientRemoveFilter(virNetServerClientPtr client, virObjectUnlock(client); } +int virNetServerClientCheckFilterExist(virNetServerClientPtr client, + int filterID) +{ + virNetServerClientFilterPtr tmp; + + tmp = client->filters; + while(tmp && tmp->id != filterID) { + tmp = tmp->next; + } + + return (tmp != NULL); +} /* Check the client's access. */ static int diff --git a/src/rpc/virnetserverclient.h b/src/rpc/virnetserverclient.h index 7a3061d..85fda39 100644 --- a/src/rpc/virnetserverclient.h +++ b/src/rpc/virnetserverclient.h @@ -93,6 +93,8 @@ int virNetServerClientAddFilter(virNetServerClientPtr client, void virNetServerClientRemoveFilter(virNetServerClientPtr client, int filterID); +int virNetServerClientCheckFilterExist(virNetServerClientPtr client, + int filterID); int virNetServerClientGetAuth(virNetServerClientPtr client); void virNetServerClientSetAuthLocked(virNetServerClientPtr client, int auth); -- 1.8.3.1

5 years, 7 months

1
0
0 / 0

[libvirt] [ruby PATCH] Fix default values for node_cpu_stats() and node_memory_stats()

by Stefano Garzarella

ruby_libvirt_value_to_int() returns 0 if the optional value is not defined, but in node_cpu_stats() and node_memory_stats() the default value of cpuNum and cellNum is -1. Reported-by: Charlie Smurthwaite <charlie(a)atech.media> Signed-off-by: Stefano Garzarella <sgarzare(a)redhat.com> --- ext/libvirt/connect.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/ext/libvirt/connect.c b/ext/libvirt/connect.c index 5932535..b2d041b 100644 --- a/ext/libvirt/connect.c +++ b/ext/libvirt/connect.c @@ -2079,7 +2079,12 @@ static VALUE libvirt_connect_node_cpu_stats(int argc, VALUE *argv, VALUE c) rb_scan_args(argc, argv, "02", &intparam, &flags); - tmp = ruby_libvirt_value_to_int(intparam); + if (NIL_P(intparam)) { + tmp = -1; + } + else { + tmp = ruby_libvirt_value_to_int(intparam); + } return ruby_libvirt_get_parameters(c, ruby_libvirt_value_to_uint(flags), (void *)&tmp, sizeof(virNodeCPUStats), @@ -2139,7 +2144,12 @@ static VALUE libvirt_connect_node_memory_stats(int argc, VALUE *argv, VALUE c) rb_scan_args(argc, argv, "02", &intparam, &flags); - tmp = ruby_libvirt_value_to_int(intparam); + if (NIL_P(intparam)) { + tmp = -1; + } + else { + tmp = ruby_libvirt_value_to_int(intparam); + } return ruby_libvirt_get_parameters(c, ruby_libvirt_value_to_uint(flags), (void *)&tmp, sizeof(virNodeMemoryStats), -- 2.21.0

5 years, 7 months

3
4
0 / 0

Re: [libvirt] [edk2-discuss] [OVMF] resource assignment fails for passthrough PCI GPU

by Eduardo Habkost

(+Jiri, +libvir-list) On Fri, Nov 22, 2019 at 04:58:25PM +0000, Dr. David Alan Gilbert wrote: > * Laszlo Ersek (lersek(a)redhat.com) wrote: > > (+Dave, +Eduardo) > > > > On 11/22/19 00:00, dann frazier wrote: > > > On Tue, Nov 19, 2019 at 06:06:15AM +0100, Laszlo Ersek wrote: > > >> On 11/19/19 01:54, dann frazier wrote: > > >>> On Fri, Nov 15, 2019 at 11:51:18PM +0100, Laszlo Ersek wrote: > > >>>> On 11/15/19 19:56, dann frazier wrote: > > >>>>> Hi, > > >>>>> I'm trying to passthrough an Nvidia GPU to a q35 KVM guest, but UEFI > > >>>>> is failing to allocate resources for it. I have no issues if I boot w/ > > >>>>> a legacy BIOS, and it works fine if I tell the linux guest to do the > > >>>>> allocation itself - but I'm looking for a way to make this work w/ > > >>>>> OVMF by default. > > >>>>> > > >>>>> I posted a debug log here: > > >>>>> https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1849563/+attachment/5... > > >>>>> > > >>>>> Linux guest lspci output is also available for both seabios/OVMF boots here: > > >>>>> https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1849563 > > >>>> > > >>>> By default, OVMF exposes such a 64-bit MMIO aperture for PCI MMIO BAR > > >>>> allocation that is 32GB in size. The generic PciBusDxe driver collects, > > >>>> orders, and assigns / allocates the MMIO BARs, but it can work only out > > >>>> of the aperture that platform code advertizes. > > >>>> > > >>>> Your GPU's region 1 is itself 32GB in size. Given that there are further > > >>>> PCI devices in the system with further 64-bit MMIO BARs, the default > > >>>> aperture cannot accommodate everything. In such an event, PciBusDxe > > >>>> avoids assigning the largest BARs (to my knowledge), in order to > > >>>> conserve the most aperture possible, for other devices -- hence break > > >>>> the fewest possible PCI devices. > > >>>> > > >>>> You can control the aperture size from the QEMU command line. You can > > >>>> also do it from the libvirt domain XML, technically speaking. The knob > > >>>> is experimental, so no stability or compatibility guarantees are made. > > >>>> (That's also the reason why it's a bit of a hack in the libvirt domain XML.) > > >>>> > > >>>> The QEMU cmdline options is described in the following edk2 commit message: > > >>>> > > >>>> https://github.com/tianocore/edk2/commit/7e5b1b670c38 > > >>> > > >>> Hi Laszlo, > > >>> > > >>> Thanks for taking the time to describe this in detail! The -fw_cfg > > >>> option did avoid the problem for me. > > >> > > >> Good to hear, thanks. > > >> > > >>> I also noticed that the above > > >>> commit message mentions the existence of a 24GB card as a reasoning > > >>> behind choosing the 32GB default aperture. From what you say below, I > > >>> understand that bumping this above 64GB could break hosts w/ <= 37 > > >>> physical address bits. > > >> > > >> Right. > > >> > > >>> What would be the downside of bumping the > > >>> default aperture to, say, 48GB? > > >> > > >> The placement of the aperture is not trivial (please see the code > > >> comments in the linked commit). The base address of the aperture is > > >> chosen so that the largest BAR that can fit in the aperture may be > > >> naturally aligned. (BARs are whole powers of two.) > > >> > > >> The largest BAR that can fit in a 48 GB aperture is 32 GB. Therefore > > >> such an aperture would be aligned at 32 GB -- the lowest base address > > >> (dependent on guest RAM size) would be 32 GB. Meaning that the aperture > > >> would end at 32 + 48 = 80 GB. That still breaches the 36-bit phys > > >> address width. > > >> > > >> 32 GB is the largest aperture size that can work with 36-bit phys > > >> address width; that's the aperture that ends at 64 GB exactly. > > > > > > Thanks, yeah - now that I read the code comments that is clear (as > > > clear as it can be w/ my low level of base knowledge). In the commit you > > > mention Gerd (CC'd) had suggested a heuristic-based approach for > > > sizing the aperture. When you say "PCPU address width" - is that a > > > function of the available physical bits? > > > > "PCPU address width" is not a "function" of the available physical bits > > -- it *is* the available physical bits. "PCPU" simply stands for > > "physical CPU". > > > > > IOW, would that approach > > > allow OVMF to automatically grow the aperture to the max ^2 supported > > > by the host CPU? > > > > Maybe. > > > > The current logic in OVMF works from the guest-physical address space > > size -- as deduced from multiple factors, such as the 64-bit MMIO > > aperture size, and others -- towards the guest-CPU (aka VCPU) address > > width. The VCPU address width is important for a bunch of other purposes > > in the firmware, so OVMF has to calculate it no matter what. > > > > Again, the current logic is to calculate the highest guest-physical > > address, and then deduce the VCPU address width from that (and then > > expose it to the rest of the firmware). > > > > Your suggestion would require passing the PCPU (physical CPU) address > > width from QEMU/KVM into the guest, and reversing the direction of the > > calculation. The PCPU address width would determine the VCPU address > > width directly, and then the 64-bit PCI MMIO aperture would be > > calculated from that. > > > > However, there are two caveats. > > > > (1) The larger your guest-phys address space (as exposed through the > > VCPU address width to the rest of the firmware), the more guest RAM you > > need for page tables. Because, just before entering the DXE phase, the > > firmware builds 1:1 mapping page tables for the entire guest-phys > > address space. This is necessary e.g. so you can access any PCI MMIO BAR. > > > > Now consider that you have a huge beefy virtualization host with say 46 > > phys address bits, and a wimpy guest with say 1.5GB of guest RAM. Do you > > absolutely want tens of *terabytes* for your 64-bit PCI MMIO aperture? > > Do you really want to pay for the necessary page tables with that meager > > guest RAM? > > > > (Such machines do exist BTW, for example: > > > > http://mid.mail-archive.com/9BD73EA91F8E404F851CF3F519B14AA8036C67B5@DGGE... > > ) > > > > In other words, you'd need some kind of knob anyway, because otherwise > > your aperture could grow too *large*. > > > > > > (2) Exposing the PCPU address width to the guest may have nasty > > consequences at the QEMU/KVM level, regardless of guest firmware. For > > example, that kind of "guest enlightenment" could interfere with migration. > > > > If you boot a guest let's say with 16GB of RAM, and tell it "hey friend, > > have 40 bits of phys address width!", then you'll have a difficult time > > migrating that guest to a host with a CPU that only has 36-bits wide > > physical addresses -- even if the destination host has plenty of RAM > > otherwise, such as a full 64GB. > > > > There could be other QEMU/KVM / libvirt issues that I m unaware of > > (hence the CC to Dave and Eduardo). > > host physical address width gets messy. There are differences as well > between upstream qemu behaviour, and some downstreams. > I think the story is that: > > a) Qemu default: 40 bits on any host > b) -cpu blah,host-phys-bits=true to follow the host. > c) RHEL has host-phys-bits=true by default > > As you say, the only real problem with host-phys-bits is migration - > between say an E3 and an E5 xeon with different widths. The magic 40's > is generally wrong as well - I think it came from some ancient AMD, > but it's the default on QEMU TCG as well. Yes, and because it affects live migration ability, we have two constraints: 1) It needs to be exposed in the libvirt domain XML; 2) QEMU and libvirt can't choose a value that works for everybody (because neither QEMU or libvirt know where the VM might be migrated later). Which is why the BZ below is important: > > I don't think there's a way to set it in libvirt; > https://bugzilla.redhat.com/show_bug.cgi?id=1578278 is a bz asking for > that. > > IMHO host-phys-bits is actually pretty safe; and makes most sense in a > lot of cases. Yeah, it is mostly safe and makes sense, but messy if you try to migrate to a host with a different size. > > Dave > > > > Thanks, > > Laszlo > > > > > > > > -dann > > > > > >>>> For example, to set a 64GB aperture, pass: > > >>>> > > >>>> -fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=65536 > > >>>> > > >>>> The libvirt domain XML syntax is a bit tricky (and it might "taint" your > > >>>> domain, as it goes outside of the QEMU features that libvirt directly > > >>>> maps to): > > >>>> > > >>>> <domain > > >>>> type='kvm' > > >>>> xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'> > > >>>> <qemu:commandline> > > >>>> <qemu:arg value='-fw_cfg'/> > > >>>> <qemu:arg value='opt/ovmf/X-PciMmio64Mb,string=65536'/> > > >>>> </qemu:commandline> > > >>>> </domain> > > >>>> > > >>>> Some notes: > > >>>> > > >>>> (1) The "xmlns:qemu" namespace definition attribute in the <domain> root > > >>>> element is important. You have to add it manually when you add > > >>>> <qemu:commandline> and <qemu:arg> too. Without the namespace > > >>>> definition, the latter elements will make no sense, and libvirt will > > >>>> delete them immediately. > > >>>> > > >>>> (2) The above change will grow your guest's physical address space to > > >>>> more than 64GB. As a consequence, on your *host*, *if* your physical CPU > > >>>> supports nested paging (called "ept" on Intel and "npt" on AMD), *then* > > >>>> the CPU will have to support at least 37 physical address bits too, for > > >>>> the guest to work. Otherwise, the guest will break, hard. > > >>>> > > >>>> Here's how to verify (on the host): > > >>>> > > >>>> (2a) run "egrep -w 'npt|ept' /proc/cpuinfo" --> if this does not produce > > >>>> output, then stop reading here; things should work. Your CPU does not > > >>>> support nested paging, so KVM will use shadow paging, which is slower, > > >>>> but at least you don't have to care about the CPU's phys address width. > > >>>> > > >>>> (2b) otherwise (i.e. when you do have nested paging), run "grep 'bits > > >>>> physical' /proc/cpuinfo" --> if the physical address width is >=37, > > >>>> you're good. > > >>>> > > >>>> (2c) if you have nested paging but exactly 36 phys address bits, then > > >>>> you'll have to forcibly disable nested paging (assuming you want to run > > >>>> a guest with larger than 64GB guest-phys address space, that is). On > > >>>> Intel, issue: > > >>>> > > >>>> rmmod kvm_intel > > >>>> modprobe kvm_intel ept=N > > >>>> > > >>>> On AMD, go with: > > >>>> > > >>>> rmmod kvm_amd > > >>>> modprobe kvm_amd npt=N > > >>>> > > >>>> Hope this helps, > > >>>> Laszlo > > >>>> > > >>> > > >> > > > > > > -- > Dr. David Alan Gilbert / dgilbert(a)redhat.com / Manchester, UK -- Eduardo

5 years, 8 months

1
0
0 / 0

[libvirt] [PATCH v2 0/4] PCI hostdev partial assignment support

by Daniel Henrique Barboza

Changes from previous version [1], all of them result of feedback from Alex Williamson and Abdulla Bubshait: - use <address type='none'/> instead of creating a new subsys attribute; - expand the change to all PCI hostdevs. Former patch 01 was discarded because we don't need the PCI Multifunction helpers for now; - series changed name to reflect what it's being done - new patch 04: add documentation to formatdomain.html.in To avoid a huge wall of text please refer to [1] for context about the road up to here. Commit msgs of the first 3 patches tells the story as well. [1] https://www.redhat.com/archives/libvir-list/2019-October/msg00298.html What I want to discuss here instead is a caveat that I've found while testing this work, since its first version. This test was done in a Power 8 system with a Broadcom BCM5719 PCIe Multifunction card, with 4 virtual functions. This series enables Libvirt to declare PCI hostdevs that will not be visible to the guest using address type='none'. During the tests I faced a scenario that I expected to fail, but it didn't. This is the relevant XML except: <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0001' bus='0x09' slot='0x00' function='0x0'/> </source> <address type='none'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0001' bus='0x09' slot='0x00' function='0x1'/> </source> <address type='none'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0001' bus='0x09' slot='0x00' function='0x2'/> </source> <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x2'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0001' bus='0x09' slot='0x00' function='0x3'/> </source> <address type='none'/> </hostdev> I'm declaring all the BCM5719 functions in the XML, but I am making functions 0, 1 and 3 unassignable by the guest using address type='none'. This test was meant to fail, but it didn't. To my surprise the guest booted and the device is functional: $ lspci 0000:00:01.0 Unclassified device [00ff]: Red Hat, Inc Virtio memory balloon 0000:00:03.0 USB controller: Red Hat, Inc. QEMU XHCI Host Controller (rev 01) 0000:00:04.0 SCSI storage controller: Red Hat, Inc Virtio block device 0001:00:01.2 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) $ I've talked with Michael Roth (QEMU PPC64 developer that worked with the PCI multifunction hotplug/unplug support in the PPC64 machine) about this. He mentioned that this is intended. I'll quote here what he had to say about it: "The current logic is that we only emit the hotplug event when function 0 is attached, but if some other function is attached at boot-time the guest will still see it on the bus, and whether that works or not I think is up to the device/driver" This explains why this test didn't fail as I expected. At least for the PPC64 machine, depending on the device support, this setup is allowed. PPC64 machine uses function 0 hotplug as a signal of 'plug all the queue functions and function 0', but function 0 isn't required at boot time. I would like to hear other opinions in this because I can't say whether this is allowed in x86. I am mentioning all this now because this had a direct impact on the design of this work since the previous version, and I failed to bring it up back then. I am *not* checking for the assignment of function 0 at guest boot time in Libvirt, leaving the user free to decide what to do. I am aware that this will be inconsistent to the logic of the PCI multifunction hotplug/unplug support, where function 0 is required. This also puts a lot of faith in the user, relying that the user is fully aware of the capabilities of the hardware. My question is: should Libvirt force function 0 to be present in boot time as well, regardless of whether the PPC64 guest or some cards are able to boot without it? Thanks, DHB Daniel Henrique Barboza (4): domain_conf: allow address type='none' to unassign PCI hostdevs qemu: handle unassigned PCI hostdevs in command line and alias virhostdev.c: check all IOMMU devs in virHostdevPreparePCIDevices formatdomain.html.in: document <address type='none'/> docs/formatdomain.html.in | 15 +++++ docs/schemas/domaincommon.rng | 5 ++ src/conf/domain_conf.c | 56 ++++++++++++++-- src/conf/domain_conf.h | 3 + src/qemu/qemu_alias.c | 6 ++ src/qemu/qemu_command.c | 4 ++ src/qemu/qemu_domain_address.c | 6 ++ src/util/virhostdev.c | 64 +++++++++++++++++-- .../hostdev-pci-address-none.args | 31 +++++++++ .../hostdev-pci-address-none.xml | 42 ++++++++++++ ...ostdev-pci-multifunction-partial-fail.args | 31 +++++++++ ...hostdev-pci-multifunction-partial-fail.xml | 35 ++++++++++ tests/qemuxml2argvtest.c | 8 +++ .../hostdev-pci-address-none.xml | 58 +++++++++++++++++ tests/qemuxml2xmltest.c | 1 + 15 files changed, 352 insertions(+), 13 deletions(-) create mode 100644 tests/qemuxml2argvdata/hostdev-pci-address-none.args create mode 100644 tests/qemuxml2argvdata/hostdev-pci-address-none.xml create mode 100644 tests/qemuxml2argvdata/hostdev-pci-multifunction-partial-fail.args create mode 100644 tests/qemuxml2argvdata/hostdev-pci-multifunction-partial-fail.xml create mode 100644 tests/qemuxml2xmloutdata/hostdev-pci-address-none.xml -- 2.21.0

5 years, 8 months

3
8
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Devel November 2019