September 2015 - Devel - Libvirt List Archives

Re: [libvirt] [PATCH 3/3] kvm-all: notice KVM of vcpu's TSC rate after migration

by Eduardo Habkost

On Mon, Sep 28, 2015 at 01:38:31PM +0800, Haozhong Zhang wrote: > When a vcpu is created in KVM, its TSC rate is initially identical to > the host TSC rate. If its state is migrated to a vcpu on another > machine (target machine) which may uses a different host TSC rate, QEMU > on the target machine should notice KVM of the migrated vcpu's TSC > rate. In case that KVM on the target machine supports TSC scaling, guest > programs running on the migrated vcpu will observe the same TSC rate > before and after the migration. > > Signed-off-by: Haozhong Zhang <haozhong.zhang(a)intel.com> > --- > kvm-all.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/kvm-all.c b/kvm-all.c > index 0be4615..e8de038 100644 > --- a/kvm-all.c > +++ b/kvm-all.c > @@ -1769,6 +1769,19 @@ void kvm_cpu_synchronize_post_reset(CPUState *cpu) > static void do_kvm_cpu_synchronize_post_init(void *arg) > { > CPUState *cpu = arg; > + CPUX86State *env = &X86_CPU(cpu)->env; > + int r; > + > + /* > + * XXX: KVM_SET_TSC_KHZ must be done before kvm_arch_put_registers(). Could you explain where this requirement comes from? > + */ > + r = kvm_check_extension(cpu->kvm_state, KVM_CAP_TSC_CONTROL); > + if (r && env->tsc_khz) { > + r = kvm_vcpu_ioctl(cpu, KVM_SET_TSC_KHZ, env->tsc_khz); > + if (r < 0) { > + fprintf(stderr, "KVM_SET_TSC_KHZ failed\n"); > + } > + } This is duplicating the existing KVM_SET_TSC_KHZ call at kvm_arch_init_vcpu(). I wonder if there's a way to avoid this duplication. Should we set TSC KHz only at do_kvm_cpu_synchronize_post_init(), and remove the call from kvm_arch_init_vcpu()? Or maybe we shouldn't treat this as VM state, but as configuration, and let management configure the TSC frequency explicitly if the user really needs it to stay the same during migration. (CCing libvir-list to see if they have feedback) -- Eduardo

9 years, 9 months

2
5
0 / 0

[libvirt] Cannot write data: Broken pipe when accesing libvirt from a forked process

by Josef Stribny

Hi all, I am trying to add vagrant-libvirt support for landrush plugin[0] and found out that libvirt fails for me with: Call to virConnectNumOfNetworks failed: Cannot write data: Broken pipe (Libvirt::RetrieveError) I created a minimal reproducer that causes this: ``` #!/usr/bin/ruby require 'libvirt' conn = Libvirt::open("qemu:///system") fork do puts conn.list_networks end puts conn.list_networks ``` This works just fine on my host, but fails on my virtualized guest (when using nested KVM). The journal shows the following lines in logs: Failed to acquire pid file '/run/user/1001/libvirt/libvirtd.pid': Resource temporarily unavailable So the forked process here tries to access not-existing libvirtd pid file for root (1001 on the VM). Is this a feature/bug? How can one avoid it? Thanks everyone Josef [0] https://github.com/phinze/landrush/pull/124

9 years, 9 months

3
2
0 / 0

[libvirt] [PATCH] network: avoid existing bridges during rpm install of default network

by Laine Stump

When we install the libvirt-daemon-config-network package from an rpm, the specfile checks that the subnet used for the default network isn't already used, but blindly assumes that virbr0 is available. This is almost always the case, but there could be a situation where someone was already using virbr0 for their own network, then decided to install libvirt-daemon-config-network, leading to a failure when they tried to start the default network. This patch adds a bit to the %post script for the daemon-network-config package (in the specfile, used only for .rpm packages) that checks "ip link show" and the existing libvirt network xml files in a loop to find the lowest numbered virbrN that is currently unused by either. (note that we already check for in-use bridge devices when defining a network in libvirt's network driver, but the rpm install bypasses libvirt (which may not yet be fully functional) and creates the xml file itself). --- I found this sitting in a source tree, written a few months ago and forgotten. It is a followup to commit 37b8bc6f, which added a similar check in the network driver during virNetworkDefineXML(). I vaguely recall someone reporting this problem on IRC or maybe on one of the mailing lists (they had created a bridge manually using the name virbr0, then when they installed libvirt, it happily made a network definition using virbr0). Yes, not very common, but possible. libvirt.spec.in | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/libvirt.spec.in b/libvirt.spec.in index 78a4cc3..931c5b9 100644 --- a/libvirt.spec.in +++ b/libvirt.spec.in @@ -1793,8 +1793,24 @@ if test $1 -eq 1 && test ! -f %{_sysconfdir}/libvirt/qemu/networks/default.xml ; ;; esac + # find an unused bridge device named virbrN (this assumes that the + # original name in default.xml is exactly "virbr0", so do not + # change it there without also changing orig_br). + orig_br=0 + br=${orig_br} + while ip link show virbr${br} >/dev/null 2>&1 ||\ + (grep '<bridge ' %{_sysconfdir}/libvirt/qemu/networks/*.xml |\ + grep virbr${br} >/dev/null 2>&1); do + br=$(expr ${br} + 1) + # safety to prevent an endless loop + if test ${br} -gt 256; then + break; + fi + done + UUID=`/usr/bin/uuidgen` sed -e "s/${orig_sub}/${sub}/g" \ + -e "s/virbr${orig_br}/virbr${br}/" \ -e "s,</name>,</name>\n <uuid>$UUID</uuid>," \ < %{_datadir}/libvirt/networks/default.xml \ > %{_sysconfdir}/libvirt/qemu/networks/default.xml -- 2.4.3

9 years, 9 months

2
1
0 / 0

[libvirt] [PATCH 0/6] Fix some Coverity issues

by John Ferlan

Newer Coverity (7.7.0) found a couple real issues and a few more false positives. There's still a few more to be resolved, but still trying to figure them out... libxlDomainMigrationPrepare - claim is args is leaked. Although it seems to be handled in libxlMigrateReceive or libxlDoMigrateReceive. Don't know the code well enough to do proper triage. qemuProcessStop - claim is usage of net->ifname in the condition (vport->virtPortType == VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH) is using a NULL net->ifname that would have been VIR_FREE()'d when case VIR_DOMAIN_NET_TYPE_DIRECT a few lines above. qemuDomainBlockRebase - claim is overwriting 'dest' after call to qemuDomainBlockCopyCommon causes resource leak. Although reading code says otherwise, except of course if CopyCommon goes to cleanup rather than endjob. Even changing the location doesn't resolve the issue. Fix the Coverity tag for the DEADCODE... It's always a coin-flip either dead_error_begin or dead_error_condition - I gave Jiri bad advice as to which one to use it seems. John Ferlan (6): Avoid Coverity FORWARD_NULL prior to strtok_r calls tests: Resolve Coverity RESOURCE_LEAK tests: Resolve Coverity RESOURCE_LEAK virsh: Resolve Coverity DEADCODE qemu: Resolve Coverity CHECKED_RETURN qemu: Resolve Coverity RESOURCE_LEAK src/esx/esx_vi.c | 1 + src/libxl/libxl_conf.c | 1 + src/openvz/openvz_conf.c | 2 ++ src/qemu/qemu_driver.c | 24 +++++++++++------------- src/qemu/qemu_process.c | 6 +++--- src/xenapi/xenapi_utils.c | 1 + tests/qemucaps2xmltest.c | 1 + tests/virnetdaemontest.c | 2 ++ tools/virsh.c | 2 +- 9 files changed, 23 insertions(+), 17 deletions(-) -- 2.1.0

9 years, 9 months

3
16
0 / 0

[libvirt] [PATCH v3 0/14] migration: support all toURI and proto combos

by Nikolay Shirokovskiy

Current implementation of 'toURI' migration interfaces does not support all combinations of interface versions and protocol versions. For example 'toURI2' with p2p flag will not migrate if driver supports only v3params proto. This is not convinient as drivers that starts to support migration have to manually support older versions of protocol. I guess this should be done in one place, namely here. Another issue is that there are a lot of code duplication in implementation of toURI interfaces and it is not obvious from code how are they related. This implementation uses extensible parameters as intermediate parameters representation. This is possible as interfaces are done backward compatible in terms of parameters and later versions supports all parameters of former versions. = Changes from version2 1. fix misc typos and overall english in commit messages 2. fix misc style issues 3. add 2 new patches 1. migration: refactor: rename uri parameter to miguri 2. migration: check dconnuri in p2p mode 4. rework succession of 2 patches 1. migration: refactor: prepare to reuse flag vs feature checks 2. migration: reuse flags vs features checks in toURI family into 1. migration: refactor: introduce parameter checking function 2. migration: reuse parameters check in toURI2 and toURI3 5. rearrange functions as suggested by reviwer to make better diffs The overall diff as compared to the previous patchset is really small. src/libvirt-domain.c | 531 ++++++++++++++++++++++--------------------------- 1 files changed, 238 insertions(+), 293 deletions(-)

9 years, 9 months

3
35
0 / 0

[libvirt] [libvirt-php] free xml resources in get_string_from_xpath

by Vasiliy Tolstov

free as much as possible on return from get_string_from_xpath Signed-off-by: Vasiliy Tolstov <v.tolstov(a)selfip.ru> --- src/libvirt-php.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/src/libvirt-php.c b/src/libvirt-php.c index 8588128..18499a6 100644 --- a/src/libvirt-php.c +++ b/src/libvirt-php.c @@ -2597,6 +2597,7 @@ char *get_string_from_xpath(char *xml, char *xpath, zval **val, int *retVal) if (!doc) { if (retVal) *retVal = -2; + xmlFreeParserCtxt(xp); xmlCleanupParser(); return NULL; } @@ -2605,6 +2606,8 @@ char *get_string_from_xpath(char *xml, char *xpath, zval **val, int *retVal) if (!context) { if (retVal) *retVal = -3; + xmlFreeDoc(doc); + xmlFreeParserCtxt(xp); xmlCleanupParser(); return NULL; } @@ -2614,6 +2617,8 @@ char *get_string_from_xpath(char *xml, char *xpath, zval **val, int *retVal) if (retVal) *retVal = -4; xmlXPathFreeContext(context); + xmlFreeParserCtxt(xp); + xmlFreeDoc(doc); xmlCleanupParser(); return NULL; } @@ -2621,6 +2626,8 @@ char *get_string_from_xpath(char *xml, char *xpath, zval **val, int *retVal) if(xmlXPathNodeSetIsEmpty(result->nodesetval)){ xmlXPathFreeObject(result); xmlXPathFreeContext(context); + xmlFreeParserCtxt(xp); + xmlFreeDoc(doc); xmlCleanupParser(); if (retVal) *retVal = 0; @@ -2632,8 +2639,9 @@ char *get_string_from_xpath(char *xml, char *xpath, zval **val, int *retVal) if (ret == 0) { xmlXPathFreeObject(result); - xmlFreeDoc(doc); xmlXPathFreeContext(context); + xmlFreeParserCtxt(xp); + xmlFreeDoc(doc); xmlCleanupParser(); if (retVal) *retVal = 0; @@ -2658,8 +2666,9 @@ char *get_string_from_xpath(char *xml, char *xpath, zval **val, int *retVal) value = (char *)xmlNodeListGetString(doc, nodeset->nodeTab[0]->xmlChildrenNode, 1); } - xmlXPathFreeContext(context); xmlXPathFreeObject(result); + xmlXPathFreeContext(context); + xmlFreeParserCtxt(xp); xmlFreeDoc(doc); xmlCleanupParser(); -- 2.5.0

9 years, 9 months

2
3
0 / 0

[libvirt] [PATCH v2 0/3] Add support for gic-version machine option

by Pavel Fedin

qemu now supports gic-version option for the virt machine. This patchset allows to use it in libvirt. v1 => v2: - Added capability flag Pavel Fedin (3): qemu: Introduce QEMU_CAPS_MACH_VIRT_GIC_VERSION capability qemu: Add support for gic-version machine option qemu: Add test cases for gic-version option src/qemu/qemu_capabilities.c | 5 +++ src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 43 +++++++++++++++------- .../qemuxml2argv-aarch64-gicv3.args | 6 +++ .../qemuxml2argv-aarch64-gicv3.xml | 26 +++++++++++++ tests/qemuxml2argvtest.c | 5 +++ 6 files changed, 73 insertions(+), 13 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-aarch64-gicv3.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-aarch64-gicv3.xml -- 2.1.4

9 years, 9 months

3
12
0 / 0

[libvirt] [PATCH] qemuDomainAttachDeviceLive: Check provided disk address

by Michal Privoznik

https://bugzilla.redhat.com/show_bug.cgi?id=1257844 Imagine an user who is trying to attach a disk to a domain with the following XML: <disk type='block' device='disk'> <driver name='qemu' type='raw'/> <source dev='/dev/sr0'/> <target dev='vde' bus='virtio'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> The XML is obviously wrong. It's trying to attach a virtio disk onto non-PCI bus. We should forbid that. Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com> --- src/qemu/qemu_hotplug.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index afc5408..fc8e21b 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -336,6 +336,13 @@ qemuDomainAttachVirtioDiskDevice(virConnectPtr conn, if (!qemuCheckCCWS390AddressSupport(vm->def, disk->info, priv->qemuCaps, disk->dst)) goto cleanup; + if (disk->info.type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI && + disk->info.type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_VIRTIO_S390 && + disk->info.type != VIR_DOMAIN_DEVICE_ADDRESS_TYPE_CCW) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("virtio disks must have PCI or CCW address")); + goto cleanup; + } } for (i = 0; i < vm->def->ndisks; i++) { -- 2.4.9

9 years, 9 months

3
10
0 / 0

[libvirt] Entreing freeze for libvirt-1.2.20

by Daniel Veillard

I'm afraid I forgot to warn about this last week, but we are close to the end of the month and it's time ! I have tagged the git tree and pushed release candidate 1 signed tarball and rpms at the usual place: ftp://libvirt.org/libvirt/ I didn't really had time to give it a bit of testing, at least it compiles and build fine <grin/>, so it may be broken, check it ! Plan is to have an rc2 probably on Wednesday, and final release on Friday. Hope that works for everyone ! Daniel -- Daniel Veillard | Open Source and Standards, Red Hat veillard(a)redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/

9 years, 9 months

1
1
0 / 0

[libvirt] Fwd: [PATCH v2] Close the source fd if the destination qemu exits during tunnelled migration

by Shivaprasad bhat

On Tue, Sep 29, 2015 at 2:33 AM, John Ferlan <jferlan(a)redhat.com> wrote: > > > On 09/28/2015 11:02 AM, Shivaprasad bhat wrote: >> Hi Jon, >> >> Thanks a lot for attaching the patch. Replies inline. >> >> On Thu, Sep 24, 2015 at 1:52 AM, John Ferlan <jferlan(a)redhat.com> wrote: >>> >>> >>> On 09/22/2015 07:21 AM, Shivaprasad bhat wrote: >>>> On Mon, Sep 21, 2015 at 8:04 PM, John Ferlan <jferlan(a)redhat.com> wrote: >>>>> >>>>> >>>>> On 09/21/2015 05:09 AM, Shivaprasad bhat wrote: >>>>>> Thanks John for the comments. >>>>>> >>>>>> >>>>>> On Fri, Sep 18, 2015 at 10:34 PM, John Ferlan <jferlan(a)redhat.com> wrote: >>>>>>> >>>>>>> >>>>>>> On 09/14/2015 10:44 AM, Shivaprasad G Bhat wrote: >>>>>>>> Tunnelled migration can hang if the destination qemu exits despite all the >>>>>>>> ABI checks. This happens whenever the destination qemu exits before the >>>>>>>> complete transfer is noticed by source qemu. The savevm state checks at >>>>>>>> runtime can fail at destination and cause qemu to error out. >>>>>>>> The source qemu cant notice it as the EPIPE is not propogated to it. >>>>>>>> The qemuMigrationIOFunc() notices the stream being broken from virStreamSend() >>>>>>>> and it cleans up the stream alone. The qemuMigrationWaitForCompletion() would >>>>>>>> never get to 100% transfer completion. >>>>>>>> The qemuMigrationWaitForCompletion() never breaks out as well since >>>>>>>> the ssh connection to destination is healthy, and the source qemu also thinks >>>>>>>> the migration is ongoing as the Fd to which it transfers, is never >>>>>>>> closed or broken. So, the migration will hang forever. Even Ctrl-C on the >>>>>>>> virsh migrate wouldn't be honoured. Close the source side FD when there is >>>>>>>> an error in the stream. That way, the source qemu updates itself and >>>>>>>> qemuMigrationWaitForCompletion() notices the failure. >>>>>>>> >>>>>>>> Close the FD for all kinds of errors to be sure. The error message is not >>>>>>>> copied for EPIPE so that the destination error is copied instead later. >>>>>>>> >>>>>>>> Note: >>>>>>>> Reproducible with repeated migrations between Power hosts running in different >>>>>>>> subcores-per-core modes. >>>>>>>> >>>>>>>> Changes from v1 -> v2: >>>>>>>> VIR_FORCE_CLOSE() was called twice for this use case which would log >>>>>>>> unneccessary warnings. So, move the fd close to qemuMigrationIOFunc >>>>>>>> so that there are no unnecessary duplicate attempts.(Would this trigger >>>>>>>> a Coverity error? I don't have a setup to check.) >>>>>>>> >>>>>>>> Signed-off-by: Shivaprasad G Bhat <sbhat(a)linux.vnet.ibm.com> >>>>>>>> --- >>>>>>>> src/qemu/qemu_migration.c | 8 ++++++-- >>>>>>>> 1 file changed, 6 insertions(+), 2 deletions(-) >>>>>>>> >>>>>>>> diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c >>>>>>>> index ff89ab5..9602fb2 100644 >>>>>>>> --- a/src/qemu/qemu_migration.c >>>>>>>> +++ b/src/qemu/qemu_migration.c >>>>>>>> @@ -4012,6 +4012,7 @@ static void qemuMigrationIOFunc(void *arg) >>>>>>>> if (virStreamFinish(data->st) < 0) >>>>>>>> goto error; >>>>>>>> >>>>>>>> + VIR_FORCE_CLOSE(data->sock); >>>>>>>> VIR_FREE(buffer); >>>>>>>> >>>>>>>> return; >>>>>>>> @@ -4029,7 +4030,11 @@ static void qemuMigrationIOFunc(void *arg) >>>>>>>> } >>>>>>>> >>>>>>>> error: >>>>>>>> - virCopyLastError(&data->err); >>>>>>>> + /* Let the source qemu know that the transfer cant continue anymore. >>>>>>>> + * Don't copy the error for EPIPE as destination has the actual error. */ >>>>>>>> + VIR_FORCE_CLOSE(data->sock); >>>>>>>> + if (!virLastErrorIsSystemErrno(EPIPE)) >>>>>>>> + virCopyLastError(&data->err); >>>>>>>> virResetLastError(); >>>>>>>> VIR_FREE(buffer); >>>>>>>> } >>>>>>>> @@ -4397,7 +4402,6 @@ qemuMigrationRun(virQEMUDriverPtr driver, >>>>>>>> if (iothread && qemuMigrationStopTunnel(iothread, ret < 0) < 0) >>>>>>>> ret = -1; >>>>>>>> } >>>>>>>> - VIR_FORCE_CLOSE(fd); >>>>>>> >>>>>>> ^^^ >>>>>>> >>>>>>> This causes Coverity to claim a RESOURCE_LEAK >>>>>>> >>>>>>> Feels like this was a mistake edit... >>>>>>> >>>>>> >>>>>> The VIR_FORCE_CLOSE() inside qemuMigrationIOFunc() alone is sufficient. >>>>>> Having this again here would lead to Warning in the logs. I too thought coverity >>>>>> would complain. Is there a way to force ignore a coverity warning? >>>>>> >>>>> >>>>> Typically a marker of sorts, such as >>>>> >>>>> /* coverity[leaked_handle] <some extra comment explaining why> */ >>>>> >>>>> Although I'm not sure that's the best way to handle this either. >>>>> >>>>> The problem I see though is this is an "error path" issue and when >>>>> perhaps it's safe/required to close fd/io->sock/data->sock. >>>>> >>>>> Your commit comment notes that the issue is seen on a fairly specific >>>>> event (virStreamSend failure) for a specific type of migration. >>>> >>>> I believe the failure can be seen for all types of migration with savestate >>>> mismatch. >>>> >>> >>> My thoughts were based mostly on your commit message comments: >>> >>> "The qemuMigrationIOFunc() notices the stream being broken from >>> virStreamSend() and it cleans up the stream alone. The >>> qemuMigrationWaitForCompletion() would never get to 100% transfer >>> completion." >>> >>> and the belief that the core issue you described is there's no way for >>> qemuMigrationCompleted to know the qemuMigrationIOFunc thread failed and >>> thus that's what caused the hang. Since we can pass the iothread "io" >>> object to the Completion function, I figured we could also use that to >>> check the status of the thread in the list of things already checked. I >>> guess it's not clear to me why your test fails below. Maybe it'll help >>> to attach/show you my thoughts in a patch format. I just took your >>> changes and "adjusted" them, merging the differences. So you could 'git >>> am' to top of trunk in order to compare my thoughts. >> >> I tried your patch. Gave migrations in a while loop. Hit the hang in 7th run >> itlsef. I feel the small time window is not very negligible. >> > > Perhaps I should clarify my query-migrate has no timeout comment... It > seems based on what I've read so far, the 'query-migrate' command > started successfully, because if it hadn't we would have received a > failure (as shown below). Thus libvirt has sent the command via the > monitor and is waiting for a response (e.g. the virCondWait after the > qemuMonitorSend in the trace below). The response isn't coming because > either "A" qemu didn't send it back or "B" libvirt missed it - that > should be determinable. > > There's a way to turn on debugging so the monitor dialog can be seen - > via changes to /etc/libvirt/libvirtd.conf. I use : > > log_level = 1 > log_filters="3:remote 4:event 3:json 3:rpc" > log_outputs="1:file:/var/log/libvirt/libvirtd.log" > > But you may need to remove the "3:json" in order to see the dialog since > that where it "feels like" the issue might be. Then start libvirtd in > the debugger again. Once it's hung - you should be able to scan (eg, > edit) the libvirtd.log file and search for the "query-migrate" command > being sent and then follow the copious output looking for the presence > of a returned command. If there is none, then something in qemu isn't > returning the failure correctly and it would need to be fixed there I > would think as opposed to throwing down the big hammer of closing the fd. Had a chance to run with your log settings. The query-migrate doesn't seem to have a corresponding "return" in the logs. So as you say, there may be a qemu bug that is not returning a response when the fd is still open(as libvirt didnt close it) but no read actually happening there. I felt qemu can't sense the failure as the fd is open, so posted this patch. Though, the qemu should return with the current state of migration as it sees instead of not returning at all. Hope we are on the same page. Mailer refused my reply as the attached logs with the mail crossed 300KB. Not attaching the logs for now. Please let me know if you want to me check anything specific. Thanks, Shivaprasad > > I think you've hit some edge/error "timing" condition - one of the more > difficult to debug. Finding the root cause of the problem could be > beneficial especially if it could be considered some more "common" type > problem for other similar migrate commands. > > John >>> >>> I'm not opposed to the chosen method, although perhaps Dan or Jiri who >>> understand more of the inner workings of migration than I do can provide >>> some comments. Changing where the fd is closed could have some corner >>> race condition I'm not aware of... >>> >>> One issue that needs to be resolved with your patch is what happens to >>> the 'fd' when qemuMigrationIOFunc isn't involved? IOW: if "fwdType == >>> MIGRATION_FWD_DIRECT", then qemuMigrationStartTunnel isn't called and >>> thus qemuMigrationIOFunc won't be closing the 'fd' in the form of >>> data->sock. That's perhaps a path Coverity was considering (and I hadn't >>> considered initially). >>> >>> I think to resolve this the following should be added to your patch (I >>> tried it, Coverity is happy too)... >>> >>> if (spec->fwdType != MIGRATION_FWD_DIRECT) { >>> if (!(iothread = qemuMigrationStartTunnel(spec->fwd.stream, fd))) >>> goto cancel; >>> /* If we've created a tunnel, then the 'fd' will be closed in the >>> * qemuMigrationIOFunc as data->sock >>> */ >>> fd = -1; >>> } >>> >>> and restore the VIR_FORCE_CLOSE(fd); >>> >> >> I agree. Let me know if we should go with this approach. I'll spin >> the next version with the above corrections. I am copying Jiri on this >> patch for his opinion as well. >> >>> >>>>> As I >>>>> read the code, that failure jumps to error (as does virStreamFinish). So >>>>> now you have a fairly specific set of instances which perhaps would >>>>> cause qemuMigrationWaitForCompletion to need to fail. The fix you have >>>>> proposed is to close the data->sock (io->sock, fd). However, your >>>>> proposal is a larger hammer. I assume closure of data->sock causes >>>>> WaitForCompletion to fail (perhaps differently) and that's why you chose it. >>>>> >>>>> Going back to the beginning of qemuMigrationRun, setting the 'fd' and >>>>> using StartTunnel seems to rely on MIGRATION_FWD_DIRECT, but if a tunnel >>>>> is created an (ugh) 'iothread' variable is NON-NULL. What if 'iothread' >>>>> were passed to qemuMigrationWaitForCompletion? Then again passed to >>>>> qemuMigrationCompleted which would check if iothread non-NULL and for >>>>> some new flag that could be created in _qemuMigrationIOThread, being >>>>> true. If it were true, then that would cause failure so that >>>>> WaitForCompletion would return error status. The only way the flag is >>>>> set to true is in qemuMigrationIOFunc when the code jumps to error. >>>>> (e.g. add bool stream_abort and data->stream_abort = true in "error:", >>>>> then check iothread && iothread->stream_abort, force error). >>>>> >>>> >>>> I tried this. Until the fd is closed, the 'info migrate' wouldn't >>>> return from the >>>> qemu. So, the migration hangs. Checking the stream status before/after >>>> fetching the job status would still leave a race. So, having it in a different >>>> thread (qemuMigrationIOFunc) here seems inevitable to me. >>> >>> Although perhaps susceptible to a smaller "timing" window, if we move >>> the iothread check to before the CheckJobStatus call in Completed, then >>> that'll work better? >>> >>> It seems the "query-migrate" command has no timeout... I assume qemu >>> would indicate that if you call it when the migration pipe had an error, >>> then you unpredictable results (only a guess - I didn't look at that >>> code as well). Perhaps that means libvirt needs some code to handle the >>> condition where the command doesn't finish in a timely manner. >>> >> >> No, If the pipe was broken, the query-migrate returns "failed". This is >> anyway handled in a generic way in libvirt. >> >> 90.408 > 0x3fff80007410 {"execute":"query-migrate","id":"libvirt-725"} >> 90.408 < 0x3fff80007410 {"return": {"status": "failed"}, "id": "libvirt-725"} >> >> Thanks and Regards, >> Shiva >> >>> Perhaps worth a shot... I did change my attached patch to reflect this >>> thought.. >>> >>> John >>>> >>>> #1 0x00003fff83ff593c in virCondWait (c=<optimized out>, m=<optimized out>) >>>> at util/virthread.c:154 >>>> #2 0x00003fff76235544 in qemuMonitorSend (mon=0x3fff54003970, >>>> msg=<optimized out>) at qemu/qemu_monitor.c:1035 >>>> #3 0x00003fff7624fb30 in qemuMonitorJSONCommandWithFd (mon=0x3fff54003970, >>>> cmd=0x3fff5c001420, scm_fd=-1, reply=0x3fff79fbd388) >>>> at qemu/qemu_monitor_json.c:293 >>>> #4 0x00003fff76254b90 in qemuMonitorJSONCommand (reply=0x3fff79fbd388, >>>> cmd=0x3fff5c001420, mon=0x3fff54003970) at qemu/qemu_monitor_json.c:323 >>>> #5 qemuMonitorJSONGetMigrationStatus (mon=0x3fff54003970, >>>> status=0x3fff79fbd538) at qemu/qemu_monitor_json.c:2620 >>>> #6 0x00003fff7623a664 in qemuMonitorGetMigrationStatus (mon=<optimized out>, >>>> status=<optimized out>) at qemu/qemu_monitor.c:2134 >>>> #7 0x00003fff76228e6c in qemuMigrationFetchJobStatus (driver=0x3fff6c118d80, >>>> vm=0x3fff6c27faf0, asyncJob=<optimized out>, jobInfo=0x3fff79fbd4f0) >>>> at qemu/qemu_migration.c:2528 >>>> #8 0x00003fff76228fb4 in qemuMigrationUpdateJobStatus (driver=0x3fff6c118d80, >>>> vm=0x3fff6c27faf0, asyncJob=QEMU_ASYNC_JOB_MIGRATION_OUT) >>>> at qemu/qemu_migration.c:2565 >>>> #9 0x00003fff76229200 in qemuMigrationCheckJobStatus (driver=0x3fff6c118d80, >>>> vm=0x3fff6c27faf0, asyncJob=QEMU_ASYNC_JOB_MIGRATION_OUT) >>>> ---Type <return> to continue, or q <return> to quit---cont >>>> at qemu/qemu_migration.c:2585 >>>> #10 0x00003fff76229408 in qemuMigrationCompleted (iothread=<optimized out>, >>>> >>>>> The only question then in my mind then is would this also be something >>>>> that should be done for the virStreamFinish failure path which also >>>>> jumps to error? >>>>> >>>> >>>> Yes, I too am not sure if its appropriate for virStreamFinish failure >>>> case. But, given >>>> this an error path, I feel we can safely go ahead and close the >>>> data->sock. >>>> >>>>> Doing this means you shouldn't need the VIR_FILE_CLOSE(data->sock) for >>>>> either the normal or error path. Does this seem to be a reasonable >>>>> approach and solve the issue you're facing? >>>>> >>>>> John >>>>>> Thanks and Regards, >>>>>> Shivaprasad >>>>>> >>>>>>> The rest of the patch looks reasonable; however, I'm far from the expert >>>>>>> in these matters. >>>>>>> >>>>>>> John >>>>>>>> >>>>>>>> if (priv->job.completed) { >>>>>>>> qemuDomainJobInfoUpdateTime(priv->job.completed); >>>>>>>> >>>>>>>> -- >>>>>>>> libvir-list mailing list >>>>>>>> libvir-list(a)redhat.com >>>>>>>> https://www.redhat.com/mailman/listinfo/libvir-list >>>>>>>>

9 years, 9 months

1
0
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Devel September 2015