On Mon, Aug 02, 2021 at 15:34:52 +0200, Vojtech Juranek wrote:
On Monday, 2 August 2021 14:30:05 CEST Peter Krempa wrote:
> On Mon, Aug 02, 2021 at 14:20:44 +0200, Vojtech Juranek wrote:
> > Hi,
> > as a follow-up of BZ #1883399 [1], we are reviewing vdsm VM migration
> > flows and solve few follow-up bugs, e.g. BZ #1981079 [2]. I have couple
> > of questions related to libvirt:
> >
> > * if we run disk extend during migration, it can happen that migration
> > finishes sooner than disk extend. In such case we will try to set disk
> > threshold on already stopped VM (we handle libvirt event that VM was
> > stopper, but due to Python GIL there can be a delay between obtaining
> > appropriate signal from libvirt and handling it). In such case we get
> > libvirt
> > VIR_ERR_OPERATION_INVALID when setting disk threshold.
actually I was wrong here and the issue is actually caused by delay libvirt
setBlockThreshold() call, form vdsm log:
2021-08-02 09:06:01,918-0400 WARN (mailbox-hsm/3) [virt.vm]
(vmId='2dad9038-3e3a-4b5e-8d20-b0da37d9ef79') setting theshold using dom
<vdsm.virt.virdomain.Notifying object at 0x7fd06610df28> (drivemonitor:122)
[...]
2021-08-02 09:06:03,967-0400 WARN (libvirt/events) [virt.vm]
(vmId='2dad9038-3e3a-4b5e-8d20-b0da37d9ef79') libvirt event Stopped detail 3
opaque None (vm:5657)
[...]
2021-08-02 09:06:03,969-0400 WARN (mailbox-hsm/3) [virt.vm]
(vmId='2dad9038-3e3a-4b5e-8d20-b0da37d9ef79') Domain not connected, skipping set
block threshold for drive 'sdc' (drivemonitor:133)
so it took about 2 second to libvirt setBlockThreshold() call to return and in meantime
migration was finished and we get VIR_ERR_OPERATION_INVALID error from
setBlockThreshold() call.
What is the reason for this delay? Is this operation intentionally delayed until
migration finishes?
Actually, qemuDomainSetBlockThreshold which is the backend for
virDomainSetBlockThreshold requires a QEMU_JOB_MODIFY job on the domain,
so this actually can't even be set _during_ migration.
In fact what happens is that the API call is waiting to be able to
obtain the MODIFY job and that can happen only after the migration is
finished, thus it always serializes after the migration.