
On Monday, 2 August 2021 14:30:05 CEST Peter Krempa wrote:
On Mon, Aug 02, 2021 at 14:20:44 +0200, Vojtech Juranek wrote:
Hi, as a follow-up of BZ #1883399 [1], we are reviewing vdsm VM migration flows and solve few follow-up bugs, e.g. BZ #1981079 [2]. I have couple of questions related to libvirt:
* if we run disk extend during migration, it can happen that migration finishes sooner than disk extend. In such case we will try to set disk threshold on already stopped VM (we handle libvirt event that VM was stopper, but due to Python GIL there can be a delay between obtaining appropriate signal from libvirt and handling it). In such case we get libvirt VIR_ERR_OPERATION_INVALID when setting disk threshold.
actually I was wrong here and the issue is actually caused by delay libvirt setBlockThreshold() call, form vdsm log: 2021-08-02 09:06:01,918-0400 WARN (mailbox-hsm/3) [virt.vm] (vmId='2dad9038-3e3a-4b5e-8d20-b0da37d9ef79') setting theshold using dom <vdsm.virt.virdomain.Notifying object at 0x7fd06610df28> (drivemonitor:122) [...] 2021-08-02 09:06:03,967-0400 WARN (libvirt/events) [virt.vm] (vmId='2dad9038-3e3a-4b5e-8d20-b0da37d9ef79') libvirt event Stopped detail 3 opaque None (vm:5657) [...] 2021-08-02 09:06:03,969-0400 WARN (mailbox-hsm/3) [virt.vm] (vmId='2dad9038-3e3a-4b5e-8d20-b0da37d9ef79') Domain not connected, skipping set block threshold for drive 'sdc' (drivemonitor:133) so it took about 2 second to libvirt setBlockThreshold() call to return and in meantime migration was finished and we get VIR_ERR_OPERATION_INVALID error from setBlockThreshold() call. What is the reason for this delay? Is this operation intentionally delayed until migration finishes? I posted relevant libvirt debug log on https://pastebin.com/YkdKYKM5
Is it safe to catch this exception and ignore it or it's thrown for various reasons and the root cause can be something else than stopped VM?
The API to set the block trheshold level can return the following errors including cases when it can happen:
VIR_ERR_OPERATION_UNSUPPORTED <- unlikely new qemu supports it VIR_ERR_INVALID_ARG <- disk was not found in VM definition VIR_ERR_INTERNAL_ERROR <- on error from qemu
Thus VIR_ERR_OPERATION_INVALID seems to be safe to ignore in your specific case, while not ignoring others can be used to catch problems.
thanks for your answer