On 10/29/2013 11:22 AM, Cole Robinson wrote:
> On 10/29/2013 10:25 AM, Daniel P. Berrange wrote:
>> On Mon, Oct 28, 2013 at 01:22:39PM -0400, Cole Robinson wrote:
>>> On 10/28/2013 01:14 PM, Daniel P. Berrange wrote:
>>>> On Mon, Oct 28, 2013 at 01:08:45PM -0400, Cole Robinson wrote:
>>>>> On 10/28/2013 01:06 PM, Daniel P. Berrange wrote:
>>>>>> On Mon, Oct 28, 2013 at 01:03:49PM -0400, Cole Robinson wrote:
>>>>>>> On 10/28/2013 07:52 AM, Daniel P. Berrange wrote:
>>>>>>>> From: "Daniel P. Berrange"
<berrange(a)redhat.com>
>>>>>>>>
>>>>>>>> The following sequence
>>>>>>>>
>>>>>>>> 1. Define a persistent QMEU guest
>>>>>>>> 2. Start the QEMU guest
>>>>>>>> 3. Stop libvirtd
>>>>>>>> 4. Kill the QEMU process
>>>>>>>> 5. Start libvirtd
>>>>>>>> 6. List persistent guets
>>>>>>>>
>>>>>>>> At the last step, the previously running persistent
guest
>>>>>>>> will be missing. This is because of a race condition in
the
>>>>>>>> QEMU driver startup code. It does
>>>>>>>>
>>>>>>>> 1. Load all VM state files
>>>>>>>> 2. Spawn thread to reconnect to each VM
>>>>>>>> 3. Load all VM config files
>>>>>>>>
>>>>>>>> Only at the end of step 3, does the
'virDomainObjPtr' get
>>>>>>>> marked as "persistent". There is therefore a
window where
>>>>>>>> the thread reconnecting to the VM will remove the
persistent
>>>>>>>> VM from the list.
>>>>>>>>
>>>>>>>> The easy fix is to simply switch the order of steps 2
& 3.
>>>>>>>>
>>>>>>>> Signed-off-by: Daniel P. Berrange
<berrange(a)redhat.com>
>>>>>>>> ---
>>>>>>>> src/qemu/qemu_driver.c | 3 +--
>>>>>>>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/src/qemu/qemu_driver.c
b/src/qemu/qemu_driver.c
>>>>>>>> index c613967..9c3daad 100644
>>>>>>>> --- a/src/qemu/qemu_driver.c
>>>>>>>> +++ b/src/qemu/qemu_driver.c
>>>>>>>> @@ -816,8 +816,6 @@ qemuStateInitialize(bool
privileged,
>>>>>>>>
>>>>>>>> conn = virConnectOpen(cfg->uri);
>>>>>>>>
>>>>>>>> - qemuProcessReconnectAll(conn, qemu_driver);
>>>>>>>> -
>>>>>>>> /* Then inactive persistent configs */
>>>>>>>> if
(virDomainObjListLoadAllConfigs(qemu_driver->domains,
>>>>>>>>
cfg->configDir,
>>>>>>>> @@ -828,6 +826,7 @@ qemuStateInitialize(bool
privileged,
>>>>>>>> NULL, NULL) <
0)
>>>>>>>> goto error;
>>>>>>>>
>>>>>>>> + qemuProcessReconnectAll(conn, qemu_driver);
>>>>>>>>
>>>>>>>> virDomainObjListForEach(qemu_driver->domains,
>>>>>>>> qemuDomainSnapshotLoad,
>>>>>>>>
>>>>>>>
>>>>>>> I tried testing this patch to see if it would fix:
>>>>>>>
>>>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1015246
>>>>>>>
>>>>>>> from current master I did:
>>>>>>>
>>>>>>> git revert a924d9d083c215df6044387057c501d9aa338b96
>>>>>>> reproduce the bug
>>>>>>> git am <your-patch>
>>>>>>>
>>>>>>> But the daemon won't even start up after your patch is
built:
>>>>>>>
>>>>>>> (gdb) bt
>>>>>>> #0 qemuMonitorOpen (vm=vm@entry=0x7fffd4211090, config=0x0,
json=false,
>>>>>>> cb=cb@entry=0x7fffddcae720 <monitorCallbacks>,
>>>>>>> opaque=opaque@entry=0x7fffd419b840) at
qemu/qemu_monitor.c:852
>>>>
>>>>> Sorry for not being clear: The daemon crashes, that's the
backtrace.
>>>>
>>>> Hmm config is NULL - does the state XML files not include the
>>>> monitor info perhaps ?
>>>>
>>>
>>> I see:
>>>
>>> pidfile for busted VM in /var/run/libvirt/qemu
>>> nothing in /var/cache/libvirt/qemu
>>> no state that I can see in /var/lib/libvirt/qemu
>>>
>>> But I'm not sure where it's supposed to be stored.
>>>
>>> FWIW reproducing this state was pretty simple: revert
>>> a924d9d083c215df6044387057c501d9aa338b96, edit an existing x86 guest to
remove
>>> all <video> and <graphics> devices, start the guest, libvirtd
crashes.
>>
>> Ok, I believe you probably have SELinux disabled on your machine or in
>> libvirtd. With SELinux enabled you hit another bug first
>>
>> 2013-10-29 13:50:11.711+0000: 17579: error : qemuConnectMonitor:1401 : Failed to
set security context for monitor for rhel6x86_64
>>
>>
>> which prevents hitting the crash you report. The fix is the same in both
>> cases - we must skip VMs with PID of zero. I've sent a v2 patch.
>>
>
> Hmm, selinux is permissive here but not disabled. But I'll try your patches
> and report back.
>
Applied both patches, the original bug report and the crash I reported here
are both fixed. Thanks Dan!