On 10/29/2013 10:25 AM, Daniel P. Berrange wrote:
On Mon, Oct 28, 2013 at 01:22:39PM -0400, Cole Robinson wrote:
> On 10/28/2013 01:14 PM, Daniel P. Berrange wrote:
>> On Mon, Oct 28, 2013 at 01:08:45PM -0400, Cole Robinson wrote:
>>> On 10/28/2013 01:06 PM, Daniel P. Berrange wrote:
>>>> On Mon, Oct 28, 2013 at 01:03:49PM -0400, Cole Robinson wrote:
>>>>> On 10/28/2013 07:52 AM, Daniel P. Berrange wrote:
>>>>>> From: "Daniel P. Berrange" <berrange(a)redhat.com>
>>>>>>
>>>>>> The following sequence
>>>>>>
>>>>>> 1. Define a persistent QMEU guest
>>>>>> 2. Start the QEMU guest
>>>>>> 3. Stop libvirtd
>>>>>> 4. Kill the QEMU process
>>>>>> 5. Start libvirtd
>>>>>> 6. List persistent guets
>>>>>>
>>>>>> At the last step, the previously running persistent guest
>>>>>> will be missing. This is because of a race condition in the
>>>>>> QEMU driver startup code. It does
>>>>>>
>>>>>> 1. Load all VM state files
>>>>>> 2. Spawn thread to reconnect to each VM
>>>>>> 3. Load all VM config files
>>>>>>
>>>>>> Only at the end of step 3, does the 'virDomainObjPtr'
get
>>>>>> marked as "persistent". There is therefore a window
where
>>>>>> the thread reconnecting to the VM will remove the persistent
>>>>>> VM from the list.
>>>>>>
>>>>>> The easy fix is to simply switch the order of steps 2 & 3.
>>>>>>
>>>>>> Signed-off-by: Daniel P. Berrange <berrange(a)redhat.com>
>>>>>> ---
>>>>>> src/qemu/qemu_driver.c | 3 +--
>>>>>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
>>>>>> index c613967..9c3daad 100644
>>>>>> --- a/src/qemu/qemu_driver.c
>>>>>> +++ b/src/qemu/qemu_driver.c
>>>>>> @@ -816,8 +816,6 @@ qemuStateInitialize(bool privileged,
>>>>>>
>>>>>> conn = virConnectOpen(cfg->uri);
>>>>>>
>>>>>> - qemuProcessReconnectAll(conn, qemu_driver);
>>>>>> -
>>>>>> /* Then inactive persistent configs */
>>>>>> if (virDomainObjListLoadAllConfigs(qemu_driver->domains,
>>>>>> cfg->configDir,
>>>>>> @@ -828,6 +826,7 @@ qemuStateInitialize(bool privileged,
>>>>>> NULL, NULL) < 0)
>>>>>> goto error;
>>>>>>
>>>>>> + qemuProcessReconnectAll(conn, qemu_driver);
>>>>>>
>>>>>> virDomainObjListForEach(qemu_driver->domains,
>>>>>> qemuDomainSnapshotLoad,
>>>>>>
>>>>>
>>>>> I tried testing this patch to see if it would fix:
>>>>>
>>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=1015246
>>>>>
>>>>> from current master I did:
>>>>>
>>>>> git revert a924d9d083c215df6044387057c501d9aa338b96
>>>>> reproduce the bug
>>>>> git am <your-patch>
>>>>>
>>>>> But the daemon won't even start up after your patch is built:
>>>>>
>>>>> (gdb) bt
>>>>> #0 qemuMonitorOpen (vm=vm@entry=0x7fffd4211090, config=0x0,
json=false,
>>>>> cb=cb@entry=0x7fffddcae720 <monitorCallbacks>,
>>>>> opaque=opaque@entry=0x7fffd419b840) at qemu/qemu_monitor.c:852
>>
>>> Sorry for not being clear: The daemon crashes, that's the backtrace.
>>
>> Hmm config is NULL - does the state XML files not include the
>> monitor info perhaps ?
>>
>
> I see:
>
> pidfile for busted VM in /var/run/libvirt/qemu
> nothing in /var/cache/libvirt/qemu
> no state that I can see in /var/lib/libvirt/qemu
>
> But I'm not sure where it's supposed to be stored.
>
> FWIW reproducing this state was pretty simple: revert
> a924d9d083c215df6044387057c501d9aa338b96, edit an existing x86 guest to remove
> all <video> and <graphics> devices, start the guest, libvirtd crashes.
Ok, I believe you probably have SELinux disabled on your machine or in
libvirtd. With SELinux enabled you hit another bug first
2013-10-29 13:50:11.711+0000: 17579: error : qemuConnectMonitor:1401 : Failed to set
security context for monitor for rhel6x86_64
which prevents hitting the crash you report. The fix is the same in both
cases - we must skip VMs with PID of zero. I've sent a v2 patch.
Hmm, selinux is permissive here but not disabled. But I'll try your patches
and report back.
Thanks,
Cole