[libvirt] [PATCH] Fix race condition reconnecting to vms & loading configs

From: "Daniel P. Berrange" <berrange@redhat.com> The following sequence 1. Define a persistent QMEU guest 2. Start the QEMU guest 3. Stop libvirtd 4. Kill the QEMU process 5. Start libvirtd 6. List persistent guets At the last step, the previously running persistent guest will be missing. This is because of a race condition in the QEMU driver startup code. It does 1. Load all VM state files 2. Spawn thread to reconnect to each VM 3. Load all VM config files Only at the end of step 3, does the 'virDomainObjPtr' get marked as "persistent". There is therefore a window where the thread reconnecting to the VM will remove the persistent VM from the list. The easy fix is to simply switch the order of steps 2 & 3. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/qemu/qemu_driver.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index c613967..9c3daad 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -816,8 +816,6 @@ qemuStateInitialize(bool privileged, conn = virConnectOpen(cfg->uri); - qemuProcessReconnectAll(conn, qemu_driver); - /* Then inactive persistent configs */ if (virDomainObjListLoadAllConfigs(qemu_driver->domains, cfg->configDir, @@ -828,6 +826,7 @@ qemuStateInitialize(bool privileged, NULL, NULL) < 0) goto error; + qemuProcessReconnectAll(conn, qemu_driver); virDomainObjListForEach(qemu_driver->domains, qemuDomainSnapshotLoad, -- 1.8.3.1

On 10/28/2013 07:52 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
The following sequence
1. Define a persistent QMEU guest
s/QMEU/QEMU
2. Start the QEMU guest 3. Stop libvirtd 4. Kill the QEMU process 5. Start libvirtd 6. List persistent guets
s/guets/guests
At the last step, the previously running persistent guest will be missing. This is because of a race condition in the QEMU driver startup code. It does
1. Load all VM state files 2. Spawn thread to reconnect to each VM 3. Load all VM config files
Only at the end of step 3, does the 'virDomainObjPtr' get marked as "persistent". There is therefore a window where the thread reconnecting to the VM will remove the persistent VM from the list.
The easy fix is to simply switch the order of steps 2 & 3.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/qemu/qemu_driver.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
The fix seems reasonable, although I immediately wondered why "at some time" it was considered OK to reconnect before being persistent flag was set for inactive guests. The condition/fault initially described includes a host reboot in the processing. In that case (I would assume) the restart of the guest would occur if autostart was set. The external action of the kill()'ing of a guest outside of libvirtd's control results in some unknown/unpredictable state for the guest. Is there something in that initial load that could detect this condition better? I tried following the steps without the patch, but on my host the guest was listed after the restart - so yes a timing condition - but what causes that timing condition. Would setting the dom->persistent before the virObjectUnlock(dom) in virDomainObjListLoadAllConfigs() change the results? Beyond that keeping the virConnectOpen() and qemuProcessReconnectAll() "together" after the loading of the inactive persistent configs seems to keep code flow more normal. Whether that comes before or after the Snapshot/ManagedSave load is I suppose just an "implementation detail". Also, other drivers follow the load running, reconnect, and load inactive/persistent configs. Should those have similar patches as well? John
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index c613967..9c3daad 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -816,8 +816,6 @@ qemuStateInitialize(bool privileged,
conn = virConnectOpen(cfg->uri);
- qemuProcessReconnectAll(conn, qemu_driver); - /* Then inactive persistent configs */ if (virDomainObjListLoadAllConfigs(qemu_driver->domains, cfg->configDir, @@ -828,6 +826,7 @@ qemuStateInitialize(bool privileged, NULL, NULL) < 0) goto error;
+ qemuProcessReconnectAll(conn, qemu_driver);
virDomainObjListForEach(qemu_driver->domains, qemuDomainSnapshotLoad,

On Mon, Oct 28, 2013 at 09:17:16AM -0400, John Ferlan wrote:
On 10/28/2013 07:52 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
The following sequence
1. Define a persistent QMEU guest
s/QMEU/QEMU
2. Start the QEMU guest 3. Stop libvirtd 4. Kill the QEMU process 5. Start libvirtd 6. List persistent guets
s/guets/guests
At the last step, the previously running persistent guest will be missing. This is because of a race condition in the QEMU driver startup code. It does
1. Load all VM state files 2. Spawn thread to reconnect to each VM 3. Load all VM config files
Only at the end of step 3, does the 'virDomainObjPtr' get marked as "persistent". There is therefore a window where the thread reconnecting to the VM will remove the persistent VM from the list.
The easy fix is to simply switch the order of steps 2 & 3.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/qemu/qemu_driver.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
The fix seems reasonable, although I immediately wondered why "at some time" it was considered OK to reconnect before being persistent flag was set for inactive guests. The condition/fault initially described includes a host reboot in the processing. In that case (I would assume) the restart of the guest would occur if autostart was set. The external action of the kill()'ing of a guest outside of libvirtd's control results in some unknown/unpredictable state for the guest. Is there something in that initial load that could detect this condition better? I tried following the steps without the patch, but on my host the guest was listed after the restart - so yes a timing condition - but what causes that timing condition.
Would setting the dom->persistent before the virObjectUnlock(dom) in virDomainObjListLoadAllConfigs() change the results?
No, I tried that and it didn't help. None the less that should be fixed separately.
Beyond that keeping the virConnectOpen() and qemuProcessReconnectAll() "together" after the loading of the inactive persistent configs seems to keep code flow more normal. Whether that comes before or after the Snapshot/ManagedSave load is I suppose just an "implementation detail".
Also, other drivers follow the load running, reconnect, and load inactive/persistent configs. Should those have similar patches as well?
Historically it was safe because reconnecting to QEMU was serialized. At some point we switched to using a separate thread for reconnecting to QEMU which introduced a race. The other drivers are still serialized IIUC. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 10/28/2013 07:52 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
The following sequence
1. Define a persistent QMEU guest 2. Start the QEMU guest 3. Stop libvirtd 4. Kill the QEMU process 5. Start libvirtd 6. List persistent guets
At the last step, the previously running persistent guest will be missing. This is because of a race condition in the QEMU driver startup code. It does
1. Load all VM state files 2. Spawn thread to reconnect to each VM 3. Load all VM config files
Only at the end of step 3, does the 'virDomainObjPtr' get marked as "persistent". There is therefore a window where the thread reconnecting to the VM will remove the persistent VM from the list.
The easy fix is to simply switch the order of steps 2 & 3.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/qemu/qemu_driver.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index c613967..9c3daad 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -816,8 +816,6 @@ qemuStateInitialize(bool privileged,
conn = virConnectOpen(cfg->uri);
- qemuProcessReconnectAll(conn, qemu_driver); - /* Then inactive persistent configs */ if (virDomainObjListLoadAllConfigs(qemu_driver->domains, cfg->configDir, @@ -828,6 +826,7 @@ qemuStateInitialize(bool privileged, NULL, NULL) < 0) goto error;
+ qemuProcessReconnectAll(conn, qemu_driver);
virDomainObjListForEach(qemu_driver->domains, qemuDomainSnapshotLoad,
I tried testing this patch to see if it would fix: https://bugzilla.redhat.com/show_bug.cgi?id=1015246 from current master I did: git revert a924d9d083c215df6044387057c501d9aa338b96 reproduce the bug git am <your-patch> But the daemon won't even start up after your patch is built: (gdb) bt #0 qemuMonitorOpen (vm=vm@entry=0x7fffd4211090, config=0x0, json=false, cb=cb@entry=0x7fffddcae720 <monitorCallbacks>, opaque=opaque@entry=0x7fffd419b840) at qemu/qemu_monitor.c:852 #1 0x00007fffdda1083d in qemuConnectMonitor ( driver=driver@entry=0x7fffd419b840, vm=vm@entry=0x7fffd4211090, logfd=logfd@entry=-1) at qemu/qemu_process.c:1412 #2 0x00007fffdda1685a in qemuProcessReconnect ( opaque=opaque@entry=0x7fffd422fef0) at qemu/qemu_process.c:3086 #3 0x00007ffff7528dce in virThreadHelper (data=<optimized out>) at util/virthreadpthread.c:161 #4 0x00007ffff4782f33 in start_thread (arg=0x7fffcb7fe700) at pthread_create.c:309 #5 0x00007ffff40a9ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 The reverted commit is the fix for a simple crash that I used to reproduce 1015246. There might be multiple issues here but I don't have time to poke at it right now. Thanks, Cole

On Mon, Oct 28, 2013 at 01:03:49PM -0400, Cole Robinson wrote:
On 10/28/2013 07:52 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
The following sequence
1. Define a persistent QMEU guest 2. Start the QEMU guest 3. Stop libvirtd 4. Kill the QEMU process 5. Start libvirtd 6. List persistent guets
At the last step, the previously running persistent guest will be missing. This is because of a race condition in the QEMU driver startup code. It does
1. Load all VM state files 2. Spawn thread to reconnect to each VM 3. Load all VM config files
Only at the end of step 3, does the 'virDomainObjPtr' get marked as "persistent". There is therefore a window where the thread reconnecting to the VM will remove the persistent VM from the list.
The easy fix is to simply switch the order of steps 2 & 3.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/qemu/qemu_driver.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index c613967..9c3daad 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -816,8 +816,6 @@ qemuStateInitialize(bool privileged,
conn = virConnectOpen(cfg->uri);
- qemuProcessReconnectAll(conn, qemu_driver); - /* Then inactive persistent configs */ if (virDomainObjListLoadAllConfigs(qemu_driver->domains, cfg->configDir, @@ -828,6 +826,7 @@ qemuStateInitialize(bool privileged, NULL, NULL) < 0) goto error;
+ qemuProcessReconnectAll(conn, qemu_driver);
virDomainObjListForEach(qemu_driver->domains, qemuDomainSnapshotLoad,
I tried testing this patch to see if it would fix:
https://bugzilla.redhat.com/show_bug.cgi?id=1015246
from current master I did:
git revert a924d9d083c215df6044387057c501d9aa338b96 reproduce the bug git am <your-patch>
But the daemon won't even start up after your patch is built:
(gdb) bt #0 qemuMonitorOpen (vm=vm@entry=0x7fffd4211090, config=0x0, json=false, cb=cb@entry=0x7fffddcae720 <monitorCallbacks>, opaque=opaque@entry=0x7fffd419b840) at qemu/qemu_monitor.c:852 #1 0x00007fffdda1083d in qemuConnectMonitor ( driver=driver@entry=0x7fffd419b840, vm=vm@entry=0x7fffd4211090, logfd=logfd@entry=-1) at qemu/qemu_process.c:1412 #2 0x00007fffdda1685a in qemuProcessReconnect ( opaque=opaque@entry=0x7fffd422fef0) at qemu/qemu_process.c:3086 #3 0x00007ffff7528dce in virThreadHelper (data=<optimized out>) at util/virthreadpthread.c:161 #4 0x00007ffff4782f33 in start_thread (arg=0x7fffcb7fe700) at pthread_create.c:309 #5 0x00007ffff40a9ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
What is this trace showing ? or rather what is the error reported when it fails to start ? Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 10/28/2013 01:06 PM, Daniel P. Berrange wrote:
On Mon, Oct 28, 2013 at 01:03:49PM -0400, Cole Robinson wrote:
On 10/28/2013 07:52 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
The following sequence
1. Define a persistent QMEU guest 2. Start the QEMU guest 3. Stop libvirtd 4. Kill the QEMU process 5. Start libvirtd 6. List persistent guets
At the last step, the previously running persistent guest will be missing. This is because of a race condition in the QEMU driver startup code. It does
1. Load all VM state files 2. Spawn thread to reconnect to each VM 3. Load all VM config files
Only at the end of step 3, does the 'virDomainObjPtr' get marked as "persistent". There is therefore a window where the thread reconnecting to the VM will remove the persistent VM from the list.
The easy fix is to simply switch the order of steps 2 & 3.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/qemu/qemu_driver.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index c613967..9c3daad 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -816,8 +816,6 @@ qemuStateInitialize(bool privileged,
conn = virConnectOpen(cfg->uri);
- qemuProcessReconnectAll(conn, qemu_driver); - /* Then inactive persistent configs */ if (virDomainObjListLoadAllConfigs(qemu_driver->domains, cfg->configDir, @@ -828,6 +826,7 @@ qemuStateInitialize(bool privileged, NULL, NULL) < 0) goto error;
+ qemuProcessReconnectAll(conn, qemu_driver);
virDomainObjListForEach(qemu_driver->domains, qemuDomainSnapshotLoad,
I tried testing this patch to see if it would fix:
https://bugzilla.redhat.com/show_bug.cgi?id=1015246
from current master I did:
git revert a924d9d083c215df6044387057c501d9aa338b96 reproduce the bug git am <your-patch>
But the daemon won't even start up after your patch is built:
(gdb) bt #0 qemuMonitorOpen (vm=vm@entry=0x7fffd4211090, config=0x0, json=false, cb=cb@entry=0x7fffddcae720 <monitorCallbacks>, opaque=opaque@entry=0x7fffd419b840) at qemu/qemu_monitor.c:852 #1 0x00007fffdda1083d in qemuConnectMonitor ( driver=driver@entry=0x7fffd419b840, vm=vm@entry=0x7fffd4211090, logfd=logfd@entry=-1) at qemu/qemu_process.c:1412 #2 0x00007fffdda1685a in qemuProcessReconnect ( opaque=opaque@entry=0x7fffd422fef0) at qemu/qemu_process.c:3086 #3 0x00007ffff7528dce in virThreadHelper (data=<optimized out>) at util/virthreadpthread.c:161 #4 0x00007ffff4782f33 in start_thread (arg=0x7fffcb7fe700) at pthread_create.c:309 #5 0x00007ffff40a9ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
What is this trace showing ? or rather what is the error reported when it fails to start ?
Sorry for not being clear: The daemon crashes, that's the backtrace. - Cole

On Mon, Oct 28, 2013 at 01:08:45PM -0400, Cole Robinson wrote:
On 10/28/2013 01:06 PM, Daniel P. Berrange wrote:
On Mon, Oct 28, 2013 at 01:03:49PM -0400, Cole Robinson wrote:
On 10/28/2013 07:52 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
The following sequence
1. Define a persistent QMEU guest 2. Start the QEMU guest 3. Stop libvirtd 4. Kill the QEMU process 5. Start libvirtd 6. List persistent guets
At the last step, the previously running persistent guest will be missing. This is because of a race condition in the QEMU driver startup code. It does
1. Load all VM state files 2. Spawn thread to reconnect to each VM 3. Load all VM config files
Only at the end of step 3, does the 'virDomainObjPtr' get marked as "persistent". There is therefore a window where the thread reconnecting to the VM will remove the persistent VM from the list.
The easy fix is to simply switch the order of steps 2 & 3.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/qemu/qemu_driver.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index c613967..9c3daad 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -816,8 +816,6 @@ qemuStateInitialize(bool privileged,
conn = virConnectOpen(cfg->uri);
- qemuProcessReconnectAll(conn, qemu_driver); - /* Then inactive persistent configs */ if (virDomainObjListLoadAllConfigs(qemu_driver->domains, cfg->configDir, @@ -828,6 +826,7 @@ qemuStateInitialize(bool privileged, NULL, NULL) < 0) goto error;
+ qemuProcessReconnectAll(conn, qemu_driver);
virDomainObjListForEach(qemu_driver->domains, qemuDomainSnapshotLoad,
I tried testing this patch to see if it would fix:
https://bugzilla.redhat.com/show_bug.cgi?id=1015246
from current master I did:
git revert a924d9d083c215df6044387057c501d9aa338b96 reproduce the bug git am <your-patch>
But the daemon won't even start up after your patch is built:
(gdb) bt #0 qemuMonitorOpen (vm=vm@entry=0x7fffd4211090, config=0x0, json=false, cb=cb@entry=0x7fffddcae720 <monitorCallbacks>, opaque=opaque@entry=0x7fffd419b840) at qemu/qemu_monitor.c:852
Sorry for not being clear: The daemon crashes, that's the backtrace.
Hmm config is NULL - does the state XML files not include the monitor info perhaps ? Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 10/28/2013 01:14 PM, Daniel P. Berrange wrote:
On Mon, Oct 28, 2013 at 01:08:45PM -0400, Cole Robinson wrote:
On 10/28/2013 01:06 PM, Daniel P. Berrange wrote:
On Mon, Oct 28, 2013 at 01:03:49PM -0400, Cole Robinson wrote:
On 10/28/2013 07:52 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
The following sequence
1. Define a persistent QMEU guest 2. Start the QEMU guest 3. Stop libvirtd 4. Kill the QEMU process 5. Start libvirtd 6. List persistent guets
At the last step, the previously running persistent guest will be missing. This is because of a race condition in the QEMU driver startup code. It does
1. Load all VM state files 2. Spawn thread to reconnect to each VM 3. Load all VM config files
Only at the end of step 3, does the 'virDomainObjPtr' get marked as "persistent". There is therefore a window where the thread reconnecting to the VM will remove the persistent VM from the list.
The easy fix is to simply switch the order of steps 2 & 3.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/qemu/qemu_driver.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index c613967..9c3daad 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -816,8 +816,6 @@ qemuStateInitialize(bool privileged,
conn = virConnectOpen(cfg->uri);
- qemuProcessReconnectAll(conn, qemu_driver); - /* Then inactive persistent configs */ if (virDomainObjListLoadAllConfigs(qemu_driver->domains, cfg->configDir, @@ -828,6 +826,7 @@ qemuStateInitialize(bool privileged, NULL, NULL) < 0) goto error;
+ qemuProcessReconnectAll(conn, qemu_driver);
virDomainObjListForEach(qemu_driver->domains, qemuDomainSnapshotLoad,
I tried testing this patch to see if it would fix:
https://bugzilla.redhat.com/show_bug.cgi?id=1015246
from current master I did:
git revert a924d9d083c215df6044387057c501d9aa338b96 reproduce the bug git am <your-patch>
But the daemon won't even start up after your patch is built:
(gdb) bt #0 qemuMonitorOpen (vm=vm@entry=0x7fffd4211090, config=0x0, json=false, cb=cb@entry=0x7fffddcae720 <monitorCallbacks>, opaque=opaque@entry=0x7fffd419b840) at qemu/qemu_monitor.c:852
Sorry for not being clear: The daemon crashes, that's the backtrace.
Hmm config is NULL - does the state XML files not include the monitor info perhaps ?
I see: pidfile for busted VM in /var/run/libvirt/qemu nothing in /var/cache/libvirt/qemu no state that I can see in /var/lib/libvirt/qemu But I'm not sure where it's supposed to be stored. FWIW reproducing this state was pretty simple: revert a924d9d083c215df6044387057c501d9aa338b96, edit an existing x86 guest to remove all <video> and <graphics> devices, start the guest, libvirtd crashes. Thanks, Cole

On Mon, Oct 28, 2013 at 01:22:39PM -0400, Cole Robinson wrote:
On 10/28/2013 01:14 PM, Daniel P. Berrange wrote:
On Mon, Oct 28, 2013 at 01:08:45PM -0400, Cole Robinson wrote:
On 10/28/2013 01:06 PM, Daniel P. Berrange wrote:
On Mon, Oct 28, 2013 at 01:03:49PM -0400, Cole Robinson wrote:
On 10/28/2013 07:52 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
The following sequence
1. Define a persistent QMEU guest 2. Start the QEMU guest 3. Stop libvirtd 4. Kill the QEMU process 5. Start libvirtd 6. List persistent guets
At the last step, the previously running persistent guest will be missing. This is because of a race condition in the QEMU driver startup code. It does
1. Load all VM state files 2. Spawn thread to reconnect to each VM 3. Load all VM config files
Only at the end of step 3, does the 'virDomainObjPtr' get marked as "persistent". There is therefore a window where the thread reconnecting to the VM will remove the persistent VM from the list.
The easy fix is to simply switch the order of steps 2 & 3.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/qemu/qemu_driver.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index c613967..9c3daad 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -816,8 +816,6 @@ qemuStateInitialize(bool privileged,
conn = virConnectOpen(cfg->uri);
- qemuProcessReconnectAll(conn, qemu_driver); - /* Then inactive persistent configs */ if (virDomainObjListLoadAllConfigs(qemu_driver->domains, cfg->configDir, @@ -828,6 +826,7 @@ qemuStateInitialize(bool privileged, NULL, NULL) < 0) goto error;
+ qemuProcessReconnectAll(conn, qemu_driver);
virDomainObjListForEach(qemu_driver->domains, qemuDomainSnapshotLoad,
I tried testing this patch to see if it would fix:
https://bugzilla.redhat.com/show_bug.cgi?id=1015246
from current master I did:
git revert a924d9d083c215df6044387057c501d9aa338b96 reproduce the bug git am <your-patch>
But the daemon won't even start up after your patch is built:
(gdb) bt #0 qemuMonitorOpen (vm=vm@entry=0x7fffd4211090, config=0x0, json=false, cb=cb@entry=0x7fffddcae720 <monitorCallbacks>, opaque=opaque@entry=0x7fffd419b840) at qemu/qemu_monitor.c:852
Sorry for not being clear: The daemon crashes, that's the backtrace.
Hmm config is NULL - does the state XML files not include the monitor info perhaps ?
I see:
pidfile for busted VM in /var/run/libvirt/qemu nothing in /var/cache/libvirt/qemu no state that I can see in /var/lib/libvirt/qemu
But I'm not sure where it's supposed to be stored.
FWIW reproducing this state was pretty simple: revert a924d9d083c215df6044387057c501d9aa338b96, edit an existing x86 guest to remove all <video> and <graphics> devices, start the guest, libvirtd crashes.
Ok, I believe you probably have SELinux disabled on your machine or in libvirtd. With SELinux enabled you hit another bug first 2013-10-29 13:50:11.711+0000: 17579: error : qemuConnectMonitor:1401 : Failed to set security context for monitor for rhel6x86_64 which prevents hitting the crash you report. The fix is the same in both cases - we must skip VMs with PID of zero. I've sent a v2 patch. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 10/29/2013 10:25 AM, Daniel P. Berrange wrote:
On Mon, Oct 28, 2013 at 01:22:39PM -0400, Cole Robinson wrote:
On 10/28/2013 01:14 PM, Daniel P. Berrange wrote:
On Mon, Oct 28, 2013 at 01:08:45PM -0400, Cole Robinson wrote:
On 10/28/2013 01:06 PM, Daniel P. Berrange wrote:
On Mon, Oct 28, 2013 at 01:03:49PM -0400, Cole Robinson wrote:
On 10/28/2013 07:52 AM, Daniel P. Berrange wrote: > From: "Daniel P. Berrange" <berrange@redhat.com> > > The following sequence > > 1. Define a persistent QMEU guest > 2. Start the QEMU guest > 3. Stop libvirtd > 4. Kill the QEMU process > 5. Start libvirtd > 6. List persistent guets > > At the last step, the previously running persistent guest > will be missing. This is because of a race condition in the > QEMU driver startup code. It does > > 1. Load all VM state files > 2. Spawn thread to reconnect to each VM > 3. Load all VM config files > > Only at the end of step 3, does the 'virDomainObjPtr' get > marked as "persistent". There is therefore a window where > the thread reconnecting to the VM will remove the persistent > VM from the list. > > The easy fix is to simply switch the order of steps 2 & 3. > > Signed-off-by: Daniel P. Berrange <berrange@redhat.com> > --- > src/qemu/qemu_driver.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c > index c613967..9c3daad 100644 > --- a/src/qemu/qemu_driver.c > +++ b/src/qemu/qemu_driver.c > @@ -816,8 +816,6 @@ qemuStateInitialize(bool privileged, > > conn = virConnectOpen(cfg->uri); > > - qemuProcessReconnectAll(conn, qemu_driver); > - > /* Then inactive persistent configs */ > if (virDomainObjListLoadAllConfigs(qemu_driver->domains, > cfg->configDir, > @@ -828,6 +826,7 @@ qemuStateInitialize(bool privileged, > NULL, NULL) < 0) > goto error; > > + qemuProcessReconnectAll(conn, qemu_driver); > > virDomainObjListForEach(qemu_driver->domains, > qemuDomainSnapshotLoad, >
I tried testing this patch to see if it would fix:
https://bugzilla.redhat.com/show_bug.cgi?id=1015246
from current master I did:
git revert a924d9d083c215df6044387057c501d9aa338b96 reproduce the bug git am <your-patch>
But the daemon won't even start up after your patch is built:
(gdb) bt #0 qemuMonitorOpen (vm=vm@entry=0x7fffd4211090, config=0x0, json=false, cb=cb@entry=0x7fffddcae720 <monitorCallbacks>, opaque=opaque@entry=0x7fffd419b840) at qemu/qemu_monitor.c:852
Sorry for not being clear: The daemon crashes, that's the backtrace.
Hmm config is NULL - does the state XML files not include the monitor info perhaps ?
I see:
pidfile for busted VM in /var/run/libvirt/qemu nothing in /var/cache/libvirt/qemu no state that I can see in /var/lib/libvirt/qemu
But I'm not sure where it's supposed to be stored.
FWIW reproducing this state was pretty simple: revert a924d9d083c215df6044387057c501d9aa338b96, edit an existing x86 guest to remove all <video> and <graphics> devices, start the guest, libvirtd crashes.
Ok, I believe you probably have SELinux disabled on your machine or in libvirtd. With SELinux enabled you hit another bug first
2013-10-29 13:50:11.711+0000: 17579: error : qemuConnectMonitor:1401 : Failed to set security context for monitor for rhel6x86_64
which prevents hitting the crash you report. The fix is the same in both cases - we must skip VMs with PID of zero. I've sent a v2 patch.
Hmm, selinux is permissive here but not disabled. But I'll try your patches and report back. Thanks, Cole

On 10/29/2013 11:22 AM, Cole Robinson wrote:
On 10/29/2013 10:25 AM, Daniel P. Berrange wrote:
On Mon, Oct 28, 2013 at 01:22:39PM -0400, Cole Robinson wrote:
On 10/28/2013 01:14 PM, Daniel P. Berrange wrote:
On Mon, Oct 28, 2013 at 01:08:45PM -0400, Cole Robinson wrote:
On 10/28/2013 01:06 PM, Daniel P. Berrange wrote:
On Mon, Oct 28, 2013 at 01:03:49PM -0400, Cole Robinson wrote: > On 10/28/2013 07:52 AM, Daniel P. Berrange wrote: >> From: "Daniel P. Berrange" <berrange@redhat.com> >> >> The following sequence >> >> 1. Define a persistent QMEU guest >> 2. Start the QEMU guest >> 3. Stop libvirtd >> 4. Kill the QEMU process >> 5. Start libvirtd >> 6. List persistent guets >> >> At the last step, the previously running persistent guest >> will be missing. This is because of a race condition in the >> QEMU driver startup code. It does >> >> 1. Load all VM state files >> 2. Spawn thread to reconnect to each VM >> 3. Load all VM config files >> >> Only at the end of step 3, does the 'virDomainObjPtr' get >> marked as "persistent". There is therefore a window where >> the thread reconnecting to the VM will remove the persistent >> VM from the list. >> >> The easy fix is to simply switch the order of steps 2 & 3. >> >> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> >> --- >> src/qemu/qemu_driver.c | 3 +-- >> 1 file changed, 1 insertion(+), 2 deletions(-) >> >> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c >> index c613967..9c3daad 100644 >> --- a/src/qemu/qemu_driver.c >> +++ b/src/qemu/qemu_driver.c >> @@ -816,8 +816,6 @@ qemuStateInitialize(bool privileged, >> >> conn = virConnectOpen(cfg->uri); >> >> - qemuProcessReconnectAll(conn, qemu_driver); >> - >> /* Then inactive persistent configs */ >> if (virDomainObjListLoadAllConfigs(qemu_driver->domains, >> cfg->configDir, >> @@ -828,6 +826,7 @@ qemuStateInitialize(bool privileged, >> NULL, NULL) < 0) >> goto error; >> >> + qemuProcessReconnectAll(conn, qemu_driver); >> >> virDomainObjListForEach(qemu_driver->domains, >> qemuDomainSnapshotLoad, >> > > I tried testing this patch to see if it would fix: > > https://bugzilla.redhat.com/show_bug.cgi?id=1015246 > > from current master I did: > > git revert a924d9d083c215df6044387057c501d9aa338b96 > reproduce the bug > git am <your-patch> > > But the daemon won't even start up after your patch is built: > > (gdb) bt > #0 qemuMonitorOpen (vm=vm@entry=0x7fffd4211090, config=0x0, json=false, > cb=cb@entry=0x7fffddcae720 <monitorCallbacks>, > opaque=opaque@entry=0x7fffd419b840) at qemu/qemu_monitor.c:852
Sorry for not being clear: The daemon crashes, that's the backtrace.
Hmm config is NULL - does the state XML files not include the monitor info perhaps ?
I see:
pidfile for busted VM in /var/run/libvirt/qemu nothing in /var/cache/libvirt/qemu no state that I can see in /var/lib/libvirt/qemu
But I'm not sure where it's supposed to be stored.
FWIW reproducing this state was pretty simple: revert a924d9d083c215df6044387057c501d9aa338b96, edit an existing x86 guest to remove all <video> and <graphics> devices, start the guest, libvirtd crashes.
Ok, I believe you probably have SELinux disabled on your machine or in libvirtd. With SELinux enabled you hit another bug first
2013-10-29 13:50:11.711+0000: 17579: error : qemuConnectMonitor:1401 : Failed to set security context for monitor for rhel6x86_64
which prevents hitting the crash you report. The fix is the same in both cases - we must skip VMs with PID of zero. I've sent a v2 patch.
Hmm, selinux is permissive here but not disabled. But I'll try your patches and report back.
Applied both patches, the original bug report and the crash I reported here are both fixed. Thanks Dan! - Cole

On Thu, Oct 31, 2013 at 10:23:25AM -0400, Cole Robinson wrote:
On 10/29/2013 11:22 AM, Cole Robinson wrote:
On 10/29/2013 10:25 AM, Daniel P. Berrange wrote:
On Mon, Oct 28, 2013 at 01:22:39PM -0400, Cole Robinson wrote:
On 10/28/2013 01:14 PM, Daniel P. Berrange wrote:
On Mon, Oct 28, 2013 at 01:08:45PM -0400, Cole Robinson wrote:
On 10/28/2013 01:06 PM, Daniel P. Berrange wrote: > On Mon, Oct 28, 2013 at 01:03:49PM -0400, Cole Robinson wrote: >> On 10/28/2013 07:52 AM, Daniel P. Berrange wrote: >>> From: "Daniel P. Berrange" <berrange@redhat.com> >>> >>> The following sequence >>> >>> 1. Define a persistent QMEU guest >>> 2. Start the QEMU guest >>> 3. Stop libvirtd >>> 4. Kill the QEMU process >>> 5. Start libvirtd >>> 6. List persistent guets >>> >>> At the last step, the previously running persistent guest >>> will be missing. This is because of a race condition in the >>> QEMU driver startup code. It does >>> >>> 1. Load all VM state files >>> 2. Spawn thread to reconnect to each VM >>> 3. Load all VM config files >>> >>> Only at the end of step 3, does the 'virDomainObjPtr' get >>> marked as "persistent". There is therefore a window where >>> the thread reconnecting to the VM will remove the persistent >>> VM from the list. >>> >>> The easy fix is to simply switch the order of steps 2 & 3. >>> >>> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> >>> --- >>> src/qemu/qemu_driver.c | 3 +-- >>> 1 file changed, 1 insertion(+), 2 deletions(-) >>> >>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c >>> index c613967..9c3daad 100644 >>> --- a/src/qemu/qemu_driver.c >>> +++ b/src/qemu/qemu_driver.c >>> @@ -816,8 +816,6 @@ qemuStateInitialize(bool privileged, >>> >>> conn = virConnectOpen(cfg->uri); >>> >>> - qemuProcessReconnectAll(conn, qemu_driver); >>> - >>> /* Then inactive persistent configs */ >>> if (virDomainObjListLoadAllConfigs(qemu_driver->domains, >>> cfg->configDir, >>> @@ -828,6 +826,7 @@ qemuStateInitialize(bool privileged, >>> NULL, NULL) < 0) >>> goto error; >>> >>> + qemuProcessReconnectAll(conn, qemu_driver); >>> >>> virDomainObjListForEach(qemu_driver->domains, >>> qemuDomainSnapshotLoad, >>> >> >> I tried testing this patch to see if it would fix: >> >> https://bugzilla.redhat.com/show_bug.cgi?id=1015246 >> >> from current master I did: >> >> git revert a924d9d083c215df6044387057c501d9aa338b96 >> reproduce the bug >> git am <your-patch> >> >> But the daemon won't even start up after your patch is built: >> >> (gdb) bt >> #0 qemuMonitorOpen (vm=vm@entry=0x7fffd4211090, config=0x0, json=false, >> cb=cb@entry=0x7fffddcae720 <monitorCallbacks>, >> opaque=opaque@entry=0x7fffd419b840) at qemu/qemu_monitor.c:852
Sorry for not being clear: The daemon crashes, that's the backtrace.
Hmm config is NULL - does the state XML files not include the monitor info perhaps ?
I see:
pidfile for busted VM in /var/run/libvirt/qemu nothing in /var/cache/libvirt/qemu no state that I can see in /var/lib/libvirt/qemu
But I'm not sure where it's supposed to be stored.
FWIW reproducing this state was pretty simple: revert a924d9d083c215df6044387057c501d9aa338b96, edit an existing x86 guest to remove all <video> and <graphics> devices, start the guest, libvirtd crashes.
Ok, I believe you probably have SELinux disabled on your machine or in libvirtd. With SELinux enabled you hit another bug first
2013-10-29 13:50:11.711+0000: 17579: error : qemuConnectMonitor:1401 : Failed to set security context for monitor for rhel6x86_64
which prevents hitting the crash you report. The fix is the same in both cases - we must skip VMs with PID of zero. I've sent a v2 patch.
Hmm, selinux is permissive here but not disabled. But I'll try your patches and report back.
Applied both patches, the original bug report and the crash I reported here are both fixed. Thanks Dan!
Cool, thanks for confirming. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Mon, Oct 28, 2013 at 11:52:50AM +0000, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
The following sequence
1. Define a persistent QMEU guest 2. Start the QEMU guest 3. Stop libvirtd 4. Kill the QEMU process 5. Start libvirtd 6. List persistent guets
At the last step, the previously running persistent guest will be missing. This is because of a race condition in the QEMU driver startup code. It does
1. Load all VM state files 2. Spawn thread to reconnect to each VM 3. Load all VM config files
Only at the end of step 3, does the 'virDomainObjPtr' get marked as "persistent". There is therefore a window where the thread reconnecting to the VM will remove the persistent VM from the list.
The easy fix is to simply switch the order of steps 2 & 3.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/qemu/qemu_driver.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index c613967..9c3daad 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -816,8 +816,6 @@ qemuStateInitialize(bool privileged,
conn = virConnectOpen(cfg->uri);
- qemuProcessReconnectAll(conn, qemu_driver); - /* Then inactive persistent configs */ if (virDomainObjListLoadAllConfigs(qemu_driver->domains, cfg->configDir, @@ -828,6 +826,7 @@ qemuStateInitialize(bool privileged, NULL, NULL) < 0) goto error;
+ qemuProcessReconnectAll(conn, qemu_driver);
virDomainObjListForEach(qemu_driver->domains, qemuDomainSnapshotLoad,
Self-NACK. The qemuProcessReconnectAll() method assumes that *only* live configs have been loaded. Reordering this, makes it try to connect to all VMs, even those which were never live. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (3)
-
Cole Robinson
-
Daniel P. Berrange
-
John Ferlan