On 2012年04月24日 03:47, Guido Günther wrote:
>Hi,
>On Sun, Apr 22, 2012 at 02:41:54PM -0400, Jim Paris wrote:
>>Hi,
>>
>>http://bugs.debian.org/663931 is a bug I'm hitting, where virt-manager
>>times out on the initial connection to libvirt.
>
>I reassigned the bug back to libvirt. I still wonder what triggers this
>though for some users but not for others?
>Cheers,
> -- Guido
>
>>
>>The basic problem is that, while checking storage volumes,
>>virt-manager causes libvirt to call "udevadm settle". There's an
>>interaction where libvirt's earlier use of network namespaces (to probe
>>LXC features) had caused some uevents to be sent that get filtered out
>>before they reach udev. This confuses "udevadm settle" a bit, and so
>>it sits there waiting for a 2-3 minute built-in timeout before returning.
>>Eventually libvirtd prints:
>> 2012-04-22 18:22:18.678+0000: 30503: warning : virKeepAliveTimer:182 : No
response from client 0x7feec4003630 after 5 keepalive messages in 30 seconds
>>and virt-manager prints:
>> 2012-04-22 18:22:18.931+0000: 30647: warning : virKeepAliveSend:128 : Failed
to send keepalive response to client 0x25004e0
>>and the connection gets dropped.
>>
>>One workaround could be to specify a shorter timeout when doing the
>>settle. The patch appended below allows virt-manager to work,
>>although the connection still has to wait for the 10 second timeout
>>before it succeeds. I don't know what a better solution would be,
>>though. It seems the udevadm behavior might not be considered a bug
>>from the udev/kernel point of view:
>>
https://lkml.org/lkml/2012/4/22/60
>>
>>I'm using Linux 3.2.14 with libvirt 0.9.11. You can trigger the
>>udevadm issue using a program I posted at the Debian bug report link
>>above.
>>
>>-jim
>>
>>> From 17e5b9ebab76acb0d711e8bc308023372fbc4180 Mon Sep 17 00:00:00 2001
>>From: Jim Paris<jim(a)jtan.com>
>>Date: Sun, 22 Apr 2012 14:35:47 -0400
>>Subject: [PATCH] shorten udevadmin settle timeout
>>
>>Otherwise, udevadmin settle can take so long that connections from
>>e.g. virt-manager will get closed.
>>---
>> src/util/util.c | 4 ++--
>> 1 files changed, 2 insertions(+), 2 deletions(-)
>>
>>diff --git a/src/util/util.c b/src/util/util.c
>>index 6e041d6..dfe458e 100644
>>--- a/src/util/util.c
>>+++ b/src/util/util.c
>>@@ -2593,9 +2593,9 @@ virFileFindMountPoint(const char *type ATTRIBUTE_UNUSED)
>> void virFileWaitForDevices(void)
>> {
>> # ifdef UDEVADM
>>- const char *const settleprog[] = { UDEVADM, "settle", NULL };
>>+ const char *const settleprog[] = { UDEVADM, "settle",
"--timeout", "10", NULL };
Though I don't have a good idea to fix it either, I guess this
change could cause "lvremove" to fail again for the udev race.
See BZs:
https://bugzilla.redhat.com/show_bug.cgi?id=702260
https://bugzilla.redhat.com/show_bug.cgi?id=570359
It seems that those bugs were caused by something like
1. open(lv, O_RDWR)
2. close(lv)
3. system("lvremove ...")
where udev would fire off a command between 2 and 3 that caused 3 to
fail. Adding "udevadm settle" as step 2.5 is a good way to wait for
that command to finish, but:
- it doesn't necessarily fix the issue; something could easily re-open
the device between 2.5 and 3 and cause the same failure.
- the race condition sounds like it was a short window, and sometimes
the original sequence would still work even without the settle.
That would suggest to me that a timeout of 10s is still plenty long.
A few thoughts:
- For lvremove: can we try a short timeout (3 seconds), then if the
lvremove still fails, try again with the default udevadm timeout
(120 seconds)?
- Even in that case, we need to fix libvirtd to not kill the
connection after 30 seconds when it's libvirtd's fault that the
connection is blocked for so long anyway.
- When connecting with virt-manager, is the udevadm settle really
necessary? We're not calling lvremove.
Thanks,
-jim