On Tue, Apr 15, 2014 at 07:48:52PM -0400, John Ferlan wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=789766
This patch addresses a possible misperception that libvirt is hung
during pool startup/refresh as well as virt-manager startup as it
needs to start/refresh storage pools found.
The problem is that the 'udevadm settle' command will wait for a
default of two minutes prior to returning a failure. For code paths
such as pool starts or refreshes the timeout doesn't necessarily cause
failures - it may just not find anything. Additionally, since no error
message or any other indication is proviced, blame is placed up libvirt
for the problem; however, use of debug tools shows libvirt waiting in
udevadm. NB: This timeout may be longer for iSCSI and SCSI pools since
both seem to run through settle paths twice under certain conditions.
Based on feedback described here:
http://lists.freedesktop.org/archives/systemd-devel/2013-July/011845.html
this timeout is part of the architecture of udevadm settle. Although the
two minute timeout expires, the code doesn't check status. Callers can
decide to continue regardless of whether settle is still handling events.
FWIW, Kay has just done a fix for the timeout problems we're hitting
with udev when containers are in use
commit 9ea28c55a2488e6cd4a44ac5786f12b71ad5bc9f
Author: Kay Sievers <kay(a)vrfy.org>
Date: Sat Apr 12 22:35:50 2014 -0700
udev: remove seqnum API and all assumptions about seqnums
The way the kernel namespaces have been implemented breaks assumptions
udev made regarding uevent sequence numbers. Creating devices in a
namespace "steals" uevents and its sequence numbers from the host. It
confuses the "udevadmin settle" logic, which might block until util a
timeout is reached, even when no uevent is pending.
Remove any assumptions about sequence numbers and deprecate libudev's
API exposing these numbers; none of that can reliably be used anymore
when namespaces are involved.
so with that fix in place, IMHO, what libvirt is doing already
today should be fine.
For paths waiting for something specific, failure will be
inevitable;
however, for paths such as perhaps pool refreshes waiting for "anything"
to be present within the directory space udevadm manages having a
shorter timeout may allow them to gather some data quicker/sooner.
It's always possible to refresh the data again at some later point.
I'm pretty wary of saying that allowing the shorter times for the
pool refresh is safe. eg if we're about to boot a guest from ISCSI
and start an iSCSI pool, there could be 100's of LUNs there. If we
only wait 5 seconds, there's a reasonable chance that the LUN we need
for the VM is not going to be ready in time. I think it is a violation
of expectations if we say apps have to refresh the pool multiple times
for it to see the LUNs that we know to be there. It was problems like
this that motivated the addition of the udevadm settle calls all those
years ago.
For paths that are waiting for something specific, such as finding a
node device, creating a backing disk volume, or removing a logical
volume waiting longer for settle to work through its events will be
more important.
So given this, I modified the calls to virFileWaitForDevices() pass a
boolean 'quiesce' indicating whether the caller should wait for the
default timeout or to wait a shorter time. I chose 5 seconds for the
shorter time, although there was no real logic/reason for it.
Additionally, for either case use VIR_WARN to indicate that the command
to settle the database returned a failure.
---
Note that I did try to use a timeout value of 0 which as I was reading
things would somehow indicate that something needed to be flushed; however,
for the veth issue described here:
http://lists.freedesktop.org/archives/systemd-devel/2013-July/011829.html
the return value of a udevadm settle --timeout 0 was always 0; whereas,
if using --timeout 1 (or more) the return value was 1.
Callers to virFileWaitForDevices() and my reasoning/logic behind why
I chose to wait for quiesce or not...
node_device_driver
-> Is trying to find node devices - wait long time
storage_backend_*.c: (virStorageBackend*() APIs)
disk: DiskRefreshPool
-> Will read directory path - wait short time
disk: DiskCreateVol
-> Requires new disk to be present - wait long time
iscsi: ISCSIGetHostNumber
-> called by ISCSIFindLUs which is called by RefreshPool before
opening path and scanning for devices - wait short time
logical: LogicalRefreshPool
-> Runs /usr/sbin/vgs to get report of volume groups - wait short time
logical: LogicalDeleteVol
-> Requires synchronization prior to deletion of logical volume -
wait long time
mpath: MpathRefreshPool
-> Use DM_DEVICE_LIST to get list of devices - wait short time
scsi: SCSIFindLUs
-> called by SCSIRefreshPool and ISCSIFindLUs (not sure why) prior
to opening path and scanning for available devices - wait short time
scsi: createVport
-> called by SCSIStartPool after the port is managed - there's no
search for any devices and nothing depends on or checks if the
port was created (although next call in sequence would be to
refresh the pool, which would wait again) - wait short time
Signed-off-by: John Ferlan <jferlan(a)redhat.com>
---
src/node_device/node_device_driver.c | 4 ++--
src/storage/storage_backend_disk.c | 4 ++--
src/storage/storage_backend_iscsi.c | 2 +-
src/storage/storage_backend_logical.c | 4 ++--
src/storage/storage_backend_mpath.c | 2 +-
src/storage/storage_backend_scsi.c | 4 ++--
src/util/virfile.h | 2 +-
src/util/virutil.c | 45 +++++++++++++++++++++++++++--------
8 files changed, 46 insertions(+), 21 deletions(-)
I'm afraid I just don't think this change is safe - I think it will
cause volumes to go missing from pools as we had before we used
udevadm settle.
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|