[libvirt] [PATCH] Shorten the udevadm settle timeout for refresh/start pool cases

https://bugzilla.redhat.com/show_bug.cgi?id=789766 This patch addresses a possible misperception that libvirt is hung during pool startup/refresh as well as virt-manager startup as it needs to start/refresh storage pools found. The problem is that the 'udevadm settle' command will wait for a default of two minutes prior to returning a failure. For code paths such as pool starts or refreshes the timeout doesn't necessarily cause failures - it may just not find anything. Additionally, since no error message or any other indication is proviced, blame is placed up libvirt for the problem; however, use of debug tools shows libvirt waiting in udevadm. NB: This timeout may be longer for iSCSI and SCSI pools since both seem to run through settle paths twice under certain conditions. Based on feedback described here: http://lists.freedesktop.org/archives/systemd-devel/2013-July/011845.html this timeout is part of the architecture of udevadm settle. Although the two minute timeout expires, the code doesn't check status. Callers can decide to continue regardless of whether settle is still handling events. For paths waiting for something specific, failure will be inevitable; however, for paths such as perhaps pool refreshes waiting for "anything" to be present within the directory space udevadm manages having a shorter timeout may allow them to gather some data quicker/sooner. It's always possible to refresh the data again at some later point. For paths that are waiting for something specific, such as finding a node device, creating a backing disk volume, or removing a logical volume waiting longer for settle to work through its events will be more important. So given this, I modified the calls to virFileWaitForDevices() pass a boolean 'quiesce' indicating whether the caller should wait for the default timeout or to wait a shorter time. I chose 5 seconds for the shorter time, although there was no real logic/reason for it. Additionally, for either case use VIR_WARN to indicate that the command to settle the database returned a failure. --- Note that I did try to use a timeout value of 0 which as I was reading things would somehow indicate that something needed to be flushed; however, for the veth issue described here: http://lists.freedesktop.org/archives/systemd-devel/2013-July/011829.html the return value of a udevadm settle --timeout 0 was always 0; whereas, if using --timeout 1 (or more) the return value was 1. Callers to virFileWaitForDevices() and my reasoning/logic behind why I chose to wait for quiesce or not... node_device_driver -> Is trying to find node devices - wait long time storage_backend_*.c: (virStorageBackend*() APIs) disk: DiskRefreshPool -> Will read directory path - wait short time disk: DiskCreateVol -> Requires new disk to be present - wait long time iscsi: ISCSIGetHostNumber -> called by ISCSIFindLUs which is called by RefreshPool before opening path and scanning for devices - wait short time logical: LogicalRefreshPool -> Runs /usr/sbin/vgs to get report of volume groups - wait short time logical: LogicalDeleteVol -> Requires synchronization prior to deletion of logical volume - wait long time mpath: MpathRefreshPool -> Use DM_DEVICE_LIST to get list of devices - wait short time scsi: SCSIFindLUs -> called by SCSIRefreshPool and ISCSIFindLUs (not sure why) prior to opening path and scanning for available devices - wait short time scsi: createVport -> called by SCSIStartPool after the port is managed - there's no search for any devices and nothing depends on or checks if the port was created (although next call in sequence would be to refresh the pool, which would wait again) - wait short time Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/node_device/node_device_driver.c | 4 ++-- src/storage/storage_backend_disk.c | 4 ++-- src/storage/storage_backend_iscsi.c | 2 +- src/storage/storage_backend_logical.c | 4 ++-- src/storage/storage_backend_mpath.c | 2 +- src/storage/storage_backend_scsi.c | 4 ++-- src/util/virfile.h | 2 +- src/util/virutil.c | 45 +++++++++++++++++++++++++++-------- 8 files changed, 46 insertions(+), 21 deletions(-) diff --git a/src/node_device/node_device_driver.c b/src/node_device/node_device_driver.c index 6906463..fda4104 100644 --- a/src/node_device/node_device_driver.c +++ b/src/node_device/node_device_driver.c @@ -458,7 +458,7 @@ get_time(time_t *t) * possible for udev not to realize that it has work to do before we * get here. We thus keep trying to find the new device we just * created for up to LINUX_NEW_DEVICE_WAIT_TIME. Note that udev's - * default settle time is 180 seconds, so once udev realizes that it + * default settle time is 120 seconds, so once udev realizes that it * has work to do, it might take that long for the udev wait to * return. Thus the total maximum time for this function to return is * the udev settle time plus LINUX_NEW_DEVICE_WAIT_TIME. @@ -485,7 +485,7 @@ find_new_device(virConnectPtr conn, const char *wwnn, const char *wwpn) while ((now - start) < LINUX_NEW_DEVICE_WAIT_TIME) { - virFileWaitForDevices(); + virFileWaitForDevices(true); dev = nodeDeviceLookupSCSIHostByWWN(conn, wwnn, wwpn, 0); diff --git a/src/storage/storage_backend_disk.c b/src/storage/storage_backend_disk.c index 9cebcca..8678a24 100644 --- a/src/storage/storage_backend_disk.c +++ b/src/storage/storage_backend_disk.c @@ -323,7 +323,7 @@ virStorageBackendDiskRefreshPool(virConnectPtr conn ATTRIBUTE_UNUSED, VIR_FREE(pool->def->source.devices[0].freeExtents); pool->def->source.devices[0].nfreeExtent = 0; - virFileWaitForDevices(); + virFileWaitForDevices(false); if (!virFileExists(pool->def->source.devices[0].path)) { virReportError(VIR_ERR_INVALID_ARG, @@ -660,7 +660,7 @@ virStorageBackendDiskCreateVol(virConnectPtr conn ATTRIBUTE_UNUSED, goto cleanup; /* wait for device node to show up */ - virFileWaitForDevices(); + virFileWaitForDevices(true); /* Blow away free extent info, as we're about to re-populate it */ VIR_FREE(pool->def->source.devices[0].freeExtents); diff --git a/src/storage/storage_backend_iscsi.c b/src/storage/storage_backend_iscsi.c index 881159b..5680bcb 100644 --- a/src/storage/storage_backend_iscsi.c +++ b/src/storage/storage_backend_iscsi.c @@ -96,7 +96,7 @@ virStorageBackendISCSIGetHostNumber(const char *sysfs_path, VIR_DEBUG("Finding host number from '%s'", sysfs_path); - virFileWaitForDevices(); + virFileWaitForDevices(false); sysdir = opendir(sysfs_path); diff --git a/src/storage/storage_backend_logical.c b/src/storage/storage_backend_logical.c index ed3a012..868fdce 100644 --- a/src/storage/storage_backend_logical.c +++ b/src/storage/storage_backend_logical.c @@ -586,7 +586,7 @@ virStorageBackendLogicalRefreshPool(virConnectPtr conn ATTRIBUTE_UNUSED, virCommandPtr cmd = NULL; int ret = -1; - virFileWaitForDevices(); + virFileWaitForDevices(false); /* Get list of all logical volumes */ if (virStorageBackendLogicalFindLVs(pool, NULL) < 0) @@ -689,7 +689,7 @@ virStorageBackendLogicalDeleteVol(virConnectPtr conn ATTRIBUTE_UNUSED, virCheckFlags(0, -1); - virFileWaitForDevices(); + virFileWaitForDevices(true); lvchange_cmd = virCommandNewArgList(LVCHANGE, "-aln", vol->target.path, NULL); lvremove_cmd = virCommandNewArgList(LVREMOVE, "-f", vol->target.path, NULL); diff --git a/src/storage/storage_backend_mpath.c b/src/storage/storage_backend_mpath.c index f0ed189..216863b 100644 --- a/src/storage/storage_backend_mpath.c +++ b/src/storage/storage_backend_mpath.c @@ -274,7 +274,7 @@ virStorageBackendMpathRefreshPool(virConnectPtr conn ATTRIBUTE_UNUSED, pool->def->allocation = pool->def->capacity = pool->def->available = 0; - virFileWaitForDevices(); + virFileWaitForDevices(false); virStorageBackendGetMaps(pool); diff --git a/src/storage/storage_backend_scsi.c b/src/storage/storage_backend_scsi.c index c448d7f..7335433 100644 --- a/src/storage/storage_backend_scsi.c +++ b/src/storage/storage_backend_scsi.c @@ -430,7 +430,7 @@ virStorageBackendSCSIFindLUs(virStoragePoolObjPtr pool, VIR_DEBUG("Discovering LUs on host %u", scanhost); - virFileWaitForDevices(); + virFileWaitForDevices(false); devicedir = opendir(device_path); @@ -591,7 +591,7 @@ createVport(virStoragePoolSourceAdapter adapter) adapter.data.fchost.wwnn, VPORT_CREATE) < 0) return -1; - virFileWaitForDevices(); + virFileWaitForDevices(false); return 0; } diff --git a/src/util/virfile.h b/src/util/virfile.h index 46ef781..63d3f05 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -253,7 +253,7 @@ int virFileOpenTty(int *ttymaster, char *virFileFindMountPoint(const char *type); -void virFileWaitForDevices(void); +void virFileWaitForDevices(bool quiesce); /* NB: this should be combined with virFileBuildPath */ # define virBuildPath(path, ...) \ diff --git a/src/util/virutil.c b/src/util/virutil.c index 9be1590..030e5ec 100644 --- a/src/util/virutil.c +++ b/src/util/virutil.c @@ -1447,29 +1447,54 @@ virSetUIDGIDWithCaps(uid_t uid, gid_t gid, gid_t *groups, int ngroups, #if defined(UDEVADM) || defined(UDEVSETTLE) -void virFileWaitForDevices(void) +void virFileWaitForDevices(bool quiesce) { # ifdef UDEVADM - const char *const settleprog[] = { UDEVADM, "settle", NULL }; + const char *const settleprog = UDEVADM; # else - const char *const settleprog[] = { UDEVSETTLE, NULL }; + const char *const settleprog = UDEVSETTLE; # endif - int exitstatus; + int exitstatus = 0; + virCommandPtr settlecmd = NULL; - if (access(settleprog[0], X_OK) != 0) + if (access(settleprog, X_OK) != 0) return; + /* In less critical paths, rather than possibly wait for a two + * minute timeout for settle to just return failure because of + * some issue in the udevadm database, let's wait for 5 seconds. + * If there's an issue with the udevadm database, the timeout + * will cause a non zero return which we can check and issue a + * warning message. + */ + settlecmd = virCommandNew(settleprog); +# ifdef UDEVADM + virCommandAddArg(settlecmd, "settle"); +# endif + if (!quiesce) { + virCommandAddArg(settlecmd, "--timeout"); + virCommandAddArg(settlecmd, "5"); + } + /* - * NOTE: we ignore errors here; this is just to make sure that any device + * NOTE: we mostly ignore errors; this is just to make sure that any device * nodes that are being created finish before we try to scan them. - * If this fails for any reason, we still have the backup of polling for - * 5 seconds for device nodes. */ - if (virRun(settleprog, &exitstatus) < 0) + if (virCommandRun(settlecmd, &exitstatus) < 0) {} + + if (exitstatus) { + char *cmd_str = virCommandToString(settlecmd); + VIR_WARN("Timed out running '%s' indicating a possible issue " + "with udev event queue.", + cmd_str ? cmd_str : settleprog); + VIR_FREE(cmd_str); + } + + virCommandFree(settlecmd); } #else -void virFileWaitForDevices(void) +void virFileWaitForDevices(bool quiesce ATTRIBUTE_UNUSED) {} #endif -- 1.9.0

On Tue, Apr 15, 2014 at 07:48:52PM -0400, John Ferlan wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=789766
This patch addresses a possible misperception that libvirt is hung during pool startup/refresh as well as virt-manager startup as it needs to start/refresh storage pools found.
The problem is that the 'udevadm settle' command will wait for a default of two minutes prior to returning a failure. For code paths such as pool starts or refreshes the timeout doesn't necessarily cause failures - it may just not find anything. Additionally, since no error message or any other indication is proviced, blame is placed up libvirt for the problem; however, use of debug tools shows libvirt waiting in udevadm. NB: This timeout may be longer for iSCSI and SCSI pools since both seem to run through settle paths twice under certain conditions.
Based on feedback described here:
http://lists.freedesktop.org/archives/systemd-devel/2013-July/011845.html
this timeout is part of the architecture of udevadm settle. Although the two minute timeout expires, the code doesn't check status. Callers can decide to continue regardless of whether settle is still handling events.
FWIW, Kay has just done a fix for the timeout problems we're hitting with udev when containers are in use commit 9ea28c55a2488e6cd4a44ac5786f12b71ad5bc9f Author: Kay Sievers <kay@vrfy.org> Date: Sat Apr 12 22:35:50 2014 -0700 udev: remove seqnum API and all assumptions about seqnums The way the kernel namespaces have been implemented breaks assumptions udev made regarding uevent sequence numbers. Creating devices in a namespace "steals" uevents and its sequence numbers from the host. It confuses the "udevadmin settle" logic, which might block until util a timeout is reached, even when no uevent is pending. Remove any assumptions about sequence numbers and deprecate libudev's API exposing these numbers; none of that can reliably be used anymore when namespaces are involved. so with that fix in place, IMHO, what libvirt is doing already today should be fine.
For paths waiting for something specific, failure will be inevitable; however, for paths such as perhaps pool refreshes waiting for "anything" to be present within the directory space udevadm manages having a shorter timeout may allow them to gather some data quicker/sooner. It's always possible to refresh the data again at some later point.
I'm pretty wary of saying that allowing the shorter times for the pool refresh is safe. eg if we're about to boot a guest from ISCSI and start an iSCSI pool, there could be 100's of LUNs there. If we only wait 5 seconds, there's a reasonable chance that the LUN we need for the VM is not going to be ready in time. I think it is a violation of expectations if we say apps have to refresh the pool multiple times for it to see the LUNs that we know to be there. It was problems like this that motivated the addition of the udevadm settle calls all those years ago.
For paths that are waiting for something specific, such as finding a node device, creating a backing disk volume, or removing a logical volume waiting longer for settle to work through its events will be more important.
So given this, I modified the calls to virFileWaitForDevices() pass a boolean 'quiesce' indicating whether the caller should wait for the default timeout or to wait a shorter time. I chose 5 seconds for the shorter time, although there was no real logic/reason for it.
Additionally, for either case use VIR_WARN to indicate that the command to settle the database returned a failure.
---
Note that I did try to use a timeout value of 0 which as I was reading things would somehow indicate that something needed to be flushed; however, for the veth issue described here:
http://lists.freedesktop.org/archives/systemd-devel/2013-July/011829.html
the return value of a udevadm settle --timeout 0 was always 0; whereas, if using --timeout 1 (or more) the return value was 1.
Callers to virFileWaitForDevices() and my reasoning/logic behind why I chose to wait for quiesce or not...
node_device_driver -> Is trying to find node devices - wait long time storage_backend_*.c: (virStorageBackend*() APIs) disk: DiskRefreshPool -> Will read directory path - wait short time disk: DiskCreateVol -> Requires new disk to be present - wait long time iscsi: ISCSIGetHostNumber -> called by ISCSIFindLUs which is called by RefreshPool before opening path and scanning for devices - wait short time logical: LogicalRefreshPool -> Runs /usr/sbin/vgs to get report of volume groups - wait short time logical: LogicalDeleteVol -> Requires synchronization prior to deletion of logical volume - wait long time mpath: MpathRefreshPool -> Use DM_DEVICE_LIST to get list of devices - wait short time scsi: SCSIFindLUs -> called by SCSIRefreshPool and ISCSIFindLUs (not sure why) prior to opening path and scanning for available devices - wait short time scsi: createVport -> called by SCSIStartPool after the port is managed - there's no search for any devices and nothing depends on or checks if the port was created (although next call in sequence would be to refresh the pool, which would wait again) - wait short time
Signed-off-by: John Ferlan <jferlan@redhat.com> --- src/node_device/node_device_driver.c | 4 ++-- src/storage/storage_backend_disk.c | 4 ++-- src/storage/storage_backend_iscsi.c | 2 +- src/storage/storage_backend_logical.c | 4 ++-- src/storage/storage_backend_mpath.c | 2 +- src/storage/storage_backend_scsi.c | 4 ++-- src/util/virfile.h | 2 +- src/util/virutil.c | 45 +++++++++++++++++++++++++++-------- 8 files changed, 46 insertions(+), 21 deletions(-)
I'm afraid I just don't think this change is safe - I think it will cause volumes to go missing from pools as we had before we used udevadm settle. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 04/16/2014 05:19 AM, Daniel P. Berrange wrote:
On Tue, Apr 15, 2014 at 07:48:52PM -0400, John Ferlan wrote: <...snip...>
I'm afraid I just don't think this change is safe - I think it will cause volumes to go missing from pools as we had before we used udevadm settle.
OK - not a problem. This'll be one of those cannot fix with explanation type bugs. Do you believe it's "reasonable" or "good" to at least have some sort of message if the udevadm settle timeout is triggered? It is notable that if the two minute timeout is triggered, then the same problem exists vis-a-vis having or not having LUNs appear. Secondary to that - the iSCSI code will end up going through the settle code twice - once during it's own FindLU's in the GetHostNumber call during RefreshPool and then once because it calls the SCSI code's FindLU's call. Additionally, for a SCSI pool start there are two calls to settle - once at the end of PoolStart (createVport() after VPORT_CREATE) and then again through the RefreshPool logic during FindLU. Do you believe both are necessary or should some rework be done? Tks, John

On Wed, Apr 16, 2014 at 08:10:58AM -0400, John Ferlan wrote:
On 04/16/2014 05:19 AM, Daniel P. Berrange wrote:
On Tue, Apr 15, 2014 at 07:48:52PM -0400, John Ferlan wrote: <...snip...>
I'm afraid I just don't think this change is safe - I think it will cause volumes to go missing from pools as we had before we used udevadm settle.
OK - not a problem. This'll be one of those cannot fix with explanation type bugs. Do you believe it's "reasonable" or "good" to at least have some sort of message if the udevadm settle timeout is triggered? It is notable that if the two minute timeout is triggered, then the same problem exists vis-a-vis having or not having LUNs appear.
I guess we could log a message, but we wouldn't want to use a high logging level, since if you have LXC guests, we know we'll get these timeouts every time :-(
Secondary to that - the iSCSI code will end up going through the settle code twice - once during it's own FindLU's in the GetHostNumber call during RefreshPool and then once because it calls the SCSI code's FindLU's call. Additionally, for a SCSI pool start there are two calls to settle - once at the end of PoolStart (createVport() after VPORT_CREATE) and then again through the RefreshPool logic during FindLU. Do you believe both are necessary or should some rework be done?
I'm not sure to be honest, but I think it is worth investigating. Ideally I'd say we should only settle once per API call, unless there's a scenario that forces us otherwise. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Tue, Apr 15, 2014 at 07:48:52PM -0400, John Ferlan wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=789766
This patch addresses a possible misperception that libvirt is hung during pool startup/refresh as well as virt-manager startup as it needs to start/refresh storage pools found.
This is much nicer than what we ship in Debian (reducing the timeout to 10 seconds for all operations). So ACK (although I probably don't grasp all the details of all affected code paths). Cheers, -- Guido
participants (3)
-
Daniel P. Berrange
-
Guido Günther
-
John Ferlan