Unfortunately, after I got home, I discovered that this patch introduces a
bug that seems to have been previously hidden behind the one that this
fixes. Doubly unfortunate, is that domains using RBD (and probably ISCSI,
but I don't have the facilities to test this) volumes with authentication
can't be reconnected upon libvirt restart without this patch (or something
similarly bypassing or fixing the check on actual vs expected secret
usage). If all four of my proposed patches are accepted, an errata entry
would likely be needed to note that using network volumes with
authentication may (or really will almost certainly) cause those domains to
be rebooted. Below are my findings and thoughts on the problem I'm seeing.
Would anyone else have suggestions on how to resolve this?
During libvirt startup, there seems to be a race condition while the
various state driver subsystems are initialized and auto-start is run. It
seems to be that all state drivers are initialized, then once they are all
initialized, the auto-start function is run (if defined) for each of them.
Due to this behavior, most of the time I restart libvirt while using
storage pool volumes (currently I only have RBD volumes in use. I need to
add a domain using a file-backed volume to test that as well), any running
domains that had been using a volume are restarted due to the storage pool
not being active when it tries to reconnect to the running domain. The odd
part is that occasionally, my domains don't restart. I have yet to get a
debug log from this, but I suspect it is related to the order in which the
state drivers are loaded and auto-started. Either that or occasionally QEMU
takes longer to get around to reconnecting to the monitor socket.
As far as a resolution for the apparent race condition, I am a bit low on
suggestions at the moment. I am assuming that there is a reason that state
drivers are all loaded, then all auto-started. If not, it could be as
simple as running auto-start for a state driver immediately after loading
it. Another possible solution might be to enforce an ordering on state
drivers. Perhaps forcing virtualization state drivers (such as QEMU, LXC,
Xen, etc) to be initialized and auto-started after all other state drivers
have completed their sequences. I'm not sure where or how this list is
created, though. Since this occasionally works without restarting domains,
I do assume that the list is either generated at run-time and/or the
initialization and auto-start sequences are asynchronous. Unfortunately, my
experience with asynchronous code is about nil, so I'm not sure how much
help I could provide in removing a race condition.
An alternate solution that, while not pretty, may be better than a reboot
of the domains, would be to change the behavior when QEMU attempts a
reconnect and finds the storage pool inactive. Instead of terminating the
domain, perhaps it could be paused. The auto-start would then resume it if
the pool was active at that point, otherwise, it would remain paused until
someone resumed it.
On Fri, Dec 6, 2013 at 3:10 PM, Adam Walters <adam(a)pandorasboxen.com> wrote:
While testing Peter's changes to my patches, I discovered that
when
restarting libvirt with domains running, they would no longer show
up when a 'virsh list' was issued. I tracked this bug down to this
section of code within virDomainDiskDefParseXML. The symptom was
that libvirt was issuing an internal error saying that the secret
type was invalid. This is caused because this function didn't handle
storage pool volumes. I do not have any ISCSI volumes with which to
test, but it would theoretically affect those, as well. I attempted
to make the function smarter, by checking the pool type and setting
expected_secret_usage properly, but this function does not have a
valid connection pointer, and I was thus unable to do so.
---
src/conf/domain_conf.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index 332cb50..82587eb 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -5311,7 +5311,8 @@ virDomainDiskDefParseXML(virDomainXMLOptionPtr
xmlopt,
cur = cur->next;
}
- if (auth_secret_usage != -1 && auth_secret_usage !=
expected_secret_usage) {
+ if (auth_secret_usage != -1 && auth_secret_usage !=
expected_secret_usage &&
+ def->type != VIR_DOMAIN_DISK_TYPE_VOLUME) {
virReportError(VIR_ERR_INTERNAL_ERROR,
_("invalid secret type '%s'"),
virSecretUsageTypeTypeToString(auth_secret_usage));
--
1.8.4.2