[libvirt] [PATCHv5 0/4] Implement RBD storage pool support

Here are updated patches that implement RBD storage pool support in libvirt. By and large, my changes are confined to qemu_conf.c, but I did need a couple of changes in domain_conf.c in order to get things working. I've tested these patches on my equipment, and they worked there, but they could probably use testing by others if anyone else has Ceph configured. The last patch in this series isn't needed in order to have RBD volumes work for domains, but it is needed if you don't want to have to kill the domain to get it back under the control of libvirt. Adam Walters (4): qemu: conf: Add qemuAddRBDPoolSourceHost helper function qemu: conf: Implement RBD storage pool support domain: conf: Bypass virDomainDiskDefForeachPath for network volumes domain: conf: Resolve bug related to network volume types src/conf/domain_conf.c | 6 +++-- src/qemu/qemu_conf.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 65 insertions(+), 3 deletions(-) -- 1.8.4.2

This helper function pulls the host definitions from the RBD storage pool definition, and applies it to the disk definition. --- src/qemu/qemu_conf.c | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index 4378791..2d42c3b 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -1247,6 +1247,45 @@ cleanup: } static int +qemuAddRBDPoolSourceHost(virDomainDiskDefPtr def, + virStoragePoolDefPtr pooldef) +{ + int ret = -1; + size_t i = 0; + char **tokens = NULL; + + def->nhosts = pooldef->source.nhost; + + if (VIR_ALLOC_N(def->hosts, def->nhosts) < 0) + goto cleanup; + + for (i = 0; i < def->nhosts; i++) { + if (VIR_STRDUP(def->hosts[i].name, pooldef->source.hosts[i].name) < 0) + goto cleanup; + + if (virAsprintf(&def->hosts[i].port, "%d", + pooldef->source.hosts[i].port ? + pooldef->source.hosts[i].port : + 6789) < 0) + goto cleanup; + + /* Storage pools have not supported these 2 attributes yet, + * use the defaults. + */ + def->hosts[i].transport = VIR_DOMAIN_DISK_PROTO_TRANS_TCP; + def->hosts[i].socket = NULL; + } + + def->protocol = VIR_DOMAIN_DISK_PROTOCOL_RBD; + + ret = 0; + +cleanup: + virStringFreeList(tokens); + return ret; +} + +static int qemuTranslateDiskSourcePoolAuth(virDomainDiskDefPtr def, virStoragePoolDefPtr pooldef) { -- 1.8.4.2

This patch is updated to not contain a switch statement on the mode. The patch as a whole allows RBD storage pool volumes to be used within domain XML. --- src/qemu/qemu_conf.c | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index 2d42c3b..c28908a 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -1478,8 +1478,29 @@ qemuTranslateDiskSourcePool(virConnectPtr conn, } break; - case VIR_STORAGE_POOL_MPATH: case VIR_STORAGE_POOL_RBD: + if (def->startupPolicy) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("'startupPolicy' is only valid for " + "'file' type volume")); + goto cleanup; + } + + def->srcpool->actualtype = VIR_DOMAIN_DISK_TYPE_NETWORK; + def->protocol = VIR_DOMAIN_DISK_PROTOCOL_RBD; + + if (!(def->src = virStorageVolGetPath(vol))) + goto cleanup; + + if (qemuTranslateDiskSourcePoolAuth(def, pooldef) < 0) + goto cleanup; + + if (qemuAddRBDPoolSourceHost(def, pooldef) < 0) + goto cleanup; + + break; + + case VIR_STORAGE_POOL_MPATH: case VIR_STORAGE_POOL_SHEEPDOG: case VIR_STORAGE_POOL_GLUSTER: case VIR_STORAGE_POOL_LAST: -- 1.8.4.2

virDomainDiskDefForeachPath returned 0 previously for disks of type VIR_DOMAIN_DISK_TYPE_NETWORK. This patch extends this check to also bypass the function for VIR_DOMAIN_DISK_TYPE_VOLUME, shoudl the new disk variable def->actualtype contain VIR_DOMAIN_DISK_TYPE_NETWORK. --- src/conf/domain_conf.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index b76cf26..332cb50 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -17970,7 +17970,8 @@ virDomainDiskDefForeachPath(virDomainDiskDefPtr disk, if (!disk->src || disk->type == VIR_DOMAIN_DISK_TYPE_NETWORK || (disk->type == VIR_DOMAIN_DISK_TYPE_VOLUME && disk->srcpool && - disk->srcpool->mode == VIR_DOMAIN_DISK_SOURCE_POOL_MODE_DIRECT)) + (disk->srcpool->mode == VIR_DOMAIN_DISK_SOURCE_POOL_MODE_DIRECT || + disk->srcpool->actualtype == VIR_DOMAIN_DISK_TYPE_NETWORK))) return 0; if (iter(disk, disk->src, 0, opaque) < 0) -- 1.8.4.2

While testing Peter's changes to my patches, I discovered that when restarting libvirt with domains running, they would no longer show up when a 'virsh list' was issued. I tracked this bug down to this section of code within virDomainDiskDefParseXML. The symptom was that libvirt was issuing an internal error saying that the secret type was invalid. This is caused because this function didn't handle storage pool volumes. I do not have any ISCSI volumes with which to test, but it would theoretically affect those, as well. I attempted to make the function smarter, by checking the pool type and setting expected_secret_usage properly, but this function does not have a valid connection pointer, and I was thus unable to do so. --- src/conf/domain_conf.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 332cb50..82587eb 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -5311,7 +5311,8 @@ virDomainDiskDefParseXML(virDomainXMLOptionPtr xmlopt, cur = cur->next; } - if (auth_secret_usage != -1 && auth_secret_usage != expected_secret_usage) { + if (auth_secret_usage != -1 && auth_secret_usage != expected_secret_usage && + def->type != VIR_DOMAIN_DISK_TYPE_VOLUME) { virReportError(VIR_ERR_INTERNAL_ERROR, _("invalid secret type '%s'"), virSecretUsageTypeTypeToString(auth_secret_usage)); -- 1.8.4.2

Unfortunately, after I got home, I discovered that this patch introduces a bug that seems to have been previously hidden behind the one that this fixes. Doubly unfortunate, is that domains using RBD (and probably ISCSI, but I don't have the facilities to test this) volumes with authentication can't be reconnected upon libvirt restart without this patch (or something similarly bypassing or fixing the check on actual vs expected secret usage). If all four of my proposed patches are accepted, an errata entry would likely be needed to note that using network volumes with authentication may (or really will almost certainly) cause those domains to be rebooted. Below are my findings and thoughts on the problem I'm seeing. Would anyone else have suggestions on how to resolve this? During libvirt startup, there seems to be a race condition while the various state driver subsystems are initialized and auto-start is run. It seems to be that all state drivers are initialized, then once they are all initialized, the auto-start function is run (if defined) for each of them. Due to this behavior, most of the time I restart libvirt while using storage pool volumes (currently I only have RBD volumes in use. I need to add a domain using a file-backed volume to test that as well), any running domains that had been using a volume are restarted due to the storage pool not being active when it tries to reconnect to the running domain. The odd part is that occasionally, my domains don't restart. I have yet to get a debug log from this, but I suspect it is related to the order in which the state drivers are loaded and auto-started. Either that or occasionally QEMU takes longer to get around to reconnecting to the monitor socket. As far as a resolution for the apparent race condition, I am a bit low on suggestions at the moment. I am assuming that there is a reason that state drivers are all loaded, then all auto-started. If not, it could be as simple as running auto-start for a state driver immediately after loading it. Another possible solution might be to enforce an ordering on state drivers. Perhaps forcing virtualization state drivers (such as QEMU, LXC, Xen, etc) to be initialized and auto-started after all other state drivers have completed their sequences. I'm not sure where or how this list is created, though. Since this occasionally works without restarting domains, I do assume that the list is either generated at run-time and/or the initialization and auto-start sequences are asynchronous. Unfortunately, my experience with asynchronous code is about nil, so I'm not sure how much help I could provide in removing a race condition. An alternate solution that, while not pretty, may be better than a reboot of the domains, would be to change the behavior when QEMU attempts a reconnect and finds the storage pool inactive. Instead of terminating the domain, perhaps it could be paused. The auto-start would then resume it if the pool was active at that point, otherwise, it would remain paused until someone resumed it. On Fri, Dec 6, 2013 at 3:10 PM, Adam Walters <adam@pandorasboxen.com> wrote:
While testing Peter's changes to my patches, I discovered that when restarting libvirt with domains running, they would no longer show up when a 'virsh list' was issued. I tracked this bug down to this section of code within virDomainDiskDefParseXML. The symptom was that libvirt was issuing an internal error saying that the secret type was invalid. This is caused because this function didn't handle storage pool volumes. I do not have any ISCSI volumes with which to test, but it would theoretically affect those, as well. I attempted to make the function smarter, by checking the pool type and setting expected_secret_usage properly, but this function does not have a valid connection pointer, and I was thus unable to do so. --- src/conf/domain_conf.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 332cb50..82587eb 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -5311,7 +5311,8 @@ virDomainDiskDefParseXML(virDomainXMLOptionPtr xmlopt, cur = cur->next; }
- if (auth_secret_usage != -1 && auth_secret_usage != expected_secret_usage) { + if (auth_secret_usage != -1 && auth_secret_usage != expected_secret_usage && + def->type != VIR_DOMAIN_DISK_TYPE_VOLUME) { virReportError(VIR_ERR_INTERNAL_ERROR, _("invalid secret type '%s'"), virSecretUsageTypeTypeToString(auth_secret_usage)); -- 1.8.4.2

I built a test domain using the default storage pool, backed by files under /var/lib/libvirt/images/. I can confirm that the apparent race condition also appears there, too. So it definitely seems like using volumes in general can cause issues with domains being restarted after a libvirt restart. I can post more of the debug-enabled log file if anyone wants, but the error message is "qemuTranslateDiskSourcePool:1376 : unsupported configuration: storage pool 'default' containing volume 'test.img' is not active." In my troubleshooting, the race condition seemed to reboot at least one domain (out of three) about 40% of the time. My troubleshooting also seems to confirm that my suspicion that this bug had been more or less hidden by the fact that libvirt previously refused to take over an existing domain with RBD pools, combined with the seemingly low adoption rate of storage pools in frontend software thus far. I'll try and think about and implement possible solutions to this race condition over the weekend. On Fri, Dec 6, 2013 at 4:40 PM, Adam Walters <adam@pandorasboxen.com> wrote:
Unfortunately, after I got home, I discovered that this patch introduces a bug that seems to have been previously hidden behind the one that this fixes. Doubly unfortunate, is that domains using RBD (and probably ISCSI, but I don't have the facilities to test this) volumes with authentication can't be reconnected upon libvirt restart without this patch (or something similarly bypassing or fixing the check on actual vs expected secret usage). If all four of my proposed patches are accepted, an errata entry would likely be needed to note that using network volumes with authentication may (or really will almost certainly) cause those domains to be rebooted. Below are my findings and thoughts on the problem I'm seeing. Would anyone else have suggestions on how to resolve this?
During libvirt startup, there seems to be a race condition while the various state driver subsystems are initialized and auto-start is run. It seems to be that all state drivers are initialized, then once they are all initialized, the auto-start function is run (if defined) for each of them. Due to this behavior, most of the time I restart libvirt while using storage pool volumes (currently I only have RBD volumes in use. I need to add a domain using a file-backed volume to test that as well), any running domains that had been using a volume are restarted due to the storage pool not being active when it tries to reconnect to the running domain. The odd part is that occasionally, my domains don't restart. I have yet to get a debug log from this, but I suspect it is related to the order in which the state drivers are loaded and auto-started. Either that or occasionally QEMU takes longer to get around to reconnecting to the monitor socket.
As far as a resolution for the apparent race condition, I am a bit low on suggestions at the moment. I am assuming that there is a reason that state drivers are all loaded, then all auto-started. If not, it could be as simple as running auto-start for a state driver immediately after loading it. Another possible solution might be to enforce an ordering on state drivers. Perhaps forcing virtualization state drivers (such as QEMU, LXC, Xen, etc) to be initialized and auto-started after all other state drivers have completed their sequences. I'm not sure where or how this list is created, though. Since this occasionally works without restarting domains, I do assume that the list is either generated at run-time and/or the initialization and auto-start sequences are asynchronous. Unfortunately, my experience with asynchronous code is about nil, so I'm not sure how much help I could provide in removing a race condition.
An alternate solution that, while not pretty, may be better than a reboot of the domains, would be to change the behavior when QEMU attempts a reconnect and finds the storage pool inactive. Instead of terminating the domain, perhaps it could be paused. The auto-start would then resume it if the pool was active at that point, otherwise, it would remain paused until someone resumed it.
On Fri, Dec 6, 2013 at 3:10 PM, Adam Walters <adam@pandorasboxen.com>wrote:
While testing Peter's changes to my patches, I discovered that when restarting libvirt with domains running, they would no longer show up when a 'virsh list' was issued. I tracked this bug down to this section of code within virDomainDiskDefParseXML. The symptom was that libvirt was issuing an internal error saying that the secret type was invalid. This is caused because this function didn't handle storage pool volumes. I do not have any ISCSI volumes with which to test, but it would theoretically affect those, as well. I attempted to make the function smarter, by checking the pool type and setting expected_secret_usage properly, but this function does not have a valid connection pointer, and I was thus unable to do so. --- src/conf/domain_conf.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 332cb50..82587eb 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -5311,7 +5311,8 @@ virDomainDiskDefParseXML(virDomainXMLOptionPtr xmlopt, cur = cur->next; }
- if (auth_secret_usage != -1 && auth_secret_usage != expected_secret_usage) { + if (auth_secret_usage != -1 && auth_secret_usage != expected_secret_usage && + def->type != VIR_DOMAIN_DISK_TYPE_VOLUME) { virReportError(VIR_ERR_INTERNAL_ERROR, _("invalid secret type '%s'"),
virSecretUsageTypeTypeToString(auth_secret_usage)); -- 1.8.4.2
participants (1)
-
Adam Walters