[libvirt] [PATCH 00/14] Basic implementation of persistent reservations

QEMU added support for SCSI persistent reservations. The way QEMU implemented that makes it slightly harder for libvirt to adapt to - it's a small binary that needs to start before qemu so that it can connect to it. Basically, what libvirt does is: 1) start qemu-pr-helper (the small binary from above), 2) records its PID for tracking purposes 3) start qemu and when shutting down a domain the process is reversed: 1) shut down qemu process, 2) kill qemu-pr-helper (PID was saved earlier). Now, here also lies the last missing piece to he puzzle - what if qemu-pr-helper dies while qemu is running? After some discussion with Paolo it was agreed that qemu will emit an event for libvirt so that we can respawn the process. Anyway, that shouldn't be a show stopper for these patches (if it was the patch set might get too big). Please test, review and comment. As usual, you can find the patches on my github too: https://github.com/zippy2/libvirt/tree/pr Michal Privoznik (14): virstoragefile: Introduce virStoragePRDef qemuDomainDiskChangeSupported: Deny changing reservations qemu: Introduce pr-manager-helper capability qemu_domain: Introduce qemuDomainDiskPRObject qemu: Generate alias for pr-helper qemu: Store prAlias and prPath in status XML qemu: Generate cmd line at startup qemu: Introduce pr_helper to qemu.conf qemu: Start PR daemons on domain startup qemu: Track PR daemons PIDs in status XML qemu_domain: Introduce qemuDomainGetPRUsageCount qemu_process.c: Introduce qemuProcessSetupPRDaemon qemu_hotplug: Hotplug of reservations qemu_hotplug: Hotunplug of reservations docs/formatdomain.html.in | 25 +- docs/schemas/domaincommon.rng | 19 +- docs/schemas/storagecommon.rng | 34 ++ m4/virt-driver-qemu.m4 | 5 + src/conf/domain_conf.c | 36 ++ src/libvirt_private.syms | 4 + src/qemu/libvirtd_qemu.aug | 1 + src/qemu/qemu.conf | 4 + src/qemu/qemu_capabilities.c | 2 + src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 59 +++ src/qemu/qemu_conf.c | 7 +- src/qemu/qemu_conf.h | 1 + src/qemu/qemu_domain.c | 435 ++++++++++++++++++++- src/qemu/qemu_domain.h | 29 ++ src/qemu/qemu_hotplug.c | 110 ++++++ src/qemu/qemu_process.c | 160 ++++++++ src/qemu/qemu_process.h | 4 + src/qemu/test_libvirtd_qemu.aug.in | 1 + src/util/virstoragefile.c | 166 ++++++++ src/util/virstoragefile.h | 17 + .../disk-virtio-scsi-reservations-not-managed.args | 28 ++ .../disk-virtio-scsi-reservations-not-managed.xml | 40 ++ .../disk-virtio-scsi-reservations.args | 29 ++ .../disk-virtio-scsi-reservations.xml | 38 ++ tests/qemuxml2argvtest.c | 8 + .../disk-virtio-scsi-reservations-not-managed.xml | 1 + .../disk-virtio-scsi-reservations.xml | 1 + tests/qemuxml2xmltest.c | 4 + 29 files changed, 1250 insertions(+), 19 deletions(-) create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.args create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.xml create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations.args create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations.xml create mode 120000 tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations-not-managed.xml create mode 120000 tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations.xml -- 2.13.6

This is a definition that holds information on SCSI persistent reservation settings. The XML part looks like this: <reservations enabled='yes' managed='no'> <source type='unix' path='/path/to/qemu-pr-helper.sock' mode='client'/> </reservations> If @managed is set to 'yes' then the <source/> is not parsed. This design was agreed on here: https://www.redhat.com/archives/libvir-list/2017-November/msg01005.html Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.html.in | 25 +++- docs/schemas/domaincommon.rng | 19 +-- docs/schemas/storagecommon.rng | 34 +++++ src/conf/domain_conf.c | 36 +++++ src/libvirt_private.syms | 3 + src/util/virstoragefile.c | 148 +++++++++++++++++++++ src/util/virstoragefile.h | 15 +++ .../disk-virtio-scsi-reservations-not-managed.xml | 40 ++++++ .../disk-virtio-scsi-reservations.xml | 38 ++++++ .../disk-virtio-scsi-reservations-not-managed.xml | 1 + .../disk-virtio-scsi-reservations.xml | 1 + tests/qemuxml2xmltest.c | 4 + 12 files changed, 348 insertions(+), 16 deletions(-) create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.xml create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations.xml create mode 120000 tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations-not-managed.xml create mode 120000 tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations.xml diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index d272cc1ba..23c5f071e 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -2457,7 +2457,10 @@ </disk> <disk type='block' device='lun'> <driver name='qemu' type='raw'/> - <source dev='/dev/sda'/> + <source dev='/dev/sda'> + <reservations enabled='yes' managed='no'> + <source type='unix' path='/path/to/qemu-pr-helper' mode='client'/> + </reservations> <target dev='sda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='3' unit='0'/> </disk> @@ -2819,6 +2822,26 @@ <a href="formatstorageencryption.html">Storage Encryption</a> page for more information. </dd> + <dt><code>reservations</code></dt> + <dd><span class="since">Since libvirt 3.10.0</span>, the + <code>reservations</code> can be a sub-element of the + <code>source</code> element for storage sources (QEMU driver only). + If present (and enabled) it enabled persistent reservations for + SCSI based disks. The element has one mandatory attribute + <code>enabled</code> with accepted values <code>yes</code> and + <code>no</code>. If the feature is enabled, then there's another + mandatory attribute <code>managed</code> (accepted values are the + same as for <code>enabled</code>) that enables or disables libvirt + spawning any helper process (should one be needed). However, if + libvirt is not enabled spawning helper process (i.e. hypervisor + should just use one which is already running), path to its socket + must be provided in child element <code>source</code>, which + currently accepts only the following attributes: <code>type</code> + with one value <code>unix</code>, <code>path</code> with path the + socket, and finally <code>mode</code> which accepts either + <code>server</code> or <code>client</code> and specifies the role + of hypervisor. + </dd> </dl> <p> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index f22c932f6..884081bd7 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -1498,6 +1498,9 @@ <optional> <ref name="encryption"/> </optional> + <optional> + <ref name="reservations"/> + </optional> <zeroOrMore> <ref name='devSeclabel'/> </zeroOrMore> @@ -2447,21 +2450,7 @@ <value>vhostuser</value> </attribute> <interleave> - <element name="source"> - <attribute name="type"> - <value>unix</value> - </attribute> - <attribute name="path"> - <ref name="absFilePath"/> - </attribute> - <attribute name="mode"> - <choice> - <value>server</value> - <value>client</value> - </choice> - </attribute> - <empty/> - </element> + <ref name="unixSocketSource"/> <ref name="interface-options"/> </interleave> </group> diff --git a/docs/schemas/storagecommon.rng b/docs/schemas/storagecommon.rng index edee1b084..9071667b0 100644 --- a/docs/schemas/storagecommon.rng +++ b/docs/schemas/storagecommon.rng @@ -39,6 +39,40 @@ </element> </define> + <define name='unixSocketSource'> + <element name="source"> + <attribute name="type"> + <value>unix</value> + </attribute> + <attribute name="path"> + <ref name="absFilePath"/> + </attribute> + <attribute name="mode"> + <choice> + <value>server</value> + <value>client</value> + </choice> + </attribute> + <empty/> + </element> + </define> + + <define name='reservations'> + <element name='reservations'> + <attribute name='enabled'> + <ref name='virYesNo'/> + </attribute> + <optional> + <attribute name='managed'> + <ref name='virYesNo'/> + </attribute> + </optional> + <optional> + <ref name='unixSocketSource'/> + </optional> + </element> + </define> + <define name='secret'> <element name='secret'> <attribute name='type'> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index a1c25060f..222c6dd15 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -5163,6 +5163,13 @@ virDomainDiskDefValidate(const virDomainDiskDef *disk) } } + if (disk->src->pr && + disk->device != VIR_DOMAIN_DISK_DEVICE_LUN) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("<reservations/> allowed only for lun disks")); + return -1; + } + /* Reject disks with a bus type that is not compatible with the * given address type. The function considers only buses that are * handled in common code. For other bus types it's not possible @@ -8579,6 +8586,29 @@ virDomainDiskSourcePrivateDataParse(xmlXPathContextPtr ctxt, } +static int +virDomainDiskSourcePRParse(xmlNodePtr node, + virStoragePRDefPtr *prsrc) +{ + xmlNodePtr child; + virStoragePRDefPtr pr = NULL; + + for (child = node->children; child; child = child->next) { + if (child->type == XML_ELEMENT_NODE && + virXMLNodeNameEqual(child, "reservations")) { + + if (!(pr = virStoragePRDefParseNode(node->doc, child))) + return -1; + + *prsrc = pr; + return 0; + } + } + + return 0; +} + + int virDomainDiskSourceParse(xmlNodePtr node, xmlXPathContextPtr ctxt, @@ -8623,6 +8653,9 @@ virDomainDiskSourceParse(xmlNodePtr node, if (virDomainDiskSourceEncryptionParse(node, &src->encryption) < 0) goto cleanup; + if (virDomainDiskSourcePRParse(node, &src->pr) < 0) + goto cleanup; + if (virDomainDiskSourcePrivateDataParse(ctxt, src, flags, xmlopt) < 0) goto cleanup; @@ -22519,6 +22552,9 @@ virDomainDiskSourceFormatInternal(virBufferPtr buf, virStorageEncryptionFormat(&childBuf, src->encryption) < 0) return -1; + if (src->pr) + virStoragePRDefFormat(&childBuf, src->pr); + if (virDomainDiskSourceFormatPrivateData(&childBuf, src, flags, xmlopt) < 0) return -1; diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index bc8cc1fba..b0dc76b91 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2717,6 +2717,9 @@ virStorageNetHostDefFree; virStorageNetHostTransportTypeFromString; virStorageNetHostTransportTypeToString; virStorageNetProtocolTypeToString; +virStoragePRDefFormat; +virStoragePRDefFree; +virStoragePRDefParseNode; virStorageSourceBackingStoreClear; virStorageSourceClear; virStorageSourceCopy; diff --git a/src/util/virstoragefile.c b/src/util/virstoragefile.c index 5780180a9..fda6617f7 100644 --- a/src/util/virstoragefile.c +++ b/src/util/virstoragefile.c @@ -1913,6 +1913,153 @@ virStorageAuthDefFormat(virBufferPtr buf, } +void +virStoragePRDefFree(virStoragePRDefPtr prd) +{ + if (!prd) + return; + + VIR_FREE(prd->path); + VIR_FREE(prd); +} + + +static virStoragePRDefPtr +virStoragePRDefParseXML(xmlXPathContextPtr ctxt) +{ + virStoragePRDefPtr prd, ret = NULL; + char *enabled = NULL; + char *managed = NULL; + char *type = NULL; + char *path = NULL; + char *mode = NULL; + + if (VIR_ALLOC(prd) < 0) + return NULL; + + if (!(enabled = virXPathString("string(./@enabled)", ctxt))) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("missing @enabled attribute for <reservations/>")); + goto cleanup; + } + + if ((prd->enabled = virTristateBoolTypeFromString(enabled)) <= 0) { + virReportError(VIR_ERR_XML_ERROR, + _("invalid value for 'enabled': %s"), NULLSTR(enabled)); + goto cleanup; + } + + if (prd->enabled == VIR_TRISTATE_BOOL_YES) { + managed = virXPathString("string(./@managed)", ctxt); + if ((prd->managed = virTristateBoolTypeFromString(managed)) <= 0) { + virReportError(VIR_ERR_XML_ERROR, + _("invalid value for 'managed': %s"), NULLSTR(managed)); + goto cleanup; + } + + if (prd->managed == VIR_TRISTATE_BOOL_NO) { + type = virXPathString("string(./source[1]/@type)", ctxt); + path = virXPathString("string(./source[1]/@path)", ctxt); + mode = virXPathString("string(./source[1]/@mode)", ctxt); + + if (!type) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("missing connection type")); + goto cleanup; + } + + if (!path) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("missing path")); + goto cleanup; + } + + if (!mode) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("missing connection mode")); + goto cleanup; + } + + if (STRNEQ(type, "unix")) { + virReportError(VIR_ERR_XML_ERROR, + _("unsupported type: %s"), type); + goto cleanup; + } + + if (STRNEQ(mode, "client")) { + virReportError(VIR_ERR_XML_ERROR, + _("unsupported mode: %s"), mode); + goto cleanup; + } + + prd->path = path; + path = NULL; + + } + } + + ret = prd; + prd = NULL; + + cleanup: + VIR_FREE(mode); + VIR_FREE(path); + VIR_FREE(type); + VIR_FREE(managed); + VIR_FREE(enabled); + virStoragePRDefFree(prd); + return ret; +} + + +virStoragePRDefPtr +virStoragePRDefParseNode(xmlDocPtr xml, + xmlNodePtr root) +{ + xmlXPathContextPtr ctxt = NULL; + virStoragePRDefPtr prd = NULL; + + ctxt = xmlXPathNewContext(xml); + if (ctxt == NULL) { + virReportOOMError(); + goto cleanup; + } + + ctxt->node = root; + prd = virStoragePRDefParseXML(ctxt); + + cleanup: + xmlXPathFreeContext(ctxt); + return prd; +} + + +void +virStoragePRDefFormat(virBufferPtr buf, + virStoragePRDefPtr prd) +{ + virBufferAsprintf(buf, "<reservations enabled='%s'", + virTristateBoolTypeToString(prd->enabled)); + if (prd->enabled == VIR_TRISTATE_BOOL_YES) { + virBufferAsprintf(buf, " managed='%s'", + virTristateBoolTypeToString(prd->managed)); + if (prd->managed == VIR_TRISTATE_BOOL_NO) { + virBufferAddLit(buf, ">\n"); + virBufferAdjustIndent(buf, 2); + virBufferAddLit(buf, "<source type='unix'"); + virBufferEscapeString(buf, " path='%s'", prd->path); + virBufferAddLit(buf, " mode='client'/>\n"); + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</reservations>\n"); + } else { + virBufferAddLit(buf, "/>\n"); + } + } else { + virBufferAddLit(buf, "/>\n"); + } +} + + virSecurityDeviceLabelDefPtr virStorageSourceGetSecurityLabelDef(virStorageSourcePtr src, const char *model) @@ -2291,6 +2438,7 @@ virStorageSourceClear(virStorageSourcePtr def) virBitmapFree(def->features); VIR_FREE(def->compat); virStorageEncryptionFree(def->encryption); + virStoragePRDefFree(def->pr); virStorageSourceSeclabelsClear(def); virStoragePermsFree(def->perms); VIR_FREE(def->timestamps); diff --git a/src/util/virstoragefile.h b/src/util/virstoragefile.h index ecd806c93..f82c20cf4 100644 --- a/src/util/virstoragefile.h +++ b/src/util/virstoragefile.h @@ -216,6 +216,14 @@ struct _virStorageAuthDef { virSecretLookupTypeDef seclookupdef; }; +typedef struct _virStoragePRDef virStoragePRDef; +typedef virStoragePRDef *virStoragePRDefPtr; +struct _virStoragePRDef { + int enabled; /* enum virTristateBool */ + int managed; /* enum virTristateBool */ + char *path; +}; + typedef struct _virStorageDriverData virStorageDriverData; typedef virStorageDriverData *virStorageDriverDataPtr; @@ -243,6 +251,7 @@ struct _virStorageSource { bool authInherited; virStorageEncryptionPtr encryption; bool encryptionInherited; + virStoragePRDefPtr pr; virObjectPtr privateData; @@ -369,6 +378,12 @@ virStorageAuthDefPtr virStorageAuthDefCopy(const virStorageAuthDef *src); virStorageAuthDefPtr virStorageAuthDefParse(xmlDocPtr xml, xmlNodePtr root); int virStorageAuthDefFormat(virBufferPtr buf, virStorageAuthDefPtr authdef); +void virStoragePRDefFree(virStoragePRDefPtr prd); +virStoragePRDefPtr virStoragePRDefParseNode(xmlDocPtr xml, + xmlNodePtr root); +void virStoragePRDefFormat(virBufferPtr buf, + virStoragePRDefPtr prd); + virSecurityDeviceLabelDefPtr virStorageSourceGetSecurityLabelDef(virStorageSourcePtr src, const char *model); diff --git a/tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.xml b/tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.xml new file mode 100644 index 000000000..8ee623a70 --- /dev/null +++ b/tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.xml @@ -0,0 +1,40 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu placement='static'>8</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i686</emulator> + <disk type='block' device='lun'> + <source dev='/dev/HostVG/QEMUGuest1'> + <reservations enabled='yes' managed='no'> + <source type='unix' path='/path/to/qemu-pr-helper.sock' mode='client'/> + </reservations> + </source> + <target dev='sdb' bus='scsi'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='usb' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> + </controller> + <controller type='scsi' index='0' model='virtio-scsi'> + <driver queues='8'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </controller> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> + </memballoon> + </devices> +</domain> diff --git a/tests/qemuxml2argvdata/disk-virtio-scsi-reservations.xml b/tests/qemuxml2argvdata/disk-virtio-scsi-reservations.xml new file mode 100644 index 000000000..874446f7b --- /dev/null +++ b/tests/qemuxml2argvdata/disk-virtio-scsi-reservations.xml @@ -0,0 +1,38 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu placement='static'>8</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i686</emulator> + <disk type='block' device='lun'> + <source dev='/dev/HostVG/QEMUGuest1'> + <reservations enabled='yes' managed='yes'/> + </source> + <target dev='sdb' bus='scsi'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='usb' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> + </controller> + <controller type='scsi' index='0' model='virtio-scsi'> + <driver queues='8'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </controller> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> + </memballoon> + </devices> +</domain> diff --git a/tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations-not-managed.xml b/tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations-not-managed.xml new file mode 120000 index 000000000..f96d62cb6 --- /dev/null +++ b/tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations-not-managed.xml @@ -0,0 +1 @@ +../qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.xml \ No newline at end of file diff --git a/tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations.xml b/tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations.xml new file mode 120000 index 000000000..dde52fd1d --- /dev/null +++ b/tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations.xml @@ -0,0 +1 @@ +../qemuxml2argvdata/disk-virtio-scsi-reservations.xml \ No newline at end of file diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index 2be8eb2c1..3d70f13c8 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -531,6 +531,10 @@ mymain(void) QEMU_CAPS_NODEFCONFIG, QEMU_CAPS_VIRTIO_SCSI); DO_TEST("disk-virtio-scsi-num_queues", QEMU_CAPS_NODEFCONFIG, QEMU_CAPS_VIRTIO_SCSI); + DO_TEST("disk-virtio-scsi-reservations", + QEMU_CAPS_NODEFCONFIG, QEMU_CAPS_VIRTIO_SCSI); + DO_TEST("disk-virtio-scsi-reservations-not-managed", + QEMU_CAPS_NODEFCONFIG, QEMU_CAPS_VIRTIO_SCSI); DO_TEST("disk-virtio-scsi-cmd_per_lun", QEMU_CAPS_NODEFCONFIG, QEMU_CAPS_VIRTIO_SCSI); DO_TEST("disk-virtio-scsi-max_sectors", -- 2.13.6

On 01/18/2018 11:04 AM, Michal Privoznik wrote:
This is a definition that holds information on SCSI persistent reservation settings. The XML part looks like this:
<reservations enabled='yes' managed='no'> <source type='unix' path='/path/to/qemu-pr-helper.sock' mode='client'/> </reservations>
If @managed is set to 'yes' then the <source/> is not parsed. This design was agreed on here:
https://www.redhat.com/archives/libvir-list/2017-November/msg01005.html
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.html.in | 25 +++- docs/schemas/domaincommon.rng | 19 +-- docs/schemas/storagecommon.rng | 34 +++++ src/conf/domain_conf.c | 36 +++++ src/libvirt_private.syms | 3 + src/util/virstoragefile.c | 148 +++++++++++++++++++++ src/util/virstoragefile.h | 15 +++ .../disk-virtio-scsi-reservations-not-managed.xml | 40 ++++++ .../disk-virtio-scsi-reservations.xml | 38 ++++++ .../disk-virtio-scsi-reservations-not-managed.xml | 1 + .../disk-virtio-scsi-reservations.xml | 1 + tests/qemuxml2xmltest.c | 4 + 12 files changed, 348 insertions(+), 16 deletions(-) create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.xml create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations.xml create mode 120000 tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations-not-managed.xml create mode 120000 tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations.xml
Before digging too deep into this... - I assume we're avoiding <disk> iSCSI mainly because those reservations would take place elsewhere, safe assumption? - What about using lun's from a storage pool (and what could become your favorite, NPIV devices ;-)) <disk type='volume' device='lun'> <driver name='qemu' type='raw'/> <source pool='sourcepool' volume='unit:0:4:0'/> <target dev='sda' bus='scsi'/> </disk> - What about <hostdev>'s? <hostdev mode='subsystem' type='scsi'> but not iSCSI or vHost hostdev's. I think that creates the SCSI generic LUN, but it's been a while since I've thought about the terminology used for hostdev's... I also have this faint recollection of PR related to sgio filtering and perhaps even rawio, but dredging that back up could send me down the path of some "downstream only" type bz's. Although searching on just VIR_DOMAIN_DISK_DEVICE_LUN does bring up qemuSetUnprivSGIO. And finally... I assume there is one qemu-pr-manager (qemu.conf changes eventually)... Eventually there's magic that allows/adds per domain *and* per LUN some socket path. If libvirt provided it's generated via the domain temporary directory; however, what's not really clear is how that unmanaged path really works. Need a virtual whiteboard... I only commented at this first patch for now. I did briefly scan the others, but too many questions here to dig deep into those. I did run the entire series through my Coverity checker - there's 3 things that get tagged in later patches. John
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index d272cc1ba..23c5f071e 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -2457,7 +2457,10 @@ </disk> <disk type='block' device='lun'> <driver name='qemu' type='raw'/> - <source dev='/dev/sda'/> + <source dev='/dev/sda'> + <reservations enabled='yes' managed='no'> + <source type='unix' path='/path/to/qemu-pr-helper' mode='client'/> + </reservations> <target dev='sda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='3' unit='0'/> </disk> @@ -2819,6 +2822,26 @@ <a href="formatstorageencryption.html">Storage Encryption</a> page for more information. </dd> + <dt><code>reservations</code></dt> + <dd><span class="since">Since libvirt 3.10.0</span>, the
Will be 4.1.0 at least...
+ <code>reservations</code> can be a sub-element of the + <code>source</code> element for storage sources (QEMU driver only). + If present (and enabled) it enabled persistent reservations for
Doesn't read well - even with the second enabled changed to enables. Furthermore what's the purpose of enabled=no? Isn't that just like removing <reservations>? If so, then I think enabled just gets removed and the "managed" is the only required attribute. Besides you don't have a "enabled='no'" example. And, no, sorry I didn't go back through the entire "New QEMU daemon for persistent reservations" discussion that started in August to see if this was discussed there. Looking at virStoragePRDefFormat when enabled='no' anything anyone set for managed would be lost (e.g. not printed). IOW: by changing from enabled=yes and managed=no to enabled=no, then everything in the <source...> subelement is lost.
+ SCSI based disks. The element has one mandatory attribute + <code>enabled</code> with accepted values <code>yes</code> and
NB: Other uses go with "yes" and "no" not the code wrapped tag.
+ <code>no</code>. If the feature is enabled, then there's another + mandatory attribute <code>managed</code> (accepted values are the + same as for <code>enabled</code>) that enables or disables libvirt + spawning any helper process (should one be needed). However, if
Things get a bit confusing from here on in...
+ libvirt is not enabled spawning helper process (i.e. hypervisor + should just use one which is already running), path to its socket + must be provided in child element <code>source</code>, which + currently accepts only the following attributes: <code>type</code> + with one value <code>unix</code>, <code>path</code> with path the + socket, and finally <code>mode</code> which accepts either + <code>server</code> or <code>client</code> and specifies the role + of hypervisor.
Let's see if I can suggest something... Since libvirt 4.1.0, when the disk is a SCSI based lun, then this element may be used to enable persistent reservations on the storage source (QEMU driver only). The required attribute <code>managed</code> can be either "yes" or "no" indicating whether libvirt should spawn the helper process to communicate with QEMU via a <code>source</code> sub-element. [ok, I'm kind of lost, but hopefully there's something that can be expounded up that can help clear my vision...]
+ </dd> </dl>
<p> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index f22c932f6..884081bd7 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -1498,6 +1498,9 @@ <optional> <ref name="encryption"/> </optional> + <optional> + <ref name="reservations"/> + </optional>
domainschema fails syntax-check with your added XML You need to add this hunk to diskSourceBlock instead of diskSourceFile because this is only supported for 'lun' which as your XML examples exhibit have the following syntax: + <disk type='block' device='lun'> + <source dev='/dev/HostVG/QEMUGuest1'> + <reservations enabled='yes' managed='no'> and + <disk type='block' device='lun'> + <source dev='/dev/HostVG/QEMUGuest1'> + <reservations enabled='yes' managed='yes'/> BTW: Since you're going w/ 'lun' (which I believe is the same as a hostdev, then having a hostdev example and support probably is needed.
<zeroOrMore> <ref name='devSeclabel'/> </zeroOrMore> @@ -2447,21 +2450,7 @@ <value>vhostuser</value> </attribute> <interleave> - <element name="source"> - <attribute name="type"> - <value>unix</value> - </attribute> - <attribute name="path"> - <ref name="absFilePath"/> - </attribute> - <attribute name="mode"> - <choice> - <value>server</value> - <value>client</value> - </choice> - </attribute> - <empty/> - </element> + <ref name="unixSocketSource"/> <ref name="interface-options"/> </interleave> </group> diff --git a/docs/schemas/storagecommon.rng b/docs/schemas/storagecommon.rng index edee1b084..9071667b0 100644 --- a/docs/schemas/storagecommon.rng +++ b/docs/schemas/storagecommon.rng @@ -39,6 +39,40 @@ </element> </define>
+ <define name='unixSocketSource'> + <element name="source"> + <attribute name="type"> + <value>unix</value> + </attribute> + <attribute name="path"> + <ref name="absFilePath"/> + </attribute> + <attribute name="mode"> + <choice> + <value>server</value> + <value>client</value> + </choice> + </attribute> + <empty/> + </element> + </define> + + <define name='reservations'> + <element name='reservations'> + <attribute name='enabled'> + <ref name='virYesNo'/> + </attribute> + <optional> + <attribute name='managed'> + <ref name='virYesNo'/> + </attribute> + </optional> + <optional> + <ref name='unixSocketSource'/> + </optional> + </element> + </define> + <define name='secret'> <element name='secret'> <attribute name='type'> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index a1c25060f..222c6dd15 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -5163,6 +5163,13 @@ virDomainDiskDefValidate(const virDomainDiskDef *disk) } }
+ if (disk->src->pr && + disk->device != VIR_DOMAIN_DISK_DEVICE_LUN) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("<reservations/> allowed only for lun disks"));
Do you really need the < >/? IOW: why not just "reservations allowed only..." And what about other things that support having LUN such as VOLUME and NETWORK/ISCSI (e.g. above).
+ return -1; + } + /* Reject disks with a bus type that is not compatible with the * given address type. The function considers only buses that are * handled in common code. For other bus types it's not possible @@ -8579,6 +8586,29 @@ virDomainDiskSourcePrivateDataParse(xmlXPathContextPtr ctxt, }
+static int +virDomainDiskSourcePRParse(xmlNodePtr node, + virStoragePRDefPtr *prsrc) +{ + xmlNodePtr child; + virStoragePRDefPtr pr = NULL; + + for (child = node->children; child; child = child->next) { + if (child->type == XML_ELEMENT_NODE && + virXMLNodeNameEqual(child, "reservations")) { + + if (!(pr = virStoragePRDefParseNode(node->doc, child))) + return -1; + + *prsrc = pr; + return 0; + } + } + + return 0; +} + + int virDomainDiskSourceParse(xmlNodePtr node, xmlXPathContextPtr ctxt, @@ -8623,6 +8653,9 @@ virDomainDiskSourceParse(xmlNodePtr node, if (virDomainDiskSourceEncryptionParse(node, &src->encryption) < 0) goto cleanup;
+ if (virDomainDiskSourcePRParse(node, &src->pr) < 0) + goto cleanup; + if (virDomainDiskSourcePrivateDataParse(ctxt, src, flags, xmlopt) < 0) goto cleanup;
@@ -22519,6 +22552,9 @@ virDomainDiskSourceFormatInternal(virBufferPtr buf, virStorageEncryptionFormat(&childBuf, src->encryption) < 0) return -1;
+ if (src->pr) + virStoragePRDefFormat(&childBuf, src->pr); + if (virDomainDiskSourceFormatPrivateData(&childBuf, src, flags, xmlopt) < 0) return -1;
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index bc8cc1fba..b0dc76b91 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2717,6 +2717,9 @@ virStorageNetHostDefFree; virStorageNetHostTransportTypeFromString; virStorageNetHostTransportTypeToString; virStorageNetProtocolTypeToString; +virStoragePRDefFormat; +virStoragePRDefFree; +virStoragePRDefParseNode; virStorageSourceBackingStoreClear; virStorageSourceClear; virStorageSourceCopy; diff --git a/src/util/virstoragefile.c b/src/util/virstoragefile.c index 5780180a9..fda6617f7 100644 --- a/src/util/virstoragefile.c +++ b/src/util/virstoragefile.c @@ -1913,6 +1913,153 @@ virStorageAuthDefFormat(virBufferPtr buf, }
+void +virStoragePRDefFree(virStoragePRDefPtr prd) +{ + if (!prd) + return; + + VIR_FREE(prd->path); + VIR_FREE(prd); +} + + +static virStoragePRDefPtr +virStoragePRDefParseXML(xmlXPathContextPtr ctxt) +{ + virStoragePRDefPtr prd, ret = NULL; + char *enabled = NULL; + char *managed = NULL; + char *type = NULL; + char *path = NULL; + char *mode = NULL; + + if (VIR_ALLOC(prd) < 0) + return NULL; + + if (!(enabled = virXPathString("string(./@enabled)", ctxt))) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("missing @enabled attribute for <reservations/>")); + goto cleanup; + } + + if ((prd->enabled = virTristateBoolTypeFromString(enabled)) <= 0) { + virReportError(VIR_ERR_XML_ERROR, + _("invalid value for 'enabled': %s"), NULLSTR(enabled)); + goto cleanup; + } + + if (prd->enabled == VIR_TRISTATE_BOOL_YES) {
As noted before, I see no purpose for enabled=no
+ managed = virXPathString("string(./@managed)", ctxt); + if ((prd->managed = virTristateBoolTypeFromString(managed)) <= 0) { + virReportError(VIR_ERR_XML_ERROR, + _("invalid value for 'managed': %s"), NULLSTR(managed)); + goto cleanup; + } + + if (prd->managed == VIR_TRISTATE_BOOL_NO) { + type = virXPathString("string(./source[1]/@type)", ctxt); + path = virXPathString("string(./source[1]/@path)", ctxt); + mode = virXPathString("string(./source[1]/@mode)", ctxt); + + if (!type) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("missing connection type")); + goto cleanup; + } + + if (!path) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("missing path")); + goto cleanup; + } + + if (!mode) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("missing connection mode")); + goto cleanup; + } + + if (STRNEQ(type, "unix")) { + virReportError(VIR_ERR_XML_ERROR, + _("unsupported type: %s"), type); + goto cleanup; + } + + if (STRNEQ(mode, "client")) { + virReportError(VIR_ERR_XML_ERROR, + _("unsupported mode: %s"), mode); + goto cleanup; + }
But docs say client or server... If only going to be client, then no sense adding this yet.
+ + prd->path = path; + path = NULL; + + } + } + + ret = prd; + prd = NULL; + + cleanup: + VIR_FREE(mode); + VIR_FREE(path); + VIR_FREE(type); + VIR_FREE(managed); + VIR_FREE(enabled); + virStoragePRDefFree(prd); + return ret; +} + + +virStoragePRDefPtr +virStoragePRDefParseNode(xmlDocPtr xml, + xmlNodePtr root) +{ + xmlXPathContextPtr ctxt = NULL; + virStoragePRDefPtr prd = NULL; + + ctxt = xmlXPathNewContext(xml); + if (ctxt == NULL) { + virReportOOMError(); + goto cleanup; + } + + ctxt->node = root; + prd = virStoragePRDefParseXML(ctxt); + + cleanup: + xmlXPathFreeContext(ctxt); + return prd; +} + + +void +virStoragePRDefFormat(virBufferPtr buf, + virStoragePRDefPtr prd) +{ + virBufferAsprintf(buf, "<reservations enabled='%s'", + virTristateBoolTypeToString(prd->enabled)); + if (prd->enabled == VIR_TRISTATE_BOOL_YES) { + virBufferAsprintf(buf, " managed='%s'", + virTristateBoolTypeToString(prd->managed)); + if (prd->managed == VIR_TRISTATE_BOOL_NO) { + virBufferAddLit(buf, ">\n"); + virBufferAdjustIndent(buf, 2); + virBufferAddLit(buf, "<source type='unix'"); + virBufferEscapeString(buf, " path='%s'", prd->path); + virBufferAddLit(buf, " mode='client'/>\n");
What about mode='server'?
+ virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</reservations>\n"); + } else { + virBufferAddLit(buf, "/>\n"); + } + } else { + virBufferAddLit(buf, "/>\n"); + } +} + + virSecurityDeviceLabelDefPtr virStorageSourceGetSecurityLabelDef(virStorageSourcePtr src, const char *model) @@ -2291,6 +2438,7 @@ virStorageSourceClear(virStorageSourcePtr def) virBitmapFree(def->features); VIR_FREE(def->compat); virStorageEncryptionFree(def->encryption); + virStoragePRDefFree(def->pr); virStorageSourceSeclabelsClear(def); virStoragePermsFree(def->perms); VIR_FREE(def->timestamps); diff --git a/src/util/virstoragefile.h b/src/util/virstoragefile.h index ecd806c93..f82c20cf4 100644 --- a/src/util/virstoragefile.h +++ b/src/util/virstoragefile.h @@ -216,6 +216,14 @@ struct _virStorageAuthDef { virSecretLookupTypeDef seclookupdef; };
+typedef struct _virStoragePRDef virStoragePRDef; +typedef virStoragePRDef *virStoragePRDefPtr; +struct _virStoragePRDef { + int enabled; /* enum virTristateBool */ + int managed; /* enum virTristateBool */ + char *path; +}; + typedef struct _virStorageDriverData virStorageDriverData; typedef virStorageDriverData *virStorageDriverDataPtr;
@@ -243,6 +251,7 @@ struct _virStorageSource { bool authInherited; virStorageEncryptionPtr encryption; bool encryptionInherited;
Add an empty line here - if only to separate auth/encryption from this... Auth and Encryption could probably be separated to. It's just a visual thing.
+ virStoragePRDefPtr pr;
Not a fan of pr, but persistentReservation would be far worse and pR is no better. I find I mess up typing encryption often, so either expanded word would be tough ;-)
virObjectPtr privateData;
@@ -369,6 +378,12 @@ virStorageAuthDefPtr virStorageAuthDefCopy(const virStorageAuthDef *src); virStorageAuthDefPtr virStorageAuthDefParse(xmlDocPtr xml, xmlNodePtr root); int virStorageAuthDefFormat(virBufferPtr buf, virStorageAuthDefPtr authdef);
+void virStoragePRDefFree(virStoragePRDefPtr prd); +virStoragePRDefPtr virStoragePRDefParseNode(xmlDocPtr xml, + xmlNodePtr root); +void virStoragePRDefFormat(virBufferPtr buf, + virStoragePRDefPtr prd); + virSecurityDeviceLabelDefPtr virStorageSourceGetSecurityLabelDef(virStorageSourcePtr src, const char *model); diff --git a/tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.xml b/tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.xml new file mode 100644 index 000000000..8ee623a70 --- /dev/null +++ b/tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.xml @@ -0,0 +1,40 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu placement='static'>8</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i686</emulator> + <disk type='block' device='lun'> + <source dev='/dev/HostVG/QEMUGuest1'> + <reservations enabled='yes' managed='no'> + <source type='unix' path='/path/to/qemu-pr-helper.sock' mode='client'/>
Not that it was parsed, but 'server' was described so if supported there should be an example.
+ </reservations> + </source> + <target dev='sdb' bus='scsi'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='usb' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> + </controller> + <controller type='scsi' index='0' model='virtio-scsi'> + <driver queues='8'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </controller> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> + </memballoon> + </devices> +</domain> diff --git a/tests/qemuxml2argvdata/disk-virtio-scsi-reservations.xml b/tests/qemuxml2argvdata/disk-virtio-scsi-reservations.xml new file mode 100644 index 000000000..874446f7b --- /dev/null +++ b/tests/qemuxml2argvdata/disk-virtio-scsi-reservations.xml @@ -0,0 +1,38 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu placement='static'>8</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-i686</emulator> + <disk type='block' device='lun'> + <source dev='/dev/HostVG/QEMUGuest1'> + <reservations enabled='yes' managed='yes'/> + </source> + <target dev='sdb' bus='scsi'/>
If enabled='no' is important, then a second one should be added here without it just so we parse it.
+ <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='usb' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> + </controller> + <controller type='scsi' index='0' model='virtio-scsi'> + <driver queues='8'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </controller> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> + </memballoon> + </devices> +</domain> diff --git a/tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations-not-managed.xml b/tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations-not-managed.xml new file mode 120000 index 000000000..f96d62cb6 --- /dev/null +++ b/tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations-not-managed.xml @@ -0,0 +1 @@ +../qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.xml \ No newline at end of file diff --git a/tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations.xml b/tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations.xml new file mode 120000 index 000000000..dde52fd1d --- /dev/null +++ b/tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations.xml @@ -0,0 +1 @@ +../qemuxml2argvdata/disk-virtio-scsi-reservations.xml \ No newline at end of file diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index 2be8eb2c1..3d70f13c8 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -531,6 +531,10 @@ mymain(void) QEMU_CAPS_NODEFCONFIG, QEMU_CAPS_VIRTIO_SCSI); DO_TEST("disk-virtio-scsi-num_queues", QEMU_CAPS_NODEFCONFIG, QEMU_CAPS_VIRTIO_SCSI); + DO_TEST("disk-virtio-scsi-reservations", + QEMU_CAPS_NODEFCONFIG, QEMU_CAPS_VIRTIO_SCSI); + DO_TEST("disk-virtio-scsi-reservations-not-managed", + QEMU_CAPS_NODEFCONFIG, QEMU_CAPS_VIRTIO_SCSI); DO_TEST("disk-virtio-scsi-cmd_per_lun", QEMU_CAPS_NODEFCONFIG, QEMU_CAPS_VIRTIO_SCSI); DO_TEST("disk-virtio-scsi-max_sectors",

On 01/26/2018 04:07 AM, John Ferlan wrote:
On 01/18/2018 11:04 AM, Michal Privoznik wrote:
This is a definition that holds information on SCSI persistent reservation settings. The XML part looks like this:
<reservations enabled='yes' managed='no'> <source type='unix' path='/path/to/qemu-pr-helper.sock' mode='client'/> </reservations>
If @managed is set to 'yes' then the <source/> is not parsed. This design was agreed on here:
https://www.redhat.com/archives/libvir-list/2017-November/msg01005.html
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.html.in | 25 +++- docs/schemas/domaincommon.rng | 19 +-- docs/schemas/storagecommon.rng | 34 +++++ src/conf/domain_conf.c | 36 +++++ src/libvirt_private.syms | 3 + src/util/virstoragefile.c | 148 +++++++++++++++++++++ src/util/virstoragefile.h | 15 +++ .../disk-virtio-scsi-reservations-not-managed.xml | 40 ++++++ .../disk-virtio-scsi-reservations.xml | 38 ++++++ .../disk-virtio-scsi-reservations-not-managed.xml | 1 + .../disk-virtio-scsi-reservations.xml | 1 + tests/qemuxml2xmltest.c | 4 + 12 files changed, 348 insertions(+), 16 deletions(-) create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.xml create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations.xml create mode 120000 tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations-not-managed.xml create mode 120000 tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations.xml
Before digging too deep into this...
- I assume we're avoiding <disk> iSCSI mainly because those reservations would take place elsewhere, safe assumption?
I believe so, but I'll let Paolo answer that. The way I understand reservations is that qemu needs to issue 'privileged' SCSI commands and thus for regular SCSI (which for purpose of this argument involves iSCSI emulated by kernel) either qemu needs CAP_SYS_RAWIO or a helper process to which it'll pass the FD and which will issue the 'privileged' SCSI commands on qemu's behalf.
- What about using lun's from a storage pool (and what could become your favorite, NPIV devices ;-))
<disk type='volume' device='lun'> <driver name='qemu' type='raw'/> <source pool='sourcepool' volume='unit:0:4:0'/> <target dev='sda' bus='scsi'/> </disk>
These should work too with my patches (not tested though - I don't have any real SCSI machine).
- What about <hostdev>'s?
<hostdev mode='subsystem' type='scsi'>
but not iSCSI or vHost hostdev's. I think that creates the SCSI generic LUN, but it's been a while since I've thought about the terminology used for hostdev's...
I think these don't need the feature since qemu can access the device directly.
I also have this faint recollection of PR related to sgio filtering and perhaps even rawio, but dredging that back up could send me down the path of some "downstream only" type bz's. Although searching on just VIR_DOMAIN_DISK_DEVICE_LUN does bring up qemuSetUnprivSGIO.
And finally... I assume there is one qemu-pr-manager (qemu.conf changes eventually)... Eventually there's magic that allows/adds per domain *and* per LUN some socket path. If libvirt provided it's generated via the domain temporary directory; however, what's not really clear is how that unmanaged path really works. Need a virtual whiteboard...
So, in case of unmanaged path, here are the assumptions that my patches are built on: 1) unmanaged helper process (UHP) is spawned by somebody else's than libvirtd (hence unmanaged) - it doesn't have to be user, it can be systemd for instance. 2) path to UHP's socket has to be labeled correctly - libvirt doesn't touch that, because it knows nothing about usage scenario, whether sys admin intended one UHP per whole host and thus configured label that way, or it is spawned by mgmt app (or systemd, or whomever) per one domain, or even disk. Therefore, we can do nothing more than shrug shoulders and require users to label the socket correctly. Or use managed helper. 3) in future, when UHP dies, libvirt will NOT spawn it again. It's unmanaged after all. It's user/sysadmin responsibility to spawn it again. Now, for the managed helper process (MHP) the assumptions are: 1) there's one MHP per domain (all SCSI disks in the domain share the same MHP). 2) the MHP runs as root, but is placed into the same CGroup, mount namespace as qemu process it serves 3) MHP is lives and dies with the domain it is associated with. The code might be complicated more than needed - it is prepared to have one MHP per disk rather than domain (should we ever need it). Therefore instead of storing one pid_t, we store them in a hash table where more can be stored. Michal

On 01/29/2018 08:03 AM, Michal Privoznik wrote:
On 01/26/2018 04:07 AM, John Ferlan wrote:
On 01/18/2018 11:04 AM, Michal Privoznik wrote:
This is a definition that holds information on SCSI persistent reservation settings. The XML part looks like this:
<reservations enabled='yes' managed='no'> <source type='unix' path='/path/to/qemu-pr-helper.sock' mode='client'/> </reservations>
If @managed is set to 'yes' then the <source/> is not parsed. This design was agreed on here:
https://www.redhat.com/archives/libvir-list/2017-November/msg01005.html
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.html.in | 25 +++- docs/schemas/domaincommon.rng | 19 +-- docs/schemas/storagecommon.rng | 34 +++++ src/conf/domain_conf.c | 36 +++++ src/libvirt_private.syms | 3 + src/util/virstoragefile.c | 148 +++++++++++++++++++++ src/util/virstoragefile.h | 15 +++ .../disk-virtio-scsi-reservations-not-managed.xml | 40 ++++++ .../disk-virtio-scsi-reservations.xml | 38 ++++++ .../disk-virtio-scsi-reservations-not-managed.xml | 1 + .../disk-virtio-scsi-reservations.xml | 1 + tests/qemuxml2xmltest.c | 4 + 12 files changed, 348 insertions(+), 16 deletions(-) create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.xml create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations.xml create mode 120000 tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations-not-managed.xml create mode 120000 tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations.xml
Before digging too deep into this...
- I assume we're avoiding <disk> iSCSI mainly because those reservations would take place elsewhere, safe assumption?
I believe so, but I'll let Paolo answer that. The way I understand reservations is that qemu needs to issue 'privileged' SCSI commands and thus for regular SCSI (which for purpose of this argument involves iSCSI emulated by kernel) either qemu needs CAP_SYS_RAWIO or a helper process to which it'll pass the FD and which will issue the 'privileged' SCSI commands on qemu's behalf.
Just trying to cover all the bases... Disks, Hostdevs, Storage Pools, SCSI, iSCSI, NPIV, etc. A few different ways to figure LUNs. I think you're right regarding the privileged command - it's what I have a recollection of when dealing with PR before with the rawio and sgio properties.
- What about using lun's from a storage pool (and what could become your favorite, NPIV devices ;-))
<disk type='volume' device='lun'> <driver name='qemu' type='raw'/> <source pool='sourcepool' volume='unit:0:4:0'/> <target dev='sda' bus='scsi'/> </disk>
These should work too with my patches (not tested though - I don't have any real SCSI machine).
I have access to the NPIV environment I use and can let you use it as well... So that part is not a problem. IIRC, this was a two pronged question, but I neglected to go back and be more explicit.. The second half of the question is when using a storage pool for your LUN, then how is the PR added? The <reservations> are only being processed from one very specific disk source type. For reference, see how many times encryption is added to various RNG definitions - including diskSourceVolume which is what I think you'd want to process a 'srcpool' definition. The virStorageTranslateDiskSourcePool handles the magic of generating the disk like structure used when building the command line. There's also a 'mode' attribute where if set to 'host' for an iSCSI LUN's will result in using the "local" filesystem path to the LUN rather than going through the iSCSI server. I would think for this case, then one would have to have the PR set up (if configured of course).
- What about <hostdev>'s?
<hostdev mode='subsystem' type='scsi'>
but not iSCSI or vHost hostdev's. I think that creates the SCSI generic LUN, but it's been a while since I've thought about the terminology used for hostdev's...
I think these don't need the feature since qemu can access the device directly.
Well that'll make life a bit easier. IIRC, this creates a /dev/sgN on the guest.
I also have this faint recollection of PR related to sgio filtering and perhaps even rawio, but dredging that back up could send me down the path of some "downstream only" type bz's. Although searching on just VIR_DOMAIN_DISK_DEVICE_LUN does bring up qemuSetUnprivSGIO.
And finally... I assume there is one qemu-pr-manager (qemu.conf changes eventually)... Eventually there's magic that allows/adds per domain *and* per LUN some socket path. If libvirt provided it's generated via the domain temporary directory; however, what's not really clear is how that unmanaged path really works. Need a virtual whiteboard...
So, in case of unmanaged path, here are the assumptions that my patches are built on:
1) unmanaged helper process (UHP) is spawned by somebody else's than libvirtd (hence unmanaged) - it doesn't have to be user, it can be systemd for instance.
2) path to UHP's socket has to be labeled correctly - libvirt doesn't touch that, because it knows nothing about usage scenario, whether sys admin intended one UHP per whole host and thus configured label that way, or it is spawned by mgmt app (or systemd, or whomever) per one domain, or even disk. Therefore, we can do nothing more than shrug shoulders and require users to label the socket correctly. Or use managed helper.
3) in future, when UHP dies, libvirt will NOT spawn it again. It's unmanaged after all. It's user/sysadmin responsibility to spawn it again.
Now, for the managed helper process (MHP) the assumptions are:
1) there's one MHP per domain (all SCSI disks in the domain share the same MHP).
2) the MHP runs as root, but is placed into the same CGroup, mount namespace as qemu process it serves
3) MHP is lives and dies with the domain it is associated with.
The code might be complicated more than needed - it is prepared to have one MHP per disk rather than domain (should we ever need it). Therefore instead of storing one pid_t, we store them in a hash table where more can be stored.
I know the design was agreed upon previously, but nothing like actual patches to open Pandora's Box again ;-). I might be off in the weeds here, but taking a step back and typing thoughts... If MHP is a QEMU related process - IOW, something QEMU creates, then why is it libvirt's responsibility to manage it? Why would/should libvirt be responsible for restarting it if it goes away? The death would be noticeable if we were notified that the socket closed or perhaps if QEMU sent an event when it noticed. If the daemon was having a problem, then wouldn't restarting it possibly cause a never ending fail/restart cycle? What/Who stops that? Perhaps the UHP model is how we should be treating things - that is "mostly" not our problem. Shouldn't the extent of libvirt's "management" be nothing more than using it and handling when the connection dies by some action defined for the device that's using it (e.g. domain shutdown, domain pause, device hot unplug)?. Subsequent hot-plug's would need to determine that the connection to the daemon can be made; otherwise, the attempt fails. That may need to be something checked after the device add. Defining the close action on a device by device basis allows the domain configurer to make a choice based on whether it's a boot/system/important disk/lun or just something being used for something. Another random, but related thought... QEMU has made this change to add this PR daemon to manage something and communicate with libvirt by some defined socket... Someone else says, great idea, let's copy that and libvirt will be able to manage it. After all imitation is a form of flattery. That leaves libvirt trying to fit this new thing into the existing model - either by liberally copying the same code and changing the names to protect the innocent or refactoring to generate common paths for separate technologies. Still in the long run what we have for the current mechanism is a socket by path. So if we consider that the common point why not have a way to define socket paths to external daemon's by path. IOW: Trying to make this generic enough that we can just have a "type" that would be able to label the type of socket (e.g. "PR") <disk>... <source>.. <socket type='unix' path='/path/to/socket.sock' mode='client|server' closeaction='shutdown|pause|unplug' [daemon='/path/to/daemon' args='list of daemon args']>NAME</socket> Where : optional daemon & args will be used to spin out a new daemon NAME is just some string, but must be UNIQUE amongs other sockets (we could store these in a hash table any number of sockets) The thought behind 'daemon' attribute is to avoid the qemu.conf changes... The args allows defining a way to start the daemon debug or however it's desired - it's just a string copied/passed to the daemon. I would think it's up to daemon code to make all the rules and manage everything including security, cgroups, acls, etc. etc. We don't care if it's a LUN, a Disk, an LV, a Block device, etc. If the daemon wants to somehow support migration, then have at it. We probably should be as hands off as possible. OK - rambled on long enough ;-) John FWIW: Coverity issues found in current series... Patch 9: * In qemuProcessSetupOnePRDaemon, @pidfile is NULL if jump to cleanup if prHelperName is not executable causing a unlink(NULL) which is bad. Also the @cfg is leaked. Patch 10: * In qemuDomainObjPrivateXMLParsePRs, virXPathNodeSet can return a -1 into an unsigned @n value, thus the subsequent for loop will take a while.

On 29/01/2018 14:03, Michal Privoznik wrote:
On 01/26/2018 04:07 AM, John Ferlan wrote:
On 01/18/2018 11:04 AM, Michal Privoznik wrote:
This is a definition that holds information on SCSI persistent reservation settings. The XML part looks like this:
<reservations enabled='yes' managed='no'> <source type='unix' path='/path/to/qemu-pr-helper.sock' mode='client'/> </reservations>
If @managed is set to 'yes' then the <source/> is not parsed. This design was agreed on here:
https://www.redhat.com/archives/libvir-list/2017-November/msg01005.html
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.html.in | 25 +++- docs/schemas/domaincommon.rng | 19 +-- docs/schemas/storagecommon.rng | 34 +++++ src/conf/domain_conf.c | 36 +++++ src/libvirt_private.syms | 3 + src/util/virstoragefile.c | 148 +++++++++++++++++++++ src/util/virstoragefile.h | 15 +++ .../disk-virtio-scsi-reservations-not-managed.xml | 40 ++++++ .../disk-virtio-scsi-reservations.xml | 38 ++++++ .../disk-virtio-scsi-reservations-not-managed.xml | 1 + .../disk-virtio-scsi-reservations.xml | 1 + tests/qemuxml2xmltest.c | 4 + 12 files changed, 348 insertions(+), 16 deletions(-) create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.xml create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations.xml create mode 120000 tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations-not-managed.xml create mode 120000 tests/qemuxml2xmloutdata/disk-virtio-scsi-reservations.xml
Before digging too deep into this...
- I assume we're avoiding <disk> iSCSI mainly because those reservations would take place elsewhere, safe assumption?
I believe so, but I'll let Paolo answer that. The way I understand reservations is that qemu needs to issue 'privileged' SCSI commands and thus for regular SCSI (which for purpose of this argument involves iSCSI emulated by kernel) either qemu needs CAP_SYS_RAWIO or a helper process to which it'll pass the FD and which will issue the 'privileged' SCSI commands on qemu's behalf.
Yes. There are two reasons for QEMU to access the helper. First, in order to be able to issue the command without CAP_SYS_RAWIO. Second, in order to access /dev/mapper/control and issue the command to all targets in a multipath setup. iSCSI in kernel, including multipath over iSCSI is included. iSCSI in userspace does not need qemu-pr-manager because QEMU 1) can just send the command down a TCP socket without needing CAP_SYS_RAWIO 2) does not support multipath for iSCSI in userspace.
- What about using lun's from a storage pool (and what could become your favorite, NPIV devices ;-))
<disk type='volume' device='lun'> <driver name='qemu' type='raw'/> <source pool='sourcepool' volume='unit:0:4:0'/> <target dev='sda' bus='scsi'/> </disk>
These should work too with my patches (not tested though - I don't have any real SCSI machine).
- What about <hostdev>'s?
<hostdev mode='subsystem' type='scsi'>
but not iSCSI or vHost hostdev's. I think that creates the SCSI generic LUN, but it's been a while since I've thought about the terminology used for hostdev's...
I think these don't need the feature since qemu can access the device directly.
They actually need the feature, but it can be added later.
And finally... I assume there is one qemu-pr-manager (qemu.conf changes eventually)... Eventually there's magic that allows/adds per domain *and* per LUN some socket path. If libvirt provided it's generated via the domain temporary directory; however, what's not really clear is how that unmanaged path really works. Need a virtual whiteboard...
So, in case of unmanaged path, here are the assumptions that my patches are built on:
1) unmanaged helper process (UHP) is spawned by somebody else's than libvirtd (hence unmanaged) - it doesn't have to be user, it can be systemd for instance.
2) path to UHP's socket has to be labeled correctly - libvirt doesn't touch that
3) in future, when UHP dies, libvirt will NOT spawn it again. It's unmanaged after all. It's user/sysadmin responsibility to spawn it again.
Correct.
Now, for the managed helper process (MHP) the assumptions are:
1) there's one MHP per domain (all SCSI disks in the domain share the same MHP).
2) the MHP runs as root, but is placed into the same CGroup, mount namespace as qemu process it serves
3) MHP is lives and dies with the domain it is associated with.
Correct, with the caveat that QEMU must provide the MHP state and death event for this to be complete. Thanks, Paolo
The code might be complicated more than needed - it is prepared to have one MHP per disk rather than domain (should we ever need it). Therefore instead of storing one pid_t, we store them in a hash table where more can be stored.
Michal

Couple of reasons for that: a) there's no monitor command to change path where the pr-helper connects to, or b) there's no monitor command to introduce a new pr-helper for a disk that already exists. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/qemu/qemu_domain.c | 8 ++++++++ src/util/virstoragefile.c | 18 ++++++++++++++++++ src/util/virstoragefile.h | 2 ++ 4 files changed, 29 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index b0dc76b91..424f2ef64 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2719,6 +2719,7 @@ virStorageNetHostTransportTypeToString; virStorageNetProtocolTypeToString; virStoragePRDefFormat; virStoragePRDefFree; +virStoragePRDefIsEqual; virStoragePRDefParseNode; virStorageSourceBackingStoreClear; virStorageSourceClear; diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 441bf5935..e8539dcab 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -6865,6 +6865,14 @@ qemuDomainDiskChangeSupported(virDomainDiskDefPtr disk, CHECK_EQ(src->readonly, "readonly", true); CHECK_EQ(src->shared, "shared", true); + if (!virStoragePRDefIsEqual(disk->src->pr, + orig_disk->src->pr)) { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, + _("cannot modify field '%s' of the disk"), + "reservations"); + return false; + } + #undef CHECK_EQ return true; diff --git a/src/util/virstoragefile.c b/src/util/virstoragefile.c index fda6617f7..469f7c97b 100644 --- a/src/util/virstoragefile.c +++ b/src/util/virstoragefile.c @@ -2060,6 +2060,24 @@ virStoragePRDefFormat(virBufferPtr buf, } +bool +virStoragePRDefIsEqual(virStoragePRDefPtr a, + virStoragePRDefPtr b) +{ + if (!a && !b) + return true; + + if (!a || !b) + return false; + + if (a->enabled != b->enabled || + a->managed != b->managed || + STRNEQ_NULLABLE(a->path, b->path)) + return false; + + return true; +} + virSecurityDeviceLabelDefPtr virStorageSourceGetSecurityLabelDef(virStorageSourcePtr src, const char *model) diff --git a/src/util/virstoragefile.h b/src/util/virstoragefile.h index f82c20cf4..d07035d65 100644 --- a/src/util/virstoragefile.h +++ b/src/util/virstoragefile.h @@ -383,6 +383,8 @@ virStoragePRDefPtr virStoragePRDefParseNode(xmlDocPtr xml, xmlNodePtr root); void virStoragePRDefFormat(virBufferPtr buf, virStoragePRDefPtr prd); +bool virStoragePRDefIsEqual(virStoragePRDefPtr a, + virStoragePRDefPtr b); virSecurityDeviceLabelDefPtr virStorageSourceGetSecurityLabelDef(virStorageSourcePtr src, -- 2.13.6

On Thu, Jan 18, 2018 at 17:04:34 +0100, Michal Privoznik wrote:
Couple of reasons for that:
a) there's no monitor command to change path where the pr-helper connects to, or b) there's no monitor command to introduce a new pr-helper for a disk that already exists.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/qemu/qemu_domain.c | 8 ++++++++ src/util/virstoragefile.c | 18 ++++++++++++++++++ src/util/virstoragefile.h | 2 ++ 4 files changed, 29 insertions(+)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index b0dc76b91..424f2ef64 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2719,6 +2719,7 @@ virStorageNetHostTransportTypeToString; virStorageNetProtocolTypeToString; virStoragePRDefFormat; virStoragePRDefFree; +virStoragePRDefIsEqual; virStoragePRDefParseNode; virStorageSourceBackingStoreClear; virStorageSourceClear; diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 441bf5935..e8539dcab 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -6865,6 +6865,14 @@ qemuDomainDiskChangeSupported(virDomainDiskDefPtr disk, CHECK_EQ(src->readonly, "readonly", true); CHECK_EQ(src->shared, "shared", true);
+ if (!virStoragePRDefIsEqual(disk->src->pr, + orig_disk->src->pr)) { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, + _("cannot modify field '%s' of the disk"), + "reservations");
Is there a particular reason to format 'reservations' separately? Translations should not translate it, but doing this is generally deemed translation unfriendly. Rest looks good to me.

On Mon, 2018-02-12 at 16:40 +0100, Peter Krempa wrote:
@@ -6865,6 +6865,14 @@ qemuDomainDiskChangeSupported(virDomainDiskDefPtr disk, CHECK_EQ(src->readonly, "readonly", true); CHECK_EQ(src->shared, "shared", true);
+ if (!virStoragePRDefIsEqual(disk->src->pr, + orig_disk->src->pr)) { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, + _("cannot modify field '%s' of the disk"), + "reservations");
Is there a particular reason to format 'reservations' separately?
Translations should not translate it, but doing this is generally deemed translation unfriendly.
Wouldn't that actually be better for translators? Assuming there's going to be more than a single field, you can translate the message once and it will work for all instances. Of course that only works if, like in this case, the variable part itself does not need translation. -- Andrea Bolognani / Red Hat / Virtualization

On Mon, Feb 12, 2018 at 17:19:24 +0100, Andrea Bolognani wrote:
On Mon, 2018-02-12 at 16:40 +0100, Peter Krempa wrote:
@@ -6865,6 +6865,14 @@ qemuDomainDiskChangeSupported(virDomainDiskDefPtr disk, CHECK_EQ(src->readonly, "readonly", true); CHECK_EQ(src->shared, "shared", true);
+ if (!virStoragePRDefIsEqual(disk->src->pr, + orig_disk->src->pr)) { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, + _("cannot modify field '%s' of the disk"), + "reservations");
Is there a particular reason to format 'reservations' separately?
Translations should not translate it, but doing this is generally deemed translation unfriendly.
Wouldn't that actually be better for translators? Assuming there's going to be more than a single field, you can translate the message once and it will work for all instances.
Of course that only works if, like in this case, the variable part itself does not need translation.
Um yes, in this case it will actually help due to the surrounding code using that pattern a lot and also tha the field should not be translated. I retract my comment. ACK

The capability tracks if qemu has pr-manager-helper object. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + 2 files changed, 3 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index ab0ea8ec0..f6abc33ce 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -457,6 +457,7 @@ VIR_ENUM_IMPL(virQEMUCaps, QEMU_CAPS_LAST, /* 280 */ "pl011", "machine.pseries.max-cpu-compat", + "pr-manager-helper", ); @@ -1690,6 +1691,7 @@ struct virQEMUCapsStringFlags virQEMUCapsObjectTypes[] = { { "sclplmconsole", QEMU_CAPS_DEVICE_SCLPLMCONSOLE }, { "isa-serial", QEMU_CAPS_DEVICE_ISA_SERIAL }, { "pl011", QEMU_CAPS_DEVICE_PL011 }, + { "pr-manager-helper", QEMU_CAPS_PR_MANAGER_HELPER }, }; static struct virQEMUCapsStringFlags virQEMUCapsObjectPropsVirtioBalloon[] = { diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 3dfc77f87..44eba98ec 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -443,6 +443,7 @@ typedef enum { /* 280 */ QEMU_CAPS_DEVICE_PL011, /* -device pl011 (not user-instantiable) */ QEMU_CAPS_MACHINE_PSERIES_MAX_CPU_COMPAT, /* -machine pseries,max-cpu-compat= */ + QEMU_CAPS_PR_MANAGER_HELPER, /* -object pr-manager-helper */ QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; -- 2.13.6

On Thu, Jan 18, 2018 at 17:04:35 +0100, Michal Privoznik wrote:
The capability tracks if qemu has pr-manager-helper object.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + 2 files changed, 3 insertions(+)
ACK

This is an extended definition of virStoragePRDef because it contains runtime information (like path to pr helper socket, its pid and alias). Since these are driver dependant we should have a driver specific structure instead of putting all of that into driver agnostic structure. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 120 +++++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 18 ++++++++ 2 files changed, 138 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index e8539dcab..7fa8c93b7 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -65,6 +65,7 @@ #endif #include <sys/time.h> #include <fcntl.h> +#include <signal.h> #if defined(HAVE_SYS_MOUNT_H) # include <sys/mount.h> #endif @@ -1829,6 +1830,9 @@ qemuDomainObjPrivateDataClear(qemuDomainObjPrivatePtr priv) virBitmapFree(priv->migrationCaps); priv->migrationCaps = NULL; + + virHashFree(priv->prHelpers); + priv->prHelpers = NULL; } @@ -10917,6 +10921,122 @@ qemuDomainCheckMigrationCapabilities(virQEMUDriverPtr driver, } +static void +qemuDomainDiskPRObjectHashFree(void *payload, + const void *name) +{ + qemuDomainDiskPRObjectPtr tmp = payload; + + if (tmp->managed && + tmp->pid != (pid_t) -1) { + VIR_DEBUG("Forcibly killing pr-manager: %s", (const char *) name); + virProcessKillPainfully(tmp->pid, true); + } + VIR_FREE(tmp->path); + VIR_FREE(tmp); +} + + +/** + * qemuDomainDiskPRObjectRegister: + * @priv: Domain private data + * @alias: alias of the pr-manager object + * @managed: true if pr-managed object is manged by libvirt + * @path: socket path for the pr-manager object + * + * Records [alias, managed, path] tuple for pr-manager objects. + * On successful return @path is stolen and set to NULL. + * + * Returns 0 on success (with @path stolen), + * -1 otherwise (with error reported). + */ +int +qemuDomainDiskPRObjectRegister(qemuDomainObjPrivatePtr priv, + const char *alias, + bool managed, + char **path) +{ + qemuDomainDiskPRObjectPtr tmp; + int ret = -1; + + if (!priv->prHelpers && + !(priv->prHelpers = virHashCreate(10, qemuDomainDiskPRObjectHashFree))) + return -1; + + if ((tmp = virHashLookup(priv->prHelpers, alias))) { + /* Entry exists, check if it matches path. This shouldn't + * happen, but it's better to be safe than sorry. */ + if (STRNEQ(tmp->path, *path)) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("trying to change path for pr helper object")); + + return -1; + } + + /* Claim success */ + VIR_FREE(*path); + return 0; + } + + if (VIR_ALLOC(tmp) < 0) + goto cleanup; + + tmp->managed = managed, + tmp->path = *path; + tmp->pid = (pid_t) -1; + + if (virHashAddEntry(priv->prHelpers, alias, tmp) < 0) + goto cleanup; + + *path = NULL; + tmp = NULL; + ret = 0; + cleanup: + VIR_FREE(tmp); + return ret; +} + + +static int +qemuDomainDiskPRObjectKillOne(void *payload, + const void *name, + void *data ATTRIBUTE_UNUSED) +{ + qemuDomainDiskPRObjectPtr tmp = payload; + + if (!tmp->managed) + return 0; + + VIR_DEBUG("Killing pr-manager: %s", (const char *) name); + if (tmp->pid != (pid_t) -1 && + virProcessKill(tmp->pid, SIGTERM) < 0) { + virReportSystemError(errno, + _("Unable to kill pr-manager: %s"), + (const char *) name); + /* Don't return error; we want to kill as many as + * possible. */ + } else { + tmp->pid = (pid_t) -1; + } + + return 0; +} + + +void +qemuDomainDiskPRObjectKillAll(qemuDomainObjPrivatePtr priv) +{ + if (!priv->prHelpers) + return; + + virHashForEach(priv->prHelpers, + qemuDomainDiskPRObjectKillOne, NULL); + + virHashFree(priv->prHelpers); + priv->prHelpers = NULL; +} + + int qemuDomainPrepareDiskSource(virConnectPtr conn, virDomainDiskDefPtr disk, diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index ddfc46dcd..f741f3039 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -329,6 +329,8 @@ struct _qemuDomainObjPrivate { /* Migration capabilities. Rechecked on reconnect, not to be saved in * private XML. */ virBitmapPtr migrationCaps; + + virHashTablePtr prHelpers; }; # define QEMU_DOMAIN_PRIVATE(vm) \ @@ -990,4 +992,20 @@ qemuDomainPrepareDiskSource(virConnectPtr conn, qemuDomainObjPrivatePtr priv, virQEMUDriverConfigPtr cfg); +typedef struct _qemuDomainDiskPRObject qemuDomainDiskPRObject; +typedef qemuDomainDiskPRObject *qemuDomainDiskPRObjectPtr; +struct _qemuDomainDiskPRObject { + bool managed; + char *path; /* socket path */ + pid_t pid; /* daemon pid */ +}; + +int +qemuDomainDiskPRObjectRegister(qemuDomainObjPrivatePtr priv, + const char *alias, + bool managed, + char **path); +void +qemuDomainDiskPRObjectKillAll(qemuDomainObjPrivatePtr priv); + #endif /* __QEMU_DOMAIN_H__ */ -- 2.13.6

On Thu, Jan 18, 2018 at 17:04:36 +0100, Michal Privoznik wrote:
This is an extended definition of virStoragePRDef because it contains runtime information (like path to pr helper socket, its pid and alias). Since these are driver dependant we should have a driver specific structure instead of putting all of that into driver agnostic structure.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 120 +++++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 18 ++++++++ 2 files changed, 138 insertions(+)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index e8539dcab..7fa8c93b7 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -65,6 +65,7 @@ #endif #include <sys/time.h> #include <fcntl.h> +#include <signal.h> #if defined(HAVE_SYS_MOUNT_H) # include <sys/mount.h> #endif @@ -1829,6 +1830,9 @@ qemuDomainObjPrivateDataClear(qemuDomainObjPrivatePtr priv)
virBitmapFree(priv->migrationCaps); priv->migrationCaps = NULL; + + virHashFree(priv->prHelpers); + priv->prHelpers = NULL; }
@@ -10917,6 +10921,122 @@ qemuDomainCheckMigrationCapabilities(virQEMUDriverPtr driver, }
+static void +qemuDomainDiskPRObjectHashFree(void *payload, + const void *name) +{ + qemuDomainDiskPRObjectPtr tmp = payload; + + if (tmp->managed && + tmp->pid != (pid_t) -1) { + VIR_DEBUG("Forcibly killing pr-manager: %s", (const char *) name); + virProcessKillPainfully(tmp->pid, true);
Is this really a good idea? If you restart libvirtd gracefully, this will kill all the PR daemons. Also I'd prefer if all the process management code would be in qemu_process.c.

While we're not generating the command line just yet (look for the next commit), we can generate the alias for pr-manager. A domain can have up to one managed pr-manager (in which case socket path is decided by libvirt and pr-helper is spawned by libvirt too), but up to ndisks of unmanaged ones (one per disk basically). In case of the former we can generate a simple alias and be sure it'll not conflict. But in case of the latter to avoid any conflicts let's generate alias that's based on something unique - like disk target. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 3 +++ 2 files changed, 64 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 7fa8c93b7..e8d2adf56 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -972,6 +972,8 @@ qemuDomainStorageSourcePrivateDispose(void *obj) qemuDomainSecretInfoFree(&priv->secinfo); qemuDomainSecretInfoFree(&priv->encinfo); + + VIR_FREE(priv->prAlias); } @@ -11037,6 +11039,62 @@ qemuDomainDiskPRObjectKillAll(qemuDomainObjPrivatePtr priv) } +static int +qemuDomainPrepareDiskPR(qemuDomainObjPrivatePtr priv, + virDomainDiskDefPtr disk) +{ + qemuDomainStorageSourcePrivatePtr srcPriv; + virStoragePRDefPtr prd = disk->src->pr; + char *prPath = NULL; + bool managed; + int ret = -1; + + if (!prd || + prd->enabled != VIR_TRISTATE_BOOL_YES) + return 0; + + if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_PR_MANAGER_HELPER)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("reservations not supported with this QEMU binary")); + goto cleanup; + } + + if (!disk->src->privateData && + !(disk->src->privateData = qemuDomainStorageSourcePrivateNew())) + return -1; + + srcPriv = QEMU_DOMAIN_STORAGE_SOURCE_PRIVATE(disk->src); + managed = prd->managed == VIR_TRISTATE_BOOL_YES; + + if (managed) { + /* Generate PR helper socket path & alias that are same + * for each disk in the domain. */ + + if (VIR_STRDUP(srcPriv->prAlias, "pr-helper0") < 0) + return -1; + + if (virAsprintf(&prPath, "%s/pr-helper0.sock", priv->libDir) < 0) + return -1; + + } else { + if (virAsprintf(&srcPriv->prAlias, "pr-helper-%s", disk->dst) < 0) + return -1; + + if (VIR_STRDUP(prPath, prd->path) < 0) + return -1; + } + + if (qemuDomainDiskPRObjectRegister(priv, srcPriv->prAlias, + managed, &prPath) < 0) + goto cleanup; + + ret = 0; + cleanup: + VIR_FREE(prPath); + return ret; +} + + int qemuDomainPrepareDiskSource(virConnectPtr conn, virDomainDiskDefPtr disk, @@ -11049,6 +11107,9 @@ qemuDomainPrepareDiskSource(virConnectPtr conn, if (qemuDomainSecretDiskPrepare(conn, priv, disk) < 0) return -1; + if (qemuDomainPrepareDiskPR(priv, disk) < 0) + return -1; + if (disk->src->type == VIR_STORAGE_TYPE_NETWORK && disk->src->protocol == VIR_STORAGE_NET_PROTOCOL_GLUSTER && virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_GLUSTER_DEBUG_LEVEL)) { diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index f741f3039..5fdb5b931 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -375,6 +375,9 @@ struct _qemuDomainStorageSourcePrivate { /* data required for decryption of encrypted storage source */ qemuDomainSecretInfoPtr encinfo; + + /* alias for pr-manager-helper */ + char *prAlias; }; virObjectPtr qemuDomainStorageSourcePrivateNew(void); -- 2.13.6

On Thu, Jan 18, 2018 at 17:04:37 +0100, Michal Privoznik wrote:
While we're not generating the command line just yet (look for the next commit), we can generate the alias for pr-manager. A domain can have up to one managed pr-manager (in which case socket path is decided by libvirt and pr-helper is spawned by libvirt too), but up to ndisks of unmanaged ones (one per disk basically). In case of the former we can generate a simple alias and be sure it'll not conflict. But in case of the latter to avoid any conflicts let's generate alias that's based on something unique - like disk target.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 3 +++ 2 files changed, 64 insertions(+)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 7fa8c93b7..e8d2adf56 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -972,6 +972,8 @@ qemuDomainStorageSourcePrivateDispose(void *obj)
qemuDomainSecretInfoFree(&priv->secinfo); qemuDomainSecretInfoFree(&priv->encinfo); + + VIR_FREE(priv->prAlias); }
@@ -11037,6 +11039,62 @@ qemuDomainDiskPRObjectKillAll(qemuDomainObjPrivatePtr priv) }
+static int +qemuDomainPrepareDiskPR(qemuDomainObjPrivatePtr priv, + virDomainDiskDefPtr disk) +{ + qemuDomainStorageSourcePrivatePtr srcPriv; + virStoragePRDefPtr prd = disk->src->pr; + char *prPath = NULL; + bool managed; + int ret = -1; + + if (!prd || + prd->enabled != VIR_TRISTATE_BOOL_YES) + return 0; + + if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_PR_MANAGER_HELPER)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("reservations not supported with this QEMU binary")); + goto cleanup; + } + + if (!disk->src->privateData && + !(disk->src->privateData = qemuDomainStorageSourcePrivateNew())) + return -1; + + srcPriv = QEMU_DOMAIN_STORAGE_SOURCE_PRIVATE(disk->src); + managed = prd->managed == VIR_TRISTATE_BOOL_YES; + + if (managed) { + /* Generate PR helper socket path & alias that are same + * for each disk in the domain. */ + + if (VIR_STRDUP(srcPriv->prAlias, "pr-helper0") < 0) + return -1; + + if (virAsprintf(&prPath, "%s/pr-helper0.sock", priv->libDir) < 0) + return -1; + + } else { + if (virAsprintf(&srcPriv->prAlias, "pr-helper-%s", disk->dst) < 0) + return -1;
I though that unmanaged PRs would be shared which would make sense given the data structure you've introduced in patch 4. Since this code would in that case make problems with hotplug I looked back at that code. So, this means that there's exactly one managed PR daemon and every unmanaged PR daemon has possibly multiple instances/connections from qemu. So, that brings me to the question: Do you really need the prHelpers hash table? If the instances are duplicated, you can store the data in virStorageSource (in the private data) and don't really need the table. If there's intent to add sharing of the pr objects in qemu, you'll need to rething this code, since generating the alias will create problems when you hotunplug a shared PR instance and try to plug back a disk with a different one.

Now that we generate pr-manger alias and socket path store them in status XML so that they are preserved across daemon restarts. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 70 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index e8d2adf56..97996b053 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -2449,6 +2449,74 @@ qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, } +static int +qemuStorageSourcePrivateDataParsePR(xmlXPathContextPtr ctxt, + virStorageSourcePtr src) +{ + qemuDomainStorageSourcePrivatePtr srcPriv; + + if (!src->pr || + src->pr->enabled != VIR_TRISTATE_BOOL_YES) + return 0; + + if (!src->privateData && + !(src->privateData = qemuDomainStorageSourcePrivateNew())) + return -1; + + srcPriv = QEMU_DOMAIN_STORAGE_SOURCE_PRIVATE(src); + + if (!(srcPriv->prAlias = virXPathString("string(./prAlias)", ctxt))) + return -1; + + return 0; +} + + +static int +qemuStorageSourcePrivateDataFormatPR(virStorageSourcePtr src, + virBufferPtr buf) +{ + qemuDomainStorageSourcePrivatePtr srcPriv; + + if (!src->pr || + src->pr->enabled != VIR_TRISTATE_BOOL_YES) + return 0; + + srcPriv = QEMU_DOMAIN_STORAGE_SOURCE_PRIVATE(src); + + virBufferAsprintf(buf, "<prAlias>%s</prAlias>\n", srcPriv->prAlias); + return 0; +} + + +static int +qemuStorageSourcePrivateDataParse(xmlXPathContextPtr ctxt, + virStorageSourcePtr src) +{ + if (virStorageSourcePrivateDataParseRelPath(ctxt, src) < 0) + return -1; + + if (qemuStorageSourcePrivateDataParsePR(ctxt, src) < 0) + return -1; + + return 0; +} + + +static int +qemuStorageSourcePrivateDataFormat(virStorageSourcePtr src, + virBufferPtr buf) +{ + if (virStorageSourcePrivateDataFormatRelPath(src, buf) < 0) + return -1; + + if (qemuStorageSourcePrivateDataFormatPR(src, buf) < 0) + return -1; + + return 0; +} + + virDomainXMLPrivateDataCallbacks virQEMUDriverPrivateDataCallbacks = { .alloc = qemuDomainObjPrivateAlloc, .free = qemuDomainObjPrivateFree, @@ -2457,8 +2525,8 @@ virDomainXMLPrivateDataCallbacks virQEMUDriverPrivateDataCallbacks = { .chrSourceNew = qemuDomainChrSourcePrivateNew, .parse = qemuDomainObjPrivateXMLParse, .format = qemuDomainObjPrivateXMLFormat, - .storageParse = virStorageSourcePrivateDataParseRelPath, - .storageFormat = virStorageSourcePrivateDataFormatRelPath, + .storageParse = qemuStorageSourcePrivateDataParse, + .storageFormat = qemuStorageSourcePrivateDataFormat, }; -- 2.13.6

On Thu, Jan 18, 2018 at 17:04:38 +0100, Michal Privoznik wrote:
Now that we generate pr-manger alias and socket path store them in status XML so that they are preserved across daemon restarts.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 70 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index e8d2adf56..97996b053 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -2449,6 +2449,74 @@ qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, }
+static int +qemuStorageSourcePrivateDataParsePR(xmlXPathContextPtr ctxt, + virStorageSourcePtr src) +{ + qemuDomainStorageSourcePrivatePtr srcPriv; + + if (!src->pr || + src->pr->enabled != VIR_TRISTATE_BOOL_YES) + return 0;
This logic should not be here. If there is the field parse it. If it's not don't parse it.
+ + if (!src->privateData && + !(src->privateData = qemuDomainStorageSourcePrivateNew())) + return -1;
Hmmm, we probably should make sure that this is always allocated in the qemu driver as a separate patch preceeding this series.
+ + srcPriv = QEMU_DOMAIN_STORAGE_SOURCE_PRIVATE(src); + + if (!(srcPriv->prAlias = virXPathString("string(./prAlias)", ctxt))) + return -1; + + return 0; +} + + +static int +qemuStorageSourcePrivateDataFormatPR(virStorageSourcePtr src, + virBufferPtr buf) +{ + qemuDomainStorageSourcePrivatePtr srcPriv; + + if (!src->pr || + src->pr->enabled != VIR_TRISTATE_BOOL_YES) + return 0;
Same here, this logic should not be here. Presence of the alias should be the key.
+ + srcPriv = QEMU_DOMAIN_STORAGE_SOURCE_PRIVATE(src); + + virBufferAsprintf(buf, "<prAlias>%s</prAlias>\n", srcPriv->prAlias); + return 0; +}

This is the easier part. All we need to do here is put -object pr-manger-helper,id=$alias,path=$socketPath and then just reference the object in -drive file.pr-manger=$alias. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_command.c | 59 ++++++++++++++++++++++ .../disk-virtio-scsi-reservations-not-managed.args | 28 ++++++++++ .../disk-virtio-scsi-reservations.args | 29 +++++++++++ tests/qemuxml2argvtest.c | 8 +++ 4 files changed, 124 insertions(+) create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.args create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations.args diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index b8aede32d..13f2e4fd0 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -1515,6 +1515,20 @@ qemuDiskSourceGetProps(virStorageSourcePtr src) } +static void +qemuBuildDriveSourcePR(virBufferPtr buf, + virStorageSourcePtr src) +{ + qemuDomainStorageSourcePrivatePtr srcPriv = QEMU_DOMAIN_STORAGE_SOURCE_PRIVATE(src); + + if (!src->pr || + src->pr->enabled != VIR_TRISTATE_BOOL_YES) + return; + + virBufferAsprintf(buf, ",file.pr-manager=%s", srcPriv->prAlias); +} + + static int qemuBuildDriveSourceStr(virDomainDiskDefPtr disk, virQEMUCapsPtr qemuCaps, @@ -1591,6 +1605,8 @@ qemuBuildDriveSourceStr(virDomainDiskDefPtr disk, if (disk->src->debug) virBufferAsprintf(buf, ",file.debug=%d", disk->src->debugLevel); + + qemuBuildDriveSourcePR(buf, disk->src); } else { if (!(source = virQEMUBuildDriveCommandlineFromJSON(srcprops))) goto cleanup; @@ -10033,6 +10049,46 @@ qemuBuildPanicCommandLine(virCommandPtr cmd, } +struct qemuBuildMasterPRCommandLineData { + virCommandPtr cmd; +}; + + +static int +qemuBuildMasterPRCommandLineHelper(void *payload, + const void *name, + void *opaque) +{ + qemuDomainDiskPRObjectPtr obj = payload; + struct qemuBuildMasterPRCommandLineData *data = opaque; + virBuffer buf = VIR_BUFFER_INITIALIZER; + const char *alias = name; + + virBufferAsprintf(&buf, "pr-manager-helper,id=%s,path=%s", alias, obj->path); + virCommandAddArg(data->cmd, "-object"); + virCommandAddArgBuffer(data->cmd, &buf); + return 0; +} + + +static int +qemuBuildMasterPRCommandLine(virCommandPtr cmd, + qemuDomainObjPrivatePtr priv) +{ + struct qemuBuildMasterPRCommandLineData data = {.cmd = cmd }; + int ret = -1; + + if (priv->prHelpers && + virHashForEach(priv->prHelpers, + qemuBuildMasterPRCommandLineHelper, &data) < 0) + goto cleanup; + + ret = 0; + cleanup: + return ret; +} + + /** * qemuBuildCommandLineValidate: * @@ -10185,6 +10241,9 @@ qemuBuildCommandLine(virQEMUDriverPtr driver, if (qemuBuildMasterKeyCommandLine(cmd, priv) < 0) goto error; + if (qemuBuildMasterPRCommandLine(cmd, priv) < 0) + goto error; + if (enableFips) virCommandAddArg(cmd, "-enable-fips"); diff --git a/tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.args b/tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.args new file mode 100644 index 000000000..387432c18 --- /dev/null +++ b/tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.args @@ -0,0 +1,28 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-i686 \ +-name QEMUGuest1 \ +-S \ +-object pr-manager-helper,id=pr-helper-sdb,path=/path/to/qemu-pr-helper.sock \ +-M pc \ +-m 214 \ +-smp 8,sockets=8,cores=1,threads=1 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-nographic \ +-nodefaults \ +-chardev socket,id=charmonitor,path=/tmp/lib/domain--1-QEMUGuest1/monitor.sock,\ +server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=readline \ +-no-acpi \ +-boot c \ +-device virtio-scsi-pci,id=scsi0,num_queues=8,bus=pci.0,addr=0x3 \ +-usb \ +-drive file=/dev/HostVG/QEMUGuest1,file.pr-manager=pr-helper-sdb,format=raw,\ +if=none,id=drive-scsi0-0-0-0,boot=on \ +-device scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=0,\ +drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 diff --git a/tests/qemuxml2argvdata/disk-virtio-scsi-reservations.args b/tests/qemuxml2argvdata/disk-virtio-scsi-reservations.args new file mode 100644 index 000000000..2dc53cf05 --- /dev/null +++ b/tests/qemuxml2argvdata/disk-virtio-scsi-reservations.args @@ -0,0 +1,29 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/home/test \ +USER=test \ +LOGNAME=test \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-i686 \ +-name QEMUGuest1 \ +-S \ +-object pr-manager-helper,id=pr-helper0,\ +path=/tmp/lib/domain--1-QEMUGuest1/pr-helper0.sock \ +-M pc \ +-m 214 \ +-smp 8,sockets=8,cores=1,threads=1 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-nographic \ +-nodefaults \ +-chardev socket,id=charmonitor,path=/tmp/lib/domain--1-QEMUGuest1/monitor.sock,\ +server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=readline \ +-no-acpi \ +-boot c \ +-device virtio-scsi-pci,id=scsi0,num_queues=8,bus=pci.0,addr=0x3 \ +-usb \ +-drive file=/dev/HostVG/QEMUGuest1,file.pr-manager=pr-helper0,format=raw,\ +if=none,id=drive-scsi0-0-0-0,boot=on \ +-device scsi-block,bus=scsi0.0,channel=0,scsi-id=0,lun=0,\ +drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index be32d891e..153098e8c 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -2959,6 +2959,14 @@ mymain(void) QEMU_CAPS_HDA_DUPLEX); DO_TEST("user-aliases2", QEMU_CAPS_DEVICE_IOH3420, QEMU_CAPS_ICH9_AHCI); + DO_TEST("disk-virtio-scsi-reservations", + QEMU_CAPS_DRIVE_BOOT, QEMU_CAPS_VIRTIO_SCSI, + QEMU_CAPS_SCSI_BLOCK, QEMU_CAPS_PR_MANAGER_HELPER); + + DO_TEST("disk-virtio-scsi-reservations-not-managed", + QEMU_CAPS_DRIVE_BOOT, QEMU_CAPS_VIRTIO_SCSI, + QEMU_CAPS_SCSI_BLOCK, QEMU_CAPS_PR_MANAGER_HELPER); + if (getenv("LIBVIRT_SKIP_CLEANUP") == NULL) virFileDeleteTree(fakerootdir); -- 2.13.6

On Thu, Jan 18, 2018 at 17:04:39 +0100, Michal Privoznik wrote:
This is the easier part. All we need to do here is put -object pr-manger-helper,id=$alias,path=$socketPath and then just reference the object in -drive file.pr-manger=$alias.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_command.c | 59 ++++++++++++++++++++++ .../disk-virtio-scsi-reservations-not-managed.args | 28 ++++++++++ .../disk-virtio-scsi-reservations.args | 29 +++++++++++ tests/qemuxml2argvtest.c | 8 +++ 4 files changed, 124 insertions(+) create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations-not-managed.args create mode 100644 tests/qemuxml2argvdata/disk-virtio-scsi-reservations.args
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index b8aede32d..13f2e4fd0 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -1515,6 +1515,20 @@ qemuDiskSourceGetProps(virStorageSourcePtr src) }
+static void +qemuBuildDriveSourcePR(virBufferPtr buf, + virStorageSourcePtr src) +{ + qemuDomainStorageSourcePrivatePtr srcPriv = QEMU_DOMAIN_STORAGE_SOURCE_PRIVATE(src); + + if (!src->pr || + src->pr->enabled != VIR_TRISTATE_BOOL_YES)
This code should really just look whether the alias is allocated rather than duplicating the logic of when it's used.
+ return; + + virBufferAsprintf(buf, ",file.pr-manager=%s", srcPriv->prAlias);
According qemu.git/qapi/block-core.json, the pr-manager field is supported only for '@BlockdevOptionsFile'. This means that you need to check that it's used only for local paths which eventually translate to BlockdevOptionsFile. Otherwise qemu will refuse this with any network based disk. virStorageSourceIsLocalStorage is your friend.
+ static int qemuBuildDriveSourceStr(virDomainDiskDefPtr disk, virQEMUCapsPtr qemuCaps, @@ -1591,6 +1605,8 @@ qemuBuildDriveSourceStr(virDomainDiskDefPtr disk,
if (disk->src->debug) virBufferAsprintf(buf, ",file.debug=%d", disk->src->debugLevel); + + qemuBuildDriveSourcePR(buf, disk->src); } else { if (!(source = virQEMUBuildDriveCommandlineFromJSON(srcprops))) goto cleanup; @@ -10033,6 +10049,46 @@ qemuBuildPanicCommandLine(virCommandPtr cmd, }
+struct qemuBuildMasterPRCommandLineData { + virCommandPtr cmd; +};
Do you really need a single member struct for the void pointer?
+ + +static int +qemuBuildMasterPRCommandLineHelper(void *payload, + const void *name, + void *opaque) +{ + qemuDomainDiskPRObjectPtr obj = payload; + struct qemuBuildMasterPRCommandLineData *data = opaque; + virBuffer buf = VIR_BUFFER_INITIALIZER; + const char *alias = name; + + virBufferAsprintf(&buf, "pr-manager-helper,id=%s,path=%s", alias, obj->path); + virCommandAddArg(data->cmd, "-object"); + virCommandAddArgBuffer(data->cmd, &buf); + return 0; +} +

Just like we allow users overriding path to bridge-helper detected at compile time we can allow them to override path to qemu-pr-helper. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- m4/virt-driver-qemu.m4 | 5 +++++ src/qemu/libvirtd_qemu.aug | 1 + src/qemu/qemu.conf | 4 ++++ src/qemu/qemu_conf.c | 7 ++++++- src/qemu/qemu_conf.h | 1 + src/qemu/test_libvirtd_qemu.aug.in | 1 + 6 files changed, 18 insertions(+), 1 deletion(-) diff --git a/m4/virt-driver-qemu.m4 b/m4/virt-driver-qemu.m4 index b9bafdab9..0e0f261d6 100644 --- a/m4/virt-driver-qemu.m4 +++ b/m4/virt-driver-qemu.m4 @@ -57,6 +57,11 @@ AC_DEFUN([LIBVIRT_DRIVER_CHECK_QEMU], [ [/usr/libexec:/usr/lib/qemu:/usr/lib]) AC_DEFINE_UNQUOTED([QEMU_BRIDGE_HELPER], ["$QEMU_BRIDGE_HELPER"], [QEMU bridge helper]) + AC_PATH_PROG([QEMU_PR_HELPER], [qemu-pr-helper], + [/usr/libexec/qemu-pr-helper], + [/usr/libexec:/usr/lib/qemu:/usr/lib]) + AC_DEFINE_UNQUOTED([QEMU_PR_HELPER], ["$QEMU_PR_HELPER"], + [QEMU PR helper]) ]) AC_DEFUN([LIBVIRT_DRIVER_RESULT_QEMU], [ diff --git a/src/qemu/libvirtd_qemu.aug b/src/qemu/libvirtd_qemu.aug index c19bf3a43..2dc16e91f 100644 --- a/src/qemu/libvirtd_qemu.aug +++ b/src/qemu/libvirtd_qemu.aug @@ -86,6 +86,7 @@ module Libvirtd_qemu = let process_entry = str_entry "hugetlbfs_mount" | bool_entry "clear_emulator_capabilities" | str_entry "bridge_helper" + | str_entry "pr_helper" | bool_entry "set_process_name" | int_entry "max_processes" | int_entry "max_files" diff --git a/src/qemu/qemu.conf b/src/qemu/qemu.conf index 43dd561cc..4bc668406 100644 --- a/src/qemu/qemu.conf +++ b/src/qemu/qemu.conf @@ -775,3 +775,7 @@ # This directory is used for memoryBacking source if configured as file. # NOTE: big files will be stored here #memory_backing_dir = "/var/lib/libvirt/qemu/ram" + +# Path to the SCSI persistent reservations helper. This helper is +# used whenever <reservations/> are enabled for SCSI disks. +#pr_helper = "/usr/libexec/qemu-pr-helper" diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index af503d31c..b842fb4ab 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -307,7 +307,8 @@ virQEMUDriverConfigPtr virQEMUDriverConfigNew(bool privileged) goto error; } - if (VIR_STRDUP(cfg->bridgeHelperName, QEMU_BRIDGE_HELPER) < 0) + if (VIR_STRDUP(cfg->bridgeHelperName, QEMU_BRIDGE_HELPER) < 0 || + VIR_STRDUP(cfg->prHelperName, QEMU_PR_HELPER) < 0) goto error; cfg->clearEmulatorCapabilities = true; @@ -392,6 +393,7 @@ static void virQEMUDriverConfigDispose(void *obj) } VIR_FREE(cfg->hugetlbfs); VIR_FREE(cfg->bridgeHelperName); + VIR_FREE(cfg->prHelperName); VIR_FREE(cfg->saveImageFormat); VIR_FREE(cfg->dumpImageFormat); @@ -759,6 +761,9 @@ int virQEMUDriverConfigLoadFile(virQEMUDriverConfigPtr cfg, if (virConfGetValueString(conf, "bridge_helper", &cfg->bridgeHelperName) < 0) goto cleanup; + if (virConfGetValueString(conf, "pr_helper", &cfg->prHelperName) < 0) + goto cleanup; + if (virConfGetValueBool(conf, "mac_filter", &cfg->macFilter) < 0) goto cleanup; diff --git a/src/qemu/qemu_conf.h b/src/qemu/qemu_conf.h index a553e30e2..11d986a53 100644 --- a/src/qemu/qemu_conf.h +++ b/src/qemu/qemu_conf.h @@ -153,6 +153,7 @@ struct _virQEMUDriverConfig { size_t nhugetlbfs; char *bridgeHelperName; + char *prHelperName; bool macFilter; diff --git a/src/qemu/test_libvirtd_qemu.aug.in b/src/qemu/test_libvirtd_qemu.aug.in index 688e5b9fd..c0efae47b 100644 --- a/src/qemu/test_libvirtd_qemu.aug.in +++ b/src/qemu/test_libvirtd_qemu.aug.in @@ -100,3 +100,4 @@ module Test_libvirtd_qemu = { "1" = "mount" } } { "memory_backing_dir" = "/var/lib/libvirt/qemu/ram" } +{ "pr_helper" = "/usr/libexec/qemu-pr-helper" } -- 2.13.6

Before we exec() qemu we have to spawn pr-helper processes for all managed reservations (well, technically there can only one). The only caveat there is that we should place the process into the same namespace and cgroup as qemu (so that it shares the same view of the system). But we can do that only after we've forked. That means calling the setup function between fork() and exec(). Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 151 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 151 insertions(+) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 25ec464d3..02608c1f3 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2507,6 +2507,151 @@ qemuProcessSetupEmulator(virDomainObjPtr vm) } +static int +qemuProcessSetupOnePRDaemonHook(void *opaque) +{ + virDomainObjPtr vm = opaque; + size_t i, nfds = 0; + int *fds = NULL; + int ret = -1; + + if (virProcessGetNamespaces(vm->pid, &nfds, &fds) < 0) + return ret; + + if (virProcessSetNamespaces(nfds, fds) < 0) + goto cleanup; + + ret = 0; + cleanup: + for (i = 0; i < nfds; i++) + VIR_FORCE_CLOSE(fds[i]); + VIR_FREE(fds); + return ret; +} + + +static int +qemuProcessSetupOnePRDaemon(void *payload, + const void *name, + void *opaque) +{ + qemuDomainDiskPRObjectPtr prObj = payload; + virDomainObjPtr vm = opaque; + qemuDomainObjPrivatePtr priv = vm->privateData; + virQEMUDriverPtr driver = priv->driver; + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + char *pidfile = NULL; + pid_t cpid = -1; + virCommandPtr cmd = NULL; + virTimeBackOffVar timebackoff; + const unsigned long long timeout = 500000; /* ms */ + int ret = -1; + + if (!prObj->managed) + return 0; + + if (!virFileIsExecutable(cfg->prHelperName)) { + virReportSystemError(errno, _("'%s' is not a suitable pr helper"), + cfg->prHelperName); + goto cleanup; + } + + if (!(pidfile = virPidFileBuildPath(cfg->stateDir, name))) + goto cleanup; + + if (unlink(pidfile) < 0 && + errno != ENOENT) { + virReportSystemError(errno, + _("Cannot remove stale PID file %s"), + pidfile); + goto cleanup; + } + + if (!(cmd = virCommandNewArgList(cfg->prHelperName, + "-k", prObj->path, + NULL))) + goto cleanup; + + virCommandDaemonize(cmd); + virCommandSetPidFile(cmd, pidfile); + virCommandSetPreExecHook(cmd, qemuProcessSetupOnePRDaemonHook, vm); + +#ifdef CAP_SYS_RAWIO + virCommandAllowCap(cmd, CAP_SYS_RAWIO); +#else + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("Raw I/O is not supported on this platform")); + goto cleanup; +#endif + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + if (virPidFileReadPath(pidfile, &cpid) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("pr helper %s didn't show up"), (const char *) name); + goto cleanup; + } + + if (virTimeBackOffStart(&timebackoff, 1, timeout) < 0) + goto cleanup; + while (virTimeBackOffWait(&timebackoff)) { + if (virFileExists(prObj->path)) + break; + + if (virProcessKill(cpid, 0) == 0) + continue; + + virReportSystemError(errno, "%s", + _("pr helper died unexpectedly")); + goto cleanup; + } + + if (priv->cgroup && + virCgroupAddMachineTask(priv->cgroup, cpid) < 0) + goto cleanup; + + if (qemuSecurityDomainSetPathLabel(driver->securityManager, + vm->def, prObj->path, true) < 0) + goto cleanup; + + prObj->pid = cpid; + ret = 0; + cleanup: + if (ret < 0) { + virCommandAbort(cmd); + virProcessKillPainfully(cpid, true); + } + virCommandFree(cmd); + if (unlink(pidfile) < 0 && + errno != ENOENT && + !virGetLastError()) + virReportSystemError(errno, + _("Cannot remove stale PID file %s"), + pidfile); + VIR_FREE(pidfile); + virObjectUnref(cfg); + return ret; +} + + +static int +qemuProcessSetupPRDaemons(virDomainObjPtr vm) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + int ret = -1; + + if (priv->prHelpers && + virHashForEach(priv->prHelpers, + qemuProcessSetupOnePRDaemon, vm) < 0) + goto cleanup; + + ret = 0; + cleanup: + return ret; +} + + static int qemuProcessInitPasswords(virConnectPtr conn, virQEMUDriverPtr driver, @@ -5896,6 +6041,10 @@ qemuProcessLaunch(virConnectPtr conn, if (qemuProcessSetupEmulator(vm) < 0) goto cleanup; + VIR_DEBUG("Setting up PR daemons"); + if (qemuProcessSetupPRDaemons(vm) < 0) + goto cleanup; + VIR_DEBUG("Setting domain security labels"); if (qemuSecuritySetAllLabel(driver, vm, @@ -6475,6 +6624,8 @@ void qemuProcessStop(virQEMUDriverPtr driver, VIR_FREE(vm->def->seclabels[i]->imagelabel); } + qemuDomainDiskPRObjectKillAll(priv); + qemuHostdevReAttachDomainDevices(driver, vm->def); def = vm->def; -- 2.13.6

On Thu, Jan 18, 2018 at 17:04:41 +0100, Michal Privoznik wrote:
Before we exec() qemu we have to spawn pr-helper processes for all managed reservations (well, technically there can only one). The only caveat there is that we should place the process into the same namespace and cgroup as qemu (so that it shares the same view of the system). But we can do that only after we've forked. That means calling the setup function between fork() and exec().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 151 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 151 insertions(+)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 25ec464d3..02608c1f3 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2507,6 +2507,151 @@ qemuProcessSetupEmulator(virDomainObjPtr vm) }
+static int +qemuProcessSetupOnePRDaemonHook(void *opaque) +{ + virDomainObjPtr vm = opaque; + size_t i, nfds = 0; + int *fds = NULL; + int ret = -1; + + if (virProcessGetNamespaces(vm->pid, &nfds, &fds) < 0)
If this detects '0' namespaces ...
+ return ret; + + if (virProcessSetNamespaces(nfds, fds) < 0)
... this call will be unhappy.
+ goto cleanup; + + ret = 0; + cleanup: + for (i = 0; i < nfds; i++) + VIR_FORCE_CLOSE(fds[i]); + VIR_FREE(fds); + return ret; +} + + +static int +qemuProcessSetupOnePRDaemon(void *payload, + const void *name, + void *opaque) +{ + qemuDomainDiskPRObjectPtr prObj = payload; + virDomainObjPtr vm = opaque; + qemuDomainObjPrivatePtr priv = vm->privateData; + virQEMUDriverPtr driver = priv->driver; + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + char *pidfile = NULL; + pid_t cpid = -1; + virCommandPtr cmd = NULL; + virTimeBackOffVar timebackoff; + const unsigned long long timeout = 500000; /* ms */ + int ret = -1; + + if (!prObj->managed) + return 0;
Again. It does not make much sense to me to have the hash table and everything if this is ever going to be called for the managed daemon.
+ + if (!virFileIsExecutable(cfg->prHelperName)) { + virReportSystemError(errno, _("'%s' is not a suitable pr helper"),
This error message does not really report what it's checking.
+ cfg->prHelperName); + goto cleanup; + } + + if (!(pidfile = virPidFileBuildPath(cfg->stateDir, name))) + goto cleanup; + + if (unlink(pidfile) < 0 && + errno != ENOENT) { + virReportSystemError(errno, + _("Cannot remove stale PID file %s"), + pidfile); + goto cleanup; + } + + if (!(cmd = virCommandNewArgList(cfg->prHelperName, + "-k", prObj->path, + NULL))) + goto cleanup; + + virCommandDaemonize(cmd); + virCommandSetPidFile(cmd, pidfile); + virCommandSetPreExecHook(cmd, qemuProcessSetupOnePRDaemonHook, vm); + +#ifdef CAP_SYS_RAWIO + virCommandAllowCap(cmd, CAP_SYS_RAWIO); +#else + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("Raw I/O is not supported on this platform")); + goto cleanup;
This also looks like worth putting in the validation code for this so that we don't really get to here to find out.
+#endif + + if (virCommandRun(cmd, NULL) < 0) + goto cleanup; + + if (virPidFileReadPath(pidfile, &cpid) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("pr helper %s didn't show up"), (const char *) name); + goto cleanup; + } + + if (virTimeBackOffStart(&timebackoff, 1, timeout) < 0) + goto cleanup; + while (virTimeBackOffWait(&timebackoff)) { + if (virFileExists(prObj->path)) + break; + + if (virProcessKill(cpid, 0) == 0) + continue; + + virReportSystemError(errno, "%s", + _("pr helper died unexpectedly")); + goto cleanup; + }
Sooo, after the timeout you assume success? That does not seem like a reasonable idea.
+ + if (priv->cgroup && + virCgroupAddMachineTask(priv->cgroup, cpid) < 0) + goto cleanup; + + if (qemuSecurityDomainSetPathLabel(driver->securityManager, + vm->def, prObj->path, true) < 0) + goto cleanup; + + prObj->pid = cpid; + ret = 0; + cleanup: + if (ret < 0) { + virCommandAbort(cmd); + virProcessKillPainfully(cpid, true); + } + virCommandFree(cmd); + if (unlink(pidfile) < 0 && + errno != ENOENT && + !virGetLastError()) + virReportSystemError(errno, + _("Cannot remove stale PID file %s"), + pidfile); + VIR_FREE(pidfile); + virObjectUnref(cfg); + return ret; +}

On Mon, Feb 12, 2018 at 05:54:19PM +0100, Peter Krempa wrote:
On Thu, Jan 18, 2018 at 17:04:41 +0100, Michal Privoznik wrote:
Before we exec() qemu we have to spawn pr-helper processes for all managed reservations (well, technically there can only one). The only caveat there is that we should place the process into the same namespace and cgroup as qemu (so that it shares the same view of the system). But we can do that only after we've forked. That means calling the setup function between fork() and exec().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 151 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 151 insertions(+)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 25ec464d3..02608c1f3 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2507,6 +2507,151 @@ qemuProcessSetupEmulator(virDomainObjPtr vm) }
+static int +qemuProcessSetupOnePRDaemonHook(void *opaque) +{ + virDomainObjPtr vm = opaque; + size_t i, nfds = 0; + int *fds = NULL; + int ret = -1; + + if (virProcessGetNamespaces(vm->pid, &nfds, &fds) < 0)
If this detects '0' namespaces ...
+ return ret; + + if (virProcessSetNamespaces(nfds, fds) < 0)
... this call will be unhappy.
Namespaces have been around since at least RHEL-5 vintage so I reckon we can assume non-zero number of namespaces. In fact perhaps we should just make virProcessGetNamespaces() return an explicit error in the unlikely case it finds zero namespaces. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Mon, Feb 12, 2018 at 17:10:23 +0000, Daniel Berrange wrote:
On Mon, Feb 12, 2018 at 05:54:19PM +0100, Peter Krempa wrote:
On Thu, Jan 18, 2018 at 17:04:41 +0100, Michal Privoznik wrote:
Before we exec() qemu we have to spawn pr-helper processes for all managed reservations (well, technically there can only one). The only caveat there is that we should place the process into the same namespace and cgroup as qemu (so that it shares the same view of the system). But we can do that only after we've forked. That means calling the setup function between fork() and exec().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 151 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 151 insertions(+)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 25ec464d3..02608c1f3 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2507,6 +2507,151 @@ qemuProcessSetupEmulator(virDomainObjPtr vm) }
+static int +qemuProcessSetupOnePRDaemonHook(void *opaque) +{ + virDomainObjPtr vm = opaque; + size_t i, nfds = 0; + int *fds = NULL; + int ret = -1; + + if (virProcessGetNamespaces(vm->pid, &nfds, &fds) < 0)
If this detects '0' namespaces ...
+ return ret; + + if (virProcessSetNamespaces(nfds, fds) < 0)
... this call will be unhappy.
Namespaces have been around since at least RHEL-5 vintage so I reckon we can assume non-zero number of namespaces. In fact perhaps we should just make virProcessGetNamespaces() return an explicit error in the unlikely case it finds zero namespaces.
Fair enough, but it still does not look like the right usage semantics in this case.

We need to keep track of spawned processes so that we can kill them when qemu process dies. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 130 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 130 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 97996b053..4079165f9 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -131,6 +131,10 @@ static virClassPtr qemuDomainSaveCookieClass; static void qemuDomainLogContextDispose(void *obj); static void qemuDomainSaveCookieDispose(void *obj); +static void +qemuDomainDiskPRObjectHashFree(void *payload, + const void *name); + static int qemuDomainOnceInit(void) { @@ -1954,6 +1958,46 @@ qemuDomainObjPrivateXMLFormatAllowReboot(virBufferPtr buf, } +static int +qemuDomainObjPrivateXMLFormatOnePR(void *payload, + const void *name, + void *data) +{ + qemuDomainDiskPRObjectPtr prObj = payload; + virBufferPtr buf = data; + + virBufferAddLit(buf, "<manager>\n"); + virBufferAdjustIndent(buf, 2); + virBufferAsprintf(buf, "<alias>%s</alias>\n", (const char *) name); + virBufferAsprintf(buf, "<managed>%s</managed>\n", + virTristateBoolTypeToString(virTristateBoolFromBool(prObj->managed))); + virBufferEscapeString(buf, "<path>%s</path>\n", prObj->path); + virBufferAsprintf(buf, "<pid>%lld</pid>\n", (long long) prObj->pid); + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</manager>\n"); + + return 0; +} + + +static void +qemuDomainObjPrivateXMLFormatPRs(virBufferPtr buf, + virDomainObjPtr vm) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + + if (!priv->prHelpers) + return; + + virBufferAddLit(buf, "<prManagers>\n"); + virBufferAdjustIndent(buf, 2); + virHashForEach(priv->prHelpers, + qemuDomainObjPrivateXMLFormatOnePR, buf); + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</prManagers>\n"); +} + + static int qemuDomainObjPrivateXMLFormat(virBufferPtr buf, virDomainObjPtr vm) @@ -2081,6 +2125,8 @@ qemuDomainObjPrivateXMLFormat(virBufferPtr buf, if (qemuDomainObjPrivateXMLFormatBlockjobs(buf, vm) < 0) return -1; + qemuDomainObjPrivateXMLFormatPRs(buf, vm); + return 0; } @@ -2213,6 +2259,87 @@ qemuDomainObjPrivateXMLParseAllowReboot(xmlXPathContextPtr ctxt, } +static qemuDomainDiskPRObjectPtr +qemuDomainObjPrivateXMLParseOnePR(xmlXPathContextPtr ctxt, + char **alias) +{ + qemuDomainDiskPRObjectPtr ret; + char *managedStr = NULL; + int managed; + char *path = NULL; + long long cpid; + + if (VIR_ALLOC(ret) < 0) + return NULL; + + if (!(managedStr = virXPathString("string(./managed)", ctxt)) || + !(*alias = virXPathString("string(./alias)", ctxt)) || + !(path = virXPathString("string(./path)", ctxt)) || + virXPathLongLong("string(./pid)", ctxt, &cpid) < 0 || + (managed = virTristateBoolTypeFromString(managedStr)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Malformed status XML in <prManagers/>")); + goto error; + } + + VIR_FREE(managedStr); + ret->managed = managed == VIR_TRISTATE_BOOL_YES; + ret->path = path; + ret->pid = cpid; + path = NULL; + + return ret; + + error: + VIR_FREE(managedStr); + VIR_FREE(*alias); + VIR_FREE(path); + VIR_FREE(ret); + return NULL; +} + + +static int +qemuDomainObjPrivateXMLParsePRs(qemuDomainObjPrivatePtr priv, + xmlXPathContextPtr ctxt) +{ + xmlNodePtr saveNode = ctxt->node; + xmlNodePtr *nodeset = NULL; + virHashTablePtr prHelpers = NULL; + size_t i, n; + int ret = -1; + + if ((n = virXPathNodeSet("./prManagers/manager", ctxt, &nodeset)) > 0) { + if (!(prHelpers = virHashCreate(10, qemuDomainDiskPRObjectHashFree))) + goto cleanup; + + for (i = 0; i < n; i++) { + qemuDomainDiskPRObjectPtr tmp; + char *alias = NULL; + + ctxt->node = nodeset[i]; + if (!(tmp = qemuDomainObjPrivateXMLParseOnePR(ctxt, &alias))) + goto cleanup; + + if (virHashAddEntry(prHelpers, alias, tmp) < 0) { + qemuDomainDiskPRObjectHashFree(tmp, alias); + VIR_FREE(alias); + goto cleanup; + } + } + } + + priv->prHelpers = prHelpers; + prHelpers = NULL; + ret = 0; + cleanup: + virHashFree(prHelpers); + VIR_FREE(nodeset); + ctxt->node = saveNode; + return ret; +} + + static int qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, virDomainObjPtr vm, @@ -2433,6 +2560,9 @@ qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, if (qemuDomainObjPrivateXMLParseBlockjobs(priv, ctxt) < 0) goto error; + if (qemuDomainObjPrivateXMLParsePRs(priv, ctxt) < 0) + goto error; + return 0; error: -- 2.13.6

On Thu, Jan 18, 2018 at 17:04:42 +0100, Michal Privoznik wrote:
We need to keep track of spawned processes so that we can kill them when qemu process dies.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 130 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 130 insertions(+)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 97996b053..4079165f9 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -131,6 +131,10 @@ static virClassPtr qemuDomainSaveCookieClass; static void qemuDomainLogContextDispose(void *obj); static void qemuDomainSaveCookieDispose(void *obj);
+static void +qemuDomainDiskPRObjectHashFree(void *payload, + const void *name); + static int qemuDomainOnceInit(void) { @@ -1954,6 +1958,46 @@ qemuDomainObjPrivateXMLFormatAllowReboot(virBufferPtr buf, }
+static int +qemuDomainObjPrivateXMLFormatOnePR(void *payload, + const void *name, + void *data) +{ + qemuDomainDiskPRObjectPtr prObj = payload; + virBufferPtr buf = data; + + virBufferAddLit(buf, "<manager>\n"); + virBufferAdjustIndent(buf, 2); + virBufferAsprintf(buf, "<alias>%s</alias>\n", (const char *) name); + virBufferAsprintf(buf, "<managed>%s</managed>\n", + virTristateBoolTypeToString(virTristateBoolFromBool(prObj->managed))); + virBufferEscapeString(buf, "<path>%s</path>\n", prObj->path); + virBufferAsprintf(buf, "<pid>%lld</pid>\n", (long long) prObj->pid);
You don't really need any of this data for the unmanaged ones and need the PID and path for the managed manager. So this will also need some tweaking.

For hotplug, we will need to know how many disks use given pr-manager so that we know if to plug it too (for the first usage) or unlpug it (for the last one). Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 26 ++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 4 ++++ 2 files changed, 30 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 4079165f9..c327dfc86 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -11236,6 +11236,32 @@ qemuDomainDiskPRObjectKillAll(qemuDomainObjPrivatePtr priv) priv->prHelpers = NULL; } +size_t +qemuDomainGetPRUsageCount(const virDomainDef *def, + const char *prAlias) +{ + size_t used = 0; + size_t i; + + for (i = 0; i < def->ndisks; i++) { + const virDomainDiskDef *disk = def->disks[i]; + qemuDomainStorageSourcePrivatePtr srcPriv; + + if (!disk->src || + !disk->src->pr) + continue; + + srcPriv = QEMU_DOMAIN_STORAGE_SOURCE_PRIVATE(disk->src); + + if (STRNEQ(srcPriv->prAlias, prAlias)) + continue; + + used++; + } + + return used; +} + static int qemuDomainPrepareDiskPR(qemuDomainObjPrivatePtr priv, diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 5fdb5b931..231583256 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -1011,4 +1011,8 @@ qemuDomainDiskPRObjectRegister(qemuDomainObjPrivatePtr priv, void qemuDomainDiskPRObjectKillAll(qemuDomainObjPrivatePtr priv); +size_t +qemuDomainGetPRUsageCount(const virDomainDef *def, + const char *prAlias); + #endif /* __QEMU_DOMAIN_H__ */ -- 2.13.6

Again, for hotplug we need to be able to spawn just one process. Not all of them. Expose the static function we already have for that. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 9 +++++++++ src/qemu/qemu_process.h | 4 ++++ 2 files changed, 13 insertions(+) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 02608c1f3..b4c4c64fa 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2635,6 +2635,15 @@ qemuProcessSetupOnePRDaemon(void *payload, } +int +qemuProcessSetupPRDaemon(virDomainObjPtr vm, + qemuDomainDiskPRObjectPtr prObj, + const char *prAlias) +{ + return qemuProcessSetupOnePRDaemon(prObj, prAlias, vm); +} + + static int qemuProcessSetupPRDaemons(virDomainObjPtr vm) { diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h index cd9a72031..50946e16d 100644 --- a/src/qemu/qemu_process.h +++ b/src/qemu/qemu_process.h @@ -199,4 +199,8 @@ int qemuProcessRefreshDisks(virQEMUDriverPtr driver, virDomainObjPtr vm, qemuDomainAsyncJob asyncJob); +int qemuProcessSetupPRDaemon(virDomainObjPtr vm, + qemuDomainDiskPRObjectPtr prObj, + const char *prAlias); + #endif /* __QEMU_PROCESS_H__ */ -- 2.13.6

On Thu, Jan 18, 2018 at 17:04:44 +0100, Michal Privoznik wrote:
Again, for hotplug we need to be able to spawn just one process. Not all of them. Expose the static function we already have for that.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 9 +++++++++ src/qemu/qemu_process.h | 4 ++++ 2 files changed, 13 insertions(+)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 02608c1f3..b4c4c64fa 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2635,6 +2635,15 @@ qemuProcessSetupOnePRDaemon(void *payload, }
+int +qemuProcessSetupPRDaemon(virDomainObjPtr vm, + qemuDomainDiskPRObjectPtr prObj, + const char *prAlias) +{ + return qemuProcessSetupOnePRDaemon(prObj, prAlias, vm);
Do we really need a wrapper that only modifies argument order?

Surprisingly, nothing special is happening here. If we are the first to use the managed helper then spawn it. If not, we're almost done. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_hotplug.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 97 insertions(+) diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 6b245bd6a..ded666633 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -350,6 +350,84 @@ qemuDomainChangeEjectableMedia(virQEMUDriverPtr driver, } +static int +qemuBuildPRDefInfoProps(virDomainObjPtr vm, + virDomainDiskDefPtr disk, + virJSONValuePtr *prmgrProps, + const char **prAlias) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + qemuDomainStorageSourcePrivatePtr srcPriv; + qemuDomainDiskPRObjectPtr tmp; + virJSONValuePtr props = NULL; + int ret = -1; + + srcPriv = QEMU_DOMAIN_STORAGE_SOURCE_PRIVATE(disk->src); + + *prmgrProps = NULL; + + if (!priv->prHelpers || + !srcPriv->prAlias || + !(tmp = virHashLookup(priv->prHelpers, srcPriv->prAlias))) + return 0; + + if (qemuDomainGetPRUsageCount(vm->def, srcPriv->prAlias) != 0) { + /* We're not the first ones to use pr-manager with that + * alias. Return early and leave @prmgrProps NULL so + * that caller knows this. */ + return 0; + } + + if (virJSONValueObjectCreate(&props, + "s:path", tmp->path, + NULL) < 0) + goto cleanup; + + if (qemuProcessSetupPRDaemon(vm, tmp, srcPriv->prAlias) < 0) + goto cleanup; + + *prAlias = srcPriv->prAlias; + *prmgrProps = props; + props = NULL; + ret = 0; + cleanup: + virJSONValueFree(props); + return ret; +} + + +static void +qemuDestroyPRDefObject(virDomainObjPtr vm, + virDomainDiskDefPtr disk) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + qemuDomainStorageSourcePrivatePtr srcPriv; + qemuDomainDiskPRObjectPtr tmp; + + srcPriv = QEMU_DOMAIN_STORAGE_SOURCE_PRIVATE(disk->src); + + if (!priv->prHelpers || + !srcPriv->prAlias || + !(tmp = virHashLookup(priv->prHelpers, srcPriv->prAlias))) + return; + + if (qemuDomainGetPRUsageCount(vm->def, srcPriv->prAlias) > 1) { + /* The function might return one or even zero if the + * pr-manager is unused depending whether the disk was + * added to domain def already or not. Anyway, if the + * return value is greater than one we are certain that + * there's another disk using it so return early. */ + return; + } + + /* This also kills the pr-manager daemon. See + * qemuDomainDiskPRObjectHashFree. */ + virHashRemoveEntry(priv->prHelpers, srcPriv->prAlias); + + VIR_FREE(srcPriv->prAlias); +} + + /** * qemuDomainAttachDiskGeneric: * @@ -368,12 +446,15 @@ qemuDomainAttachDiskGeneric(virConnectPtr conn, char *devstr = NULL; char *drivestr = NULL; char *drivealias = NULL; + const char *prAlias = NULL; bool driveAdded = false; bool secobjAdded = false; bool encobjAdded = false; + bool prmgrAdded = false; virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); virJSONValuePtr secobjProps = NULL; virJSONValuePtr encobjProps = NULL; + virJSONValuePtr prmgrProps = NULL; qemuDomainStorageSourcePrivatePtr srcPriv; qemuDomainSecretInfoPtr secinfo = NULL; qemuDomainSecretInfoPtr encinfo = NULL; @@ -406,6 +487,9 @@ qemuDomainAttachDiskGeneric(virConnectPtr conn, disk->info.alias) < 0) goto error; + if (qemuBuildPRDefInfoProps(vm, disk, &prmgrProps, &prAlias) < 0) + goto error; + if (!(drivestr = qemuBuildDriveStr(disk, false, priv->qemuCaps))) goto error; @@ -438,6 +522,15 @@ qemuDomainAttachDiskGeneric(virConnectPtr conn, encobjAdded = true; } + if (prmgrProps) { + rv = qemuMonitorAddObject(priv->mon, "pr-manager-helper", prAlias, + prmgrProps); + prmgrProps = NULL; /* qemuMonitorAddObject consumes */ + if (rv < 0) + goto exit_monitor; + prmgrAdded = true; + } + if (qemuMonitorAddDrive(priv->mon, drivestr) < 0) goto exit_monitor; driveAdded = true; @@ -458,6 +551,7 @@ qemuDomainAttachDiskGeneric(virConnectPtr conn, cleanup: virJSONValueFree(secobjProps); virJSONValueFree(encobjProps); + virJSONValueFree(prmgrProps); qemuDomainSecretDiskDestroy(disk); VIR_FREE(devstr); VIR_FREE(drivestr); @@ -475,6 +569,8 @@ qemuDomainAttachDiskGeneric(virConnectPtr conn, ignore_value(qemuMonitorDelObject(priv->mon, secinfo->s.aes.alias)); if (encobjAdded) ignore_value(qemuMonitorDelObject(priv->mon, encinfo->s.aes.alias)); + if (prmgrAdded) + ignore_value(qemuMonitorDelObject(priv->mon, prAlias)); if (qemuDomainObjExitMonitor(driver, vm) < 0) ret = -2; virErrorRestore(&orig_err); @@ -484,6 +580,7 @@ qemuDomainAttachDiskGeneric(virConnectPtr conn, error: qemuDomainDelDiskSrcTLSObject(driver, vm, disk->src); ignore_value(qemuHotplugPrepareDiskAccess(driver, vm, disk, NULL, true)); + qemuDestroyPRDefObject(vm, disk); goto cleanup; } -- 2.13.6

On Thu, Jan 18, 2018 at 17:04:45 +0100, Michal Privoznik wrote:
Surprisingly, nothing special is happening here. If we are the first to use the managed helper then spawn it. If not, we're almost done.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_hotplug.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 97 insertions(+)
diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 6b245bd6a..ded666633 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -350,6 +350,84 @@ qemuDomainChangeEjectableMedia(virQEMUDriverPtr driver, }
+static int +qemuBuildPRDefInfoProps(virDomainObjPtr vm, + virDomainDiskDefPtr disk, + virJSONValuePtr *prmgrProps, + const char **prAlias) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + qemuDomainStorageSourcePrivatePtr srcPriv; + qemuDomainDiskPRObjectPtr tmp; + virJSONValuePtr props = NULL; + int ret = -1;
This function used in conjucntion with the JSON->commandline formatter should be used instead of qemuBuildMasterPRCommandLineHelper so that we don't have multiple implementations which need to be kept in sync ...
+ + srcPriv = QEMU_DOMAIN_STORAGE_SOURCE_PRIVATE(disk->src); + + *prmgrProps = NULL; + + if (!priv->prHelpers || + !srcPriv->prAlias || + !(tmp = virHashLookup(priv->prHelpers, srcPriv->prAlias))) + return 0; + + if (qemuDomainGetPRUsageCount(vm->def, srcPriv->prAlias) != 0) { + /* We're not the first ones to use pr-manager with that + * alias. Return early and leave @prmgrProps NULL so + * that caller knows this. */ + return 0; + }
Obviously you'll still need a wrapper for this logic ...
+ + if (virJSONValueObjectCreate(&props, + "s:path", tmp->path, + NULL) < 0)
... but we should not have two different implementations of this.
+ goto cleanup; + + if (qemuProcessSetupPRDaemon(vm, tmp, srcPriv->prAlias) < 0) + goto cleanup; + + *prAlias = srcPriv->prAlias; + *prmgrProps = props; + props = NULL; + ret = 0; + cleanup: + virJSONValueFree(props); + return ret; +} + + +static void +qemuDestroyPRDefObject(virDomainObjPtr vm, + virDomainDiskDefPtr disk) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + qemuDomainStorageSourcePrivatePtr srcPriv; + qemuDomainDiskPRObjectPtr tmp; + + srcPriv = QEMU_DOMAIN_STORAGE_SOURCE_PRIVATE(disk->src); + + if (!priv->prHelpers || + !srcPriv->prAlias || + !(tmp = virHashLookup(priv->prHelpers, srcPriv->prAlias))) + return; + + if (qemuDomainGetPRUsageCount(vm->def, srcPriv->prAlias) > 1) { + /* The function might return one or even zero if the + * pr-manager is unused depending whether the disk was + * added to domain def already or not. Anyway, if the + * return value is greater than one we are certain that + * there's another disk using it so return early. */
Shouldn't we use a different approach rather than comments like this?
+ return; + } + + /* This also kills the pr-manager daemon. See + * qemuDomainDiskPRObjectHashFree. */ + virHashRemoveEntry(priv->prHelpers, srcPriv->prAlias); + + VIR_FREE(srcPriv->prAlias); +} + + /** * qemuDomainAttachDiskGeneric: *

Just line in previous commit, if we are the last ones that are using the pr-manager delete it and also kill the pr-helper process. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 18 ++++++++++++++++++ src/qemu/qemu_domain.h | 4 ++++ src/qemu/qemu_hotplug.c | 13 +++++++++++++ 3 files changed, 35 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index c327dfc86..d15c29f4a 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -11263,6 +11263,24 @@ qemuDomainGetPRUsageCount(const virDomainDef *def, } +bool +qemuDomainDiskNeedRemovePR(virDomainObjPtr vm, + virDomainDiskDefPtr disk) +{ + qemuDomainStorageSourcePrivatePtr srcPriv; + + if (!disk->src) + return false; + + srcPriv = QEMU_DOMAIN_STORAGE_SOURCE_PRIVATE(disk->src); + if (!srcPriv || + !srcPriv->prAlias) + return false; + + return qemuDomainGetPRUsageCount(vm->def, srcPriv->prAlias) == 1; +} + + static int qemuDomainPrepareDiskPR(qemuDomainObjPrivatePtr priv, virDomainDiskDefPtr disk) diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 231583256..6f33d3f36 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -1015,4 +1015,8 @@ size_t qemuDomainGetPRUsageCount(const virDomainDef *def, const char *prAlias); +bool +qemuDomainDiskNeedRemovePR(virDomainObjPtr vm, + virDomainDiskDefPtr disk); + #endif /* __QEMU_DOMAIN_H__ */ diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index ded666633..e2064648b 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -3825,6 +3825,7 @@ qemuDomainRemoveDiskDevice(virQEMUDriverPtr driver, char *drivestr; char *objAlias = NULL; char *encAlias = NULL; + const char *prAlias = NULL; VIR_DEBUG("Removing disk %s from domain %p %s", disk->info.alias, vm, vm->def->name); @@ -3862,6 +3863,13 @@ qemuDomainRemoveDiskDevice(virQEMUDriverPtr driver, } } + if (qemuDomainDiskNeedRemovePR(vm, disk)) { + qemuDomainStorageSourcePrivatePtr srcPriv; + + srcPriv = QEMU_DOMAIN_STORAGE_SOURCE_PRIVATE(disk->src); + prAlias = srcPriv->prAlias; + } + qemuDomainObjEnterMonitor(driver, vm); qemuMonitorDriveDel(priv->mon, drivestr); @@ -3877,6 +3885,9 @@ qemuDomainRemoveDiskDevice(virQEMUDriverPtr driver, ignore_value(qemuMonitorDelObject(priv->mon, encAlias)); VIR_FREE(encAlias); + if (prAlias) + ignore_value(qemuMonitorDelObject(priv->mon, prAlias)); + if (disk->src->haveTLS) ignore_value(qemuMonitorDelObject(priv->mon, disk->src->tlsAlias)); @@ -3895,6 +3906,8 @@ qemuDomainRemoveDiskDevice(virQEMUDriverPtr driver, } } + qemuDestroyPRDefObject(vm, disk); + qemuDomainReleaseDeviceAddress(vm, &disk->info, src); if (qemuSecurityRestoreDiskLabel(driver, vm, disk) < 0) -- 2.13.6

On Thu, Jan 18, 2018 at 17:04:32 +0100, Michal Privoznik wrote:
QEMU added support for SCSI persistent reservations. The way QEMU implemented that makes it slightly harder for libvirt to adapt to - it's a small binary that needs to start before qemu so that it can connect to it. Basically, what libvirt does is:
1) start qemu-pr-helper (the small binary from above), 2) records its PID for tracking purposes 3) start qemu
and when shutting down a domain the process is reversed:
1) shut down qemu process, 2) kill qemu-pr-helper (PID was saved earlier).
Now, here also lies the last missing piece to he puzzle - what if qemu-pr-helper dies while qemu is running? After some discussion with Paolo it was agreed that qemu will emit an event for libvirt so that we can respawn the process. Anyway, that shouldn't be a show stopper for these patches (if it was the patch set might get too big).
Well, in general we should not allow these patches without that sub-feature so that we don't have a version of libvirt out there which will not handle that part.
Please test, review and comment.
As usual, you can find the patches on my github too:
https://github.com/zippy2/libvirt/tree/pr
Michal Privoznik (14): virstoragefile: Introduce virStoragePRDef qemuDomainDiskChangeSupported: Deny changing reservations qemu: Introduce pr-manager-helper capability qemu_domain: Introduce qemuDomainDiskPRObject qemu: Generate alias for pr-helper qemu: Store prAlias and prPath in status XML qemu: Generate cmd line at startup qemu: Introduce pr_helper to qemu.conf qemu: Start PR daemons on domain startup qemu: Track PR daemons PIDs in status XML qemu_domain: Introduce qemuDomainGetPRUsageCount qemu_process.c: Introduce qemuProcessSetupPRDaemon qemu_hotplug: Hotplug of reservations qemu_hotplug: Hotunplug of reservations
Ordering of the patches should be modified in such way, that the feature can be enabled only once all the supporting code is already in. E.g. hotplug should be fixed prior to enabling it, or command line generators should not allow formatting it prior to libvirtd being able to start the PR daemon.
participants (6)
-
Andrea Bolognani
-
Daniel P. Berrangé
-
John Ferlan
-
Michal Privoznik
-
Paolo Bonzini
-
Peter Krempa