[libvirt] [PATCH 00/15 v5] Unprivileged SG_IO support

Hi, As a result of RFC [1], this implements the unprivleged SG_IO support. Testing is not that enough, but I'd like see the reviewing earlier, and meanwhile I'm not going to give up the further testing. v4 - v5 (5 new patches): * Set sysfs unpriv_sgio when attaching disk * Restore sysfs unpriv_sgio when detaching disk * Error out when attaching disk if it's shared by other (domains), and the disk conf conflicts. * Do not restore sysfs unpriv_sgio when detaching disk if the disk is still being used by other domain(s) * Dump the original unpriv_sgio state in status XML, so that it won't be lost after restarting or reloading libvirtd. v3 - v4: * Rebase on the top * More testing v2 - v3: * Change the XML tag name to "cdbfilter" * Maintain an internal list of shared disks for QEMU driver. Patches 1/10 ~ 4/10 are to introduce the internal list for shared disks. Osier Yang (15): qemu: Introduce a list to maintain the shared disks between domains qemu: Init/Free the list with the driver's lifecyle qemu: Add/remove the shared disk entry during domain's lifecyle qemu: Add/Remove the entry of sharedDisks when live attaching/detaching docs: Add docs and rng schema for new XML cdbfilter conf: Parse and format the new XML tag cdbfilter util: Prepare helpers for unpriv_sgio setting qemu: Manage disk's cdbfilter in domain's lifecycle qemu: Do not restore the sysfs unpriv_sgio if the disk is being shared qemu: Error out when domain starting if the cdbfilter setting conflicts qemu: Set unpriv_sgio when attaching disk qemu: Restore unpriv_sgio when detaching disk qemu: Error out if the shared disk conf conflicts with others when attaching qemu: Do not restore unpriv_sgio if the disk is shared by other domain conf: Save disk's original unpriv_sgio state into status XML docs/formatdomain.html.in | 13 ++- docs/schemas/domaincommon.rng | 52 ++++-- src/conf/domain_conf.c | 106 ++++++++++-- src/conf/domain_conf.h | 13 ++ src/libvirt_private.syms | 5 + src/qemu/qemu_conf.c | 170 ++++++++++++++++++++ src/qemu/qemu_conf.h | 30 ++++ src/qemu/qemu_driver.c | 78 +++++++++ src/qemu/qemu_process.c | 141 ++++++++++++++++- src/qemu/qemu_process.h | 4 + src/util/util.c | 145 +++++++++++++++++ src/util/util.h | 7 + ...ml2argv-disk-scsi-lun-passthrough-cdbfilter.xml | 32 ++++ tests/qemuxml2xmltest.c | 1 + 14 files changed, 761 insertions(+), 36 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-disk-scsi-lun-passthrough-cdbfilter.xml Regards, Osier

A single entry of the list is constructed by the _disk path_ and a sub-list of domain names which are using the disk. * src/qemu/qemu_conf.h (New struct qemuSharedDisk, qemuSharedDiskList; New type qemuSharedDiskPtr, qemuSharedDiskListPtr; Declare the helpers qemuSharedDiskListFree, qemuSharedDiskListAdd, qemuSharedDiskListFind, qemuSharedDiskListDel) * src/qemu/qemu_conf.c (Implement the helpers) --- src/qemu/qemu_conf.c | 170 ++++++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_conf.h | 30 +++++++++ 2 files changed, 200 insertions(+), 0 deletions(-) diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index 8d380a1..4fdfb8c 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -556,3 +556,173 @@ qemuDriverCloseCallbackRunAll(virQEMUDriverPtr driver, virHashForEach(driver->closeCallbacks, qemuDriverCloseCallbackRun, &data); } + +void +qemuSharedDiskListFree(qemuSharedDiskListPtr list) +{ + int i; + int j; + + for (i = 0; i < list->ndisks; i++) { + VIR_FREE(list->disks[i]->path); + + for (j = 0; j < list->disks[i]->ndomains; j++) + VIR_FREE(list->disks[i]->domains[j]); + VIR_FREE(list->disks[i]->domains); + VIR_FREE(list->disks[i]); + } + VIR_FREE(list); +} + +/* Return the matched entry on success, with @idx set as + * the index of the matched entry. Or NULL on failure. + * + * If @domain_name is passed as NULL, it simply returns + * the entry which matches the @disk_path. + */ +qemuSharedDiskPtr +qemuSharedDiskListFind(qemuSharedDiskListPtr list, + const char *disk_path, + const char *domain_name, + int *idx) +{ + int i; + int j; + + for (i = 0; i < list->ndisks; i++) { + if (STREQ(disk_path, list->disks[i]->path)) { + if (!domain_name) + return list->disks[i]; + + for (j = 0; j < list->disks[i]->ndomains; j++) { + if (STREQ(domain_name, list->disks[i]->domains[j])) { + *idx = i; + return list->disks[i]; + } + } + } + } + + return NULL; +} + +int +qemuSharedDiskListAdd(qemuSharedDiskListPtr list, + const char *disk_path, + const char *domain_name) +{ + int i; + bool existed = false; + + if (qemuSharedDiskListFind(list, disk_path, domain_name, NULL)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("disk '%s' of domain '%s' is already " + "maintained"), disk_path, domain_name); + return -1; + } + + for (i = 0; i < list->ndisks; i++) { + qemuSharedDiskPtr disk = list->disks[i]; + + /* Append the domain name to the existed entry */ + if (STREQ(disk->path, disk_path)) { + if (VIR_REALLOC_N(disk->domains, + disk->ndomains + 1) < 0) { + virReportOOMError(); + return -1; + } + + disk->domains[disk->ndomains++] = strdup(domain_name); + existed = true; + break; + } + } + + /* No existed entry for the @disk_path yet */ + if (!existed) { + if (VIR_REALLOC_N(list->disks, list->ndisks + 1) < 0) { + virReportOOMError(); + return -1; + } + + qemuSharedDiskPtr disk = NULL; + + if ((VIR_ALLOC(disk) < 0) || + (VIR_ALLOC_N(disk->domains, 1) < 0)) { + virReportOOMError(); + return -1; + } + + disk->path = strdup(disk_path); + disk->ndomains = 1; + disk->domains[0] = strdup(domain_name); + + list->disks[list->ndisks] = disk; + list->ndisks++; + } + + return 0; +} + +int +qemuSharedDiskListDel(qemuSharedDiskListPtr list, + const char *disk_path, + const char *domain_name) +{ + qemuSharedDiskPtr disk = NULL; + int i; + int idx; + + if (!qemuSharedDiskListFind(list, disk_path, domain_name, &idx)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("disk '%s' of domain '%s' is not " + "maintained yet"), disk_path, domain_name); + return -1; + } + + disk = list->disks[idx]; + + if (disk->ndomains == 1) { + /* Free the disk entry if there is only one domain using it */ + if (idx != list->ndisks - 1) + memmove(&list->disks[idx], + &list->disks[idx + 1], + sizeof(*list->disks) * (list->ndisks - idx - 1)); + + if (VIR_REALLOC_N(list->disks, list->ndisks - 1) < 0) { + virReportOOMError(); + return -1; + } + + VIR_FREE(disk->domains[0]); + VIR_FREE(disk->domains); + VIR_FREE(disk->path); + VIR_FREE(disk); + list->ndisks--; + } else { + /* Otherwise just remove the domain name from the sub-list + * list->disk[i]->domains. + */ + for (i = 0; i < disk->ndomains; i++) { + if (STREQ(disk->domains[i], domain_name)) { + char *name = disk->domains[i]; + + if (i != disk->ndomains - 1) + memmove(&disk->domains[i], + &disk->domains[i + 1], + sizeof(*disk->domains) * (disk->ndomains - i - 1)); + + if (VIR_REALLOC_N(disk->domains, disk->ndomains - 1) < 0) { + virReportOOMError(); + return -1; + } + + disk->ndomains--; + VIR_FREE(name); + break; + } + } + } + + return 0; +} diff --git a/src/qemu/qemu_conf.h b/src/qemu/qemu_conf.h index d0d25ce..01768f7 100644 --- a/src/qemu/qemu_conf.h +++ b/src/qemu/qemu_conf.h @@ -53,6 +53,22 @@ typedef qemuDriverCloseDef *qemuDriverCloseDefPtr; typedef struct _virQEMUDriver virQEMUDriver; typedef virQEMUDriver *virQEMUDriverPtr; +typedef struct _qemuSharedDisk qemuSharedDisk; +typedef struct _qemuSharedDiskList qemuSharedDiskList; +typedef qemuSharedDiskList *qemuSharedDiskListPtr; +typedef qemuSharedDisk *qemuSharedDiskPtr; + +struct _qemuSharedDisk { + char *path; /* Disk path */ + char **domains; /* List of domain names which share the disk */ + int ndomains; +}; + +struct _qemuSharedDiskList { + int ndisks; + qemuSharedDiskPtr *disks; +}; + /* Main driver state */ struct _virQEMUDriver { virMutex lock; @@ -147,6 +163,8 @@ struct _virQEMUDriver { /* The devices which is are not in use by the host or any guest. */ pciDeviceList *inactivePciHostdevs; + qemuSharedDiskListPtr sharedDisks; + virBitmapPtr reservedRemotePorts; virSysinfoDefPtr hostsysinfo; @@ -211,4 +229,16 @@ qemuDriverCloseCallback qemuDriverCloseCallbackGet(virQEMUDriverPtr driver, void qemuDriverCloseCallbackRunAll(virQEMUDriverPtr driver, virConnectPtr conn); +void qemuSharedDiskListFree(qemuSharedDiskListPtr list); +int qemuSharedDiskListAdd(qemuSharedDiskListPtr list, + const char *disk_path, + const char *domain_name); +qemuSharedDiskPtr qemuSharedDiskListFind(qemuSharedDiskListPtr list, + const char *disk_path, + const char *domain_name, + int *idx); +int qemuSharedDiskListDel(qemuSharedDiskListPtr list, + const char *disk_path, + const char *domain_name); + #endif /* __QEMUD_CONF_H */ -- 1.7.7.6

Lifecyle here only means starting and shutdown. --- src/qemu/qemu_driver.c | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index bb1ec05..3d7f52e 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -810,6 +810,11 @@ qemuStartup(bool privileged, if ((qemu_driver->inactivePciHostdevs = pciDeviceListNew()) == NULL) goto error; + if (VIR_ALLOC(qemu_driver->sharedDisks) < 0) { + virReportOOMError(); + goto error; + } + if (privileged) { if (chown(qemu_driver->libDir, qemu_driver->user, qemu_driver->group) < 0) { virReportSystemError(errno, @@ -1064,6 +1069,7 @@ qemuShutdown(void) { pciDeviceListFree(qemu_driver->activePciHostdevs); pciDeviceListFree(qemu_driver->inactivePciHostdevs); usbDeviceListFree(qemu_driver->activeUsbHostdevs); + qemuSharedDiskListFree(qemu_driver->sharedDisks); virCapabilitiesFree(qemu_driver->caps); qemuCapsCacheFree(qemu_driver->capsCache); -- 1.7.7.6

Lifecyle here only means starting and shutdown. --- src/qemu/qemu_process.c | 20 ++++++++++++++++++++ 1 files changed, 20 insertions(+), 0 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index ab04599..a44cca1 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3708,6 +3708,13 @@ int qemuProcessStart(virConnectPtr conn, for (i = 0; i < vm->def->ndisks; i++) { if (vm->def->disks[i]->rawio == 1) virCommandAllowCap(cmd, CAP_SYS_RAWIO); + + if (vm->def->disks[i]->shared && + (qemuSharedDiskListAdd(driver->sharedDisks, + vm->def->disks[i]->src, + vm->def->name) < 0)) { + goto cleanup; + } } virCommandSetPreExecHook(cmd, qemuProcessHook, &hookData); @@ -4104,6 +4111,19 @@ void qemuProcessStop(virQEMUDriverPtr driver, flags & VIR_QEMU_PROCESS_STOP_MIGRATED); virSecurityManagerReleaseLabel(driver->securityManager, vm->def); + /* Remove the shared disk entry from qemud_driver->sharedDisks */ + for (i = 0; i < vm->def->ndisks; i++) { + if (vm->def->disks[i]->shared && + (qemuSharedDiskListDel(driver->sharedDisks, + vm->def->disks[i]->src, + vm->def->name) < 0)) { + VIR_WARN("Unable to remove shared disk entry for " + "disk = '%s', domain = '%s'", + vm->def->disks[i]->src, + vm->def->name); + } + } + /* Clear out dynamically assigned labels */ for (i = 0; i < vm->def->nseclabels; i++) { if (vm->def->seclabels[i]->type == VIR_DOMAIN_SECLABEL_DYNAMIC) { -- 1.7.7.6

This adds one entry to the list qemud_driver->sharedDisks when attaching a shared disk. And removing the entry from the list when detaching. --- src/qemu/qemu_driver.c | 22 ++++++++++++++++++++++ 1 files changed, 22 insertions(+), 0 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 3d7f52e..2c36935 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -6041,6 +6041,18 @@ qemuDomainAttachDeviceDiskLive(virConnectPtr conn, VIR_WARN("Failed to teardown cgroup for disk path %s", NULLSTR(disk->src)); } + + /* If the attached disk is shareable, add it to the list + * qemud_driver->sharedDisks if attachment succeeded + */ + if (ret == 0 && disk->shared) { + if (qemuSharedDiskListAdd(driver->sharedDisks, + disk->src, + vm->def->name) < 0) { + VIR_DEBUG("Failed to add the disk '%s' of domain '%s' to " + "sharedDisks list", disk->src, vm->def->name); + } + } end: if (cgroup) virCgroupFree(&cgroup); @@ -6156,6 +6168,16 @@ qemuDomainDetachDeviceDiskLive(virQEMUDriverPtr driver, virDomainDiskDeviceTypeToString(disk->type)); break; } + + if (ret == 0 && disk->shared) { + if (qemuSharedDiskListDel(driver->sharedDisks, + disk->src, + vm->def->name) < 0) { + VIR_DEBUG("Failed to remove the entry of disk '%s', domain " + "%s from sharedDisks", disk->src, vm->def->name); + } + } + return ret; } -- 1.7.7.6

Since "rawio" and "cdbfilter" are only valid for "lun", this groups them together; And since both of them intend to allow the unprivledged user to use the SG_IO commands, they must be exclusive. --- docs/formatdomain.html.in | 13 +++++++++- docs/schemas/domaincommon.rng | 52 ++++++++++++++++++++++++++++------------ 2 files changed, 48 insertions(+), 17 deletions(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 0574e68..cab6952 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1395,7 +1395,18 @@ rawio='yes', rawio capability will be enabled for all disks in the domain (because, in the case of QEMU, this capability can only be set on a per-process basis). This attribute is only - valid when device is "lun". + valid when device is "lun". NB, <code>rawio</code> intends to + confine the capability per-device, however, current QEMU + implementation gives the domain process broader capability + than that (per-process basis, affects all the domain disks). + To confine the capability as much as possible for QEMU driver + as this stage, <code>cdbfilter</code> is recommended. + The optional <code>cdbfilter</code> attribute + (<span class="since">since 1.0.1</span>) indicates whether the + kernel will filter unprivileged SG_IO for the disk, valid settings + are "yes" or "no" (defaults to "yes"). Note that it's exclusive + with attribute <code>rawio</code>; Same with <code>rawio</code>, + <code>cdbfilter</code> is only valid for device 'lun'. The optional <code>snapshot</code> attribute indicates the default behavior of the disk during disk snapshots: "internal" requires a file format such as qcow2 that can store both the diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 0e85739..4113521 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -957,24 +957,44 @@ --> <define name="disk"> <element name="disk"> - <optional> - <attribute name="device"> - <choice> - <value>floppy</value> - <value>disk</value> - <value>cdrom</value> - <value>lun</value> - </choice> - </attribute> - </optional> - <optional> - <attribute name="rawio"> + <choice> + <group> + <optional> + <attribute name="device"> + <choice> + <value>floppy</value> + <value>disk</value> + <value>cdrom</value> + </choice> + </attribute> + </optional> + </group> + <group> + <optional> + <attribute name="device"> + <value>lun</value> + </attribute> + </optional> <choice> - <value>yes</value> - <value>no</value> + <optional> + <attribute name="rawio"> + <choice> + <value>yes</value> + <value>no</value> + </choice> + </attribute> + </optional> + <optional> + <attribute name="cdbfilter"> + <choice> + <value>yes</value> + <value>no</value> + </choice> + </attribute> + </optional> </choice> - </attribute> - </optional> + </group> + </choice> <optional> <ref name="snapshot"/> </optional> -- 1.7.7.6

Setting "cdbfilter" to "no" to enable the unprivleged SG_IO, I.e. Disable the kernel CDB filtering. And "yes" to enable it. "yes" is the kernel default behaviour. Later patch will do the actual setting. --- src/conf/domain_conf.c | 58 +++++++++++++++----- src/conf/domain_conf.h | 10 ++++ ...ml2argv-disk-scsi-lun-passthrough-cdbfilter.xml | 32 +++++++++++ tests/qemuxml2xmltest.c | 1 + 4 files changed, 88 insertions(+), 13 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-disk-scsi-lun-passthrough-cdbfilter.xml diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 4ffb39d..50d853e 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -242,6 +242,10 @@ VIR_ENUM_IMPL(virDomainDiskIo, VIR_DOMAIN_DISK_IO_LAST, "default", "native", "threads") +VIR_ENUM_IMPL(virDomainDiskCDBFilter, VIR_DOMAIN_DISK_CDB_FILTER_LAST, + "default", + "yes", + "no") VIR_ENUM_IMPL(virDomainIoEventFd, VIR_DOMAIN_IO_EVENT_FD_LAST, "default", "on", @@ -3526,6 +3530,7 @@ virDomainDiskDefParseXML(virCapsPtr caps, char *device = NULL; char *snapshot = NULL; char *rawio = NULL; + char *cdbfilter = NULL; char *driverName = NULL; char *driverType = NULL; char *source = NULL; @@ -3588,6 +3593,7 @@ virDomainDiskDefParseXML(virCapsPtr caps, snapshot = virXMLPropString(node, "snapshot"); rawio = virXMLPropString(node, "rawio"); + cdbfilter = virXMLPropString(node, "cdbfilter"); cur = node->children; while (cur != NULL) { @@ -4008,24 +4014,46 @@ virDomainDiskDefParseXML(virCapsPtr caps, def->snapshot = VIR_DOMAIN_SNAPSHOT_LOCATION_NONE; } + if (rawio && cdbfilter) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("rawio and cdbfilter are exclusive")); + goto error; + } + + if ((rawio || cdbfilter) && + (def->device != VIR_DOMAIN_DISK_DEVICE_LUN)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("rawio or cdbfilter can be used only with " + "device='lun'")); + goto error; + } + if (rawio) { def->rawio_specified = true; - if (def->device == VIR_DOMAIN_DISK_DEVICE_LUN) { - if (STREQ(rawio, "yes")) { - def->rawio = 1; - } else if (STREQ(rawio, "no")) { - def->rawio = 0; - } else { - virReportError(VIR_ERR_XML_ERROR, - _("unknown disk rawio setting '%s'"), - rawio); - goto error; - } + if (STREQ(rawio, "yes")) { + def->rawio = 1; + } else if (STREQ(rawio, "no")) { + def->rawio = 0; } else { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("rawio can be used only with device='lun'")); + virReportError(VIR_ERR_XML_ERROR, + _("unknown disk rawio setting '%s'"), + rawio); + goto error; + } + } + + if (cdbfilter) { + int cdbfilter_val = 0; + + if ((cdbfilter_val = + virDomainDiskCDBFilterTypeFromString(cdbfilter)) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("unknown disk cdbfilter setting '%s'"), + cdbfilter); goto error; } + + def->cdbfilter = cdbfilter_val; } if (bus) { @@ -4262,6 +4290,7 @@ cleanup: VIR_FREE(type); VIR_FREE(snapshot); VIR_FREE(rawio); + VIR_FREE(cdbfilter); VIR_FREE(target); VIR_FREE(source); VIR_FREE(tray); @@ -11940,6 +11969,9 @@ virDomainDiskDefFormat(virBufferPtr buf, virBufferAddLit(buf, " rawio='no'"); } } + if (def->cdbfilter) + virBufferAsprintf(buf, " cdbfilter='%s'", + virDomainDiskCDBFilterTypeToString(def->cdbfilter)); if (def->snapshot && !(def->snapshot == VIR_DOMAIN_SNAPSHOT_LOCATION_NONE && def->readonly)) virBufferAsprintf(buf, " snapshot='%s'", diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 4ab15e9..6e65cfd 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -507,6 +507,14 @@ enum virDomainDiskIo { VIR_DOMAIN_DISK_IO_LAST }; +enum virDomainDiskCDBFilter { + VIR_DOMAIN_DISK_CDB_FILTER_DEFAULT = 0, + VIR_DOMAIN_DISK_CDB_FILTER_YES, /* reflects to 0 of sysfs unpriv_sgio */ + VIR_DOMAIN_DISK_CDB_FILTER_NO, /* reflects to 1 of sysfs unpriv_sgio */ + + VIR_DOMAIN_DISK_CDB_FILTER_LAST +}; + enum virDomainIoEventFd { VIR_DOMAIN_IO_EVENT_FD_DEFAULT = 0, VIR_DOMAIN_IO_EVENT_FD_ON, @@ -618,6 +626,7 @@ struct _virDomainDiskDef { virStorageEncryptionPtr encryption; bool rawio_specified; int rawio; /* no = 0, yes = 1 */ + int cdbfilter; size_t nseclabels; virSecurityDeviceLabelDefPtr *seclabels; @@ -2209,6 +2218,7 @@ VIR_ENUM_DECL(virDomainDiskErrorPolicy) VIR_ENUM_DECL(virDomainDiskProtocol) VIR_ENUM_DECL(virDomainDiskProtocolTransport) VIR_ENUM_DECL(virDomainDiskIo) +VIR_ENUM_DECL(virDomainDiskCDBFilter) VIR_ENUM_DECL(virDomainDiskSecretType) VIR_ENUM_DECL(virDomainDiskTray) VIR_ENUM_DECL(virDomainIoEventFd) diff --git a/tests/qemuxml2argvdata/qemuxml2argv-disk-scsi-lun-passthrough-cdbfilter.xml b/tests/qemuxml2argvdata/qemuxml2argv-disk-scsi-lun-passthrough-cdbfilter.xml new file mode 100644 index 0000000..731c9bd --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-disk-scsi-lun-passthrough-cdbfilter.xml @@ -0,0 +1,32 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu</emulator> + <disk type='block' device='lun' cdbfilter='yes'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='scsi'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <disk type='block' device='lun' cdbfilter='no'> + <source dev='/dev/HostVG/QEMUGuest2'/> + <target dev='hda' bus='scsi'/> + <address type='drive' controller='0' bus='0' target='1' unit='1'/> + </disk> + <controller type='scsi' index='0' model='virtio-scsi'/> + <controller type='scsi' index='1' model='lsilogic'/> + <controller type='usb' index='0'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index 3d8176c..af97982 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -238,6 +238,7 @@ mymain(void) DO_TEST("seclabel-static"); DO_TEST("seclabel-none"); DO_TEST("numad-static-vcpu-no-numatune"); + DO_TEST("disk-scsi-lun-passthrough-cdbfilter"); /* These tests generate different XML */ DO_TEST_DIFFERENT("balloon-device-auto"); -- 1.7.7.6

"virGetDevice{Major,Minor}" could be used across the sources, but it doesn't relate with this series, and could be done later. * src/util/util.h: (Declare virGetDevice{Major,Minor}, and vir{Get,Set}DeviceUnprivSGIO) * src/util/util.c: (Implement virGetDevice{Major,Minor} and vir{Get,Set}DeviceUnprivSGIO) * src/libvirt_private.syms: Export private symbols of upper helpers --- src/libvirt_private.syms | 4 + src/util/util.c | 145 ++++++++++++++++++++++++++++++++++++++++++++++ src/util/util.h | 7 ++ 3 files changed, 156 insertions(+), 0 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index df54f1e..78e4a5c 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1271,6 +1271,9 @@ virFileWaitForDevices; virFileWriteStr; virFindFileInPath; virFormatIntDecimal; +virGetDeviceMajor; +virGetDeviceMinor; +virGetDeviceUnprivSGIO; virGetGroupID; virGetGroupName; virGetHostname; @@ -1289,6 +1292,7 @@ virPipeReadUntilEOF; virScaleInteger; virSetBlocking; virSetCloseExec; +virSetDeviceUnprivSGIO; virSetInherit; virSetNonBlock; virSetUIDGID; diff --git a/src/util/util.c b/src/util/util.c index 2fd0f2c..4dab6ac 100644 --- a/src/util/util.c +++ b/src/util/util.c @@ -3113,3 +3113,148 @@ virValidateWWN(const char *wwn) { return true; } + +#if defined(major) && defined(minor) +int +virGetDeviceMajor(const char *path) +{ + struct stat sb; + + if (stat(path, &sb) < 0) + return -errno; + + if (!S_ISBLK(sb.st_mode)) + return -EINVAL; + + return major(sb.st_rdev); +} + +int +virGetDeviceMinor(const char *path) +{ + struct stat sb; + + if (stat(path, &sb) < 0) + return -errno; + + if (!S_ISBLK(sb.st_mode)) + return -EINVAL; + + return minor(sb.st_rdev); +} +#else +int +virGetDeviceMajor(const char *path) +{ + return -ENOSYS; +} + +int +virGetDeviceMinor(const char *path) +{ + return -ENOSYS; +} +#endif + +static char * +virGetUnprivSGIOSysfsPath(const char *path) +{ + int major, minor; + char *sysfs_path = NULL; + + if ((major = virGetDeviceMajor(path)) < 0) { + virReportSystemError(-major, + _("Unable to get major number of device '%s'"), + path); + return NULL; + } + + if ((minor = virGetDeviceMinor(path)) < 0) { + virReportSystemError(-minor, + _("Unable to get minor number of device '%s'"), + path); + return NULL; + } + + if (virAsprintf(&sysfs_path, "/sys/dev/block/%d:%d/queue/unpriv_sgio", + major, minor) < 0) { + virReportOOMError(); + return NULL; + } + + return sysfs_path; +} + +int +virSetDeviceUnprivSGIO(const char *path, + int unpriv_sgio) +{ + char *sysfs_path = NULL; + char *val = NULL; + int ret = -1; + int rc = -1; + + if (!(sysfs_path = virGetUnprivSGIOSysfsPath(path))) + return -1; + + if (!virFileExists(sysfs_path)) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("unpriv_sgio is not supported by this kernel")); + goto cleanup; + } + + if (virAsprintf(&val, "%d", unpriv_sgio) < 0) { + virReportOOMError(); + goto cleanup; + } + + if ((rc = virFileWriteStr(sysfs_path, val, 0)) < 0) { + virReportSystemError(-rc, _("failed to set %s"), sysfs_path); + goto cleanup; + } + + ret = 0; +cleanup: + VIR_FREE(sysfs_path); + VIR_FREE(val); + return ret; +} + +int +virGetDeviceUnprivSGIO(const char *path, + int *unpriv_sgio) +{ + char *sysfs_path = NULL; + char *buf = NULL; + char *tmp = NULL; + int ret = -1; + int rc = -1; + + if (!(sysfs_path = virGetUnprivSGIOSysfsPath(path))) + return -1; + + if (!virFileExists(sysfs_path)) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("unpriv_sgio is not supported by this kernel")); + goto cleanup; + } + + if (virFileReadAll(sysfs_path, 1024, &buf) < 0) + goto cleanup; + + if ((tmp = strchr(buf, '\n'))) + *tmp = '\0'; + + if ((rc = virStrToLong_i(buf, NULL, 10, + unpriv_sgio)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("failed to parse value of %s"), sysfs_path); + goto cleanup; + } + + ret = 0; +cleanup: + VIR_FREE(sysfs_path); + VIR_FREE(buf); + return ret; +} diff --git a/src/util/util.h b/src/util/util.h index 4316ab1..41146c1 100644 --- a/src/util/util.h +++ b/src/util/util.h @@ -280,4 +280,11 @@ bool virIsDevMapperDevice(const char *dev_name) ATTRIBUTE_NONNULL(1); bool virValidateWWN(const char *wwn); +int virGetDeviceMajor(const char *path); +int virGetDeviceMinor(const char *path); +int virSetDeviceUnprivSGIO(const char *path, + int unpriv_sgio); +int virGetDeviceUnprivSGIO(const char *path, + int *unpriv_sgio); + #endif /* __VIR_UTIL_H__ */ -- 1.7.7.6

Here lifecyle only means staring and shutdown. To record the original "unpriv_sgio" value, this introduces "old_cdbfilter" for disk def. When the domain is starting, the disk's "unpriv_sgio" is set with regards to the config in domain XML. And when the domain is being destroyed, it's restored to the original value ("old_cdbfilter"). --- src/conf/domain_conf.h | 1 + src/qemu/qemu_process.c | 53 +++++++++++++++++++++++++++++++++++++++------- 2 files changed, 46 insertions(+), 8 deletions(-) diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 6e65cfd..d7c9b6b 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -627,6 +627,7 @@ struct _virDomainDiskDef { bool rawio_specified; int rawio; /* no = 0, yes = 1 */ int cdbfilter; + int old_cdbfilter; /* Record the original state, internal only */ size_t nseclabels; virSecurityDeviceLabelDefPtr *seclabels; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index a44cca1..3923272 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3704,17 +3704,39 @@ int qemuProcessStart(virConnectPtr conn, if (driver->clearEmulatorCapabilities) virCommandClearCaps(cmd); - /* in case a certain disk is desirous of CAP_SYS_RAWIO, add this */ for (i = 0; i < vm->def->ndisks; i++) { - if (vm->def->disks[i]->rawio == 1) + virDomainDiskDefPtr disk = vm->def->disks[i]; + int val; + + /* Add CAP_SYS_RAWIO if the disk is desirous of it */ + if (disk->rawio == 1) virCommandAllowCap(cmd, CAP_SYS_RAWIO); - if (vm->def->disks[i]->shared && + /* Add to qemud_driver->sharedDisks list if the disk is shared */ + if (disk->shared && (qemuSharedDiskListAdd(driver->sharedDisks, - vm->def->disks[i]->src, + disk->src, vm->def->name) < 0)) { goto cleanup; } + + if (!disk->cdbfilter) + continue; + + /* Set sysfs unpriv_sgio if cdbfilter is specified */ + if (virGetDeviceUnprivSGIO(disk->src, &val) < 0) + goto cleanup; + + if (val == 0) + disk->old_cdbfilter = VIR_DOMAIN_DISK_CDB_FILTER_YES; + else + disk->old_cdbfilter = VIR_DOMAIN_DISK_CDB_FILTER_NO; + + if (virSetDeviceUnprivSGIO(disk->src, + (disk->cdbfilter == + VIR_DOMAIN_DISK_CDB_FILTER_NO) + ? 1 : 0) < 0) + goto cleanup; } virCommandSetPreExecHook(cmd, qemuProcessHook, &hookData); @@ -4111,17 +4133,32 @@ void qemuProcessStop(virQEMUDriverPtr driver, flags & VIR_QEMU_PROCESS_STOP_MIGRATED); virSecurityManagerReleaseLabel(driver->securityManager, vm->def); - /* Remove the shared disk entry from qemud_driver->sharedDisks */ for (i = 0; i < vm->def->ndisks; i++) { - if (vm->def->disks[i]->shared && + virDomainDiskDefPtr disk = vm->def->disks[i]; + int val; + + /* Remove the shared disk entry from qemud_driver->sharedDisks */ + if (disk->shared && (qemuSharedDiskListDel(driver->sharedDisks, - vm->def->disks[i]->src, + disk->src, vm->def->name) < 0)) { VIR_WARN("Unable to remove shared disk entry for " "disk = '%s', domain = '%s'", - vm->def->disks[i]->src, + disk->src, vm->def->name); } + + if (!disk->cdbfilter) + continue; + + /* Restore sysfs unpriv_sgio for the disk */ + if (disk->old_cdbfilter == VIR_DOMAIN_DISK_CDB_FILTER_YES) + val = 0; + else + val = 1; + + if (virSetDeviceUnprivSGIO(disk->src, val) < 0) + VIR_WARN("Unable to restore unpriv_sgio for disk '%s'", disk->src); } /* Clear out dynamically assigned labels */ -- 1.7.7.6

This prevents restoring the unpriv_sgio if the disk is shared, and is being used by other active domain. Because we don't want to fall into the corruption situation. --- src/qemu/qemu_process.c | 11 +++++++++++ 1 files changed, 11 insertions(+), 0 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 3923272..05f8622 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -4151,6 +4151,17 @@ void qemuProcessStop(virQEMUDriverPtr driver, if (!disk->cdbfilter) continue; + /* Don't try to restore the unpriv_sgio if the disk is shared + * by other active domain(s). We don't want to fall into the + * corruptions. + */ + if (disk->shared && + qemuSharedDiskListFind(driver->sharedDisks, + disk->src, + NULL, + NULL)) + continue; + /* Restore sysfs unpriv_sgio for the disk */ if (disk->old_cdbfilter == VIR_DOMAIN_DISK_CDB_FILTER_YES) val = 0; -- 1.7.7.6

This prevents the domain starting if the shared disk's setting conflicts with other active domain(s), E.g. A domain with "cdbfilter" set as "yes", however, another active domain is using it set as "no". * src/conf/domain_conf.h: (Declare helper virDomainDiskFindByPath) * src/conf/domain_conf.c: (Implement virDomainDiskFindByPath) * src/libvirt_private.syms (export virDomainDiskFindByPath) * src/qemu/qemu_process.c: (Error out if the shared disk's cdbfilter conflicts with others) --- src/conf/domain_conf.c | 13 ++++++++++ src/conf/domain_conf.h | 2 + src/libvirt_private.syms | 1 + src/qemu/qemu_process.c | 56 +++++++++++++++++++++++++++++++++++++++++----- 4 files changed, 66 insertions(+), 6 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 50d853e..1d6bb1f 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -3062,6 +3062,19 @@ virDomainDiskFindControllerModel(virDomainDefPtr def, return model; } +virDomainDiskDefPtr +virDomainDiskFindByPath(virDomainDefPtr def, + const char *path) +{ + int i; + + for (i = 0; i < def->ndisks; i++) + if (STREQ_NULLABLE(def->disks[i]->src, path)) + return def->disks[i]; + + return NULL; +} + int virDomainDiskDefAssignAddress(virCapsPtr caps, virDomainDiskDefPtr def) { diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index d7c9b6b..2f24a3f 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1897,6 +1897,8 @@ void virDomainDiskHostDefFree(virDomainDiskHostDefPtr def); int virDomainDiskFindControllerModel(virDomainDefPtr def, virDomainDiskDefPtr disk, int controllerType); +virDomainDiskDefPtr virDomainDiskFindByPath(virDomainDefPtr def, + const char *path); void virDomainControllerDefFree(virDomainControllerDefPtr def); void virDomainFSDefFree(virDomainFSDefPtr def); void virDomainActualNetDefFree(virDomainActualNetDefPtr def); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 78e4a5c..b220a76 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -352,6 +352,7 @@ virDomainDiskDefFree; virDomainDiskDeviceTypeToString; virDomainDiskErrorPolicyTypeFromString; virDomainDiskErrorPolicyTypeToString; +virDomainDiskFindByPath; virDomainDiskFindControllerModel; virDomainDiskGeometryTransTypeFromString; virDomainDiskGeometryTransTypeToString; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 05f8622..2938a65 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3712,12 +3712,56 @@ int qemuProcessStart(virConnectPtr conn, if (disk->rawio == 1) virCommandAllowCap(cmd, CAP_SYS_RAWIO); - /* Add to qemud_driver->sharedDisks list if the disk is shared */ - if (disk->shared && - (qemuSharedDiskListAdd(driver->sharedDisks, - disk->src, - vm->def->name) < 0)) { - goto cleanup; + if (disk->shared) { + /* Error out if the cdbfilter setting is different with what + * other domain(s) uses. + */ + qemuSharedDiskPtr entry = NULL; + + if ((entry = qemuSharedDiskListFind(driver->sharedDisks, + disk->src, + NULL, + NULL))) { + virDomainObjUnlock(vm); + for (i = 0; i < entry->ndomains; i++) { + virDomainObjPtr domobj = NULL; + virDomainDiskDefPtr diskdef = NULL; + + if (!(domobj = virDomainFindByName(&driver->domains, + entry->domains[i]))) { + virDomainObjLock(vm); + goto cleanup; + } + + if (!(diskdef = virDomainDiskFindByPath(domobj->def, + disk->src))) { + virDomainObjUnlock(domobj); + virDomainObjLock(vm); + goto cleanup; + } + + /* XXX: Can be abstracted into a function when there + * are more stuffs to check in future. + */ + if (diskdef->cdbfilter != disk->cdbfilter) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("cdbfilter of shared disk '%s' " + "conflicts with other active " + "domains"), disk->src); + virDomainObjUnlock(domobj); + virDomainObjLock(vm); + goto cleanup; + } + virDomainObjUnlock(domobj); + } + virDomainObjLock(vm); + } + + /* Add to qemud_driver->sharedDisks list if the disk is shared */ + if (qemuSharedDiskListAdd(driver->sharedDisks, + disk->src, + vm->def->name) < 0) + goto cleanup; } if (!disk->cdbfilter) -- 1.7.7.6

On Wed, Dec 05, 2012 at 17:25:11 +0800, Osier Yang wrote:
This prevents the domain starting if the shared disk's setting conflicts with other active domain(s), E.g. A domain with "cdbfilter" set as "yes", however, another active domain is using it set as "no".
...
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 05f8622..2938a65 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3712,12 +3712,56 @@ int qemuProcessStart(virConnectPtr conn, if (disk->rawio == 1) virCommandAllowCap(cmd, CAP_SYS_RAWIO);
- /* Add to qemud_driver->sharedDisks list if the disk is shared */ - if (disk->shared && - (qemuSharedDiskListAdd(driver->sharedDisks, - disk->src, - vm->def->name) < 0)) { - goto cleanup; + if (disk->shared) { + /* Error out if the cdbfilter setting is different with what + * other domain(s) uses. + */ + qemuSharedDiskPtr entry = NULL; + + if ((entry = qemuSharedDiskListFind(driver->sharedDisks, + disk->src, + NULL, + NULL))) { + virDomainObjUnlock(vm); + for (i = 0; i < entry->ndomains; i++) { + virDomainObjPtr domobj = NULL; + virDomainDiskDefPtr diskdef = NULL; + + if (!(domobj = virDomainFindByName(&driver->domains, + entry->domains[i]))) { + virDomainObjLock(vm); + goto cleanup; + } + + if (!(diskdef = virDomainDiskFindByPath(domobj->def, + disk->src))) { + virDomainObjUnlock(domobj); + virDomainObjLock(vm); + goto cleanup; + } + + /* XXX: Can be abstracted into a function when there + * are more stuffs to check in future. + */ + if (diskdef->cdbfilter != disk->cdbfilter) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("cdbfilter of shared disk '%s' " + "conflicts with other active " + "domains"), disk->src); + virDomainObjUnlock(domobj); + virDomainObjLock(vm); + goto cleanup; + } + virDomainObjUnlock(domobj); + } + virDomainObjLock(vm); + }
NACK. This is still doing what Daniel complained about in v1. Although it does not go through all domains it is still locking some other domains. Not to mention that it is absolute overkill to do. IIUC, all domains in entry-domains has to have exactly the same cdbfilter otherwise libvirt would refuse to start them. And why do we even need to maintain a list of domains sharing the same disk? Wouldn't just storing the value of cdbfilter there be enough? It seems like it would, at least for this purpose of checking that it matches cdbfilter specified in current domain. Jirka

On 2012年12月05日 18:20, Jiri Denemark wrote:
On Wed, Dec 05, 2012 at 17:25:11 +0800, Osier Yang wrote:
This prevents the domain starting if the shared disk's setting conflicts with other active domain(s), E.g. A domain with "cdbfilter" set as "yes", however, another active domain is using it set as "no".
...
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 05f8622..2938a65 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3712,12 +3712,56 @@ int qemuProcessStart(virConnectPtr conn, if (disk->rawio == 1) virCommandAllowCap(cmd, CAP_SYS_RAWIO);
- /* Add to qemud_driver->sharedDisks list if the disk is shared */ - if (disk->shared&& - (qemuSharedDiskListAdd(driver->sharedDisks, - disk->src, - vm->def->name)< 0)) { - goto cleanup; + if (disk->shared) { + /* Error out if the cdbfilter setting is different with what + * other domain(s) uses. + */ + qemuSharedDiskPtr entry = NULL; + + if ((entry = qemuSharedDiskListFind(driver->sharedDisks, + disk->src, + NULL, + NULL))) { + virDomainObjUnlock(vm); + for (i = 0; i< entry->ndomains; i++) { + virDomainObjPtr domobj = NULL; + virDomainDiskDefPtr diskdef = NULL; + + if (!(domobj = virDomainFindByName(&driver->domains, + entry->domains[i]))) { + virDomainObjLock(vm); + goto cleanup; + } + + if (!(diskdef = virDomainDiskFindByPath(domobj->def, + disk->src))) { + virDomainObjUnlock(domobj); + virDomainObjLock(vm); + goto cleanup; + } + + /* XXX: Can be abstracted into a function when there + * are more stuffs to check in future. + */ + if (diskdef->cdbfilter != disk->cdbfilter) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("cdbfilter of shared disk '%s' " + "conflicts with other active " + "domains"), disk->src); + virDomainObjUnlock(domobj); + virDomainObjLock(vm); + goto cleanup; + } + virDomainObjUnlock(domobj); + } + virDomainObjLock(vm); + }
NACK. This is still doing what Daniel complained about in v1. Although it does not go through all domains it is still locking some other domains. Not to mention that it is absolute overkill to do.
I mentioned in v4 that it's not that comfortable, especially for the nest locks, but I don't have better idea yet. IIUC, all domains in entry-domains
has to have exactly the same cdbfilter otherwise libvirt would refuse to start them.
For this patch only, yes. And why do we even need to maintain a list of domains sharing the same
disk? Wouldn't just storing the value of cdbfilter there be enough? It seems like it would, at least for this purpose of checking that it matches cdbfilter specified in current domain.
Yes if it's for only this purpose (Prevent domain starting). But the truth is that we have to care about restoring the disk's unpriv_sgio too. I.E, checking if there is domain still using the disk. But you will say having a ref count will solve this problem. However, the mainly reason I choosed to use a sub-list of domain names is for future extenstion, I.E. Assuming there are other disk setting (you never known how many they will be), we have to guarantee they are same among guests in future. Looking up the disk def with domain and disk path gives us much flexibility IMHO. Regards, Osier

On 2012年12月05日 18:50, Osier Yang wrote:
On 2012年12月05日 18:20, Jiri Denemark wrote:
On Wed, Dec 05, 2012 at 17:25:11 +0800, Osier Yang wrote:
This prevents the domain starting if the shared disk's setting conflicts with other active domain(s), E.g. A domain with "cdbfilter" set as "yes", however, another active domain is using it set as "no".
...
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 05f8622..2938a65 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3712,12 +3712,56 @@ int qemuProcessStart(virConnectPtr conn, if (disk->rawio == 1) virCommandAllowCap(cmd, CAP_SYS_RAWIO);
- /* Add to qemud_driver->sharedDisks list if the disk is shared */ - if (disk->shared&& - (qemuSharedDiskListAdd(driver->sharedDisks, - disk->src, - vm->def->name)< 0)) { - goto cleanup; + if (disk->shared) { + /* Error out if the cdbfilter setting is different with what + * other domain(s) uses. + */ + qemuSharedDiskPtr entry = NULL; + + if ((entry = qemuSharedDiskListFind(driver->sharedDisks, + disk->src, + NULL, + NULL))) { + virDomainObjUnlock(vm); + for (i = 0; i< entry->ndomains; i++) { + virDomainObjPtr domobj = NULL; + virDomainDiskDefPtr diskdef = NULL; + + if (!(domobj = virDomainFindByName(&driver->domains, + entry->domains[i]))) { + virDomainObjLock(vm); + goto cleanup; + } + + if (!(diskdef = virDomainDiskFindByPath(domobj->def, + disk->src))) { + virDomainObjUnlock(domobj); + virDomainObjLock(vm); + goto cleanup; + } + + /* XXX: Can be abstracted into a function when there + * are more stuffs to check in future. + */ + if (diskdef->cdbfilter != disk->cdbfilter) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("cdbfilter of shared disk '%s' " + "conflicts with other active " + "domains"), disk->src); + virDomainObjUnlock(domobj); + virDomainObjLock(vm); + goto cleanup; + } + virDomainObjUnlock(domobj); + } + virDomainObjLock(vm); + }
NACK. This is still doing what Daniel complained about in v1. Although it does not go through all domains it is still locking some other domains. Not to mention that it is absolute overkill to do.
I mentioned in v4 that it's not that comfortable, especially for the nest locks, but I don't have better idea yet.
IIUC, all domains in entry-domains
has to have exactly the same cdbfilter otherwise libvirt would refuse to start them.
For this patch only, yes.
And why do we even need to maintain a list of domains sharing the same
disk? Wouldn't just storing the value of cdbfilter there be enough? It seems like it would, at least for this purpose of checking that it matches cdbfilter specified in current domain.
Yes if it's for only this purpose (Prevent domain starting). But the truth is that we have to care about restoring the disk's unpriv_sgio too. I.E, checking if there is domain still using the disk. But you will say having a ref count will solve this problem.
However, the mainly reason I choosed to use a sub-list of domain names is for future extenstion, I.E. Assuming there are other disk setting (you never known how many they will be), we have to guarantee they are same among guests in future. Looking up the disk def with domain and disk path gives us much flexibility IMHO.
So the point of the argument is: the trade between the flexibility and the uncomfortable locks. Regards, Osier

On Wed, Dec 05, 2012 at 18:54:42 +0800, Osier Yang wrote: ...
However, the mainly reason I choosed to use a sub-list of domain names is for future extenstion, I.E. Assuming there are other disk setting (you never known how many they will be), we have to guarantee they are same among guests in future. Looking up the disk def with domain and disk path gives us much flexibility IMHO.
So the point of the argument is: the trade between the flexibility and the uncomfortable locks.
OK, I guess we can store more info in the sharedDisks list (either today or later when we need it), we may even store the domain list there, but we don't definitely want to go through all the domains to get the required details. Jirka

On 2012年12月05日 19:03, Jiri Denemark wrote:
On Wed, Dec 05, 2012 at 18:54:42 +0800, Osier Yang wrote: ...
However, the mainly reason I choosed to use a sub-list of domain names is for future extenstion, I.E. Assuming there are other disk setting (you never known how many they will be), we have to guarantee they are same among guests in future. Looking up the disk def with domain and disk path gives us much flexibility IMHO.
So the point of the argument is: the trade between the flexibility and the uncomfortable locks.
OK, I guess we can store more info in the sharedDisks list (either today or later when we need it), we may even store the domain list there, but we don't definitely want to go through all the domains to get the required details.
I think it should be just a few guests share the disk in practice, which means in practice the sharedDisks->disks[i]->ndomains should be short. And that's the other reason why I can live with the locks. Though in theory, it can be all domains (assuming it's large number) share the disk. The other question is: Aren't we already go through all domain objects in many places? Regards, Osier

On 12/05/2012 04:14 AM, Osier Yang wrote:
On 2012年12月05日 19:03, Jiri Denemark wrote:
On Wed, Dec 05, 2012 at 18:54:42 +0800, Osier Yang wrote: ...
However, the mainly reason I choosed to use a sub-list of domain names is for future extenstion, I.E. Assuming there are other disk setting (you never known how many they will be), we have to guarantee they are same among guests in future. Looking up the disk def with domain and disk path gives us much flexibility IMHO.
So the point of the argument is: the trade between the flexibility and the uncomfortable locks.
OK, I guess we can store more info in the sharedDisks list (either today or later when we need it), we may even store the domain list there, but we don't definitely want to go through all the domains to get the required details.
I think it should be just a few guests share the disk in practice, which means in practice the sharedDisks->disks[i]->ndomains should be short. And that's the other reason why I can live with the locks. Though in theory, it can be all domains (assuming it's large number) share the disk.
The other question is: Aren't we already go through all domain objects in many places?
I'm stepping in a bit late, and haven't fully looked at the series yet, but one of the things that I would love for libvirt to someday have is a way to get a list of all virStorageVolPtr associated with any given virDomainPtr, and conversely, given a virStorageVolPtr, return a list of all virDomainPtr that use the file (whether directly, or whether as a backing file). To get to that point, I envision having a way for every <disk> element to be tied to a virStorageVolPtr, even if it means the implicit creation of a transient virStoragePoolPtr directory-based pool for any <disk> specified without an explicit pool. Then, creation of a new domain, as well as hot-plugging of any disks to an existing domain, will update each affected virStorageVolPtr; and you really _should_ be maintaining a list of all associated domains with the disk object. Thus, I think searching whether a shared disk is allowed should be a matter of asking that shared disk what domains it is already associated with, rather than asking each domain whether it uses the shared disk. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 2012年12月07日 09:00, Eric Blake wrote:
On 12/05/2012 04:14 AM, Osier Yang wrote:
On 2012年12月05日 19:03, Jiri Denemark wrote:
On Wed, Dec 05, 2012 at 18:54:42 +0800, Osier Yang wrote: ...
However, the mainly reason I choosed to use a sub-list of domain names is for future extenstion, I.E. Assuming there are other disk setting (you never known how many they will be), we have to guarantee they are same among guests in future. Looking up the disk def with domain and disk path gives us much flexibility IMHO.
So the point of the argument is: the trade between the flexibility and the uncomfortable locks.
OK, I guess we can store more info in the sharedDisks list (either today or later when we need it), we may even store the domain list there, but we don't definitely want to go through all the domains to get the required details.
I think it should be just a few guests share the disk in practice, which means in practice the sharedDisks->disks[i]->ndomains should be short. And that's the other reason why I can live with the locks. Though in theory, it can be all domains (assuming it's large number) share the disk.
The other question is: Aren't we already go through all domain objects in many places?
I'm stepping in a bit late, and haven't fully looked at the series yet, but one of the things that I would love for libvirt to someday have is a way to get a list of all virStorageVolPtr associated with any given virDomainPtr, and conversely, given a virStorageVolPtr, return a list of all virDomainPtr that use the file (whether directly, or whether as a backing file).
This is actually in my right next TODO list, I.E to associate the domain and storage. See (I'm just having rest with this SG_IO set at the half of persistent NPIV support): https://www.redhat.com/archives/libvir-list/2012-November/msg00878.html To get to that point, I envision having a way for every
<disk> element to be tied to a virStorageVolPtr, even if it means the implicit creation of a transient virStoragePoolPtr directory-based pool for any<disk> specified without an explicit pool. Then, creation of a new domain, as well as hot-plugging of any disks to an existing domain, will update each affected virStorageVolPtr; and you really _should_ be maintaining a list of all associated domains with the disk object. Thus, I think searching whether a shared disk is allowed should be a matter of asking that shared disk what domains it is already associated with, rather than asking each domain whether it uses the shared disk.
I think this gets rid of the argument completely. We can introduce a new struct as member of virStorageVolDef to store the disk settings which have to be consistent among the domains share it. E.g. typedef struct virStorageVolSharedSettings virStorageVolSharedSettings; struct virStorageVolSharedSettings { int cdbfilter; /* Others in future */ }; struct _virStorageVolDef { char *name; char *key; int type; /* virStorageVolType enum */ ... virStorageVolSharedSettings sharedSettings; ... }; As each domain disk is mapped to a volume object, we can just initialize the sharedSettings during parsing or starting. The sharedSetting can be further used to do checking when starting, attaching. From the whole project p.o.v, this sounds quite nice to me. Regards, Osier

On Fri, Dec 07, 2012 at 12:16:08PM +0800, Osier Yang wrote:
On 2012年12月07日 09:00, Eric Blake wrote:
On 12/05/2012 04:14 AM, Osier Yang wrote:
On 2012年12月05日 19:03, Jiri Denemark wrote:
On Wed, Dec 05, 2012 at 18:54:42 +0800, Osier Yang wrote: ...
However, the mainly reason I choosed to use a sub-list of domain names is for future extenstion, I.E. Assuming there are other disk setting (you never known how many they will be), we have to guarantee they are same among guests in future. Looking up the disk def with domain and disk path gives us much flexibility IMHO.
So the point of the argument is: the trade between the flexibility and the uncomfortable locks.
OK, I guess we can store more info in the sharedDisks list (either today or later when we need it), we may even store the domain list there, but we don't definitely want to go through all the domains to get the required details.
I think it should be just a few guests share the disk in practice, which means in practice the sharedDisks->disks[i]->ndomains should be short. And that's the other reason why I can live with the locks. Though in theory, it can be all domains (assuming it's large number) share the disk.
The other question is: Aren't we already go through all domain objects in many places?
I'm stepping in a bit late, and haven't fully looked at the series yet, but one of the things that I would love for libvirt to someday have is a way to get a list of all virStorageVolPtr associated with any given virDomainPtr, and conversely, given a virStorageVolPtr, return a list of all virDomainPtr that use the file (whether directly, or whether as a backing file).
This is actually in my right next TODO list, I.E to associate the domain and storage. See (I'm just having rest with this SG_IO set at the half of persistent NPIV support):
https://www.redhat.com/archives/libvir-list/2012-November/msg00878.html
To get to that point, I envision having a way for every
<disk> element to be tied to a virStorageVolPtr, even if it means the implicit creation of a transient virStoragePoolPtr directory-based pool for any<disk> specified without an explicit pool. Then, creation of a new domain, as well as hot-plugging of any disks to an existing domain, will update each affected virStorageVolPtr; and you really _should_ be maintaining a list of all associated domains with the disk object. Thus, I think searching whether a shared disk is allowed should be a matter of asking that shared disk what domains it is already associated with, rather than asking each domain whether it uses the shared disk.
I think this gets rid of the argument completely. We can introduce a new struct as member of virStorageVolDef to store the disk settings which have to be consistent among the domains share it. E.g.
typedef struct virStorageVolSharedSettings virStorageVolSharedSettings; struct virStorageVolSharedSettings { int cdbfilter;
/* Others in future */ };
struct _virStorageVolDef { char *name; char *key; int type; /* virStorageVolType enum */
...
virStorageVolSharedSettings sharedSettings;
... };
As each domain disk is mapped to a volume object, we can just initialize the sharedSettings during parsing or starting. The sharedSetting can be further used to do checking when starting, attaching.
I don't really like this becuase you are mixing up configuration with runtime state. virStorageVolDef should be exclusively about configuration data. Further I don't like the fact that information about domain state is being stored in the storage driver. I prefer to have strict separation between the two, with the only interaction being based on passing (uuid,name) between the domain & storage drivers. This is key to allowing us to split libvirtd up into multiple daemons in the future. I think this is rather over-engineering things for the needs of the cdb filter. Fundamentally we do not need to know /which/ domains are using the same disk, only how many. If the number currently using the disk is non-zero, then the new domain must match the current CDB filter setting. If the number using the disk is zero, then the new domain can have any CDB filter setting. This is really just a hashtable of (path, reference count). No need to store virDomainObjPtr instances, nor virStorageVolptr instances. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Just like for domain starting, the original unpriv_sgio state is recorded for restoring when detaching. --- src/qemu/qemu_driver.c | 48 +++++++++++++++++++++++++++++++++++++----------- 1 files changed, 37 insertions(+), 11 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 2c36935..b75b6fe 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -6042,17 +6042,43 @@ qemuDomainAttachDeviceDiskLive(virConnectPtr conn, NULLSTR(disk->src)); } - /* If the attached disk is shareable, add it to the list - * qemud_driver->sharedDisks if attachment succeeded - */ - if (ret == 0 && disk->shared) { - if (qemuSharedDiskListAdd(driver->sharedDisks, - disk->src, - vm->def->name) < 0) { - VIR_DEBUG("Failed to add the disk '%s' of domain '%s' to " - "sharedDisks list", disk->src, vm->def->name); - } - } + if (ret == 0) { + if (disk->cdbfilter) { + int val; + + /* Store the original unpriv_sgio state */ + if (virGetDeviceUnprivSGIO(disk->src, &val) < 0) { + VIR_DEBUG("Failed to get unpriv_sgio of disk '%s'", disk->src); + goto end; + } + + if (val == 0) + disk->old_cdbfilter = VIR_DOMAIN_DISK_CDB_FILTER_YES; + else + disk->old_cdbfilter = VIR_DOMAIN_DISK_CDB_FILTER_NO; + + /* Set unpriv_sgio */ + if (virSetDeviceUnprivSGIO(disk->src, + (disk->cdbfilter == + VIR_DOMAIN_DISK_CDB_FILTER_NO) + ? 1 : 0) < 0) { + VIR_DEBUG("Failed to set unpriv_sgio of disk '%s'", disk->src); + goto end; + } + } + + /* If the attached disk is shareable, add it to the list + * qemud_driver->sharedDisks if attachment succeeded + */ + if (disk->shared) { + if (qemuSharedDiskListAdd(driver->sharedDisks, + disk->src, + vm->def->name) < 0) { + VIR_DEBUG("Failed to add the disk '%s' of domain '%s' to " + "sharedDisks list", disk->src, vm->def->name); + } + } + } end: if (cgroup) virCgroupFree(&cgroup); -- 1.7.7.6

Later patch will prohibit restoring it the disk is shared and being used by other domain(s). --- src/qemu/qemu_driver.c | 28 ++++++++++++++++++++++------ 1 files changed, 22 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index b75b6fe..cedf636 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -6195,12 +6195,28 @@ qemuDomainDetachDeviceDiskLive(virQEMUDriverPtr driver, break; } - if (ret == 0 && disk->shared) { - if (qemuSharedDiskListDel(driver->sharedDisks, - disk->src, - vm->def->name) < 0) { - VIR_DEBUG("Failed to remove the entry of disk '%s', domain " - "%s from sharedDisks", disk->src, vm->def->name); + if (ret == 0) { + /* Restore the disk's unpriv_sgio */ + if (disk->cdbfilter) { + int val; + + if (disk->old_cdbfilter == VIR_DOMAIN_DISK_CDB_FILTER_YES) + val = 0; + else + val = 1; + + if (virSetDeviceUnprivSGIO(disk->src, val) < 0) + VIR_DEBUG("Failed to restore unpriv_sgio for disk '%s'", + disk->src); + } + + if (disk->shared) { + if (qemuSharedDiskListDel(driver->sharedDisks, + disk->src, + vm->def->name) < 0) { + VIR_DEBUG("Failed to remove the entry of disk '%s', domain " + "%s from sharedDisks", disk->src, vm->def->name); + } } } -- 1.7.7.6

Just like for domain starting, this checks if the shared disk's conf conflicts with others which are in use. Currently it only checks the setting of cdbfilter. * src/qemu/qemu_process.h (Abstract a helper function qemuCheckSharedDisk) * src/qemu/qemu_process.c (Implement the helper) * src/qemu/qemu_driver.c (Check the shared disk's conf) --- src/qemu/qemu_driver.c | 4 ++ src/qemu/qemu_process.c | 105 +++++++++++++++++++++++++++++------------------ src/qemu/qemu_process.h | 4 ++ 3 files changed, 73 insertions(+), 40 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index cedf636..7288ad6 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -5991,6 +5991,10 @@ qemuDomainAttachDeviceDiskLive(virConnectPtr conn, goto end; } + if (disk->shared && + (qemuCheckSharedDisk(driver, disk, vm) < 0)) + goto end; + if (qemuDomainDetermineDiskChain(driver, disk, false) < 0) goto end; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 2938a65..b38fc7b 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3380,6 +3380,68 @@ qemuProcessReconnectAll(virConnectPtr conn, virQEMUDriverPtr driver) virHashForEach(driver->domains.objs, qemuProcessReconnectHelper, &data); } +/* + * Check if the shared disk's configraution conflicts with others. + * + * Return 0 if there is no conflict, otherwise return -1. + */ +int +qemuCheckSharedDisk(virQEMUDriverPtr driver, + virDomainDiskDefPtr disk, + virDomainObjPtr vm) +{ + qemuSharedDiskPtr entry = NULL; + int i; + + if ((entry = qemuSharedDiskListFind(driver->sharedDisks, + disk->src, + NULL, + NULL))) { + virDomainObjUnlock(vm); + for (i = 0; i < entry->ndomains; i++) { + virDomainObjPtr domobj = NULL; + virDomainDiskDefPtr diskdef = NULL; + + if (!(domobj = virDomainFindByName(&driver->domains, + entry->domains[i]))) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Failed to look up domain '%s'"), + entry->domains[i]); + virDomainObjLock(vm); + return -1; + } + + if (!(diskdef = virDomainDiskFindByPath(domobj->def, + disk->src))) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Failed to find disk '%s' of domain '%s'"), + disk->src, domobj->def->name); + virDomainObjUnlock(domobj); + virDomainObjLock(vm); + return -1; + } + + /* XXX: Can be abstracted into a function when there + * are more stuffs to check in future. + */ + if (diskdef->cdbfilter != disk->cdbfilter) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("cdbfilter of shared disk '%s' " + "conflicts with other active " + "domains"), disk->src); + virDomainObjUnlock(domobj); + virDomainObjLock(vm); + return -1; + } + virDomainObjUnlock(domobj); + } + virDomainObjLock(vm); + } + + return 0; +} + + int qemuProcessStart(virConnectPtr conn, virQEMUDriverPtr driver, virDomainObjPtr vm, @@ -3716,46 +3778,9 @@ int qemuProcessStart(virConnectPtr conn, /* Error out if the cdbfilter setting is different with what * other domain(s) uses. */ - qemuSharedDiskPtr entry = NULL; - - if ((entry = qemuSharedDiskListFind(driver->sharedDisks, - disk->src, - NULL, - NULL))) { - virDomainObjUnlock(vm); - for (i = 0; i < entry->ndomains; i++) { - virDomainObjPtr domobj = NULL; - virDomainDiskDefPtr diskdef = NULL; - - if (!(domobj = virDomainFindByName(&driver->domains, - entry->domains[i]))) { - virDomainObjLock(vm); - goto cleanup; - } - - if (!(diskdef = virDomainDiskFindByPath(domobj->def, - disk->src))) { - virDomainObjUnlock(domobj); - virDomainObjLock(vm); - goto cleanup; - } - - /* XXX: Can be abstracted into a function when there - * are more stuffs to check in future. - */ - if (diskdef->cdbfilter != disk->cdbfilter) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("cdbfilter of shared disk '%s' " - "conflicts with other active " - "domains"), disk->src); - virDomainObjUnlock(domobj); - virDomainObjLock(vm); - goto cleanup; - } - virDomainObjUnlock(domobj); - } - virDomainObjLock(vm); - } + + if (qemuCheckSharedDisk(driver, disk, vm) < 0) + goto cleanup; /* Add to qemud_driver->sharedDisks list if the disk is shared */ if (qemuSharedDiskListAdd(driver->sharedDisks, diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h index c12df32..501f206 100644 --- a/src/qemu/qemu_process.h +++ b/src/qemu/qemu_process.h @@ -99,4 +99,8 @@ bool qemuProcessAutoDestroyActive(virQEMUDriverPtr driver, virBitmapPtr qemuPrepareCpumap(virQEMUDriverPtr driver, virBitmapPtr nodemask); +int qemuCheckSharedDisk(virQEMUDriverPtr driver, + virDomainDiskDefPtr disk, + virDomainObjPtr vm); + #endif /* __QEMU_PROCESS_H__ */ -- 1.7.7.6

Just like for domain shutdown, this prevents restoring the disk's unpriv_sgio if it's being shared by other domain(s). --- src/qemu/qemu_driver.c | 6 +++++- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 7288ad6..153e780 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -6201,7 +6201,11 @@ qemuDomainDetachDeviceDiskLive(virQEMUDriverPtr driver, if (ret == 0) { /* Restore the disk's unpriv_sgio */ - if (disk->cdbfilter) { + if (disk->cdbfilter && + !qemuSharedDiskListFind(driver->sharedDisks, + disk->src, + NULL, + NULL)) { int val; if (disk->old_cdbfilter == VIR_DOMAIN_DISK_CDB_FILTER_YES) -- 1.7.7.6

This allows the disk's original_unpriv value is not lost after restarting or reloading libvirtd. --- src/conf/domain_conf.c | 35 +++++++++++++++++++++++++++++++---- 1 files changed, 31 insertions(+), 4 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 1d6bb1f..bc77429 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -70,6 +70,7 @@ typedef enum { VIR_DOMAIN_XML_INTERNAL_PCI_ORIG_STATES = (1<<18), VIR_DOMAIN_XML_INTERNAL_ALLOW_ROM = (1<<19), VIR_DOMAIN_XML_INTERNAL_ALLOW_BOOT = (1<<20), + VIR_DOMAIN_XML_INTERNAL_OLD_DISK_CDBFILTER = (1<<21), } virDomainXMLInternalFlags; VIR_ENUM_IMPL(virDomainTaint, VIR_DOMAIN_TAINT_LAST, @@ -3544,6 +3545,7 @@ virDomainDiskDefParseXML(virCapsPtr caps, char *snapshot = NULL; char *rawio = NULL; char *cdbfilter = NULL; + char *old_cdbfilter = NULL; char *driverName = NULL; char *driverType = NULL; char *source = NULL; @@ -3607,6 +3609,8 @@ virDomainDiskDefParseXML(virCapsPtr caps, rawio = virXMLPropString(node, "rawio"); cdbfilter = virXMLPropString(node, "cdbfilter"); + if (flags & VIR_DOMAIN_XML_INTERNAL_OLD_DISK_CDBFILTER) + old_cdbfilter = virXMLPropString(node, "old_cdbfilter"); cur = node->children; while (cur != NULL) { @@ -4069,6 +4073,19 @@ virDomainDiskDefParseXML(virCapsPtr caps, def->cdbfilter = cdbfilter_val; } + if (old_cdbfilter) { + int old_cdbfilter_val = 0; + + if ((old_cdbfilter_val = + virDomainDiskCDBFilterTypeFromString(old_cdbfilter)) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("unknown disk cdbfilter setting '%s'"), + old_cdbfilter); + goto error; + } + def->old_cdbfilter = old_cdbfilter_val; + } + if (bus) { if ((def->bus = virDomainDiskBusTypeFromString(bus)) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, @@ -11985,6 +12002,12 @@ virDomainDiskDefFormat(virBufferPtr buf, if (def->cdbfilter) virBufferAsprintf(buf, " cdbfilter='%s'", virDomainDiskCDBFilterTypeToString(def->cdbfilter)); + + if ((flags & VIR_DOMAIN_XML_INTERNAL_OLD_DISK_CDBFILTER) && + def->old_cdbfilter) + virBufferAsprintf(buf, " old_cdbfilter='%s'", + virDomainDiskCDBFilterTypeToString(def->old_cdbfilter)); + if (def->snapshot && !(def->snapshot == VIR_DOMAIN_SNAPSHOT_LOCATION_NONE && def->readonly)) virBufferAsprintf(buf, " snapshot='%s'", @@ -13654,7 +13677,8 @@ virDomainIsAllVcpupinInherited(virDomainDefPtr def) verify(((VIR_DOMAIN_XML_INTERNAL_STATUS | VIR_DOMAIN_XML_INTERNAL_ACTUAL_NET | - VIR_DOMAIN_XML_INTERNAL_PCI_ORIG_STATES) + VIR_DOMAIN_XML_INTERNAL_PCI_ORIG_STATES | + VIR_DOMAIN_XML_INTERNAL_OLD_DISK_CDBFILTER) & DUMPXML_FLAGS) == 0); /* This internal version can accept VIR_DOMAIN_XML_INTERNAL_*, @@ -13676,7 +13700,8 @@ virDomainDefFormatInternal(virDomainDefPtr def, virCheckFlags(DUMPXML_FLAGS | VIR_DOMAIN_XML_INTERNAL_STATUS | VIR_DOMAIN_XML_INTERNAL_ACTUAL_NET | - VIR_DOMAIN_XML_INTERNAL_PCI_ORIG_STATES, + VIR_DOMAIN_XML_INTERNAL_PCI_ORIG_STATES | + VIR_DOMAIN_XML_INTERNAL_OLD_DISK_CDBFILTER, -1); if (!(type = virDomainVirtTypeToString(def->virtType))) { @@ -14434,7 +14459,8 @@ int virDomainSaveStatus(virCapsPtr caps, unsigned int flags = (VIR_DOMAIN_XML_SECURE | VIR_DOMAIN_XML_INTERNAL_STATUS | VIR_DOMAIN_XML_INTERNAL_ACTUAL_NET | - VIR_DOMAIN_XML_INTERNAL_PCI_ORIG_STATES); + VIR_DOMAIN_XML_INTERNAL_PCI_ORIG_STATES | + VIR_DOMAIN_XML_INTERNAL_OLD_DISK_CDBFILTER); int ret = -1; char *xml; @@ -14534,7 +14560,8 @@ static virDomainObjPtr virDomainLoadStatus(virCapsPtr caps, if (!(obj = virDomainObjParseFile(caps, statusFile, expectedVirtTypes, VIR_DOMAIN_XML_INTERNAL_STATUS | VIR_DOMAIN_XML_INTERNAL_ACTUAL_NET | - VIR_DOMAIN_XML_INTERNAL_PCI_ORIG_STATES))) + VIR_DOMAIN_XML_INTERNAL_PCI_ORIG_STATES | + VIR_DOMAIN_XML_INTERNAL_OLD_DISK_CDBFILTER))) goto error; virUUIDFormat(obj->def->uuid, uuidstr); -- 1.7.7.6

On 12/05/2012 02:25 AM, Osier Yang wrote:
Hi,
As a result of RFC [1], this implements the unprivleged SG_IO support. Testing is not that enough, but I'd like see the reviewing earlier, and meanwhile I'm not going to give up the further testing.
I didn't see where [1] links to.
Patches 1/10 ~ 4/10 are to introduce the internal list for shared disks.
Osier Yang (15): qemu: Introduce a list to maintain the shared disks between domains qemu: Init/Free the list with the driver's lifecyle qemu: Add/remove the shared disk entry during domain's lifecyle qemu: Add/Remove the entry of sharedDisks when live attaching/detaching
Are you properly handling backing files as potential shared disks? I think we really need to reach the point where we can specify the entire backing chain as part of domain XML, and where we fill in the backing chain if the user didn't provide it up front. After all, it is entirely reasonable to allow for: base <- A.img <- B.img where A.img is used by domain A, B.img is used by domain B, but where we'd like to have SG_IO support for both domain's read-only use of base. I don't know if this will impact what needs to happen in your series. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 2012年12月07日 08:59, Eric Blake wrote:
On 12/05/2012 02:25 AM, Osier Yang wrote:
Hi,
As a result of RFC [1], this implements the unprivleged SG_IO support.
<...>
Testing is not that enough, but I'd like see the reviewing earlier, and meanwhile I'm not going to give up the further testing. </...>
This should have been removed, as I did much testing after v2.
I didn't see where [1] links to.
It's https://www.redhat.com/archives/libvir-list/2012-November/msg00988.html, it was missed when posting new set. :-)
Patches 1/10 ~ 4/10 are to introduce the internal list for shared disks.
Osier Yang (15): qemu: Introduce a list to maintain the shared disks between domains qemu: Init/Free the list with the driver's lifecyle qemu: Add/remove the shared disk entry during domain's lifecyle qemu: Add/Remove the entry of sharedDisks when live attaching/detaching
Are you properly handling backing files as potential shared disks?
Not yet. Thanks for reminding this. But could the base disk be a scsi block device? AFAIK, it could be only raw and qcow2 currently.
I think we really need to reach the point where we can specify the entire backing chain as part of domain XML, and where we fill in the backing chain if the user didn't provide it up front.
Yeah, regardless of this patch set, I think it's nice for not have to determine the backing chain every time, and isn't the backing chain lost after restarting or reloading libvirtd? Though the backing chain could be changed outside of libvirt, any external changing is not support by libvirt. So that will be fine. Should we also determine the backing chains for attached domain process? I.E, the API qemuDomainAttach. These are not related to this topic, but good to raise up. After all, it is entirely
reasonable to allow for:
base<- A.img <- B.img
where A.img is used by domain A, B.img is used by domain B, but where we'd like to have SG_IO support for both domain's read-only use of base. I don't know if this will impact what needs to happen in your series.
Let's ignore whether the base disk could be block device or not here, and with assuming that the backing chain is already exposed to domain XML, that means we have to add "cdbfilter" and "old_cdbfilter" to struct _virStorageFileMetadata, and allows it only for the root base once the backing are exposed to domain XML. And then just like what this set does for general disks, it has to have various checking/setting/restoring when domain starting, destroying, disk attaching, detaching. block-pull/block-commit should not be affected, as they doesn't change the root base. Regards, Osier

On Thu, Dec 6, 2012 at 6:59 PM, Eric Blake <eblake@redhat.com> wrote:
On 12/05/2012 02:25 AM, Osier Yang wrote:
Hi,
As a result of RFC [1], this implements the unprivleged SG_IO support. Testing is not that enough, but I'd like see the reviewing earlier, and meanwhile I'm not going to give up the further testing.
I didn't see where [1] links to.
Patches 1/10 ~ 4/10 are to introduce the internal list for shared disks.
Osier Yang (15): qemu: Introduce a list to maintain the shared disks between domains qemu: Init/Free the list with the driver's lifecyle qemu: Add/remove the shared disk entry during domain's lifecyle qemu: Add/Remove the entry of sharedDisks when live attaching/detaching
Are you properly handling backing files as potential shared disks? I think we really need to reach the point where we can specify the entire backing chain as part of domain XML, and where we fill in the backing chain if the user didn't provide it up front. After all, it is entirely reasonable to allow for:
base <- A.img <- B.img
where A.img is used by domain A, B.img is used by domain B, but where we'd like to have SG_IO support for both domain's read-only use of base. I don't know if this will impact what needs to happen in your series.
Not to throw a +1 here but really I have to agree with you. I'd like to see this as it would likely lead to some good cleanup and improvement of functionality regarding backing chains and cross depends. -- Doug Goldstein
participants (5)
-
Daniel P. Berrange
-
Doug Goldstein
-
Eric Blake
-
Jiri Denemark
-
Osier Yang