[PATCH 0/3] ch: monitor daemonization, sync after reboot, and shutdown fixes
by Kirill Shchetiniuk
1. Run CH monitor as a daemon
Made the monitor process daemonized to prevent VM termination if
the CH driver crashes. Added pidfile for daemon's pid aquiring and tracking
as well as its init.
2. Update domain info after reboot
Fixed an issue where domain properties (e.g., serial console path)
were not updated after VM reboot. Added VIR_CH_EVENT_VM_REBOOTED
handling to keep the transient domain definition consistent.
3. Update VM shutdown event handler
VM monitor was still up even if VM was shut off, which led to an
inability to start the domain again.
virsh # shutdown ch-test
Domain 'ch-test' is being shutdown
virsh # list
Id Name State
------------------------------
722117 ch-test shut off
Ensured the CH monitor process terminates along with the
VM shutdown (e.g., executed using virsh). Updated
virCHEventStopProcess to have proper job type.
Kirill Shchetiniuk (3):
ch: virCHMonitorNew() run new CH monitor daemonized
ch: virCHProcessEvent() update domain info after reboot
ch: virCHProcessEvent() vm shutdown event handler fix
src/ch/ch_domain.c | 1 +
src/ch/ch_domain.h | 1 +
src/ch/ch_events.c | 8 ++++----
src/ch/ch_monitor.c | 24 ++++++++++++++++++++++--
src/ch/ch_process.c | 18 +++++++++++++++++-
src/ch/ch_process.h | 2 ++
6 files changed, 47 insertions(+), 7 deletions(-)
--
2.48.1
1 hour, 9 minutes
[PATCH] build: clang stack frame size handling improvement
by Roman Bogorodskiy
The 'plain' optimization type also triggers the clang stack frame size
issues, so increase limit for it as well.
Signed-off-by: Roman Bogorodskiy <bogorodskiy(a)gmail.com>
---
meson.build | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/meson.build b/meson.build
index 56823ca25b..0a402a19a2 100644
--- a/meson.build
+++ b/meson.build
@@ -259,7 +259,7 @@ alloc_max = run_command(
stack_frame_size = 2048
# clang without optimization enlarges stack frames in certain corner cases
-if cc.get_id() == 'clang' and get_option('optimization') == '0'
+if cc.get_id() == 'clang' and get_option('optimization') in ['plain', '0']
stack_frame_size = 4096
endif
--
2.49.0
2 hours, 7 minutes
[PATCH v1 00/18] LIBVIRT: X86: TDX support
by Zhenzhong Duan
Hi,
This series brings libvirt the x86 TDX support.
* What's TDX?
TDX stands for Trust Domain Extensions which isolates VMs from
the virtual-machine manager (VMM)/hypervisor and any other software on
the platform.
This patchset extends libvirt to support TDX, with which one can start a TDX
guest from high level rather than running qemu directly.
* Misc
As QEMU use a software emulated way to reset guest which isn't supported by TDX
guest for security reason. We simulate reboot for TDX guest by kill and create a
new one in FakeReboot framework.
Complete code can be found at [1], matching qemu code can be found at [2].
There is a 'QGS' element for attestation which isn't in matching qemu[2] yet.
I keep them intentionally as they will be implemented in qemu as extention
series of [2].
* Test
start/stop/reboot/reset with virsh
stop/reboot trigger in guest
stop with on_poweroff=destroy/restart
reboot with on_reboot=destroy/restart
* Patch organization
- patch 1-4: Support query of TDX capabilities
- patch 5-11: Add TDX type to launchsecurity framework
- patch 12-17: Add reboot/reset support to TDX guest
- patch 18: Add docs
TODO:
- add reconnect logic in virsh command
[1] https://github.com/intel/libvirt-tdx/commits/tdx_for_upstream_v1
[2] https://github.com/intel-staging/qemu-tdx/tree/tdx-qemu-upstream-v8
Thanks
Zhenzhong
Changelog:
v1:
- s/virQEMUCapsKVMSupportsSecureGuestINTEL/virQEMUCapsKVMSupportsSecureGuestTDX (Daniel)
- make policy element optional and expose to QEMU directly (Daniel)
- s/qemuProcessSecFakeReboot/qemuProcessFakeRebootViaRecreate (Daniel)
- simplify QGS element schema by supporting only UNIX socket (Daniel)
- add new events VIR_DOMAIN_EVENT_[STOPPED|STARTED] for control plane (Daniel)
- s/quoteGenerationService/quoteGenerationSocket as QEMU
- add virsh reset support
rfcv4:
- add a check to tools/virt-host-validate-qemu.c (Daniel)
- remove check of q35 (Daniel)
- model 'SocktetAddress' QAPI in xml schema (Daniel)
- s/Quote-Generation-Service/quoteGenerationService/ (Daniel)
- define bits in tdx->policy and add validating logic (Daniel)
- presume QEMU choose split kernel irqchip for TDX guest by default (Daniel)
- utilize existing FakeReboot framework to do reboot for TDX guest (Daniel)
- drop patch11 'conf: Add support to keep same domid for hard reboot' (Daniel)
- add test in tests/ to validate parsing and formatting logic (Daniel)
- add doc in docs/formatdomain.rst (Daniel)
- add R-B
rfcv3:
- Change to generate qemu cmdline with -bios
- drop firmware auto match as -bios is used
- add a hard reboot method to reboot TDX guest
rfcv3: https://www.mail-archive.com/devel@lists.libvirt.org/msg00385.html
rfcv2:
- give up using qmp cmd and check TDX directly on host for TDX capabilities.
- use launchsecurity framework to support TDX
- use <os>.<loader> for general loader
- add auto firmware match feature for TDX
A example TDVF fimware description file 70-edk2-x86_64-tdx.json:
{
"description": "UEFI firmware for x86_64, supporting Intel TDX",
"interface-types": [
"uefi"
],
"mapping": {
"device": "generic",
"filename": "/usr/share/OVMF/OVMF_CODE-tdx.fd"
},
"targets": [
{
"architecture": "x86_64",
"machines": [
"pc-q35-*"
]
}
],
"features": [
"intel-tdx",
"verbose-dynamic"
],
"tags": [
]
}
rfcv2: https://www.mail-archive.com/libvir-list@redhat.com/msg219378.html
Zhenzhong Duan (18):
tools: Secure guest check for Intel in virt-host-validate
qemu: Check if INTEL Trust Domain Extention support is enabled
qemu: Add TDX capability
conf: Expose TDX feature in domain capabilities
conf: Add tdx as launch security type
conf: Validate TDX launchSecurity element
mrConfigId/mrOwner/mrOwnerConfig
qemu: Add command line and validation for TDX type
conf: Expose TDX type in domain launch security capability
qemu: Force special parameters enabled for TDX guest
conf: Add Intel TDX Quote Generation Service(QGS) support
qemu: Add command line for TDX Quote Generation Service(QGS)
qemu: Add FakeReboot support for TDX guest
qemu: Support reboot command in guest
qemu: Avoid duplicate FakeReboot for secure guest
qemu: Send event VIR_DOMAIN_EVENT_[STOPPED|STARTED] during recreation
qemu: Bypass sending VIR_DOMAIN_EVENT_RESUMED event when TD VM reboot
qemu: Support domain reset command for TDX guest
docs: domain: Add documentation for Intel TDX guest
docs/formatdomain.rst | 63 ++++++++++++++++++
docs/formatdomaincaps.rst | 1 +
examples/c/misc/event-test.c | 6 ++
include/libvirt/libvirt-domain.h | 2 +
src/conf/domain_capabilities.c | 1 +
src/conf/domain_capabilities.h | 1 +
src/conf/domain_conf.c | 82 +++++++++++++++++++++++
src/conf/domain_conf.h | 21 ++++++
src/conf/domain_validate.c | 11 ++++
src/conf/schemas/domaincaps.rng | 9 +++
src/conf/schemas/domaincommon.rng | 41 ++++++++++++
src/conf/virconftypes.h | 2 +
src/qemu/qemu_capabilities.c | 38 ++++++++++-
src/qemu/qemu_capabilities.h | 1 +
src/qemu/qemu_cgroup.c | 1 +
src/qemu/qemu_command.c | 54 +++++++++++++++
src/qemu/qemu_driver.c | 7 ++
src/qemu/qemu_firmware.c | 1 +
src/qemu/qemu_monitor.c | 28 +++++++-
src/qemu/qemu_monitor.h | 2 +-
src/qemu/qemu_monitor_json.c | 6 +-
src/qemu/qemu_namespace.c | 1 +
src/qemu/qemu_process.c | 105 ++++++++++++++++++++++++++++--
src/qemu/qemu_process.h | 2 +
src/qemu/qemu_validate.c | 45 +++++++++++++
src/security/security_dac.c | 2 +
tools/virsh-domain-event.c | 6 +-
tools/virt-host-validate-common.c | 31 ++++++++-
tools/virt-host-validate-common.h | 1 +
29 files changed, 558 insertions(+), 13 deletions(-)
--
2.34.1
3 hours, 31 minutes
[PATCH 0/3] ch: Misc fixes
by Michal Privoznik
*** BLURB HERE ***
Michal Prívozník (3):
ch: Use CH_DOMAIN_PRIVATE() more
ch: Drop pid from monitor
ch: Fix printf format strings wrt size_t argument
src/ch/ch_events.c | 4 ++--
src/ch/ch_monitor.c | 14 ++++----------
src/ch/ch_monitor.h | 2 --
src/ch/ch_process.c | 11 +++++------
4 files changed, 11 insertions(+), 20 deletions(-)
--
2.49.0
3 hours, 42 minutes
[PATCH 1/5] virpci: Switch to an implementation using libpciaccess
by Alexander Shursha
The current virpci code uses the Linux-specific sysfs subsystem, which makes
it impossible to use on other Unix-like systems. The libpciaccess library
provides a cross-platform API for accessing the PCI bus. Employ it to make
the code portable.
This makes libpciaccess a non-optional dependency of libvirt.
Signed-off-by: Alexander Shursha <kekek2(a)ya.ru>
---
meson.build | 11 +-
meson_options.txt | 4 +-
src/meson.build | 1 +
src/util/virpci.c | 465 +++++++++++----------------------------------
tests/virpcimock.c | 22 ++-
5 files changed, 139 insertions(+), 364 deletions(-)
diff --git a/meson.build b/meson.build
index 2d76a0846c..85bd882406 100644
--- a/meson.build
+++ b/meson.build
@@ -1207,7 +1207,10 @@ parallels_sdk_version = '7.0.22'
parallels_sdk_dep = dependency('parallels-sdk', version: '>=' + parallels_sdk_version, required: false)
pciaccess_version = '0.10.0'
-pciaccess_dep = dependency('pciaccess', version: '>=' + pciaccess_version, required: get_option('pciaccess'))
+pciaccess_dep = dependency('pciaccess', version: '>=' + pciaccess_version)
+if not pciaccess_dep.found()
+ error('pciaccess is required to build libvirt')
+endif
rbd_dep = cc.find_library('rbd', required: get_option('storage_rbd'))
rados_dep = cc.find_library('rados', required: get_option('storage_rbd'))
@@ -1461,11 +1464,6 @@ if not get_option('polkit').disabled()
endif
endif
-if udev_dep.found() and not pciaccess_dep.found()
- error('You must install the pciaccess module to build with udev')
-endif
-
-
# build driver options
remote_default_mode = get_option('remote_default_mode')
@@ -2341,7 +2339,6 @@ libs_summary = {
'numactl': numactl_dep.found(),
'openwsman': openwsman_dep.found(),
'parallels-sdk': parallels_sdk_dep.found(),
- 'pciaccess': pciaccess_dep.found(),
'polkit': conf.has('WITH_POLKIT'),
'rbd': rbd_dep.found(),
'readline': readline_dep.found(),
diff --git a/meson_options.txt b/meson_options.txt
index 3dc3e8667b..69605238ba 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -38,7 +38,6 @@ option('netcf', type: 'feature', value: 'auto', description: 'netcf support')
option('nls', type: 'feature', value: 'auto', description: 'nls support')
option('numactl', type: 'feature', value: 'auto', description: 'numactl support')
option('openwsman', type: 'feature', value: 'auto', description: 'openwsman support')
-option('pciaccess', type: 'feature', value: 'auto', description: 'pciaccess support')
option('polkit', type: 'feature', value: 'auto', description: 'use PolicyKit for UNIX socket access checks')
option('readline', type: 'feature', value: 'auto', description: 'readline support')
option('sanlock', type: 'feature', value: 'auto', description: 'sanlock support')
@@ -46,7 +45,6 @@ option('sasl', type: 'feature', value: 'auto', description: 'sasl support')
option('selinux', type: 'feature', value: 'auto', description: 'selinux support')
option('selinux_mount', type: 'string', value: '', description: 'set SELinux mount point')
option('sshconfdir', type: 'string', value: '', description: 'directory for SSH client configuration')
-# dep:pciaccess
option('udev', type: 'feature', value: 'auto', description: 'udev support')
# dep:driver_remote
option('wireshark_dissector', type: 'feature', value: 'auto', description: 'wireshark support')
@@ -59,7 +57,7 @@ option('driver_bhyve', type: 'feature', value: 'auto', description: 'bhyve drive
option('driver_esx', type: 'feature', value: 'auto', description: 'esx driver')
# dep:openwsman
option('driver_hyperv', type: 'feature', value: 'auto', description: 'Hyper-V driver')
-# dep:pciaccess dep:udev dep:driver_remote dep:driver_libvirtd
+# dep:udev dep:driver_remote dep:driver_libvirtd
option('driver_interface', type: 'feature', value: 'auto', description: 'host interface driver')
# dep:driver_remote
option('driver_libvirtd', type: 'feature', value: 'auto', description: 'libvirtd driver')
diff --git a/src/meson.build b/src/meson.build
index 9413192a55..39788ac4d7 100644
--- a/src/meson.build
+++ b/src/meson.build
@@ -411,6 +411,7 @@ libvirt_lib = shared_library(
dtrace_gen_objects,
dependencies: [
src_dep,
+ pciaccess_dep
],
link_args: libvirt_link_args,
link_whole: [
diff --git a/src/util/virpci.c b/src/util/virpci.c
index 90617e69c6..b5bbe73ece 100644
--- a/src/util/virpci.c
+++ b/src/util/virpci.c
@@ -2,6 +2,7 @@
* virpci.c: helper APIs for managing host PCI devices
*
* Copyright (C) 2009-2015 Red Hat, Inc.
+ * Copyright (C) 2024-2025 Future Crew, LLC
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
@@ -29,6 +30,7 @@
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
+#include <pciaccess.h>
#ifdef __linux__
# include <sys/utsname.h>
@@ -72,7 +74,7 @@ struct _virPCIDevice {
char *name; /* domain:bus:slot.function */
char id[PCI_ID_LEN]; /* product vendor */
- char *path;
+ struct pci_device *device;
/* The driver:domain which uses the device */
char *used_by_drvname;
@@ -359,121 +361,6 @@ virPCIDeviceGetCurrentDriverNameAndType(virPCIDevice *dev,
}
-static int
-virPCIDeviceConfigOpenInternal(virPCIDevice *dev, bool readonly, bool fatal)
-{
- int fd;
-
- fd = open(dev->path, readonly ? O_RDONLY : O_RDWR);
-
- if (fd < 0) {
- if (fatal) {
- virReportSystemError(errno,
- _("Failed to open config space file '%1$s'"),
- dev->path);
- } else {
- VIR_WARN("Failed to open config space file '%s': %s",
- dev->path, g_strerror(errno));
- }
- return -1;
- }
-
- VIR_DEBUG("%s %s: opened %s", dev->id, dev->name, dev->path);
- return fd;
-}
-
-static int
-virPCIDeviceConfigOpen(virPCIDevice *dev)
-{
- return virPCIDeviceConfigOpenInternal(dev, true, true);
-}
-
-static int
-virPCIDeviceConfigOpenTry(virPCIDevice *dev)
-{
- return virPCIDeviceConfigOpenInternal(dev, true, false);
-}
-
-static int
-virPCIDeviceConfigOpenWrite(virPCIDevice *dev)
-{
- return virPCIDeviceConfigOpenInternal(dev, false, true);
-}
-
-static void
-virPCIDeviceConfigClose(virPCIDevice *dev, int cfgfd)
-{
- if (VIR_CLOSE(cfgfd) < 0) {
- VIR_WARN("Failed to close config space file '%s': %s",
- dev->path, g_strerror(errno));
- }
-}
-
-
-static int
-virPCIDeviceRead(virPCIDevice *dev,
- int cfgfd,
- unsigned int pos,
- uint8_t *buf,
- unsigned int buflen)
-{
- memset(buf, 0, buflen);
- errno = 0;
-
- if (lseek(cfgfd, pos, SEEK_SET) != pos ||
- saferead(cfgfd, buf, buflen) != buflen) {
- VIR_DEBUG("Failed to read %u bytes at %u from '%s' : %s",
- buflen, pos, dev->path, g_strerror(errno));
- return -1;
- }
- return 0;
-}
-
-
-/**
- * virPCIDeviceReadN:
- * @dev: virPCIDevice object (used only to log name of config file)
- * @cfgfd: open file descriptor for device config file in sysfs
- * @pos: byte offset in the file to read from
- *
- * read "N" (where "N" is "8", "16", or "32", and appears at the end
- * of the function name) bytes from a PCI device's already-opened
- * sysfs config file and return them as the return value from the
- * function.
- *
- * Returns the value at @pos in the file, or 0 if there was an
- * error. NB: since 0 could be a valid value, occurrence of an error
- * must be determined by examining errno. errno is always reset to 0
- * before the seek/read is attempted (see virPCIDeviceRead()), so if
- * errno != 0 on return from one of these functions, then either the
- * seek or the read operation failed for some reason. If errno == 0
- * and the return value is 0, then the config file really does contain
- * the value 0 at @pos.
- */
-static uint8_t
-virPCIDeviceRead8(virPCIDevice *dev, int cfgfd, unsigned int pos)
-{
- uint8_t buf;
- virPCIDeviceRead(dev, cfgfd, pos, &buf, sizeof(buf));
- return buf;
-}
-
-static uint16_t
-virPCIDeviceRead16(virPCIDevice *dev, int cfgfd, unsigned int pos)
-{
- uint8_t buf[2];
- virPCIDeviceRead(dev, cfgfd, pos, &buf[0], sizeof(buf));
- return (buf[0] << 0) | (buf[1] << 8);
-}
-
-static uint32_t
-virPCIDeviceRead32(virPCIDevice *dev, int cfgfd, unsigned int pos)
-{
- uint8_t buf[4];
- virPCIDeviceRead(dev, cfgfd, pos, &buf[0], sizeof(buf));
- return (buf[0] << 0) | (buf[1] << 8) | (buf[2] << 16) | (buf[3] << 24);
-}
-
static int
virPCIDeviceReadClass(virPCIDevice *dev, uint16_t *device_class)
{
@@ -499,36 +386,6 @@ virPCIDeviceReadClass(virPCIDevice *dev, uint16_t *device_class)
return 0;
}
-static int
-virPCIDeviceWrite(virPCIDevice *dev,
- int cfgfd,
- unsigned int pos,
- uint8_t *buf,
- unsigned int buflen)
-{
- if (lseek(cfgfd, pos, SEEK_SET) != pos ||
- safewrite(cfgfd, buf, buflen) != buflen) {
- VIR_WARN("Failed to write to '%s' : %s", dev->path,
- g_strerror(errno));
- return -1;
- }
- return 0;
-}
-
-static void
-virPCIDeviceWrite16(virPCIDevice *dev, int cfgfd, unsigned int pos, uint16_t val)
-{
- uint8_t buf[2] = { (val >> 0), (val >> 8) };
- virPCIDeviceWrite(dev, cfgfd, pos, &buf[0], sizeof(buf));
-}
-
-static void
-virPCIDeviceWrite32(virPCIDevice *dev, int cfgfd, unsigned int pos, uint32_t val)
-{
- uint8_t buf[4] = { (val >> 0), (val >> 8), (val >> 16), (val >> 24) };
- virPCIDeviceWrite(dev, cfgfd, pos, &buf[0], sizeof(buf));
-}
-
typedef int (*virPCIDeviceIterPredicate)(virPCIDevice *, virPCIDevice *,
void *);
@@ -610,7 +467,6 @@ virPCIDeviceIterDevices(virPCIDeviceIterPredicate predicate,
*/
static int
virPCIDeviceFindCapabilityOffset(virPCIDevice *dev,
- int cfgfd,
unsigned int capability,
unsigned int *offset)
{
@@ -619,11 +475,13 @@ virPCIDeviceFindCapabilityOffset(virPCIDevice *dev,
*offset = 0; /* assume failure (*nothing* can be at offset 0) */
- status = virPCIDeviceRead16(dev, cfgfd, PCI_STATUS);
+ pci_device_cfg_read_u16(dev->device, &status, PCI_STATUS);
+
if (errno != 0 || !(status & PCI_STATUS_CAP_LIST))
goto error;
- pos = virPCIDeviceRead8(dev, cfgfd, PCI_CAPABILITY_LIST);
+ pci_device_cfg_read_u8(dev->device, &pos, PCI_CAPABILITY_LIST);
+
if (errno != 0)
goto error;
@@ -635,7 +493,9 @@ virPCIDeviceFindCapabilityOffset(virPCIDevice *dev,
* capabilities here.
*/
while (pos >= PCI_CONF_HEADER_LEN && pos != 0xff) {
- uint8_t capid = virPCIDeviceRead8(dev, cfgfd, pos);
+ uint8_t capid;
+ pci_device_cfg_read_u8(dev->device, &capid, pos);
+
if (errno != 0)
goto error;
@@ -646,7 +506,8 @@ virPCIDeviceFindCapabilityOffset(virPCIDevice *dev,
return 0;
}
- pos = virPCIDeviceRead8(dev, cfgfd, pos + 1);
+ pci_device_cfg_read_u8(dev->device, &pos, pos + 1);
+
if (errno != 0)
goto error;
}
@@ -665,7 +526,6 @@ virPCIDeviceFindCapabilityOffset(virPCIDevice *dev,
static unsigned int
virPCIDeviceFindExtendedCapabilityOffset(virPCIDevice *dev,
- int cfgfd,
unsigned int capability)
{
int ttl;
@@ -677,7 +537,7 @@ virPCIDeviceFindExtendedCapabilityOffset(virPCIDevice *dev,
pos = PCI_EXT_CAP_BASE;
while (ttl > 0 && pos >= PCI_EXT_CAP_BASE) {
- header = virPCIDeviceRead32(dev, cfgfd, pos);
+ header = pci_device_cfg_read_u32(dev->device, &header, pos);
if ((header & PCI_EXT_CAP_ID_MASK) == capability)
return pos;
@@ -693,7 +553,7 @@ virPCIDeviceFindExtendedCapabilityOffset(virPCIDevice *dev,
* not have FLR, 1 if it does, and -1 on error
*/
static bool
-virPCIDeviceDetectFunctionLevelReset(virPCIDevice *dev, int cfgfd)
+virPCIDeviceDetectFunctionLevelReset(virPCIDevice *dev)
{
uint32_t caps;
unsigned int pos;
@@ -707,7 +567,7 @@ virPCIDeviceDetectFunctionLevelReset(virPCIDevice *dev, int cfgfd)
* on SR-IOV NICs at the moment.
*/
if (dev->pcie_cap_pos) {
- caps = virPCIDeviceRead32(dev, cfgfd, dev->pcie_cap_pos + PCI_EXP_DEVCAP);
+ pci_device_cfg_read_u32(dev->device, &caps, dev->pcie_cap_pos + PCI_EXP_DEVCAP);
if (caps & PCI_EXP_DEVCAP_FLR) {
VIR_DEBUG("%s %s: detected PCIe FLR capability", dev->id, dev->name);
return true;
@@ -718,11 +578,13 @@ virPCIDeviceDetectFunctionLevelReset(virPCIDevice *dev, int cfgfd)
* the same thing, except for conventional PCI
* devices. This is not common yet.
*/
- if (virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_AF, &pos) < 0)
+ if (virPCIDeviceFindCapabilityOffset(dev, PCI_CAP_ID_AF, &pos) < 0)
goto error;
if (pos) {
- caps = virPCIDeviceRead16(dev, cfgfd, pos + PCI_AF_CAP);
+ uint16_t caps16;
+ pci_device_cfg_read_u16(dev->device, &caps16, pos + PCI_AF_CAP);
+ caps = caps16;
if (caps & PCI_AF_CAP_FLR) {
VIR_DEBUG("%s %s: detected PCI FLR capability", dev->id, dev->name);
return true;
@@ -754,13 +616,13 @@ virPCIDeviceDetectFunctionLevelReset(virPCIDevice *dev, int cfgfd)
* internal reset, not just a soft reset.
*/
static bool
-virPCIDeviceDetectPowerManagementReset(virPCIDevice *dev, int cfgfd)
+virPCIDeviceDetectPowerManagementReset(virPCIDevice *dev)
{
if (dev->pci_pm_cap_pos) {
uint32_t ctl;
/* require the NO_SOFT_RESET bit is clear */
- ctl = virPCIDeviceRead32(dev, cfgfd, dev->pci_pm_cap_pos + PCI_PM_CTRL);
+ pci_device_cfg_read_u32(dev->device, &ctl, dev->pci_pm_cap_pos + PCI_PM_CTRL);
if (!(ctl & PCI_PM_CTRL_NO_SOFT_RESET)) {
VIR_DEBUG("%s %s: detected PM reset capability", dev->id, dev->name);
return true;
@@ -811,26 +673,22 @@ virPCIDeviceIsParent(virPCIDevice *dev, virPCIDevice *check, void *data)
uint8_t header_type, secondary, subordinate;
virPCIDevice **best = data;
int ret = 0;
- int fd;
if (dev->address.domain != check->address.domain)
return 0;
- if ((fd = virPCIDeviceConfigOpenTry(check)) < 0)
- return 0;
-
/* Is it a bridge? */
ret = virPCIDeviceReadClass(check, &device_class);
if (ret < 0 || device_class != PCI_CLASS_BRIDGE_PCI)
- goto cleanup;
+ return ret;
/* Is it a plane? */
- header_type = virPCIDeviceRead8(check, fd, PCI_HEADER_TYPE);
+ pci_device_cfg_read_u8(check->device, &header_type, PCI_HEADER_TYPE);
if ((header_type & PCI_HEADER_TYPE_MASK) != PCI_HEADER_TYPE_BRIDGE)
- goto cleanup;
+ return ret;
- secondary = virPCIDeviceRead8(check, fd, PCI_SECONDARY_BUS);
- subordinate = virPCIDeviceRead8(check, fd, PCI_SUBORDINATE_BUS);
+ pci_device_cfg_read_u8(check->device, &secondary, PCI_SECONDARY_BUS);
+ pci_device_cfg_read_u8(check->device, &subordinate, PCI_SUBORDINATE_BUS);
VIR_DEBUG("%s %s: found parent device %s", dev->id, dev->name, check->name);
@@ -838,8 +696,7 @@ virPCIDeviceIsParent(virPCIDevice *dev, virPCIDevice *check, void *data)
* the direct parent. No further work is necessary
*/
if (dev->address.bus == secondary) {
- ret = 1;
- goto cleanup;
+ return 1;
}
/* otherwise, SRIOV allows VFs to be on different buses than their PFs.
@@ -850,35 +707,26 @@ virPCIDeviceIsParent(virPCIDevice *dev, virPCIDevice *check, void *data)
if (*best == NULL) {
*best = virPCIDeviceNew(&check->address);
if (*best == NULL) {
- ret = -1;
- goto cleanup;
+ return -1;
}
} else {
/* OK, we had already recorded a previous "best" match for the
* parent. See if the current device is more restrictive than the
* best, and if so, make it the new best
*/
- int bestfd;
uint8_t best_secondary;
- if ((bestfd = virPCIDeviceConfigOpenTry(*best)) < 0)
- goto cleanup;
- best_secondary = virPCIDeviceRead8(*best, bestfd, PCI_SECONDARY_BUS);
- virPCIDeviceConfigClose(*best, bestfd);
+ pci_device_cfg_read_u8((*best)->device, &best_secondary, PCI_SECONDARY_BUS);
if (secondary > best_secondary) {
virPCIDeviceFree(*best);
*best = virPCIDeviceNew(&check->address);
if (*best == NULL) {
- ret = -1;
- goto cleanup;
+ return -1;
}
}
}
}
-
- cleanup:
- virPCIDeviceConfigClose(check, fd);
return ret;
}
@@ -902,15 +750,14 @@ virPCIDeviceGetParent(virPCIDevice *dev, virPCIDevice **parent)
*/
static int
virPCIDeviceTrySecondaryBusReset(virPCIDevice *dev,
- int cfgfd,
virPCIDeviceList *inactiveDevs)
{
g_autoptr(virPCIDevice) parent = NULL;
g_autoptr(virPCIDevice) conflict = NULL;
uint8_t config_space[PCI_CONF_LEN];
uint16_t ctl;
- int ret = -1;
- int parentfd;
+ pciaddr_t bytes_read;
+ pciaddr_t bytes_written;
/* Refuse to do a secondary bus reset if there are other
* devices/functions behind the bus are used by the host
@@ -932,8 +779,6 @@ virPCIDeviceTrySecondaryBusReset(virPCIDevice *dev,
dev->name);
return -1;
}
- if ((parentfd = virPCIDeviceConfigOpenWrite(parent)) < 0)
- goto out;
VIR_DEBUG("%s %s: doing a secondary bus reset", dev->id, dev->name);
@@ -941,38 +786,37 @@ virPCIDeviceTrySecondaryBusReset(virPCIDevice *dev,
* for the supplied device since we refuse to do a reset if there
* are multiple devices/functions
*/
- if (virPCIDeviceRead(dev, cfgfd, 0, config_space, PCI_CONF_LEN) < 0) {
+ pci_device_cfg_read(dev->device, config_space, 0, PCI_CONF_LEN, &bytes_read);
+ if (bytes_read < PCI_CONF_LEN) {
virReportError(VIR_ERR_INTERNAL_ERROR,
_("Failed to read PCI config space for %1$s"),
dev->name);
- goto out;
+ return -1;
}
/* Read the control register, set the reset flag, wait 200ms,
* unset the reset flag and wait 200ms.
*/
- ctl = virPCIDeviceRead16(dev, parentfd, PCI_BRIDGE_CONTROL);
- virPCIDeviceWrite16(parent, parentfd, PCI_BRIDGE_CONTROL,
- ctl | PCI_BRIDGE_CTL_RESET);
+ pci_device_cfg_read_u16(parent->device, &ctl, PCI_BRIDGE_CONTROL);
+
+ pci_device_cfg_write_u16(parent->device, ctl | PCI_BRIDGE_CTL_RESET, PCI_BRIDGE_CONTROL);
g_usleep(200 * 1000); /* sleep 200ms */
- virPCIDeviceWrite16(parent, parentfd, PCI_BRIDGE_CONTROL, ctl);
+ pci_device_cfg_write_u16(parent->device, ctl, PCI_BRIDGE_CONTROL);
g_usleep(200 * 1000); /* sleep 200ms */
- if (virPCIDeviceWrite(dev, cfgfd, 0, config_space, PCI_CONF_LEN) < 0) {
+ pci_device_cfg_write(dev->device, config_space, 0, PCI_CONF_LEN, &bytes_written);
+ if (bytes_written < PCI_CONF_LEN) {
virReportError(VIR_ERR_INTERNAL_ERROR,
_("Failed to restore PCI config space for %1$s"),
dev->name);
- goto out;
+ return -1;
}
- ret = 0;
- out:
- virPCIDeviceConfigClose(parent, parentfd);
- return ret;
+ return 0;
}
/* Power management reset attempts to reset a device using a
@@ -980,16 +824,19 @@ virPCIDeviceTrySecondaryBusReset(virPCIDevice *dev,
* above we require the device supports a full internal reset.
*/
static int
-virPCIDeviceTryPowerManagementReset(virPCIDevice *dev, int cfgfd)
+virPCIDeviceTryPowerManagementReset(virPCIDevice *dev)
{
uint8_t config_space[PCI_CONF_LEN];
uint32_t ctl;
+ pciaddr_t bytes_read;
+ pciaddr_t bytes_written;
if (!dev->pci_pm_cap_pos)
return -1;
/* Save and restore the device's config space. */
- if (virPCIDeviceRead(dev, cfgfd, 0, &config_space[0], PCI_CONF_LEN) < 0) {
+ pci_device_cfg_read(dev->device, config_space, 0, PCI_CONF_LEN, &bytes_read);
+ if (bytes_read < PCI_CONF_LEN) {
virReportError(VIR_ERR_INTERNAL_ERROR,
_("Failed to read PCI config space for %1$s"),
dev->name);
@@ -998,20 +845,19 @@ virPCIDeviceTryPowerManagementReset(virPCIDevice *dev, int cfgfd)
VIR_DEBUG("%s %s: doing a power management reset", dev->id, dev->name);
- ctl = virPCIDeviceRead32(dev, cfgfd, dev->pci_pm_cap_pos + PCI_PM_CTRL);
+ pci_device_cfg_read_u32(dev->device, &ctl, dev->pci_pm_cap_pos + PCI_PM_CTRL);
ctl &= ~PCI_PM_CTRL_STATE_MASK;
- virPCIDeviceWrite32(dev, cfgfd, dev->pci_pm_cap_pos + PCI_PM_CTRL,
- ctl | PCI_PM_CTRL_STATE_D3hot);
+ pci_device_cfg_write_u32(dev->device, ctl | PCI_PM_CTRL_STATE_D3hot, dev->pci_pm_cap_pos + PCI_PM_CTRL);
g_usleep(10 * 1000); /* sleep 10ms */
- virPCIDeviceWrite32(dev, cfgfd, dev->pci_pm_cap_pos + PCI_PM_CTRL,
- ctl | PCI_PM_CTRL_STATE_D0);
+ pci_device_cfg_write_u32(dev->device, ctl | PCI_PM_CTRL_STATE_D0, dev->pci_pm_cap_pos + PCI_PM_CTRL);
g_usleep(10 * 1000); /* sleep 10ms */
- if (virPCIDeviceWrite(dev, cfgfd, 0, &config_space[0], PCI_CONF_LEN) < 0) {
+ pci_device_cfg_write(dev->device, config_space, 0, PCI_CONF_LEN, &bytes_written);
+ if (bytes_written < PCI_CONF_LEN) {
virReportError(VIR_ERR_INTERNAL_ERROR,
_("Failed to restore PCI config space for %1$s"),
dev->name);
@@ -1046,10 +892,10 @@ virPCIDeviceTryPowerManagementReset(virPCIDevice *dev, int cfgfd)
* Always returns success (0) (for now)
*/
static int
-virPCIDeviceInit(virPCIDevice *dev, int cfgfd)
+virPCIDeviceInit(virPCIDevice *dev)
{
dev->is_pcie = false;
- if (virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_EXP, &dev->pcie_cap_pos) < 0) {
+ if (virPCIDeviceFindCapabilityOffset(dev, PCI_CAP_ID_EXP, &dev->pcie_cap_pos) < 0) {
/* an unprivileged process is unable to read *all* of a
* device's PCI config (it can only read the first 64
* bytes, which isn't enough for see the Express
@@ -1065,18 +911,13 @@ virPCIDeviceInit(virPCIDevice *dev, int cfgfd)
* -1), then we blindly assume the most likely outcome -
* PCIe.
*/
- off_t configLen = virFileLength(virPCIDeviceGetConfigPath(dev), -1);
-
- if (configLen != 256)
- dev->is_pcie = true;
-
} else {
dev->is_pcie = (dev->pcie_cap_pos != 0);
}
- virPCIDeviceFindCapabilityOffset(dev, cfgfd, PCI_CAP_ID_PM, &dev->pci_pm_cap_pos);
- dev->has_flr = virPCIDeviceDetectFunctionLevelReset(dev, cfgfd);
- dev->has_pm_reset = virPCIDeviceDetectPowerManagementReset(dev, cfgfd);
+ virPCIDeviceFindCapabilityOffset(dev, PCI_CAP_ID_PM, &dev->pci_pm_cap_pos);
+ dev->has_flr = virPCIDeviceDetectFunctionLevelReset(dev);
+ dev->has_pm_reset = virPCIDeviceDetectPowerManagementReset(dev);
return 0;
}
@@ -1089,7 +930,6 @@ virPCIDeviceReset(virPCIDevice *dev,
g_autofree char *drvName = NULL;
virPCIStubDriver drvType;
int ret = -1;
- int fd = -1;
int hdrType = -1;
if (virPCIGetHeaderType(dev, &hdrType) < 0)
@@ -1114,29 +954,26 @@ virPCIDeviceReset(virPCIDevice *dev,
* be redundant.
*/
if (virPCIDeviceGetCurrentDriverNameAndType(dev, &drvName, &drvType) < 0)
- goto cleanup;
+ return -1;
if (drvType == VIR_PCI_STUB_DRIVER_VFIO) {
VIR_DEBUG("Device %s is bound to %s - skip reset", dev->name, drvName);
ret = 0;
- goto cleanup;
+ return 0;
}
VIR_DEBUG("Resetting device %s", dev->name);
- if ((fd = virPCIDeviceConfigOpenWrite(dev)) < 0)
- goto cleanup;
-
- if (virPCIDeviceInit(dev, fd) < 0)
- goto cleanup;
+ if (virPCIDeviceInit(dev) < 0)
+ return -1;
/* KVM will perform FLR when starting and stopping
* a guest, so there is no need for us to do it here.
*/
if (dev->has_flr) {
ret = 0;
- goto cleanup;
+ return 0;
}
/* If the device supports PCI power management reset,
@@ -1144,11 +981,11 @@ virPCIDeviceReset(virPCIDevice *dev,
* the function, not the whole device.
*/
if (dev->has_pm_reset)
- ret = virPCIDeviceTryPowerManagementReset(dev, fd);
+ ret = virPCIDeviceTryPowerManagementReset(dev);
/* Bus reset is not an option with the root bus */
if (ret < 0 && dev->address.bus != 0)
- ret = virPCIDeviceTrySecondaryBusReset(dev, fd, inactiveDevs);
+ ret = virPCIDeviceTrySecondaryBusReset(dev, inactiveDevs);
if (ret < 0) {
virErrorPtr err = virGetLastError();
@@ -1159,8 +996,6 @@ virPCIDeviceReset(virPCIDevice *dev,
_("no FLR, PM reset or bus reset available"));
}
- cleanup:
- virPCIDeviceConfigClose(dev, fd);
return ret;
}
@@ -1756,28 +1591,6 @@ virPCIDeviceReattach(virPCIDevice *dev,
return 0;
}
-static char *
-virPCIDeviceReadID(virPCIDevice *dev, const char *id_name)
-{
- g_autofree char *path = NULL;
- g_autofree char *id_str = NULL;
-
- path = virPCIFile(dev->name, id_name);
-
- /* ID string is '0xNNNN\n' ... i.e. 7 bytes */
- if (virFileReadAll(path, 7, &id_str) < 0)
- return NULL;
-
- /* Check for 0x suffix */
- if (id_str[0] != '0' || id_str[1] != 'x')
- return NULL;
-
- /* Chop off the newline; we know the string is 7 bytes */
- id_str[6] = '\0';
-
- return g_steal_pointer(&id_str);
-}
-
bool
virPCIDeviceAddressIsValid(virPCIDeviceAddress *addr,
bool report)
@@ -1865,9 +1678,9 @@ virPCIDeviceExists(const virPCIDeviceAddress *addr)
virPCIDevice *
virPCIDeviceNew(const virPCIDeviceAddress *address)
{
+ struct pci_device * device;
+
g_autoptr(virPCIDevice) dev = NULL;
- g_autofree char *vendor = NULL;
- g_autofree char *product = NULL;
dev = g_new0(virPCIDevice, 1);
@@ -1875,31 +1688,21 @@ virPCIDeviceNew(const virPCIDeviceAddress *address)
dev->name = virPCIDeviceAddressAsString(&dev->address);
- dev->path = g_strdup_printf(PCI_SYSFS "devices/%s/config", dev->name);
-
- if (!virFileExists(dev->path)) {
- virReportSystemError(errno,
- _("Device %1$s not found: could not access %2$s"),
- dev->name, dev->path);
+ pci_system_init();
+ device = pci_device_find_by_slot(address->domain, address->bus, address->slot, address->function);
+ if (!device)
return NULL;
- }
-
- vendor = virPCIDeviceReadID(dev, "vendor");
- product = virPCIDeviceReadID(dev, "device");
-
- if (!vendor || !product) {
- virReportError(VIR_ERR_INTERNAL_ERROR,
- _("Failed to read product/vendor ID for %1$s"),
- dev->name);
+ dev->device = g_memdup(device, sizeof(*dev->device));
+ if (!dev->device) {
+ virReportSystemError(errno,
+ _("Not found device domain: %1$d, bus: %2$d, slot: %3$d, function: %4$d"),
+ address->domain, address->bus, address->slot, address->function);
return NULL;
}
-
- /* strings contain '0x' prefix */
- if (g_snprintf(dev->id, sizeof(dev->id), "%s %s", &vendor[2],
- &product[2]) >= sizeof(dev->id)) {
+ if (g_snprintf(dev->id, sizeof(dev->id), "%x %x", dev->device->vendor_id, dev->device->device_id) >= sizeof(dev->id)) {
virReportError(VIR_ERR_INTERNAL_ERROR,
- _("dev->id buffer overflow: %1$s %2$s"),
- &vendor[2], &product[2]);
+ _("dev->id buffer overflow: %1$d %2$d"),
+ dev->device->vendor_id, dev->device->device_id);
return NULL;
}
@@ -1918,10 +1721,10 @@ virPCIDeviceCopy(virPCIDevice *dev)
/* shallow copy to take care of most attributes */
*copy = *dev;
- copy->path = NULL;
- copy->used_by_drvname = copy->used_by_domname = NULL;
+ copy->device = NULL;
+ copy->device = g_memdup(dev->device, sizeof(*dev->device));
+ copy->name = copy->used_by_drvname = copy->used_by_domname = copy->stubDriverName = NULL;
copy->name = g_strdup(dev->name);
- copy->path = g_strdup(dev->path);
copy->used_by_drvname = g_strdup(dev->used_by_drvname);
copy->used_by_domname = g_strdup(dev->used_by_domname);
copy->stubDriverName = g_strdup(dev->stubDriverName);
@@ -1936,10 +1739,11 @@ virPCIDeviceFree(virPCIDevice *dev)
return;
VIR_DEBUG("%s %s: freeing", dev->id, dev->name);
g_free(dev->name);
- g_free(dev->path);
g_free(dev->used_by_drvname);
g_free(dev->used_by_domname);
g_free(dev->stubDriverName);
+ if (dev->device)
+ g_free(dev->device);
g_free(dev);
}
@@ -1971,9 +1775,9 @@ virPCIDeviceGetName(virPCIDevice *dev)
* config file.
*/
const char *
-virPCIDeviceGetConfigPath(virPCIDevice *dev)
+virPCIDeviceGetConfigPath(virPCIDevice *dev G_GNUC_UNUSED)
{
- return dev->path;
+ return NULL;
}
void virPCIDeviceSetManaged(virPCIDevice *dev, bool managed)
@@ -2484,47 +2288,37 @@ virPCIDeviceDownstreamLacksACS(virPCIDevice *dev)
uint16_t flags;
uint16_t ctrl;
unsigned int pos;
- int fd;
- int ret = 0;
uint16_t device_class;
- if ((fd = virPCIDeviceConfigOpen(dev)) < 0)
+ if (virPCIDeviceInit(dev) < 0) {
return -1;
-
- if (virPCIDeviceInit(dev, fd) < 0) {
- ret = -1;
- goto cleanup;
}
if (virPCIDeviceReadClass(dev, &device_class) < 0)
- goto cleanup;
+ return 0;
pos = dev->pcie_cap_pos;
if (!pos || device_class != PCI_CLASS_BRIDGE_PCI)
- goto cleanup;
+ return 0;
- flags = virPCIDeviceRead16(dev, fd, pos + PCI_EXP_FLAGS);
+ pci_device_cfg_read_u16(dev->device, &flags, pos + PCI_EXP_FLAGS);
if (((flags & PCI_EXP_FLAGS_TYPE) >> 4) != PCI_EXP_TYPE_DOWNSTREAM)
- goto cleanup;
+ return 0;
- pos = virPCIDeviceFindExtendedCapabilityOffset(dev, fd, PCI_EXT_CAP_ID_ACS);
+ pos = virPCIDeviceFindExtendedCapabilityOffset(dev, PCI_EXT_CAP_ID_ACS);
if (!pos) {
VIR_DEBUG("%s %s: downstream port lacks ACS", dev->id, dev->name);
- ret = 1;
- goto cleanup;
+ return 1;
}
- ctrl = virPCIDeviceRead16(dev, fd, pos + PCI_EXT_ACS_CTRL);
+ pci_device_cfg_read_u16(dev->device, &ctrl, pos + PCI_EXT_ACS_CTRL);
if ((ctrl & PCI_EXT_CAP_ACS_ENABLED) != PCI_EXT_CAP_ACS_ENABLED) {
VIR_DEBUG("%s %s: downstream port has ACS disabled",
dev->id, dev->name);
- ret = 1;
- goto cleanup;
+ return 1;
}
- cleanup:
- virPCIDeviceConfigClose(dev, fd);
- return ret;
+ return 0;
}
static int
@@ -3189,48 +2983,27 @@ virPCIDeviceGetVPD(virPCIDevice *dev G_GNUC_UNUSED)
int
virPCIDeviceIsPCIExpress(virPCIDevice *dev)
{
- int fd;
- int ret = -1;
-
- if ((fd = virPCIDeviceConfigOpen(dev)) < 0)
- return ret;
-
- if (virPCIDeviceInit(dev, fd) < 0)
- goto cleanup;
-
- ret = dev->is_pcie;
+ if (virPCIDeviceInit(dev) < 0)
+ return -1;
- cleanup:
- virPCIDeviceConfigClose(dev, fd);
- return ret;
+ return dev->is_pcie;
}
int
virPCIDeviceHasPCIExpressLink(virPCIDevice *dev)
{
- int fd;
- int ret = -1;
uint16_t cap, type;
-
- if ((fd = virPCIDeviceConfigOpen(dev)) < 0)
- return ret;
-
- if (virPCIDeviceInit(dev, fd) < 0)
- goto cleanup;
+ if (virPCIDeviceInit(dev) < 0)
+ return -1;
if (dev->pcie_cap_pos == 0) {
- ret = 0;
- goto cleanup;
+ return 0;
}
- cap = virPCIDeviceRead16(dev, fd, dev->pcie_cap_pos + PCI_CAP_FLAGS);
+ pci_device_cfg_read_u16(dev->device, &cap, dev->pcie_cap_pos + PCI_CAP_FLAGS);
type = (cap & PCI_EXP_FLAGS_TYPE) >> 4;
- ret = type != PCI_EXP_TYPE_ROOT_INT_EP && type != PCI_EXP_TYPE_ROOT_EC;
-
- cleanup:
- virPCIDeviceConfigClose(dev, fd);
- return ret;
+ return type != PCI_EXP_TYPE_ROOT_INT_EP && type != PCI_EXP_TYPE_ROOT_EC;
}
int
@@ -3242,53 +3015,39 @@ virPCIDeviceGetLinkCapSta(virPCIDevice *dev,
unsigned int *sta_width)
{
uint32_t t;
- int fd;
- int ret = -1;
-
- if ((fd = virPCIDeviceConfigOpen(dev)) < 0)
- return ret;
-
- if (virPCIDeviceInit(dev, fd) < 0)
- goto cleanup;
+ uint16_t t16;
+ if (virPCIDeviceInit(dev) < 0)
+ return -1;
if (!dev->pcie_cap_pos) {
virReportError(VIR_ERR_INTERNAL_ERROR,
_("pci device %1$s is not a PCI-Express device"),
dev->name);
- goto cleanup;
+ return -1;
}
- t = virPCIDeviceRead32(dev, fd, dev->pcie_cap_pos + PCI_EXP_LNKCAP);
+ pci_device_cfg_read_u32(dev->device, &t, dev->pcie_cap_pos + PCI_EXP_LNKCAP);
*cap_port = t >> 24;
*cap_speed = t & PCI_EXP_LNKCAP_SPEED;
*cap_width = (t & PCI_EXP_LNKCAP_WIDTH) >> 4;
- t = virPCIDeviceRead16(dev, fd, dev->pcie_cap_pos + PCI_EXP_LNKSTA);
+ pci_device_cfg_read_u16(dev->device, &t16, dev->pcie_cap_pos + PCI_EXP_LNKSTA);
+ t = t16;
*sta_speed = t & PCI_EXP_LNKSTA_SPEED;
*sta_width = (t & PCI_EXP_LNKSTA_WIDTH) >> 4;
- ret = 0;
-
- cleanup:
- virPCIDeviceConfigClose(dev, fd);
- return ret;
+ return 0;
}
int virPCIGetHeaderType(virPCIDevice *dev, int *hdrType)
{
- int fd;
uint8_t type;
*hdrType = -1;
- if ((fd = virPCIDeviceConfigOpen(dev)) < 0)
- return -1;
-
- type = virPCIDeviceRead8(dev, fd, PCI_HEADER_TYPE);
-
- virPCIDeviceConfigClose(dev, fd);
+ pci_device_cfg_read_u8(dev->device, &type, PCI_HEADER_TYPE);
type &= PCI_HEADER_TYPE_MASK;
if (type >= VIR_PCI_HEADER_LAST) {
diff --git a/tests/virpcimock.c b/tests/virpcimock.c
index 5b923c63ce..36bb57edb0 100644
--- a/tests/virpcimock.c
+++ b/tests/virpcimock.c
@@ -1,5 +1,6 @@
/*
* Copyright (C) 2013 Red Hat, Inc.
+ * Copyright (C) 2024-2025 Future Crew, LLC
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
@@ -22,7 +23,7 @@
#include "virpcivpdpriv.h"
-#if defined(__linux__) || defined(__FreeBSD__) || defined(__APPLE__)
+#if defined(__linux__) || defined(__APPLE__)
# define VIR_MOCK_LOOKUP_MAIN
# include "virmock.h"
# include "virpci.h"
@@ -42,6 +43,10 @@ static int (*real___open_2)(const char *path, int flags);
static int (*real_close)(int fd);
static DIR * (*real_opendir)(const char *name);
static char *(*real_virFileCanonicalizePath)(const char *path);
+static int (*real_scandir)(const char *restrict dirp,
+ struct dirent ***restrict namelist,
+ typeof(int(const struct dirent *)) *filter,
+ typeof(int(const struct dirent **, const struct dirent **)) *compar);
static char *fakerootdir;
@@ -955,6 +960,7 @@ init_syms(void)
VIR_MOCK_REAL_INIT(opendir);
# endif
VIR_MOCK_REAL_INIT(virFileCanonicalizePath);
+ VIR_MOCK_REAL_INIT(scandir);
}
static void
@@ -1172,6 +1178,20 @@ virFileCanonicalizePath(const char *path)
return real_virFileCanonicalizePath(newpath);
}
+int scandir(const char *restrict dirp, struct dirent ***restrict namelist,
+ typeof(int(const struct dirent *)) *filter,
+ typeof(int(const struct dirent **, const struct dirent **)) *compar)
+{
+ g_autofree char *newpath = NULL;
+
+ init_syms();
+
+ if (getrealpath(&newpath, dirp) < 0)
+ return -1;
+
+ return real_scandir(newpath, namelist, filter, compar);
+}
+
# include "virmockstathelpers.c"
#else
--
2.46.1
8 hours, 56 minutes
[PATCH] bhyve: support interface type 'network'
by Roman Bogorodskiy
Add support for interface type 'network'. While bridge remains the only
supported options for networks in bhyve, supporting interface type
'network' allows easier configuration and makes domain XMLs more
compatible with the other drivers.
While here, update the error message for the unsupported interface type
to print the requested network type string instead of an integer to make
it more user-friendly.
Signed-off-by: Roman Bogorodskiy <bogorodskiy(a)gmail.com>
---
src/bhyve/bhyve_command.c | 44 ++++++++++++++++++++++++++++++++++-----
src/bhyve/bhyve_process.c | 30 +++++++++++++++++++-------
2 files changed, 62 insertions(+), 12 deletions(-)
diff --git a/src/bhyve/bhyve_command.c b/src/bhyve/bhyve_command.c
index bc287307c8..123d81699f 100644
--- a/src/bhyve/bhyve_command.c
+++ b/src/bhyve/bhyve_command.c
@@ -2,6 +2,7 @@
* bhyve_command.c: bhyve command generation
*
* Copyright (C) 2014 Roman Bogorodskiy
+ * Copyright (C) 2025 The FreeBSD Foundation
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
@@ -26,6 +27,7 @@
#include "bhyve_domain.h"
#include "bhyve_conf.h"
#include "bhyve_driver.h"
+#include "domain_validate.h"
#include "datatypes.h"
#include "viralloc.h"
#include "virfile.h"
@@ -40,7 +42,7 @@
VIR_LOG_INIT("bhyve.bhyve_command");
static int
-bhyveBuildNetArgStr(const virDomainDef *def,
+bhyveBuildNetArgStr(virDomainDef *def,
virDomainNetDef *net,
struct _bhyveConn *driver,
virCommand *cmd,
@@ -52,6 +54,7 @@ bhyveBuildNetArgStr(const virDomainDef *def,
char *nic_model = NULL;
int ret = -1;
virDomainNetType actualType = virDomainNetGetActualType(net);
+ g_autoptr(virConnect) netconn = NULL;
if (net->model == VIR_DOMAIN_NET_MODEL_VIRTIO) {
nic_model = g_strdup("virtio-net");
@@ -69,12 +72,43 @@ bhyveBuildNetArgStr(const virDomainDef *def,
return -1;
}
- if (actualType == VIR_DOMAIN_NET_TYPE_BRIDGE) {
+ if (net->type == VIR_DOMAIN_NET_TYPE_NETWORK) {
+ if (!netconn && !(netconn = virGetConnectNetwork()))
+ goto cleanup;
+ if (virDomainNetAllocateActualDevice(netconn, def, net) < 0)
+ goto cleanup;
+ }
+ /* final validation now that actual type is known */
+ if (virDomainActualNetDefValidate(net) < 0)
+ return -1;
+
+ switch (actualType) {
+ case VIR_DOMAIN_NET_TYPE_NETWORK:
+ case VIR_DOMAIN_NET_TYPE_BRIDGE:
brname = g_strdup(virDomainNetGetActualBridgeName(net));
- } else {
+ if (!brname) {
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+ _("No bridge name specified"));
+ goto cleanup;
+ }
+ break;
+ case VIR_DOMAIN_NET_TYPE_ETHERNET:
+ case VIR_DOMAIN_NET_TYPE_DIRECT:
+ case VIR_DOMAIN_NET_TYPE_USER:
+ case VIR_DOMAIN_NET_TYPE_VHOSTUSER:
+ case VIR_DOMAIN_NET_TYPE_SERVER:
+ case VIR_DOMAIN_NET_TYPE_CLIENT:
+ case VIR_DOMAIN_NET_TYPE_MCAST:
+ case VIR_DOMAIN_NET_TYPE_UDP:
+ case VIR_DOMAIN_NET_TYPE_INTERNAL:
+ case VIR_DOMAIN_NET_TYPE_HOSTDEV:
+ case VIR_DOMAIN_NET_TYPE_VDPA:
+ case VIR_DOMAIN_NET_TYPE_NULL:
+ case VIR_DOMAIN_NET_TYPE_VDS:
+ case VIR_DOMAIN_NET_TYPE_LAST:
virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
- _("Network type %1$d is not supported"),
- virDomainNetGetActualType(net));
+ _("Unsupported network type %1$s"),
+ virDomainNetTypeToString(actualType));
goto cleanup;
}
diff --git a/src/bhyve/bhyve_process.c b/src/bhyve/bhyve_process.c
index 3e6f678cf5..a17994e2a0 100644
--- a/src/bhyve/bhyve_process.c
+++ b/src/bhyve/bhyve_process.c
@@ -2,6 +2,7 @@
* bhyve_process.c: bhyve process management
*
* Copyright (C) 2014 Roman Bogorodskiy
+ * Copyright (C) 2025 The FreeBSD Foundation
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
@@ -71,18 +72,23 @@ static void
bhyveNetCleanup(virDomainObj *vm)
{
size_t i;
+ g_autoptr(virConnect) conn = NULL;
for (i = 0; i < vm->def->nnets; i++) {
virDomainNetDef *net = vm->def->nets[i];
virDomainNetType actualType = virDomainNetGetActualType(net);
- if (actualType == VIR_DOMAIN_NET_TYPE_BRIDGE) {
- if (net->ifname) {
- ignore_value(virNetDevBridgeRemovePort(
- virDomainNetGetActualBridgeName(net),
- net->ifname));
- ignore_value(virNetDevTapDelete(net->ifname, NULL));
- }
+ if (net->ifname) {
+ ignore_value(virNetDevBridgeRemovePort(
+ virDomainNetGetActualBridgeName(net),
+ net->ifname));
+ ignore_value(virNetDevTapDelete(net->ifname, NULL));
+ }
+ if (actualType == VIR_DOMAIN_NET_TYPE_NETWORK) {
+ if (conn || (conn = virGetConnectNetwork()))
+ virDomainNetReleaseActualDevice(conn, net);
+ else
+ VIR_WARN("Unable to release network device '%s'", NULLSTR(net->ifname));
}
}
}
@@ -437,6 +443,8 @@ virBhyveProcessReconnect(virDomainObj *vm,
char **proc_argv;
char *expected_proctitle = NULL;
bhyveDomainObjPrivate *priv = vm->privateData;
+ g_autoptr(virConnect) conn = NULL;
+ size_t i;
int ret = -1;
if (!virDomainObjIsActive(vm))
@@ -469,6 +477,14 @@ virBhyveProcessReconnect(virDomainObj *vm,
}
}
+ for (i = 0; i < vm->def->nnets; i++) {
+ virDomainNetDef *net = vm->def->nets[i];
+ if (net->type == VIR_DOMAIN_NET_TYPE_NETWORK && !conn)
+ conn = virGetConnectNetwork();
+
+ virDomainNetNotifyActualDevice(conn, vm->def, net);
+ }
+
cleanup:
if (ret < 0) {
/* If VM is reported to be in active state, but we cannot find
--
2.49.0
20 hours, 52 minutes
[libvirt PATCH] tools: virsh: metadata: do not report error on missing metadata
by Ján Tomko
Similarly to `desc` and `net-desc`, return an empty string if
there is no metadata to be returned.
https://issues.redhat.com/browse/RHEL-27172
Signed-off-by: Ján Tomko <jtomko(a)redhat.com>
---
tools/virsh-domain.c | 10 ++++++++--
tools/virsh-network.c | 10 ++++++++--
2 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c
index f3da2f903f..e104aa909a 100644
--- a/tools/virsh-domain.c
+++ b/tools/virsh-domain.c
@@ -8480,8 +8480,14 @@ cmdMetadata(vshControl *ctl, const vshCmd *cmd)
g_autofree char *data = NULL;
/* get */
if (!(data = virDomainGetMetadata(dom, VIR_DOMAIN_METADATA_ELEMENT,
- uri, flags)))
- return false;
+ uri, flags))) {
+ if (virGetLastErrorCode() == VIR_ERR_NO_DOMAIN_METADATA) {
+ virResetLastError();
+ data = g_strdup("");
+ } else {
+ return false;
+ }
+ }
vshPrint(ctl, "%s\n", data);
}
diff --git a/tools/virsh-network.c b/tools/virsh-network.c
index 6fcc7fd8ee..bcdb76ae36 100644
--- a/tools/virsh-network.c
+++ b/tools/virsh-network.c
@@ -604,8 +604,14 @@ cmdNetworkMetadata(vshControl *ctl, const vshCmd *cmd)
/* get */
if (!(data = virNetworkGetMetadata(net, VIR_NETWORK_METADATA_ELEMENT,
- uri, flags)))
- return false;
+ uri, flags))) {
+ if (virGetLastErrorCode() == VIR_ERR_NO_NETWORK_METADATA) {
+ virResetLastError();
+ data = g_strdup("");
+ } else {
+ return false;
+ }
+ }
vshPrint(ctl, "%s\n", data);
}
--
2.48.1
23 hours, 33 minutes
Re: [PATCH v8 55/55] docs: Add TDX documentation
by Daniel P. Berrangé
CC libvirt / Jiri, for confirmation about whether the CPUID restrictions
listed below will have any possible impact on libvirt CPUID handling...
On Tue, Apr 01, 2025 at 09:02:05AM -0400, Xiaoyao Li wrote:
> Add docs/system/i386/tdx.rst for TDX support, and add tdx in
> confidential-guest-support.rst
>
> Signed-off-by: Xiaoyao Li <xiaoyao.li(a)intel.com>
> ---
> Changes in v6:
> - Add more information of "Feature configuration"
> - Mark TD Attestation as future work because KVM now drops the support
> of it.
>
> Changes in v5:
> - Add TD attestation section and update the QEMU parameter;
>
> Changes since v1:
> - Add prerequisite of private gmem;
> - update example command to launch TD;
>
> Changes since RFC v4:
> - add the restriction that kernel-irqchip must be split
> ---
> docs/system/confidential-guest-support.rst | 1 +
> docs/system/i386/tdx.rst | 156 +++++++++++++++++++++
> docs/system/target-i386.rst | 1 +
> 3 files changed, 158 insertions(+)
> create mode 100644 docs/system/i386/tdx.rst
> +Feature Configuration
> +---------------------
> +
> +Unlike non-TDX VM, the CPU features (enumerated by CPU or MSR) of a TD are not
> +under full control of VMM. VMM can only configure part of features of a TD on
> +``KVM_TDX_INIT_VM`` command of VM scope ``MEMORY_ENCRYPT_OP`` ioctl.
> +
> +The configurable features have three types:
> +
> +- Attributes:
> + - PKS (bit 30) controls whether Supervisor Protection Keys is exposed to TD,
> + which determines related CPUID bit and CR4 bit;
> + - PERFMON (bit 63) controls whether PMU is exposed to TD.
> +
> +- XSAVE related features (XFAM):
> + XFAM is a 64b mask, which has the same format as XCR0 or IA32_XSS MSR. It
> + determines the set of extended features available for use by the guest TD.
> +
> +- CPUID features:
> + Only some bits of some CPUID leaves are directly configurable by VMM.
> +
> +What features can be configured is reported via TDX capabilities.
> +
> +TDX capabilities
> +~~~~~~~~~~~~~~~~
> +
> +The VM scope ``MEMORY_ENCRYPT_OP`` ioctl provides command ``KVM_TDX_CAPABILITIES``
> +to get the TDX capabilities from KVM. It returns a data structure of
> +``struct kvm_tdx_capabilities``, which tells the supported configuration of
> +attributes, XFAM and CPUIDs.
> +
> +TD attributes
> +~~~~~~~~~~~~~
> +
> +QEMU supports configuring raw 64-bit TD attributes directly via "attributes"
> +property of "tdx-guest" object. Note, it's users' responsibility to provide a
> +valid value because some bits may not supported by current QEMU or KVM yet.
> +
> +QEMU also supports the configuration of individual attribute bits that are
> +supported by it, via properties of "tdx-guest" object.
> +E.g., "sept-ve-disable" (bit 28).
> +
> +MSR based features
> +~~~~~~~~~~~~~~~~~~
> +
> +Current KVM doesn't support MSR based feature (e.g., MSR_IA32_ARCH_CAPABILITIES)
> +configuration for TDX, and it's a future work to enable it in QEMU when KVM adds
> +support of it.
> +
> +Feature check
> +~~~~~~~~~~~~~
> +
> +QEMU checks if the final (CPU) features, determined by given cpu model and
> +explicit feature adjustment of "+featureA/-featureB", can be supported or not.
> +It can produce feature not supported warning like
> +
> + "warning: host doesn't support requested feature: CPUID.07H:EBX.intel-pt [bit 25]"
> +
> +It can also produce warning like
> +
> + "warning: TDX forcibly sets the feature: CPUID.80000007H:EDX.invtsc [bit 8]"
> +
> +if the fixed-1 feature is requested to be disabled explicitly. This is newly
> +added to QEMU for TDX because TDX has fixed-1 features that are forcibly enabled
> +by TDX module and VMM cannot disable them.
This is where I'm wondering if libvirt has anything to be concerned
about. Possibly when libvirt queries the actual CPUID after launching
the guest it will just "do the right thing" ? Wondering if there's any
need for libvirt to be aware of CPUID restrictions before that point
though ?
> +
> +Launching a TD (TDX VM)
> +-----------------------
> +
> +To launch a TD, the necessary command line options are tdx-guest object and
> +split kernel-irqchip, as below:
> +
> +.. parsed-literal::
> +
> + |qemu_system_x86| \\
> + -object tdx-guest,id=tdx0 \\
> + -machine ...,kernel-irqchip=split,confidential-guest-support=tdx0 \\
> + -bios OVMF.fd \\
I don't think we need to show 'kernel-irqchip=split' now that we "do the
right thing" by default
This surely also ought to include '-accel kvm', as IIUC there's no
TCG support for TDX.
And presumably '-cpu host', since QEMU's default 'qemu64' CPU model
is likely a terrible match for what TDX will force set.
> +
> +Restrictions
> +------------
> +
> + - kernel-irqchip must be split;
Can append
"This is set by default for TDX guests if kernel-irqchip is left on
its default 'auto' setting."
> +
> + - No readonly support for private memory;
> +
> + - No SMM support: SMM support requires manipulating the guest register states
> + which is not allowed;
> +
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
1 day, 2 hours
[PATCH 0/4] Allow xml-configured coredump format on VM crash
by Nikolai Barybin
When libvirt processes VM crash event it always dumps core in raw
format.
This series makes it possible to configure dump format via domain xml.
This would be especcialy helpful for Windows guests, because it requires
a lot effort to convert raw dump into wingdb.
Nikolai Barybin (4):
conf: schemas: add coredump_format element to events section
src: conf: add parsing/formatting for 'coredump_format' value
qemu: use configurable dump format in doCoreDumpToAutoDumpPath()
docs: formatdomain: document 'coredump_format' element
docs/formatdomain.rst | 9 +++++
src/conf/domain_conf.c | 64 +++++++++++++++++++++++++++++++
src/conf/domain_conf.h | 2 +
src/conf/schemas/domaincommon.rng | 19 +++++++++
src/libvirt_private.syms | 2 +
src/qemu/qemu_driver.c | 2 +-
6 files changed, 97 insertions(+), 1 deletion(-)
--
2.43.5
1 day, 4 hours