[PATCH v2 0/4] multiple memory backend support for CPR Live Updates
by mgalaxy@akamai.com
From: Michael Galaxy <mgalaxy(a)akamai.com>
CPR-based support for whole-hypervisor kexec-based live updates is
now finally merged into QEMU. In support of this, we need NUMA to be
supported in these kinds of environments. To do this we use a technology
called PMEM (persistent memory) in Linux, which underpins the ability for
CPR Live Updates to work so that QEMU memory can remain in RAM and
be recovered after a kexec operationg has completed. Our systems are highly
NUMA-aware, and so this patch series enables NUMA awareness for live updates.
Further, we make a small change that allows live migrations to work
between *non* PMEM-based systems and PMEM-based systems (and
vice-versa). This allows for seemless upgrades from non-live-compatible
systems to live-update-compatible sytems without any downtime.
Michael Galaxy (4):
qemu.conf changes to support multiple memory backend
Support live migration between file-backed memory and anonymous
memory.
Update unit test to support multiple memory backends
Update documentation to reflect memory_backing_dir change in qemu.conf
NEWS.rst | 7 ++
docs/kbase/virtiofs.rst | 2 +
src/qemu/qemu.conf.in | 2 +
src/qemu/qemu_command.c | 8 ++-
src/qemu/qemu_conf.c | 141 +++++++++++++++++++++++++++++++++++-----
src/qemu/qemu_conf.h | 14 ++--
src/qemu/qemu_domain.c | 24 +++++--
src/qemu/qemu_driver.c | 29 +++++----
src/qemu/qemu_hotplug.c | 6 +-
src/qemu/qemu_process.c | 44 +++++++------
src/qemu/qemu_process.h | 7 +-
tests/testutilsqemu.c | 5 +-
12 files changed, 221 insertions(+), 68 deletions(-)
--
2.34.1
1 month
[PATCH rfcv4 00/13] LIBVIRT: X86: TDX support
by Zhenzhong Duan
Hi,
This series brings libvirt the x86 TDX support.
* What's TDX?
TDX stands for Trust Domain Extensions which isolates VMs from
the virtual-machine manager (VMM)/hypervisor and any other software on
the platform.
To support TDX, multiple software components, not only KVM but also QEMU,
guest Linux and virtual bios, need to be updated. For more details, please
check link[1].
This patchset is another software component to extend libvirt to support TDX,
with which one can start a TDX guest from high level rather than running qemu
directly.
* Misc
As QEMU use a software emulated way to reset guest which isn't supported by TDX
guest for security reason. We simulate reboot for TDX guest by kill and create a
new one in FakeReboot framework.
Complete code can be found at [2], matching qemu code can be found at [3].
There is a 'debug' property for tdx-guest object which isn't in matching qemu[3]
yet. I keep them intentionally as they will be implemented in qemu as extention
series of [3].
* Test
start/stop/reboot with virsh
stop/reboot trigger in guest
stop with on_poweroff=destroy/restart
reboot with on_reboot=destroy/restart
* Patch organization
- patch 1-4: Support query of TDX capabilities.
- patch 5-8: Add TDX type to launchsecurity framework.
- patch 9-11: Add reboot support to TDX guest
- patch 12-13: Add test and docs
TODO:
- update QEMU capabilities data in tests, depending on qemu TDX merged beforehand
- add reconnect logic in virsh command
[1] https://lore.kernel.org/kvm/cover.1708933498.git.isaku.yamahata@intel.com
[2] https://github.com/intel/libvirt-tdx/commits/tdx_for_upstream_rfcv4
[3] https://github.com/intel/qemu-tdx/tree/tdx-qemu-upstream-v5
Thanks
Zhenzhong
Changelog:
rfcv4:
- add a check to tools/virt-host-validate-qemu.c (Daniel)
- remove check of q35 (Daniel)
- model 'SocktetAddress' QAPI in xml schema (Daniel)
- s/Quote-Generation-Service/quoteGenerationService/ (Daniel)
- define bits in tdx->policy and add validating logic (Daniel)
- presume QEMU choose split kernel irqchip for TDX guest by default (Daniel)
- utilize existing FakeReboot framework to do reboot for TDX guest (Daniel)
- drop patch11 'conf: Add support to keep same domid for hard reboot' (Daniel)
- add test in tests/ to validate parsing and formatting logic (Daniel)
- add doc in docs/formatdomain.rst (Daniel)
- add R-B
rfcv3:
- Change to generate qemu cmdline with -bios
- drop firmware auto match as -bios is used
- add a hard reboot method to reboot TDX guest
rfcv3: https://www.mail-archive.com/devel@lists.libvirt.org/msg00385.html
rfcv2:
- give up using qmp cmd and check TDX directly on host for TDX capabilities.
- use launchsecurity framework to support TDX
- use <os>.<loader> for general loader
- add auto firmware match feature for TDX
A example TDVF fimware description file 70-edk2-x86_64-tdx.json:
{
"description": "UEFI firmware for x86_64, supporting Intel TDX",
"interface-types": [
"uefi"
],
"mapping": {
"device": "generic",
"filename": "/usr/share/OVMF/OVMF_CODE-tdx.fd"
},
"targets": [
{
"architecture": "x86_64",
"machines": [
"pc-q35-*"
]
}
],
"features": [
"intel-tdx",
"verbose-dynamic"
],
"tags": [
]
}
rfcv2: https://www.mail-archive.com/libvir-list@redhat.com/msg219378.html
Zhenzhong Duan (13):
tools: Secure guest check for Intel in virt-host-validate
qemu: Check if INTEL Trust Domain Extention support is enabled
qemu: Add TDX capability
conf: expose TDX feature in domain capabilities
conf: add tdx as launch security type
qemu: Add command line and validation for TDX type
qemu: force special parameters enabled for TDX guest
Add Intel TDX Quote Generation Service(QGS) support
qemu: add FakeReboot support for TDX guest
qemu: Support reboot command in guest
qemu: Avoid duplicate FakeReboot for secure guest
Add test cases for Intel TDX
docs: domain: Add documentation for Intel TDX guest
docs/formatdomain.rst | 68 ++++
docs/formatdomaincaps.rst | 1 +
src/conf/domain_capabilities.c | 1 +
src/conf/domain_capabilities.h | 1 +
src/conf/domain_conf.c | 312 ++++++++++++++++++
src/conf/domain_conf.h | 75 +++++
src/conf/schemas/domaincaps.rng | 9 +
src/conf/schemas/domaincommon.rng | 135 ++++++++
src/conf/virconftypes.h | 2 +
src/qemu/qemu_capabilities.c | 36 +-
src/qemu/qemu_capabilities.h | 1 +
src/qemu/qemu_command.c | 139 ++++++++
src/qemu/qemu_firmware.c | 1 +
src/qemu/qemu_monitor.c | 28 +-
src/qemu/qemu_monitor.h | 2 +-
src/qemu/qemu_monitor_json.c | 6 +-
src/qemu/qemu_namespace.c | 1 +
src/qemu/qemu_process.c | 75 +++++
src/qemu/qemu_validate.c | 44 +++
...unch-security-tdx-qgs-fd.x86_64-latest.xml | 77 +++++
.../launch-security-tdx-qgs-fd.xml | 30 ++
...ch-security-tdx-qgs-inet.x86_64-latest.xml | 77 +++++
.../launch-security-tdx-qgs-inet.xml | 30 ++
...ch-security-tdx-qgs-unix.x86_64-latest.xml | 77 +++++
.../launch-security-tdx-qgs-unix.xml | 30 ++
...h-security-tdx-qgs-vsock.x86_64-latest.xml | 77 +++++
.../launch-security-tdx-qgs-vsock.xml | 30 ++
tests/qemuxmlconftest.c | 24 ++
tools/virt-host-validate-common.c | 22 +-
tools/virt-host-validate-common.h | 1 +
30 files changed, 1407 insertions(+), 5 deletions(-)
create mode 100644 tests/qemuxmlconfdata/launch-security-tdx-qgs-fd.x86_64-latest.xml
create mode 100644 tests/qemuxmlconfdata/launch-security-tdx-qgs-fd.xml
create mode 100644 tests/qemuxmlconfdata/launch-security-tdx-qgs-inet.x86_64-latest.xml
create mode 100644 tests/qemuxmlconfdata/launch-security-tdx-qgs-inet.xml
create mode 100644 tests/qemuxmlconfdata/launch-security-tdx-qgs-unix.x86_64-latest.xml
create mode 100644 tests/qemuxmlconfdata/launch-security-tdx-qgs-unix.xml
create mode 100644 tests/qemuxmlconfdata/launch-security-tdx-qgs-vsock.x86_64-latest.xml
create mode 100644 tests/qemuxmlconfdata/launch-security-tdx-qgs-vsock.xml
--
2.34.1
1 month
Re: [PATCH] Сheck snapshot disk is not NULL when searching it in the VM config
by Peter Krempa
On Mon, May 20, 2024 at 14:48:47 +0000, Efim Shevrin via Devel wrote:
> Hello,
>
> > If vmdisk is NULL, shouldn't this function (qemuSnapshotDeleteValidate()) return an error?
>
> I think this qemuSnapshotDeleteValidate should not return an error.
>
> It seems to me that when vmdisk is NULL, this does not invalidate
> the snapshot itself, but indicates that the config has changed since
> the snapshot was done. And if the VM config has changed, this adds evidence that the snapshot should be deleted,
> because the snapshot does not reflect the real vm config.
>
> Since we do not have an analogue of the --force option for deleting a snapshot, in the case when qemuSnapshotDeleteValidate returns
> an error when vmdisk is NULL, we will never delete a snapshot which has invalid disk.
Snapshot deletion does have something that can be considered force and
that is the '--metadata' option that removes just the snapshot
definition (metadata) and doesn't touch the disk images.
> > Similarly, disk can be NULL too
> Thank you for the comment regarding the disk variable. I`ve reworked patch.
>
> When creating a snapshot of a VM with multiple hard disks,
> the snapshot takes into account the presence of all disks
> in the system. If, over time, one of the disks is deleted,
> the snapshot will continue to store knowledge of the deleted disk.
> This results in the fact that at the moment of deleting the snapshot,
> at the validation stage, a disk from the snapshot will be searched which
> is not in the VM configuration. As a result, vmdisk variable will
> be equal to NULL. Dereferencing a null pointer at the time of calling
> virStorageSourceIsSameLocation(vmdisk->src, disk->src)
> will result in SIGSEGV.
Crashing is obviously not okay ...
> Also, the disk variable can also be equal to NULL and this
> requires to check that disk != NULL before calling the
> virStorageSourceIsSameLocation function to avoid SIGSEGV.
.. but going ahead with the snapshot deletion isn't always okay either.
The disk isn't referenced by the VM so the disk state can't be merged,
while the state would be merged for any other disk.
When reverting back to a previous snapshot, which is still referencing
the older state of the disk which was removed from the VM, the VM would
see that the image state of disks that were present at deletion would
contain the merged state, but only a partial state for the disk which
was later removed.
2 months
[PATCH RFC v3 00/16] Support throttle block filters
by wucf@linux.ibm.com
From: Chun Feng Wu <wucf(a)linux.ibm.com>
Hi,
I am thinking to leverage "throttle block filter" in QEMU to support more flexible I/O limits(e.g. tiered I/O groups), one sample provided by QEMU doc is:
https://github.com/qemu/qemu/blob/master/docs/throttle.txt
"For example, let's say that we have three different drives and we want to set I/O limits for
each one of them and an additional set of limits for the combined I/O of all three drives."
The implementation idea is to
- Define throttle groups(limit) in domain
- Define throttle filter to reference throttle group within disk
- Within domain disk, throttle filters references multiple throttle groups to form filter chain to apply multiple limits in QEMU like above sample
- Add new virsh cmds for throttle group management:
throttlegroupset Add or update a throttling group.
throttlegroupdel Delete a throttling group.
throttlegroupinfo Get a throttling group.
throttlegrouplist list all domain throttlegroups
- Update "attach-disk" to add one more option "--throttle-groups" to apply throttle filters e.g. "virsh attach-disk $VM_ID ${DISK_PATH}/vm1_disk_2.qcow2 vdd --driver qemu --subdriver qcow2 --targetbus virtio --throttle-groups limit2,limit012"
- I chose above semantics as I felt they're appropriate, if there are better ones please kindly suggest.
Note, this implementation requires flag "QEMU_CAPS_OBJECT_JSON".
From QMP perspective, the sample flow works this way:
- Throttle group creation:
virsh qemu-monitor-command 1 '{"execute":"object-add", "arguments":{"qom-type":"throttle-group","id":"limit0","limits":{"iops-total":200,"iops-read":0,"iops-total-max":200,"iops-total-max-length":1}}}'
virsh qemu-monitor-command 1 '{"execute":"object-add", "arguments":{"qom-type":"throttle-group","id":"limit1","limits":{"iops-total":250,"iops-read":0,"iops-total-max":250,"iops-total-max-length":1}}}'
virsh qemu-monitor-command 1 '{"execute":"object-add", "arguments":{"qom-type":"throttle-group","id":"limit2","limits":{"iops-total":300,"iops-read":0,"iops-total-max":300,"iops-total-max-length":1}}}'
virsh qemu-monitor-command 1 '{"execute":"object-add", "arguments":{"qom-type":"throttle-group","id":"limit012","limits":{"iops-total":400,"iops-read":0,"iops-total-max":400,"iops-total-max-length":1}}}'
- Chain up filters during attaching disk to apply two filters(limit0 and limit012):
virsh qemu-monitor-command 1 '{"execute":"blockdev-add", "arguments": {"driver":"file","filename":"/virt/disks/vm1_disk_1.qcow2","node-name":"test-3-storage","auto-read-only":true,"discard":"unmap"}}'
virsh qemu-monitor-command 1 '{"execute":"blockdev-add", "arguments":{"node-name":"test-4-format","read-only":false,"driver":"qcow2","file":"test-3-storage","backing":null}}'
virsh qemu-monitor-command 1 '{"execute":"blockdev-add", "arguments":{"driver":"throttle","node-name":"libvirt-5-filter","throttle-group": "limit0","file":"test-4-format"}}'
virsh qemu-monitor-command 1 '{"execute":"blockdev-add", "arguments": {"driver":"throttle","node-name":"libvirt-6-filter","throttle-group":"limit012","file":"libvirt-5-filter"}}'
virsh qemu-monitor-command 1 '{"execute": "device_add", "arguments": {"driver":"virtio-blk-pci","scsi":false,"bus":"pci.0","addr":"0x5","drive":"libvirt-6-filter","id":"virtio-disk1"}}'
This patchset includes:
- Throttle group XML schema definition in patch 1
- Throttle filter XML schema definition in patch 2
- Throttle group struct definition, parsing and formating in patch 3
- Throttle filter struct definition, parsing and formating in patch 4
- New QMP processing to update and get throttle group in patch 5&6
- New API definition and implementation in patch 7
- QEMU driver implementation in patch 8
- Hotplug processing for throttle filters in patch 9
- Extract common iotune validation in patch 10
- qemuProcessLaunch flow implemenation for throttle group in patch 11
- qemuProcessLaunch flow implemenation for throttle filter in patch 12
- Domain XML test for processing throttle groups and filters in patch 13
- Test new implemented driver in patch 14
- New virsh cmd implementation for group in patch 15
- Update Virsh cmd "attach_disk" to include throttle filters in patch 16
v3 changes:
- re-org commits by splitting changes containing throttle group and filters
- update commits msgs
- move schema commits to be the first ones
- refactor "diskIoTune" to extract common schema "iotune"
- add new tests for throttle groups and filters in qemuxmlconftest
- check flag "QEMU_CAPS_OBJECT_JSON" when preparing "-object"(qemu: command: Support throttle groups during qemuProcessLaunch ) or creating throttle group (qemu: Implement qemu driver for throttle API)
- when creating throttle group through "object-add" (qemu: Implement qemu driver for throttle API), reuse "qemuMonitorAddObject" to check if "objectAddNoWrap"("props") is requried
- remove "virObject parent;" in "_virDomainThrottleFilterDef" in domain_conf.h
- remove "virDomainThrottleGroupIndexByName" in both domain_conf.h and domain_conf.c
- remove "virDomainThrottleFilterDefNew" in domain_conf.c
- update "virDomainThrottleFilterDefFree" to use "g_free" rather than "VIR_FREE" in domain_conf.c
- update "virDomainDiskThrottleFilterDefParse" to remove "xmlXPathContextPtr ctx" parameter and check NULL against "filter->group_name" in domain_conf.c
- use "virBufferEscapeString" instead of "virBufferAsprintf" in "virDomainDiskDefFormatThrottleFilterChain"
- use "group->val > 0" instead of "if (group->val)" in FORMAT_THROTTLE_GROUP
- remove NULL check for "group->group_name" since virBufferEscapeString checked NULL already in "virDomainThrottleGroupFormat"
- I haven't added new conf module (src/conf/virdomainthrottle.c/h) because "virDomainThrottleGroupDef" is alias of "_virDomainBlockIoTuneInfo", try to avoid circular dependency
- remove "NULLSTR" in qemuMonitorUpdateThrottleGroup and qemuMonitorGetThrottleGroup in qemu_monitor.c
- refactor "qemuMonitorMakeThrottleGroupLimits" to use virJSONValueObjectAdd in qemu_monitor_json.c
- use "g_strdup_printf" to avoid static buffers in "qemuMonitorJSONGetThrottleGroup"
- remove virReportError after qemuMonitorJSONGetReply in "qemuMonitorJSONGetThrottleGroup" to avoid overriding error
- remove "VIR_DOMAIN_THROTTLE_GROUP" in libvirt/include/libvirt/libvirt-domain.h
- update "virDomainGetThrottleGroup" to not first query the number of parameters,
- update "remote_domain_get_throttle_group_args" and "remote_domain_get_throttle_group_ret" to remove "nparams" in src/remote_protocol-structs, also updated "remote_domain_get_throttle_group_args" and "remote_domain_get_throttle_group_ret" in src/remote/remote_protocol.x
- update parameter "virTypedParameterPtr params" to be "virTypedParameterPtr *params" in "virDrvDomainGetThrottleGroup" in driver-hypervisor.h
- update "qemuDomainSetThrottleGroup" to not query number of parameters first
- remove wrapper "qemuDomainThrottleGroupByName" and "qemuDomainSetThrottleGroupDefaults"
- refactor "qemuDomainSetThrottleGroup" and "qemuDomainSetBlockIoTune" to use common logic
- update "qemuDomainDelThrottleGroup" to use VIR_JOB_MODIFY by referencing "qemuDomainHotplugDelIOThread"
- check if group is still being used by some filter(qemuDomainCheckThrottleGroupRef) during deletion
- replace "ThrottleFilterChain" with "ThrottleFilters"
- update "qemuDomainDiskGetTopNodename" to take top throttle node name if disk has throttles, and reuse "qemuDomainDiskGetBackendAlias" in "qemuBuildDiskDeviceProps" to get top node name as "drive"
- after enabling throttlerfilter and if disk has throttlefilters, during blockcommit, the top node name is not "libvirt-x-format" anymore, instead, top node name referencies top filter like "libvirt-x-filter"
- add check "cdrom device with throttle filters isn't supported"
- delete "filternodenameindex" and reuse "nodenameindex" to generate index for throttle nodes
- refactor detaching filters by adding "qemuBuildThrottleFiltersDetachPrepareBlockdev" to just build parameters for "blockdev-del"
- refactor "testDomainSetBlockIoTune" and "testDomainSetThrottleGroup" to use common logic
Any comments/suggestions will be appriciated!
Chun Feng Wu (16):
schema: Add new domain elements to support multiple throttle groups
schema: Add new domain elements to support multiple throttle filters
config: Introduce ThrottleGroup and corresponding XML parsing
config: Introduce ThrottleFilter and corresponding XML parsing
qemu: monitor: Add support for ThrottleGroup operations
tests: Test qemuMonitorJSONGetThrottleGroup and
qemuMonitorJSONUpdateThrottleGroup
remote: New APIs for ThrottleGroup lifecycle management
qemu: Implement qemu driver for throttle API
qemu: hotplug: Support hot attach and detach block disk along with
throttle filters
config: validate: Refactor disk iotune validation for reuse
qemu: command: Support throttle groups during qemuProcessLaunch
qemu: command: Support throttle filters during qemuProcessLaunch
qemuxmlconftest: Add 'throttlefilter' tests
test_driver: Test throttle group lifecycle APIs
virsh: Add support for throttle group operations
virsh: Add option "throttle-groups" to "attach_disk"
docs/formatdomain.rst | 48 ++
include/libvirt/libvirt-domain.h | 21 +
src/conf/domain_conf.c | 376 +++++++++++
src/conf/domain_conf.h | 51 ++
src/conf/domain_validate.c | 118 +++-
src/conf/schemas/domaincommon.rng | 293 +++++----
src/conf/virconftypes.h | 4 +
src/driver-hypervisor.h | 22 +
src/libvirt-domain.c | 196 ++++++
src/libvirt_private.syms | 9 +
src/libvirt_public.syms | 7 +
src/qemu/qemu_block.c | 131 ++++
src/qemu/qemu_block.h | 53 ++
src/qemu/qemu_command.c | 182 ++++++
src/qemu/qemu_command.h | 12 +
src/qemu/qemu_domain.c | 39 +-
src/qemu/qemu_domain.h | 8 +
src/qemu/qemu_driver.c | 617 +++++++++++++++---
src/qemu/qemu_hotplug.c | 33 +
src/qemu/qemu_monitor.c | 34 +
src/qemu/qemu_monitor.h | 14 +
src/qemu/qemu_monitor_json.c | 150 +++++
src/qemu/qemu_monitor_json.h | 14 +
src/remote/remote_daemon_dispatch.c | 44 ++
src/remote/remote_driver.c | 40 ++
src/remote/remote_protocol.x | 48 +-
src/remote_protocol-structs | 28 +
src/test/test_driver.c | 452 +++++++++----
tests/qemumonitorjsontest.c | 86 +++
.../throttlefilter.x86_64-latest.args | 43 ++
.../throttlefilter.x86_64-latest.xml | 65 ++
tests/qemuxmlconfdata/throttlefilter.xml | 55 ++
tests/qemuxmlconftest.c | 1 +
tools/virsh-completer-domain.c | 64 ++
tools/virsh-completer-domain.h | 5 +
tools/virsh-domain.c | 453 ++++++++++++-
36 files changed, 3447 insertions(+), 369 deletions(-)
create mode 100644 tests/qemuxmlconfdata/throttlefilter.x86_64-latest.args
create mode 100644 tests/qemuxmlconfdata/throttlefilter.x86_64-latest.xml
create mode 100644 tests/qemuxmlconfdata/throttlefilter.xml
--
2.34.1
3 months
[PATCH RFC 0/9] qemu: Support mapped-ram migration capability
by Jim Fehlig
This series is a RFC for support of QEMU's mapped-ram migration
capability [1] for saving and restoring VMs. It implements the first
part of the design approach we discussed for supporting parallel
save/restore [2]. In summary, the approach is
1. Add mapped-ram migration capability
2. Steal an element from save header 'unused' for a 'features' variable
and bump save version to 3.
3. Add /etc/libvirt/qemu.conf knob for the save format version,
defaulting to latest v3
4. Use v3 (aka mapped-ram) by default
5. Use mapped-ram with BYPASS_CACHE for v3, old approach for v2
6. include: Define constants for parallel save/restore
7. qemu: Add support for parallel save. Implies mapped-ram, reject if v2
8. qemu: Add support for parallel restore. Implies mapped-ram.
Reject if v2
9. tools: add parallel parameter to virsh save command
10. tools: add parallel parameter to virsh restore command
This series implements 1-5, with the BYPASS_CACHE support in patches 8
and 9 being quite hacky. They are included to discuss approaches to make
them less hacky. See the patches for details.
The QEMU mapped-ram capability currently does not support directio.
Fabino is working on that now [3]. This complicates merging support
in libvirt. I don't think it's reasonable to enable mapped-ram by
default when BYPASS_CACHE cannot be supported. Should we wait until
the mapped-ram directio support is merged in QEMU before supporting
mapped-ram in libvirt?
For the moment, compression is ignored in the new save version.
Currently, libvirt connects the output of QEMU's save stream to the
specified compression program via a pipe. This approach is incompatible
with mapped-ram since the fd provided to QEMU must be seekable. One
option is to reopen and compress the saved image after the actual save
operation has completed. This has the downside of requiring the iohelper
to handle BYPASS_CACHE, which would preclude us from removing it
sometime in the future. Other suggestions much welcomed.
Note the logical file size of mapped-ram saved images is slightly
larger than guest RAM size, so the files are often much larger than the
files produced by the existing, sequential format. However, actual blocks
written to disk is often lower with mapped-ram saved images. E.g. a saved
image from a 30G, freshly booted, idle guest results in the following
'Size' and 'Blocks' values reported by stat(1)
Size Blocks
sequential 998595770 1950392
mapped-ram 34368584225 1800456
With the same guest running a workload that dirties memory
Size Blocks
sequential 33173330615 64791672
mapped-ram 34368578210 64706944
Thanks for any comments on this RFC!
[1] https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/m...
[2] https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/message/K...
[3] https://mail.gnu.org/archive/html/qemu-devel/2024-05/msg04432.html
Jim Fehlig (9):
qemu: Enable mapped-ram migration capability
qemu_fd: Add function to retrieve fdset ID
qemu: Add function to get migration params for save
qemu: Add a 'features' element to save image header and bump version
qemu: conf: Add setting for save image version
qemu: Add support for mapped-ram on save
qemu: Enable mapped-ram on restore
qemu: Support O_DIRECT with mapped-ram on save
qemu: Support O_DIRECT with mapped-ram on restore
src/qemu/libvirtd_qemu.aug | 1 +
src/qemu/qemu.conf.in | 6 +
src/qemu/qemu_conf.c | 8 ++
src/qemu/qemu_conf.h | 1 +
src/qemu/qemu_driver.c | 25 ++--
src/qemu/qemu_fd.c | 18 +++
src/qemu/qemu_fd.h | 3 +
src/qemu/qemu_migration.c | 99 ++++++++++++++-
src/qemu/qemu_migration.h | 11 +-
src/qemu/qemu_migration_params.c | 20 +++
src/qemu/qemu_migration_params.h | 4 +
src/qemu/qemu_monitor.c | 40 ++++++
src/qemu/qemu_monitor.h | 5 +
src/qemu/qemu_process.c | 63 +++++++---
src/qemu/qemu_process.h | 16 ++-
src/qemu/qemu_saveimage.c | 187 +++++++++++++++++++++++------
src/qemu/qemu_saveimage.h | 20 ++-
src/qemu/qemu_snapshot.c | 12 +-
src/qemu/test_libvirtd_qemu.aug.in | 1 +
19 files changed, 455 insertions(+), 85 deletions(-)
--
2.44.0
3 months, 1 week
[PATCH 00/12] Introduce SEV-SNP support
by Michal Privoznik
SEV-SNP support just landed in QEMU. Here is the first round of patches
to incorporate support into libvirt.
TODOs (aka problems of future me):
- Teach tools/virt-qemu-sev-validate how to deal with SEV-SNP
- Try to find a SEV-SNP machine a test these patches in real worl
- Write a kbase article on attestation with SEV-SNP
Michal Prívozník (12):
qemu_monitor_json: Report error in error paths in SEV related code
conf: Move some members of virDomainSEVDef into virDomainSEVCommonDef
conf: Separate SEV formatting into a function
Drop needless typecast to virDomainLaunchSecurity
src: Convert some _virDomainSecDef::sectype checks to switch()
qemu_monitor: Allow querying SEV-SNP state in 'query-sev'
qemu: Report snp-policy in virDomainGetLaunchSecurityInfo()
qemu_capabilities: Introduce QEMU_CAPS_SEV_SNP_GUEST
conf: Introduce SEV-SNP support
qemu: Build cmd line for SEV-SNP
qemu: Allow setting launch security for SEV-SNP
qemu_firmware: Pick the right firmware for SEV-SNP guests
docs/formatdomain.rst | 108 ++++++++++++
include/libvirt/libvirt-domain.h | 10 ++
src/conf/domain_conf.c | 156 ++++++++++++++----
src/conf/domain_conf.h | 28 +++-
src/conf/domain_validate.c | 44 +++++
src/conf/schemas/domaincommon.rng | 73 ++++++--
src/conf/virconftypes.h | 4 +
src/qemu/qemu_capabilities.c | 4 +
src/qemu/qemu_capabilities.h | 3 +
src/qemu/qemu_cgroup.c | 19 ++-
src/qemu/qemu_command.c | 56 ++++++-
src/qemu/qemu_driver.c | 60 +++++--
src/qemu/qemu_firmware.c | 20 ++-
src/qemu/qemu_monitor.c | 7 +-
src/qemu/qemu_monitor.h | 41 ++++-
src/qemu/qemu_monitor_json.c | 67 ++++++--
src/qemu/qemu_monitor_json.h | 8 +-
src/qemu/qemu_namespace.c | 3 +-
src/qemu/qemu_process.c | 34 ++--
src/qemu/qemu_validate.c | 13 +-
src/security/security_dac.c | 34 +++-
.../caps_9.1.0_x86_64.xml | 1 +
.../firmware/60-edk2-ovmf-x64-amdsev.json | 1 +
tests/qemumonitorjsontest.c | 65 +++++++-
...launch-security-sev-snp.x86_64-latest.args | 35 ++++
.../launch-security-sev-snp.x86_64-latest.xml | 1 +
.../launch-security-sev-snp.xml | 47 ++++++
tests/qemuxmlconftest.c | 2 +
28 files changed, 817 insertions(+), 127 deletions(-)
create mode 100644 tests/qemuxmlconfdata/launch-security-sev-snp.x86_64-latest.args
create mode 120000 tests/qemuxmlconfdata/launch-security-sev-snp.x86_64-latest.xml
create mode 100644 tests/qemuxmlconfdata/launch-security-sev-snp.xml
--
2.44.2
3 months, 2 weeks
[PATCH v2 0/7] introduce job-change qmp command
by Vladimir Sementsov-Ogievskiy
Hi all!
This is an updated first part of my "[RFC 00/15] block job API"
Supersedes: <20240313150907.623462-1-vsementsov(a)yandex-team.ru>
v2:
- only job-change for now, as a first step
- drop "type-based unions", and keep type parameter as is for now (I now
doubt that this was good idea, as it makes QAPI protocol dependent on
context)
03: improve documentation
06: deprecated only block-job-change for now
07: new
Vladimir Sementsov-Ogievskiy (7):
qapi: rename BlockJobChangeOptions to JobChangeOptions
blockjob: block_job_change_locked(): check job type
qapi: block-job-change: make copy-mode parameter optional
blockjob: move change action implementation to job from block-job
qapi: add job-change
qapi/block-core: derpecate block-job-change
iotests/mirror-change-copy-mode: switch to job-change command
block/mirror.c | 13 +++++---
blockdev.c | 4 +--
blockjob.c | 20 ------------
docs/about/deprecated.rst | 5 +++
include/block/blockjob.h | 11 -------
include/block/blockjob_int.h | 7 -----
include/qemu/job.h | 12 +++++++
job-qmp.c | 15 +++++++++
job.c | 23 ++++++++++++++
qapi/block-core.json | 31 ++++++++++++++-----
.../tests/mirror-change-copy-mode | 2 +-
11 files changed, 90 insertions(+), 53 deletions(-)
--
2.34.1
3 months, 3 weeks
[PATCH v2] virsh: Provide completer for some pool-X-as commands
by Abhiram Tilak
Provides completers for auth-type and source-format commands for
virsh pool-create-as and pool-define-as commands. Use Empty completers
for options where completions are not required.
Related Issue: https://gitlab.com/libvirt/libvirt/-/issues/9
Signed-off-by: Abhiram Tilak <atp.exp(a)gmail.com>
---
Changes in v2:
- Fix all formatting errors
- Change some options using Empty completers, to use
LocalPath completers.
- Add completers for AdapterName and AdapterParent using information
from node devices.
src/libvirt_private.syms | 2 +
tools/virsh-completer-pool.c | 128 +++++++++++++++++++++++++++++++++++
tools/virsh-completer-pool.h | 20 ++++++
tools/virsh-pool.c | 9 +++
4 files changed, 159 insertions(+)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
index f0f7aa8654..fcb0ef7afe 100644
--- a/src/libvirt_private.syms
+++ b/src/libvirt_private.syms
@@ -1117,6 +1117,8 @@ virStorageAuthDefCopy;
virStorageAuthDefFormat;
virStorageAuthDefFree;
virStorageAuthDefParse;
+virStorageAuthTypeFromString;
+virStorageAuthTypeToString;
virStorageFileFeatureTypeFromString;
virStorageFileFeatureTypeToString;
virStorageFileFormatTypeFromString;
diff --git a/tools/virsh-completer-pool.c b/tools/virsh-completer-pool.c
index 3568bb985b..7db2a20347 100644
--- a/tools/virsh-completer-pool.c
+++ b/tools/virsh-completer-pool.c
@@ -23,6 +23,7 @@
#include "virsh-completer-pool.h"
#include "virsh-util.h"
#include "conf/storage_conf.h"
+#include "conf/node_device_conf.h"
#include "virsh-pool.h"
#include "virsh.h"
@@ -106,3 +107,130 @@ virshPoolTypeCompleter(vshControl *ctl,
return virshCommaStringListComplete(type_str, (const char **)tmp);
}
+
+
+char **
+virshPoolFormatCompleter(vshControl *ctl G_GNUC_UNUSED,
+ const vshCmd *cmd G_GNUC_UNUSED,
+ unsigned int flags)
+{
+ size_t i = 0;
+ size_t j = 0;
+ g_auto(GStrv) tmp = NULL;
+ size_t nformats = VIR_STORAGE_POOL_FS_LAST + VIR_STORAGE_POOL_NETFS_LAST +
+ VIR_STORAGE_POOL_DISK_LAST + VIR_STORAGE_POOL_LOGICAL_LAST;
+
+ virCheckFlags(0, NULL);
+
+ tmp = g_new0(char *, nformats + 1);
+
+ /* Club all PoolFormats for completion */
+ for (i = 0; i < VIR_STORAGE_POOL_FS_LAST; i++)
+ tmp[j++] = g_strdup(virStoragePoolFormatFileSystemTypeToString(i));
+
+ for (i = 0; i < VIR_STORAGE_POOL_NETFS_LAST; i++)
+ tmp[j++] = g_strdup(virStoragePoolFormatFileSystemNetTypeToString(i));
+
+ for (i = 1; i < VIR_STORAGE_POOL_DISK_LAST; i++)
+ tmp[j++] = g_strdup(virStoragePoolFormatDiskTypeToString(i));
+
+ for (i = 1; i < VIR_STORAGE_POOL_LOGICAL_LAST; i++)
+ tmp[j++] = g_strdup(virStoragePoolFormatLogicalTypeToString(i));
+
+ return g_steal_pointer(&tmp);
+}
+
+
+char **
+virshPoolAuthTypeCompleter(vshControl *ctl G_GNUC_UNUSED,
+ const vshCmd *cmd G_GNUC_UNUSED,
+ unsigned int flags)
+{
+ size_t i = 0;
+ g_auto(GStrv) tmp = NULL;
+
+ virCheckFlags(0, NULL);
+
+ tmp = g_new0(char *, VIR_STORAGE_AUTH_TYPE_LAST + 1);
+
+ for (i = 0; i < VIR_STORAGE_AUTH_TYPE_LAST; i++)
+ tmp[i] = g_strdup(virStorageAuthTypeToString(i));
+
+ return g_steal_pointer(&tmp);
+}
+
+
+char **
+virshAdapterNameCompleter(vshControl *ctl,
+ const vshCmd *cmd G_GNUC_UNUSED,
+ unsigned int flags)
+{
+ virshControl *priv = ctl->privData;
+ virNodeDevicePtr *devs = NULL;
+ int ndevs = 0;
+ size_t i = 0;
+ char **ret = NULL;
+ g_auto(GStrv) tmp = NULL;
+
+ virCheckFlags(0, NULL);
+
+ if (!priv->conn || virConnectIsAlive(priv->conn) <= 0)
+ return NULL;
+
+ flags |= VIR_CONNECT_LIST_NODE_DEVICES_CAP_SCSI_HOST;
+ if ((ndevs = virConnectListAllNodeDevices(priv->conn, &devs, flags)) < 0)
+ return NULL;
+
+ tmp = g_new0(char *, ndevs + 1);
+
+ for (i = 0; i < ndevs; i++) {
+ const char *name = virNodeDeviceGetName(devs[i]);
+
+ tmp[i] = g_strdup(name);
+ }
+
+ ret = g_steal_pointer(&tmp);
+
+ for (i = 0; i < ndevs; i++)
+ virshNodeDeviceFree(devs[i]);
+ g_free(devs);
+ return ret;
+}
+
+
+char **
+virshAdapterParentCompleter(vshControl *ctl,
+ const vshCmd *cmd G_GNUC_UNUSED,
+ unsigned int flags)
+{
+ virshControl *priv = ctl->privData;
+ virNodeDevicePtr *devs = NULL;
+ int ndevs = 0;
+ size_t i = 0;
+ char **ret = NULL;
+ g_auto(GStrv) tmp = NULL;
+
+ virCheckFlags(0, NULL);
+
+ if (!priv->conn || virConnectIsAlive(priv->conn) <= 0)
+ return NULL;
+
+ flags |= VIR_CONNECT_LIST_NODE_DEVICES_CAP_VPORTS;
+ if ((ndevs = virConnectListAllNodeDevices(priv->conn, &devs, flags)) < 0)
+ return NULL;
+
+ tmp = g_new0(char *, ndevs + 1);
+
+ for (i = 0; i < ndevs; i++) {
+ const char *name = virNodeDeviceGetName(devs[i]);
+
+ tmp[i] = g_strdup(name);
+ }
+
+ ret = g_steal_pointer(&tmp);
+
+ for (i = 0; i < ndevs; i++)
+ virshNodeDeviceFree(devs[i]);
+ g_free(devs);
+ return ret;
+}
diff --git a/tools/virsh-completer-pool.h b/tools/virsh-completer-pool.h
index bff3e5742b..eccc08a73f 100644
--- a/tools/virsh-completer-pool.h
+++ b/tools/virsh-completer-pool.h
@@ -40,3 +40,23 @@ char **
virshPoolTypeCompleter(vshControl *ctl,
const vshCmd *cmd,
unsigned int flags);
+
+char **
+virshPoolFormatCompleter(vshControl *ctl,
+ const vshCmd *cmd,
+ unsigned int flags);
+
+char **
+virshPoolAuthTypeCompleter(vshControl *ctl,
+ const vshCmd *cmd,
+ unsigned int flags);
+
+char **
+virshAdapterNameCompleter(vshControl *ctl,
+ const vshCmd *cmd,
+ unsigned int flags);
+
+char **
+virshAdapterParentCompleter(vshControl *ctl,
+ const vshCmd *cmd,
+ unsigned int flags);
diff --git a/tools/virsh-pool.c b/tools/virsh-pool.c
index f9aad8ded0..0cbd1417e6 100644
--- a/tools/virsh-pool.c
+++ b/tools/virsh-pool.c
@@ -80,31 +80,37 @@
{.name = "source-path", \
.type = VSH_OT_STRING, \
.unwanted_positional = true, \
+ .completer = virshCompletePathLocalExisting, \
.help = N_("source path for underlying storage") \
}, \
{.name = "source-dev", \
.type = VSH_OT_STRING, \
.unwanted_positional = true, \
+ .completer = virshCompletePathLocalExisting, \
.help = N_("source device for underlying storage") \
}, \
{.name = "source-name", \
.type = VSH_OT_STRING, \
.unwanted_positional = true, \
+ .completer = virshCompleteEmpty, \
.help = N_("source name for underlying storage") \
}, \
{.name = "target", \
.type = VSH_OT_STRING, \
.unwanted_positional = true, \
+ .completer = virshCompletePathLocalExisting, \
.help = N_("target for underlying storage") \
}, \
{.name = "source-format", \
.type = VSH_OT_STRING, \
.unwanted_positional = true, \
+ .completer = virshPoolFormatCompleter, \
.help = N_("format for underlying storage") \
}, \
{.name = "auth-type", \
.type = VSH_OT_STRING, \
.unwanted_positional = true, \
+ .completer = virshPoolAuthTypeCompleter, \
.help = N_("auth type to be used for underlying storage") \
}, \
{.name = "auth-username", \
@@ -126,6 +132,7 @@
{.name = "adapter-name", \
.type = VSH_OT_STRING, \
.unwanted_positional = true, \
+ .completer = virshAdapterNameCompleter, \
.help = N_("adapter name to be used for underlying storage") \
}, \
{.name = "adapter-wwnn", \
@@ -141,6 +148,7 @@
{.name = "adapter-parent", \
.type = VSH_OT_STRING, \
.unwanted_positional = true, \
+ .completer = virshAdapterParentCompleter, \
.help = N_("adapter parent scsi_hostN to be used for underlying vHBA storage") \
}, \
{.name = "adapter-parent-wwnn", \
@@ -161,6 +169,7 @@
{.name = "source-protocol-ver", \
.type = VSH_OT_STRING, \
.unwanted_positional = true, \
+ .completer = virshCompleteEmpty, \
.help = N_("nfsvers value for NFS pool mount option") \
}, \
{.name = "source-initiator", \
--
2.39.2
4 months
cpugroups for hyperv hypervisor
by Praveen K Paladugu
Hey folks,
My team is working on exposing `cpugroups` to Libvirt while using
'hyperv' hypervisor with cloud-hypervisor(VMM). cpugroups are relevant
in a specific configuration of hyperv called 'minroot'. In Minroot
configuration, hypervisor artificially restricts Dom0 to run on a subset
of cpus (Logical Processors). The rest of the cpus can be assigned to
guests.
cpugroups manage the CPUs assigned to guests and their scheduling
properties. Initially this looks similar to `cpuset` (in cgroups), but
the controls available with cpugroups don't map easily to those in
cgroups. For example:
* "IdleLPs" are the number of Logical Processors in a cpugroup, that
should be reserved to a guest even if they are idle
* "SchedulingPriority", the priority(values between 0..7) with which to
schedule CPUs in a cpugroup.
As controls like above don't easily map to anything in cgroups, using a
driver specific element in Domain xml, to configure cpugroups seems like
a right approach. For example:
<ch:cpugroups>
<idle_lps value='4'/>
<scheduling_priority value='6'/>
</ch:cpugroups>
As cpugroups is only relevant while using minroot configuration on
hyperv, I don't see any value in generalizing this setting. So, having
some "ch" driver specific settings seems like a good approach to
implement this feature.
Question1: Do you see any concerns with this approach?
The cpugroup settings can be applied/modified using sysfs interface or
using a cmdline tool on the host. I see Libvirt uses both these
mechanisms for various use cases. But, given a choice, sysfs based
interface seems like a simpler approach to me. With sysfs interface
Libvirt does not have to take install time dependencies on new tools.
Question2: Of "sysfs" vs "cmdline tool" which is preferred, given a choice?
Early feedback from the community will help us invest in the preferred
choices sooner than later. Thanks for your consideration.
References:
*
https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/m...
--
Regards,
Praveen
4 months
[PATCH v2 0/1] Expose availability of SEV-ES
by Takashi Kajinami
This introduces the new "model" field in sev elements so that clients can
check whether SEV-ES, the 2nd generation of AMD SEV, is available in
the taget hyprvisor. There is the maxESGuests field (along with the maxGuests
field) but this field does not explain whether SEV-ES is actually
enabled in KVM.
Takashi Kajinami (1):
Expose available AMD SEV models in domain capabilities
Changes since v1:
* Fixed one code path where available models are not added
* Fixed missing update of "report" flag
* Updated the documentation to explain the new model field in addition
to the existing but undocumanted cpu0Id field
Takashi Kajinami (1):
Expose available AMD SEV models in domain capabilities
docs/formatdomaincaps.rst | 5 ++
src/conf/domain_capabilities.c | 2 +
src/conf/domain_capabilities.h | 1 +
src/conf/domain_conf.c | 7 +++
src/conf/domain_conf.h | 8 ++++
src/qemu/qemu_capabilities.c | 84 +++++++++++++++++++++++++---------
6 files changed, 85 insertions(+), 22 deletions(-)
--
2.43.0
4 months