[libvirt PATCH 00/28] native support for nftables in virtual network driver
by Laine Stump
This patch series enables libvirt to use nftables rules rather than
iptables *when setting up virtual networks* (it does *not* add
nftables support to the nwfilter driver). It accomplishes this by
abstracting several iptables functions (from viriptables.[ch] called
by the virtual network driver into a rudimentary "virNetfilter API"
(in virnetfilter.[ch], having the virtual network driver call the
virNetFilter API rather than calling the existing iptables functions
directly, and then finally adding an equivalent virNftables backend
that can be used instead of iptables (selected manually via a
network.conf setting, or automatically if iptables isn't found on the
host).
A first look at the result may have you thinking that it's filled with
a lot of bad decisions. While I would agree with that in many cases, I
think that overall they are the "least bad" decisions, or at least
"bad within acceptable limits / no worse than something else", and
point out that it's been done in a way that minimizes (actually
eliminates) the need for immediate changes to nwfilter (the other
consumer of iptables, which *also* needs to be updated to use native
nftables), and makes it much easier to change our mind about the
details in the future.
When I first started on this (long, protracted, repeatedly interrupted
for extended periods - many of these patches are > a year old) task, I
considered doing an all-at-once complete replacement of iptables with
nftables, since all the Linux distros we support have had nftables for
several years, and I'm pretty sure nobody has it disabled (not even
sure if it's possible to disable nftables while still enabling
iptables, since they both use xtables in the kernel). But due to
libvirt's use of "-t mangle -j CHECKSUM --checksum-fill" (see commit
fd5b15ff all the way back in July 2010 for details) which has no
equivalent in nftables rules (and we don't *want* it to!!), and the
desire to be able to easily switch back to iptables in case of an
unforeseen regression, we decided that both iptables and nftables need
to be supported (for now), with the default (for now) remaining as
iptables.
Just allowing for dual backends complicated matters, since it means
that we have to have a config file, a setting, detection of which
backends are available, and of course some sort of concept of an
abstracted frontend that can use either backend based on the config
setting (and/or auto-detection). Combining that with the fact that it
would just be "too big" of a project to switch over nwfilter's
iptables usage at the same time means that we have to keep around a
lot of existing code for compatibility's sake rather than just wiping
it all away and starting over.
So, what I've ended up with is:
1) a network.conf file (didn't exist before) with a single setting
"firewall_backend". If unset, the network driver tries to use iptables
on the backend, and if that's missing, then tries to use nftables.
2) a new (internal-only, so transient!) virNetFilterXXX API that is
used by the network driver in place of the iptablesXXX API, and calls
either iptablesXXX or:
3) a virNftablesXXX API that exactly replicates the filtering rules of
the existing iptablesXXX API (except in the custom "libvirt" base
table rather than the system "filter" and "nat" tables). This means
that:
4) when the nftables backend is used, the rules added are *exactly the
same* (functionally speaking) as we currently add for iptables (except
they are in the "libvirt" table).
We had spent some time in IRC discussing different ways of using new
functionality available in nftables to make a more
efficient/performant implemention of the desired filtering, and there
are some really great possibilities that need to be explored, but in
the end there were too many details up in the air, and I decided that
it would be more "accomplishable" (coined a new word there!) to first
replicate existing behavior with nftables, but do it inside a
framework that makes it easy to modify the details in the future (in
particular making it painless to switch back and forth between builds
with differing filter models at runtime) - this way we'll be able to
separate the infrastructure work from the details of the rules (which
we can then more easily work on and experiment with). (This implies
that the main objective right now is "get rid of iptables
dependencies", not "make the filtering faster and more efficient").
Notable features of this patchset:
* allows switching between iptables/nftables backends without
rebooting or restarting networks/guests.
Because the commands required to remove a network's filter rules are
now saved in the network status XML, each time libvirtd (or
virtnetworkd) is restarted, it will execute exactly the commands
needed to remove the filter rules that had been added by the
previous libvirtd/virtnetworkd (rather than just making a guess, as
we've always done up until now), and then add new rules using the
current backend+binary's set of rules (while also saving the info
needed for future removal of these new rules back into the network's
status XML).
* firewall_backend can be explicitly set in (new)
/etc/libvirt/network.conf, but if it's not explicitly set, libvirt
will default to the iptables backend if the iptables binary is
found, and otherwise fall back to nftables as long as the nft
binary is found; otherwise the first attempt to start a network will
fail with an appropriate error.
Things that seem ugly / that I would like to clean up / that I think
are just fine as they are:
* virFirewall does *not* provide a backend-agnostic interface [this is fine]
* We need to maintain a backward-compatible API for virFirewall so
that we don't have to touch nwfilter code. Trying to make its API
backend-agnostic would require individually considering/changing
every nwfilter use of virFirewall.
* instead virFirewall objects are just a way to build a collection
of commands to execute to build a firewall, then execute them
while collecting info for and building a collection of commands
that will tear down that firewall in the future.
Do I want to "fix" this in the future by making virFirewall a higher
level interface that accepts tokens describing the type of rule to
add (rather than backend-specific arguments to a backend-specific
command)? No. I think I like the way virFirewall works (as
described in that previous bullet-point), instead I'm thinking that
it is just slightly mis-named - I've lately been thinking of it as a
"virNetFilterCmdList". Similarly, the virFirewallRules that it has a
list of aren't really "rules", they are better described as commands
or actions, so maybe they should be renamed to virNetfilterCmd or
virNetfilterAction. But that is just cosmetic, so I didn't want to
get into it in these patches (especially in case someone disagrees,
or has a better idea for naming).
* Speaking of renaming - I should probably rename all the
"iptablesXXX" functions to "virIptablesXXX" to be consistent with so
much of our other code. I lost the ambition to deal with it right
now though, so I'm leaving that for later cleanup (or I could do it
now if it really makes someone's day :-).
* I could have chosen a higher place in the callchain to make the
virNetfilter abstraction, e.g. at the level of
"networkAddXXXFirewallRules()" rather than at the lower level of
iptablesXXX(). That is actually probably what will happen in the
future (since it will be necessary in order for an nftables-based
firewall to be significantly different in structure from an
iptables-based firewall). But that's the beauty of an API being
private - we can freely add/remove things as needed. the important
thing is that we now have the basic structure there.
For now, the split is just above the existing iptablesXXX API
(util/viriptables.[ch], which seems like a "narrow" enough
place. Most iptablesXXX functions are written in terms of just 10
*other* iptablesXXX functions that add iptables-specific commands -
I've just moved those functions into virnetfilter.[ch]
(appropriately renamed), and changed them to call the 10
virNetfilterXXX functions that will in-turn call those 10
iptablesXXX (or equivalent virNftablesXXX) functions.
* Some people may dislike that the 10 virNetfilterXXX functions are
each written with a switch statement that has cases to directly call
each backend, rather than each backend driver having a table of
pointers to API functions, with the virNetfilter API function
calling backends[fwBackend]->XXX() (ie the pattern for so many
drivers in libvirt). But for just 2 backends, that really seemed
like overkill and unnecessary obfuscation.
* As implemented here, I am storing a "<fwRemoval>" element in the
network status XML - it contains a serialized virFirewall object
that directly contains the commands necessary to remove the
firewall. I could instead just store "<firewall>", which would
include all the commands that were used to *create* the firewall in
addition to the commands needed to remove the firewall. The way it's
done currently takes up less space; switching to storing the full
firewall *might* be more informative to somebody, but on the other
hand would make the network status XML *very* long. If anybody has
an opinion about this, now is the time to bring it up - do you think
it's worth having a separate list of all the commands that were used
to create a network's firewall (keeping in mind that there is no
public API to access it)? Or is it enough to just store what's
needed to remove the firewall?
* Several months ago Eric Garver posted patches for a pure firewalld
backend, and I requested that they not be pushed because I wanted
that to be integrated with my nftables backend support. Due to the
fact that the firewalld backend is almost entirely implemented by
putting the bridge into a new firewalld "zone", with no individual
rules added, that won't happen as just another backend driver file
in parallel to iptables and nftables; it will instead work by
checking firewall_backend at a higher level in the network driver,
thus avoiding the calls to virNetfilterXXX() entirely. I have
locally merged Eric's patches over the top of these patches, and
there are surprisingly few conflicts, but since his patches didn't
account for a user-settable config (but instead just always used the
firewalld backend if firewalld was active), some of the patches are
going to require a bit of rework, which I'll take care of after
getting these patches in.
Laine Stump (28):
util: add -w/--concurrent when applying the rule rather than when
building it
util: new virFirewallRuleGet*() APIs
util: determine ignoreErrors value when creating rule, not when
applying
util: rename iptables helpers that will become the frontend for
ip&nftables
util: move backend-agnostic virNetfilter*() functions to their own
file
util: make netfilter action a proper typedefed (virFirewall) enum
util: #define the names used for private packet filter chains
util: move/rename virFirewallApplyRuleDirect to
virIptablesApplyFirewallRule
util/network: reintroduce virFirewallBackend, but different
network: add (empty) network.conf file to distribution files
network: allow setting firewallBackend from network.conf
network: do not add DHCP checksum mangle rule unless using iptables
network: call backend agnostic function to init private filter chains
util: setup functions in virnetfilter which will call appropriate
backend
build: add nft to the list of binaries we attempt to locate
util: add nftables backend to virnetfilter API used by network driver
tests: test cases for nftables backend
util: new functions to support adding individual rollback rules
util: check for 0 args when applying iptables rule
util: implement rollback rule autosave for iptables backend
util: implement rollback rule autosave for nftables backend
network: turn on auto-rollback for the rules added for virtual
networks
util: new function virFirewallNewFromRollback()
util: new functions virFirewallParseXML() and virFirewallFormat()
conf: add a virFirewall object to virNetworkObj
network: use previously saved list of firewall rules when removing
network: save network status when firewall rules are reloaded
network: improve log message when reloading virtual network firewall
rules
libvirt.spec.in | 5 +
meson.build | 1 +
po/POTFILES | 2 +
src/conf/virnetworkobj.c | 40 +
src/conf/virnetworkobj.h | 11 +
src/libvirt_private.syms | 68 +-
src/network/bridge_driver.c | 40 +-
src/network/bridge_driver_conf.c | 44 +
src/network/bridge_driver_conf.h | 3 +
src/network/bridge_driver_linux.c | 241 +++--
src/network/bridge_driver_nop.c | 6 +-
src/network/bridge_driver_platform.h | 6 +-
src/network/libvirtd_network.aug | 39 +
src/network/meson.build | 11 +
src/network/network.conf | 24 +
src/network/test_libvirtd_network.aug.in | 5 +
src/nwfilter/nwfilter_ebiptables_driver.c | 16 +-
src/util/meson.build | 2 +
src/util/virebtables.c | 4 +-
src/util/virfirewall.c | 490 ++++++++--
src/util/virfirewall.h | 51 +-
src/util/viriptables.c | 762 ++++-----------
src/util/viriptables.h | 222 ++---
src/util/virnetfilter.c | 892 ++++++++++++++++++
src/util/virnetfilter.h | 159 ++++
src/util/virnftables.c | 698 ++++++++++++++
src/util/virnftables.h | 118 +++
.../{base.args => base.iptables} | 0
tests/networkxml2firewalldata/base.nftables | 256 +++++
...-linux.args => nat-default-linux.iptables} | 0
.../nat-default-linux.nftables | 248 +++++
...pv6-linux.args => nat-ipv6-linux.iptables} | 0
.../nat-ipv6-linux.nftables | 384 ++++++++
...rgs => nat-ipv6-masquerade-linux.iptables} | 0
.../nat-ipv6-masquerade-linux.nftables | 456 +++++++++
...linux.args => nat-many-ips-linux.iptables} | 0
.../nat-many-ips-linux.nftables | 472 +++++++++
...-linux.args => nat-no-dhcp-linux.iptables} | 0
.../nat-no-dhcp-linux.nftables | 384 ++++++++
...ftp-linux.args => nat-tftp-linux.iptables} | 0
.../nat-tftp-linux.nftables | 274 ++++++
...inux.args => route-default-linux.iptables} | 0
.../route-default-linux.nftables | 162 ++++
tests/networkxml2firewalltest.c | 56 +-
tests/virfirewalltest.c | 20 +-
45 files changed, 5718 insertions(+), 954 deletions(-)
create mode 100644 src/network/libvirtd_network.aug
create mode 100644 src/network/network.conf
create mode 100644 src/network/test_libvirtd_network.aug.in
create mode 100644 src/util/virnetfilter.c
create mode 100644 src/util/virnetfilter.h
create mode 100644 src/util/virnftables.c
create mode 100644 src/util/virnftables.h
rename tests/networkxml2firewalldata/{base.args => base.iptables} (100%)
create mode 100644 tests/networkxml2firewalldata/base.nftables
rename tests/networkxml2firewalldata/{nat-default-linux.args => nat-default-linux.iptables} (100%)
create mode 100644 tests/networkxml2firewalldata/nat-default-linux.nftables
rename tests/networkxml2firewalldata/{nat-ipv6-linux.args => nat-ipv6-linux.iptables} (100%)
create mode 100644 tests/networkxml2firewalldata/nat-ipv6-linux.nftables
rename tests/networkxml2firewalldata/{nat-ipv6-masquerade-linux.args => nat-ipv6-masquerade-linux.iptables} (100%)
create mode 100644 tests/networkxml2firewalldata/nat-ipv6-masquerade-linux.nftables
rename tests/networkxml2firewalldata/{nat-many-ips-linux.args => nat-many-ips-linux.iptables} (100%)
create mode 100644 tests/networkxml2firewalldata/nat-many-ips-linux.nftables
rename tests/networkxml2firewalldata/{nat-no-dhcp-linux.args => nat-no-dhcp-linux.iptables} (100%)
create mode 100644 tests/networkxml2firewalldata/nat-no-dhcp-linux.nftables
rename tests/networkxml2firewalldata/{nat-tftp-linux.args => nat-tftp-linux.iptables} (100%)
create mode 100644 tests/networkxml2firewalldata/nat-tftp-linux.nftables
rename tests/networkxml2firewalldata/{route-default-linux.args => route-default-linux.iptables} (100%)
create mode 100644 tests/networkxml2firewalldata/route-default-linux.nftables
--
2.39.2
4 months, 2 weeks
[PATCH] Extend libvirt-guests to shutdown only persistent VMs
by Benjamin Taubmann
At the moment, there is no configuration option for the libvirt-guests
service that allows users to define that only persistent virtual machines
should be shutdown on host shutdown.
Currently, the service config allows to choose between two ON_SHUTDOWN
actions that are executed on running virtual machines when the host goes
down: shutdown, suspend.
The ON_SHUTDOWN action should be orthogonal to the type of the virtual
machine. However, the existing implementation, does not suspend
transient virtual machines.
This is the matrix of actions that is executed on virtual machines based
on the configured ON_SHUTDOWN action and the type of a virtual machine.
| persistent | transient
shutdown | shutdown | shutdown (what we want to change)
suspend | suspend | nothing
Add config option PERSISTENT_ONLY to libvirt-guests config that allows
users to define if the ON_SHUTDOWN action should be applied only on
persistent virtual machines. PERSISTENT_ONLY can be set to true, false,
default. The default option will implement the already existing logic.
Case 1: PERSISTENT_ONLY=default
| persistent | transient
shutdown | shutdown | shutdown
suspend | suspend | nothing
Case 2: PERSISTENT_ONLY=true
| persistent | transient
shutdown | shutdown | nothing
suspend | suspend | nothing
Case 3: PERSISTENT_ONLY=false
| persistent | transient
shutdown | shutdown | shutdown
suspend | suspend | suspend
Change-Id: Ib03013d00b3ec60716287dad4743a038cf000763
---
tools/libvirt-guests.sh.in | 37 ++++++++++++++++++++++++++++++-------
1 file changed, 30 insertions(+), 7 deletions(-)
diff --git a/tools/libvirt-guests.sh.in b/tools/libvirt-guests.sh.in
index 344b54390a..c3c5954e17 100644
--- a/tools/libvirt-guests.sh.in
+++ b/tools/libvirt-guests.sh.in
@@ -38,6 +38,7 @@ PARALLEL_SHUTDOWN=0
START_DELAY=0
BYPASS_CACHE=0
SYNC_TIME=0
+PERSISTENT_ONLY="default"
test -f "$initconfdir"/libvirt-guests &&
. "$initconfdir"/libvirt-guests
@@ -438,14 +439,16 @@ shutdown_guests_parallel()
# stop
# Shutdown or save guests on the configured uris
stop() {
- local suspending="true"
local uri=
+ local action="suspend"
+ local persistent_only="default"
+
# last stop was not followed by start
[ -f "$LISTFILE" ] && return 0
if [ "x$ON_SHUTDOWN" = xshutdown ]; then
- suspending="false"
+ action="shutdown"
if [ $SHUTDOWN_TIMEOUT -lt 0 ]; then
gettext "SHUTDOWN_TIMEOUT must be equal or greater than 0"
echo
@@ -454,6 +457,22 @@ stop() {
fi
fi
+ case "x$PERSISTENT_ONLY" in
+ xtrue)
+ persistent_only="true"
+ ;;
+ xfalse)
+ persistent_only="false"
+ ;;
+ *)
+ if [ "x$action" = xshutdown ]; then
+ persistent_only="false"
+ elif [ "x$action" = xsuspend ]; then
+ persistent_only="true"
+ fi
+ ;;
+ esac
+
: >"$LISTFILE"
set -f
for uri in $URIS; do
@@ -478,7 +497,7 @@ stop() {
echo
fi
- if "$suspending"; then
+ if "$persistent_only"; then
local transient="$(list_guests "$uri" "--transient")"
if [ $? -eq 0 ]; then
local empty="true"
@@ -486,7 +505,11 @@ stop() {
for uuid in $transient; do
if "$empty"; then
- eval_gettext "Not suspending transient guests on URI: \$uri: "
+ if [ "x$action" = xsuspend ]; then
+ eval_gettext "Not suspending transient guests on URI: \$uri: "
+ else
+ eval_gettext "Not shutting down transient guests on URI: \$uri: "
+ fi
empty="false"
else
printf ", "
@@ -520,19 +543,19 @@ stop() {
if [ -s "$LISTFILE" ]; then
while read uri list; do
- if "$suspending"; then
+ if [ "x$action" = xsuspend ]; then
eval_gettext "Suspending guests on \$uri URI..."; echo
else
eval_gettext "Shutting down guests on \$uri URI..."; echo
fi
if [ "$PARALLEL_SHUTDOWN" -gt 1 ] &&
- ! "$suspending"; then
+ [ "x$action" = xshutdown ]; then
shutdown_guests_parallel "$uri" "$list"
else
local guest=
for guest in $list; do
- if "$suspending"; then
+ if [ "x$action" = xsuspend ]; then
suspend_guest "$uri" "$guest"
else
shutdown_guest "$uri" "$guest"
--
2.39.2
6 months, 4 weeks
[libvirt PATCH 00/12] Add vmx-* features from qemu
by Tim Wiederhake
For rationale, see patch 3 (cpu_map: No longer ignore vmx- features in
sync_qemu_features_i386.py).
Adding features in bunches (one patch per msr index), as this series
is adding ~100 features.
This also adds cpu features "gds-no" and "amx-complex" and brings
libvirt in sync with qemu commit ad6ef0a42e.
Tim Wiederhake (12):
cpu_map: Add missing feature "gds-no"
cpu_map: Add missing feature "amx-complex"
cpu_map: No longer ignore vmx- features in sync_qemu_features_i386.py
cpu_map: Add missing vmx features from MSR 0x480
cpu_map: Add missing vmx features from MSR 0x485
cpu_map: Add missing vmx features from MSR 0x48B
cpu_map: Add missing vmx features from MSR 0x48C
cpu_map: Add missing vmx features from MSR 0x48D
cpu_map: Add missing vmx features from MSR 0x48E
cpu_map: Add missing vmx features from MSR 0x48F
cpu_map: Add missing vmx features from MSR 0x490
cpu_map: Add missing vmx features from MSR 0x491
src/cpu_map/sync_qemu_features_i386.py | 2 +-
src/cpu_map/x86_features.xml | 295 ++++++++++++++++++
.../x86_64-cpuid-Atom-P5362-enabled.xml | 9 +
.../x86_64-cpuid-Atom-P5362-json.xml | 70 +++++
.../x86_64-cpuid-Cooperlake-enabled.xml | 9 +
.../x86_64-cpuid-Cooperlake-json.xml | 71 +++++
.../x86_64-cpuid-Core-i7-8550U-enabled.xml | 9 +
.../x86_64-cpuid-Core-i7-8550U-json.xml | 68 ++++
...86_64-cpuid-Xeon-Platinum-9242-enabled.xml | 9 +
.../x86_64-cpuid-Xeon-Platinum-9242-json.xml | 71 +++++
...-cpuid-baseline-Cooperlake+Cascadelake.xml | 71 +++++
.../domaincapsdata/qemu_4.2.0-q35.x86_64.xml | 68 ++++
tests/domaincapsdata/qemu_4.2.0.x86_64.xml | 68 ++++
.../domaincapsdata/qemu_5.0.0-q35.x86_64.xml | 68 ++++
tests/domaincapsdata/qemu_5.0.0.x86_64.xml | 68 ++++
.../domaincapsdata/qemu_8.2.0-q35.x86_64.xml | 1 +
tests/domaincapsdata/qemu_8.2.0.x86_64.xml | 1 +
.../cpu-host-model.x86_64-4.2.0.args | 2 +-
.../cpu-host-model.x86_64-5.0.0.args | 2 +-
.../cpu-host-model.x86_64-latest.args | 2 +-
20 files changed, 960 insertions(+), 4 deletions(-)
--
2.39.2
7 months, 2 weeks
[PATCH rfcv3 00/11] LIBVIRT: X86: TDX support
by Zhenzhong Duan
Hi,
This series brings libvirt the x86 TDX support.
* What's TDX?
TDX stands for Trust Domain Extensions which isolates VMs from
the virtual-machine manager (VMM)/hypervisor and any other software on
the platform.
To support TDX, multiple software components, not only KVM but also QEMU,
guest Linux and virtual bios, need to be updated. For more details, please
check link[1], there are TDX spec links and public repository link at github
for each software component.
This patchset is another software component to extend libvirt to support TDX,
with which one can start a VM from high level rather than running qemu directly.
* Misc
As QEMU use a software emulated way to reset guest which isn't supported by TDX
guest for security reason. We add a new way to emulate the reset for TDX guest,
called "hard reboot". We achieve this by killing old qemu and start a new one.
Complete code can be found at [1], matching qemu code can be found at [2].
There are some new properties for tdx-guest object, i.e. `mrconfigid`, `mrowner`,
`mrownerconfig` and `debug` which aren't in matching qemu[2] yet. I keep them
intentionally as they will be implemented in qemu as extention series of [2].
* Test
start/stop/reboot with virsh
stop/reboot trigger in guest
stop with on_poweroff=destroy/restart
reboot with on_reboot=destroy/restart
* Patch organization
- patch 1-3: Support query of TDX capabilities.
- patch 4-6: Add TDX type to launchsecurity framework.
- patch 7-11: Add hard reboot support to TDX guest
[1] https://github.com/intel/libvirt-tdx/commits/tdx_for_upstream_rfcv3
[2] https://github.com/intel/qemu-tdx/tree/tdx-qemu-upstream-v3
Thanks
Zhenzhong
Changelog:
rfcv3:
- Change to generate qemu cmdline with -bios
- drop firmware auto match as -bios is used
- add a hard reboot method to reboot TDX guest
rfcv2:
- give up using qmp cmd and check TDX directly on host for TDX capabilities.
- use launchsecurity framework to support TDX
- use <os>.<loader> for general loader
- add auto firmware match feature for TDX
A example TDVF fimware description file 70-edk2-x86_64-tdx.json:
{
"description": "UEFI firmware for x86_64, supporting Intel TDX",
"interface-types": [
"uefi"
],
"mapping": {
"device": "generic",
"filename": "/usr/share/OVMF/OVMF_CODE-tdx.fd"
},
"targets": [
{
"architecture": "x86_64",
"machines": [
"pc-q35-*"
]
}
],
"features": [
"intel-tdx",
"verbose-dynamic"
],
"tags": [
]
}
rfcv2:
https://www.mail-archive.com/libvir-list@redhat.com/msg219378.html
Chenyi Qiang (3):
qemu: add hard reboot in QEMU driver
qemu: make hard reboot as the TDX default reboot mode
virsh: add new option "timekeep" to keep virsh console alive
Zhenzhong Duan (8):
qemu: Check if INTEL Trust Domain Extention support is enabled
qemu: Add TDX capability
conf: expose TDX feature in domain capabilities
conf: add tdx as launch security type
qemu: Add command line and validation for TDX type
qemu: force special parameters enabled for TDX guest
qemu: Extend hard reboot in Qemu driver
conf: Add support to keep same domid for hard reboot
docs/formatdomaincaps.rst | 1 +
include/libvirt/libvirt-domain.h | 2 +
src/conf/domain_capabilities.c | 1 +
src/conf/domain_capabilities.h | 1 +
src/conf/domain_conf.c | 50 ++++++++++++++++
src/conf/domain_conf.h | 11 ++++
src/conf/schemas/domaincaps.rng | 9 +++
src/conf/schemas/domaincommon.rng | 34 +++++++++++
src/conf/virconftypes.h | 2 +
src/qemu/qemu_capabilities.c | 38 +++++++++++-
src/qemu/qemu_capabilities.h | 1 +
src/qemu/qemu_command.c | 29 +++++++++
src/qemu/qemu_domain.c | 18 ++++++
src/qemu/qemu_domain.h | 4 ++
src/qemu/qemu_driver.c | 85 ++++++++++++++++++++------
src/qemu/qemu_firmware.c | 1 +
src/qemu/qemu_monitor.c | 19 +++++-
src/qemu/qemu_monitor.h | 2 +-
src/qemu/qemu_monitor_json.c | 6 +-
src/qemu/qemu_namespace.c | 1 +
src/qemu/qemu_process.c | 99 ++++++++++++++++++++++++++++++-
src/qemu/qemu_validate.c | 18 ++++++
tools/virsh-console.c | 3 +
tools/virsh-domain.c | 64 +++++++++++++++-----
tools/virsh.h | 1 +
25 files changed, 463 insertions(+), 37 deletions(-)
--
2.34.1
9 months, 3 weeks
[libvirt PATCH v2 00/15] Support for VFIO variant drivers, Part 2
by Laine Stump
(Thisis "V2 of Part 2". "V1 of Part 2" is here:
https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/5G... )
Part 1 (which simply made it possible to use virsh nodedev-detach to
bind a device to a manually-specified variant driver, and at guest
runtime allowed libvirt to ignore the fact that the driver found to
the device was something other than exactly "vfio-pci") was here:
https://listman.redhat.com/archives/libvir-list/2023-August/241338.html
and pushed upstream as of commit v9.6.0-153-g24beaffec3
Part 2 adds two new pieces of functionality:
1) It is possible to manually specify a VFIO variant driver (or force
the generic vfio-pci driver) for a device in the domain XML with,
e.g.:
<driver name='mlx5_vfio_pci'/>
(for the former) or:
<driver name='vfio-pci'/>
(for the latter).
2) By default libvirt will now find the "best match" VFIO or VFIO
variant driver by comparing the device's modalias file contents (in
sysfs) with vfio drivers found in the running kernel's
modules.alias file. This means that "virsh nodedev-detach" of a
host device will bind it to its appropriate VFIO variant driver (if
one is available), and also if a <hostdev> decice in a domain
config has "managed='yes'", libvirt will bind it to a variant
driver if possible (in order to force binding to the basic vfio-pci
driver instead, you just need to add the <driver> element mentioned
above).
The first 12 patches are all just getting (1) going (a lot of it is
refactoring code to use common code for the four places that use the
hostdev <driver> element), and the final 3 patches implement (2).
More details are spread along the way.
Differences from V1:
I squashed together a couple of the patches, and fixed some things that caused
CI jobs to fail (I'd forgotten to push the branch to gitlab and
trigger CI).
Laine Stump (15):
util: properly deal with module vs. driver when binding device to
driver
schema: consolidate RNG for all hostdev <driver> elements
conf: move/rename hostdev PCI driver type enum to device_conf.h
conf: normalize hostdev <driver> parsing to simplify adding new attr
conf: put hostdev PCI backend into a struct
conf: use virDeviceHostdevPCIDriverInfo in network and networkport
objects
conf: split out hostdev <driver> parse/format to their own functions
conf: use new common parser/formatter for hostdev driver in network
XML
tests: remove explicit <driver name='vfio'/> from hostdev test cases
xen: explicitly set hostdev driver.type at runtime, not in postparse
conf: replace virHostdevIsVFIODevice with virHostdevIsPCIDevice
conf: support manually specifying VFIO variant driver in <hostdev> XML
util: new function virStringSkipToSpace()
tests: mock virPCIDevice(BindTo|UnbindFrom)Stub with nop functions
qemu: automatically bind to a vfio variant driver, if available
docs/formatdomain.rst | 55 ++-
src/conf/device_conf.c | 75 +++
src/conf/device_conf.h | 27 ++
src/conf/domain_capabilities.c | 2 +-
src/conf/domain_capabilities.h | 2 +-
src/conf/domain_conf.c | 98 +---
src/conf/domain_conf.h | 18 +-
src/conf/network_conf.c | 43 +-
src/conf/network_conf.h | 17 +-
src/conf/schemas/basictypes.rng | 20 +
src/conf/schemas/domaincommon.rng | 173 ++++---
src/conf/schemas/network.rng | 10 +-
src/conf/schemas/networkport.rng | 10 +-
src/conf/virconftypes.h | 2 +
src/conf/virnetworkportdef.c | 22 +-
src/conf/virnetworkportdef.h | 4 +-
src/hypervisor/virhostdev.c | 16 +-
src/hypervisor/virhostdev.h | 2 -
src/libvirt_private.syms | 10 +-
src/libxl/libxl_capabilities.c | 2 +-
src/libxl/libxl_domain.c | 65 ++-
src/libxl/libxl_driver.c | 25 +-
src/network/bridge_driver.c | 2 +-
src/qemu/qemu_capabilities.c | 4 +-
src/qemu/qemu_command.c | 14 +-
src/qemu/qemu_domain.c | 29 +-
src/qemu/qemu_hostdev.c | 2 +-
src/qemu/qemu_hotplug.c | 2 +-
src/qemu/qemu_validate.c | 6 +-
src/security/security_apparmor.c | 2 +-
src/security/security_dac.c | 4 +-
src/security/security_selinux.c | 4 +-
src/security/virt-aa-helper.c | 8 +-
src/util/virpci.c | 438 ++++++++++++++++--
src/util/virpci.h | 3 +
src/util/virstring.c | 17 +
src/util/virstring.h | 1 +
tests/domaincapstest.c | 4 +-
tests/libxlxml2domconfigdata/moredevs-hvm.xml | 1 -
tests/networkxml2xmlin/hostdev-pf-old.xml | 8 +
tests/networkxml2xmlin/hostdev-pf.xml | 2 +-
tests/networkxml2xmlout/hostdev-pf-old.xml | 8 +
tests/networkxml2xmlout/hostdev-pf.xml | 2 +-
tests/networkxml2xmltest.c | 6 +
tests/qemuhotplugmock.c | 15 +
.../qemuhotplug-hostdev-pci.xml | 1 -
.../qemuhotplug-base-live+hostdev-pci.xml | 2 +-
...uhotplug-pseries-base-live+hostdev-pci.xml | 2 +-
tests/qemustatusxml2xmldata/modern-in.xml | 2 +-
.../hostdev-pci-address-unassigned.xml | 4 -
.../hostdev-pci-multifunction.xml | 7 -
.../hostdev-vfio-multidomain.xml | 1 -
.../hostdev-vfio-zpci-autogenerate-fids.xml | 2 -
.../hostdev-vfio-zpci-autogenerate-uids.xml | 2 -
.../hostdev-vfio-zpci-autogenerate.xml | 1 -
.../hostdev-vfio-zpci-boundaries.xml | 2 -
.../hostdev-vfio-zpci-ccw-memballoon.xml | 1 -
.../hostdev-vfio-zpci-duplicate.xml | 2 -
...ostdev-vfio-zpci-invalid-uid-valid-fid.xml | 1 -
.../hostdev-vfio-zpci-multidomain-many.xml | 8 -
.../hostdev-vfio-zpci-set-zero.xml | 1 -
.../hostdev-vfio-zpci-uid-set-zero.xml | 1 -
.../hostdev-vfio-zpci-wrong-arch.xml | 1 -
tests/qemuxml2argvdata/hostdev-vfio-zpci.xml | 1 -
.../hostdev-vfio.x86_64-latest.args | 5 +-
tests/qemuxml2argvdata/hostdev-vfio.xml | 19 +-
.../net-hostdev-vfio-multidomain.xml | 1 -
tests/qemuxml2argvdata/net-hostdev-vfio.xml | 1 -
tests/qemuxml2argvdata/pseries-hostdevs-1.xml | 3 -
tests/qemuxml2argvdata/pseries-hostdevs-2.xml | 2 -
tests/qemuxml2argvdata/pseries-hostdevs-3.xml | 2 -
...v-pci-address-unassigned.x86_64-latest.xml | 4 -
...ostdev-pci-multifunction.x86_64-latest.xml | 7 -
...io-zpci-autogenerate-fids.s390x-latest.xml | 2 -
...io-zpci-autogenerate-uids.s390x-latest.xml | 2 -
...ev-vfio-zpci-autogenerate.s390x-latest.xml | 1 -
...tdev-vfio-zpci-boundaries.s390x-latest.xml | 2 -
...-vfio-zpci-ccw-memballoon.s390x-latest.xml | 1 -
...fio-zpci-multidomain-many.s390x-latest.xml | 8 -
.../hostdev-vfio-zpci.s390x-latest.xml | 1 -
.../hostdev-vfio.x86_64-latest.xml | 24 +-
.../net-hostdev-vfio.x86_64-latest.xml | 1 -
.../pseries-hostdevs-1.ppc64-latest.xml | 3 -
.../pseries-hostdevs-2.ppc64-latest.xml | 2 -
.../pseries-hostdevs-3.ppc64-latest.xml | 2 -
tests/virhostdevtest.c | 2 +-
.../plug-hostdev-pci-unmanaged.xml | 2 +-
.../plug-hostdev-pci.xml | 2 +-
tests/xlconfigdata/test-fullvirt-pci.xml | 2 -
tests/xmconfigdata/test-pci-dev-syntax.xml | 2 -
tests/xmconfigdata/test-pci-devs.xml | 2 -
tools/virsh-completer-nodedev.c | 4 +-
92 files changed, 933 insertions(+), 498 deletions(-)
create mode 100644 tests/networkxml2xmlin/hostdev-pf-old.xml
create mode 100644 tests/networkxml2xmlout/hostdev-pf-old.xml
--
2.41.0
9 months, 4 weeks
[libvirt PATCH 0/5] Initial network support in ch driver.
by Praveen K Paladugu
This series introduces network support for ch guests. ETHERNET network mode is
the only mode supported. More modes will be add once this series is merged.
For security reasons, cloud-hypervisor requries FDs for guests to be sent using
SCM_RIGHTS. This patch series introduces 2 methods that support sending FDs to
ch in Patch 3. Network support in ch driver is introduced by Patch 5.
I previously sent "hypervisor: Move interface mgmt methods to hypervisor" patch
by itself. I merged this patch also into Patch 2 in this series.
Praveen K Paladugu (5):
conf: Drop unused parameter
hypervisor: Move domain interface mgmt methods
util: Add util methods required by ch networking
ch: Introduce version based cap for network support
ch: Enable ETHERNET Network mode support
po/POTFILES | 3 +
src/ch/ch_capabilities.c | 9 +
src/ch/ch_capabilities.h | 1 +
src/ch/ch_conf.h | 4 +
src/ch/ch_domain.c | 41 +++
src/ch/ch_domain.h | 3 +
src/ch/ch_interface.c | 104 +++++++
src/ch/ch_interface.h | 35 +++
src/ch/ch_monitor.c | 205 +++++---------
src/ch/ch_monitor.h | 7 +-
src/ch/ch_process.c | 160 ++++++++++-
src/ch/meson.build | 2 +
src/conf/domain_conf.c | 1 -
src/conf/domain_conf.h | 3 +-
src/hypervisor/domain_interface.c | 457 ++++++++++++++++++++++++++++++
src/hypervisor/domain_interface.h | 45 +++
src/hypervisor/meson.build | 1 +
src/libvirt_private.syms | 11 +
src/libxl/libxl_domain.c | 2 +-
src/libxl/libxl_driver.c | 4 +-
src/lxc/lxc_driver.c | 2 +-
src/lxc/lxc_process.c | 4 +-
src/qemu/qemu_command.c | 6 +-
src/qemu/qemu_hotplug.c | 15 +-
src/qemu/qemu_interface.c | 339 +---------------------
src/qemu/qemu_interface.h | 11 -
src/qemu/qemu_process.c | 73 +----
src/util/virsocket.c | 109 +++++++
src/util/virsocket.h | 3 +
29 files changed, 1093 insertions(+), 567 deletions(-)
create mode 100644 src/ch/ch_interface.c
create mode 100644 src/ch/ch_interface.h
create mode 100644 src/hypervisor/domain_interface.c
create mode 100644 src/hypervisor/domain_interface.h
--
2.41.0
10 months
[libvirt PATCH v3] qemu: add runtime config option for nbdkit
by Jonathon Jongsma
Currently when we build with nbdkit support, libvirt will always try to
use nbdkit to access remote disk sources when it is available. But
without an up-to-date selinux policy allowing this, it will fail.
because the required selinux policies are not yet widely available, we
have disabled nbdkit support on rpm builds for all distributions before
Fedora 40.
Unfortunately, this makes it more difficult to test nbdkit support.
After someone updates to the necessary selinux policies, they would also
need to rebuild libvirt to enable nbdkit support. By introducing a
configure option (nbdkit_config_default), we can build packages with
nbdkit support but have it disabled by default.
Signed-off-by: Jonathon Jongsma <jjongsma(a)redhat.com>
Suggested-by: Andrea Bolognani <abologna(a)redhat.com>
---
Sorry for the delay in getting out a new version. Thanksgiving break in
the USA, etc.
Changes in v3:
- nbdkit_config_default changed from a meson 'boolean' to a meson
'feature' configure option
- print error message for senseless configurations where nbdkit is
disabled but nbdkit_config_default is enabled
- added a comment to example config file about possibility of the
default value changing in the future
- don't include the storage_use_nbdkit in the config file if nbdkit
support is not compiled in. I used the same strategy that is used in
the remote driver where certain config options are disabled based on
whether WITH_IP is defined or not. When WITH_NBDKIT is not defined,
the value is not included in the config file and we don't attempt to
parse it from the config file either.
libvirt.spec.in | 18 +++++++--
meson.build | 10 +++++
meson_options.txt | 3 +-
...libvirtd_qemu.aug => libvirtd_qemu.aug.in} | 7 ++++
src/qemu/meson.build | 37 ++++++++++++++++++-
src/qemu/qemu.conf.in | 13 +++++++
src/qemu/qemu_conf.c | 19 ++++++++++
src/qemu/qemu_conf.h | 2 +
src/qemu/qemu_domain.c | 3 ++
tests/qemuxml2argvtest.c | 15 +++++---
10 files changed, 114 insertions(+), 13 deletions(-)
rename src/qemu/{libvirtd_qemu.aug => libvirtd_qemu.aug.in} (98%)
diff --git a/libvirt.spec.in b/libvirt.spec.in
index f3fdedfde4..7aa497a565 100644
--- a/libvirt.spec.in
+++ b/libvirt.spec.in
@@ -96,6 +96,7 @@
%define with_sanlock 0
%define with_numad 0
%define with_nbdkit 0
+%define with_nbdkit_config_default 0
%define with_firewalld_zone 0
%define with_netcf 0
%define with_libssh2 0
@@ -174,15 +175,17 @@
%endif
%endif
-# We should only enable nbdkit support if the OS ships a SELinux policy that
-# allows libvirt to launch it. Right now that's not the case anywhere, but
-# things should be fine by the time Fedora 40 is released.
+# We want to build with nbdkit support, but should only enable nbdkit by
+# default if the OS ships a SELinux policy that allows libvirt to launch it.
+# Right now that's not the case anywhere, but things should be fine by the time
+# Fedora 40 is released.
#
# TODO: add RHEL 9 once a minor release that contains the necessary SELinux
# bits exists (we only support the most recent minor release)
%if %{with_qemu}
+ %define with_nbdkit 0%{!?_without_nbdkit:1}
%if 0%{?fedora} >= 40
- %define with_nbdkit 0%{!?_without_nbdkit:1}
+ %define with_nbdkit_config_default 0%{!?_without_nbdkit_config_default:1}
%endif
%endif
@@ -1214,6 +1217,12 @@ exit 1
%define arg_nbdkit -Dnbdkit=disabled
%endif
+%if %{with_nbdkit_config_default}
+ %define arg_nbdkit_config_default -Dnbdkit_config_default=enabled
+%else
+ %define arg_nbdkit_config_default -Dnbdkit_config_default=disabled
+%endif
+
%if %{with_fuse}
%define arg_fuse -Dfuse=enabled
%else
@@ -1329,6 +1338,7 @@ export SOURCE_DATE_EPOCH=$(stat --printf='%Y' %{_specdir}/libvirt.spec)
%{?arg_sanlock} \
-Dlibpcap=enabled \
%{?arg_nbdkit} \
+ %{?arg_nbdkit_config_default} \
-Dlibnl=enabled \
-Daudit=enabled \
-Ddtrace=enabled \
diff --git a/meson.build b/meson.build
index 62481a008d..e63b807f21 100644
--- a/meson.build
+++ b/meson.build
@@ -1012,6 +1012,16 @@ if not conf.has('WITH_NBDKIT')
libnbd_dep = dependency('', required: false)
endif
+# default value for storage_use_nbdkit config option.
+# For now 'auto' just maps to disabled, but in the future it may depend on
+# which security drivers are enabled
+use_nbdkit_default = get_option('nbdkit_config_default').enabled()
+
+if use_nbdkit_default and not conf.has('WITH_NBDKIT')
+ error('nbdkit_config_default requires nbdkit to be enabled')
+endif
+conf.set10('USE_NBDKIT_DEFAULT', use_nbdkit_default)
+
libnl_version = '3.0'
if not get_option('libnl').disabled() and host_machine.system() == 'linux'
libnl_dep = dependency('libnl-3.0', version: '>=' + libnl_version, required: get_option('libnl'))
diff --git a/meson_options.txt b/meson_options.txt
index a0928102bf..05fd63dd41 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -104,7 +104,8 @@ option('loader_nvram', type: 'string', value: '', description: 'Pass list of pai
option('login_shell', type: 'feature', value: 'auto', description: 'build virt-login-shell')
option('nss', type: 'feature', value: 'auto', description: 'enable Name Service Switch plugin for resolving guest IP addresses')
option('numad', type: 'feature', value: 'auto', description: 'use numad to manage CPU placement dynamically')
-option('nbdkit', type: 'feature', value: 'auto', description: 'use nbdkit to access network disks')
+option('nbdkit', type: 'feature', value: 'auto', description: 'Build nbdkit storage backend')
+option('nbdkit_config_default', type: 'feature', value: 'disabled', description: 'Whether to use nbdkit storage backend for network disks by default (configurable)')
option('pm_utils', type: 'feature', value: 'auto', description: 'use pm-utils for power management')
option('sysctl_config', type: 'feature', value: 'auto', description: 'Whether to install sysctl configs')
option('tls_priority', type: 'string', value: 'NORMAL', description: 'set the default TLS session priority string')
diff --git a/src/qemu/libvirtd_qemu.aug b/src/qemu/libvirtd_qemu.aug.in
similarity index 98%
rename from src/qemu/libvirtd_qemu.aug
rename to src/qemu/libvirtd_qemu.aug.in
index ed097ea3d9..bcc0aa8938 100644
--- a/src/qemu/libvirtd_qemu.aug
+++ b/src/qemu/libvirtd_qemu.aug.in
@@ -147,6 +147,10 @@ module Libvirtd_qemu =
let capability_filters_entry = str_array_entry "capability_filters"
+@CUT_WITH_NBDKIT@
+ let storage_entry = bool_entry "storage_use_nbdkit"
+
+@END@
(* Each entry in the config is one of the following ... *)
let entry = default_tls_entry
| vnc_entry
@@ -170,6 +174,9 @@ module Libvirtd_qemu =
| nbd_entry
| swtpm_entry
| capability_filters_entry
+@CUT_WITH_NBDKIT@
+ | storage_entry
+@END@
| obsolete_entry
let comment = [ label "#comment" . del /#[ \t]*/ "# " . store /([^ \t\n][^\n]*)?/ . del /\n/ "\n" ]
diff --git a/src/qemu/meson.build b/src/qemu/meson.build
index 2279fef2ca..43519e2834 100644
--- a/src/qemu/meson.build
+++ b/src/qemu/meson.build
@@ -137,9 +137,41 @@ if conf.has('WITH_QEMU')
qemu_user_group_conf = configuration_data({
'QEMU_USER': qemu_user,
'QEMU_GROUP': qemu_group,
+ 'USE_NBDKIT_DEFAULT': use_nbdkit_default.to_int(),
})
+
+ # preprocess config files to remove items related to features that are not
+ # compiled in
+ preprocess_config_files = [
+ {
+ 'inpath': 'qemu.conf.in',
+ 'outpath': 'qemu.conf.tmp',
+ 'outvar': 'qemu_conf_tmp',
+ },
+ {
+ 'inpath': 'libvirtd_qemu.aug.in',
+ 'outpath': 'libvirtd_qemu.aug',
+ 'outvar': 'libvirtd_qemu_aug',
+ }
+ ]
+ foreach item : preprocess_config_files
+ if conf.has('WITH_NBDKIT')
+ preprocess_cmd = [ 'sed', '-e', '/[@]CUT_WITH_NBDKIT[@]/d', '-e', '/[@]END[@]/d', '@INPUT@' ]
+ else
+ preprocess_cmd = [ 'sed', '-e', '/[@]CUT_WITH_NBDKIT[@]/,/[@]END[@]/d', '@INPUT@' ]
+ endif
+
+ tmp = configure_file(
+ input: item['inpath'],
+ output: item['outpath'],
+ command: preprocess_cmd,
+ capture: true,
+ )
+ set_variable(item['outvar'], tmp)
+ endforeach
+
qemu_conf = configure_file(
- input: 'qemu.conf.in',
+ input: qemu_conf_tmp,
output: 'qemu.conf',
configuration: qemu_user_group_conf,
)
@@ -147,6 +179,7 @@ if conf.has('WITH_QEMU')
qemu_user_group_hack_conf = configuration_data({
'QEMU_USER': qemu_user,
'QEMU_GROUP': qemu_group,
+ 'USE_NBDKIT_DEFAULT': use_nbdkit_default.to_int(),
# This hack is necessary because the output file is going to be
# used as input for another configure_file() call later, which
# will take care of substituting @CONFIG@ with useful data
@@ -159,7 +192,7 @@ if conf.has('WITH_QEMU')
)
virt_conf_files += qemu_conf
- virt_aug_files += files('libvirtd_qemu.aug')
+ virt_aug_files += libvirtd_qemu_aug
virt_test_aug_files += {
'name': 'test_libvirtd_qemu.aug',
'aug': test_libvirtd_qemu_aug_tmp,
diff --git a/src/qemu/qemu.conf.in b/src/qemu/qemu.conf.in
index 6897e0f760..76633c6a2f 100644
--- a/src/qemu/qemu.conf.in
+++ b/src/qemu/qemu.conf.in
@@ -974,3 +974,16 @@
# "full" - both QEMU and its helper processes are placed into separate
# scheduling group
#sched_core = "none"
+
+@CUT_WITH_NBDKIT@
+# Using nbdkit to access remote disk sources
+#
+# If this is set then libvirt will use nbdkit to access remote disk sources
+# when available. nbdkit will export an NBD share to QEMU rather than having
+# QEMU attempt to access the remote server directly.
+#
+# Possible values are 0 or 1. Default value is @USE_NBDKIT_DEFAULT@. Please
+# note that the default might change in future releases.
+#
+# storage_use_nbdkit = @USE_NBDKIT_DEFAULT@
+@END@
diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c
index 513b5ebb1e..e1a1bc3655 100644
--- a/src/qemu/qemu_conf.c
+++ b/src/qemu/qemu_conf.c
@@ -285,6 +285,7 @@ virQEMUDriverConfig *virQEMUDriverConfigNew(bool privileged,
return NULL;
cfg->deprecationBehavior = g_strdup("none");
+ cfg->storageUseNbdkit = USE_NBDKIT_DEFAULT;
return g_steal_pointer(&cfg);
}
@@ -1065,6 +1066,19 @@ virQEMUDriverConfigLoadCapsFiltersEntry(virQEMUDriverConfig *cfg,
}
+#if WITH_NBDKIT
+static int
+virQEMUDriverConfigLoadStorageEntry(virQEMUDriverConfig *cfg,
+ virConf *conf)
+{
+ if (virConfGetValueBool(conf, "storage_use_nbdkit", &cfg->storageUseNbdkit) < 0)
+ return -1;
+
+ return 0;
+}
+#endif /* WITH_NBDKIT */
+
+
int virQEMUDriverConfigLoadFile(virQEMUDriverConfig *cfg,
const char *filename,
bool privileged)
@@ -1136,6 +1150,11 @@ int virQEMUDriverConfigLoadFile(virQEMUDriverConfig *cfg,
if (virQEMUDriverConfigLoadCapsFiltersEntry(cfg, conf) < 0)
return -1;
+#if WITH_NBDKIT
+ if (virQEMUDriverConfigLoadStorageEntry(cfg, conf) < 0)
+ return -1;
+#endif /* WITH_NBDKIT */
+
return 0;
}
diff --git a/src/qemu/qemu_conf.h b/src/qemu/qemu_conf.h
index 1a3ba3a0fb..36049b4bfa 100644
--- a/src/qemu/qemu_conf.h
+++ b/src/qemu/qemu_conf.h
@@ -230,6 +230,8 @@ struct _virQEMUDriverConfig {
char *deprecationBehavior;
+ bool storageUseNbdkit;
+
virQEMUSchedCore schedCore;
};
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 953808fcfe..0891ddabe9 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -10296,6 +10296,9 @@ qemuDomainPrepareStorageSourceNbdkit(virStorageSource *src,
{
g_autoptr(qemuNbdkitCaps) nbdkit = NULL;
+ if (!cfg->storageUseNbdkit)
+ return false;
+
if (virStorageSourceGetActualType(src) != VIR_STORAGE_TYPE_NETWORK)
return false;
diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c
index b2ea2191dc..74d20b9132 100644
--- a/tests/qemuxml2argvtest.c
+++ b/tests/qemuxml2argvtest.c
@@ -1125,7 +1125,6 @@ mymain(void)
DO_TEST_CAPS_LATEST("disk-cdrom-empty-network-invalid");
DO_TEST_CAPS_LATEST("disk-cdrom-bus-other");
DO_TEST_CAPS_LATEST("disk-cdrom-network");
- DO_TEST_CAPS_LATEST_NBDKIT("disk-cdrom-network-nbdkit", QEMU_NBDKIT_CAPS_PLUGIN_CURL);
DO_TEST_CAPS_LATEST("disk-cdrom-tray");
DO_TEST_CAPS_LATEST("disk-floppy");
DO_TEST_CAPS_LATEST("disk-floppy-q35");
@@ -1171,8 +1170,6 @@ mymain(void)
DO_TEST_CAPS_VER("disk-network-sheepdog", "6.0.0");
DO_TEST_CAPS_LATEST("disk-network-source-auth");
DO_TEST_CAPS_LATEST("disk-network-source-curl");
- DO_TEST_CAPS_LATEST_NBDKIT("disk-network-source-curl-nbdkit", QEMU_NBDKIT_CAPS_PLUGIN_CURL);
- DO_TEST_CAPS_LATEST_NBDKIT("disk-network-source-curl-nbdkit-backing", QEMU_NBDKIT_CAPS_PLUGIN_CURL);
DO_TEST_CAPS_LATEST("disk-network-nfs");
driver.config->vxhsTLS = 1;
driver.config->nbdTLSx509secretUUID = g_strdup("6fd3f62d-9fe7-4a4e-a869-7acd6376d8ea");
@@ -1183,13 +1180,10 @@ mymain(void)
DO_TEST_CAPS_LATEST("disk-network-tlsx509-nbd-hostname");
DO_TEST_CAPS_VER("disk-network-tlsx509-vxhs", "5.0.0");
DO_TEST_CAPS_LATEST("disk-network-http");
- DO_TEST_CAPS_LATEST_NBDKIT("disk-network-http-nbdkit", QEMU_NBDKIT_CAPS_PLUGIN_CURL);
VIR_FREE(driver.config->nbdTLSx509secretUUID);
VIR_FREE(driver.config->vxhsTLSx509secretUUID);
driver.config->vxhsTLS = 0;
DO_TEST_CAPS_LATEST("disk-network-ssh");
- DO_TEST_CAPS_LATEST_NBDKIT("disk-network-ssh-nbdkit", QEMU_NBDKIT_CAPS_PLUGIN_SSH);
- DO_TEST_CAPS_LATEST_NBDKIT("disk-network-ssh-password", QEMU_NBDKIT_CAPS_PLUGIN_SSH);
DO_TEST_CAPS_LATEST("disk-no-boot");
DO_TEST_CAPS_LATEST("disk-nvme");
DO_TEST_CAPS_VER("disk-vhostuser-numa", "4.2.0");
@@ -1259,6 +1253,15 @@ mymain(void)
DO_TEST_CAPS_LATEST("disk-geometry");
DO_TEST_CAPS_LATEST("disk-blockio");
+ driver.config->storageUseNbdkit = 1;
+ DO_TEST_CAPS_LATEST_NBDKIT("disk-cdrom-network-nbdkit", QEMU_NBDKIT_CAPS_PLUGIN_CURL);
+ DO_TEST_CAPS_LATEST_NBDKIT("disk-network-source-curl-nbdkit", QEMU_NBDKIT_CAPS_PLUGIN_CURL);
+ DO_TEST_CAPS_LATEST_NBDKIT("disk-network-source-curl-nbdkit-backing", QEMU_NBDKIT_CAPS_PLUGIN_CURL);
+ DO_TEST_CAPS_LATEST_NBDKIT("disk-network-http-nbdkit", QEMU_NBDKIT_CAPS_PLUGIN_CURL);
+ DO_TEST_CAPS_LATEST_NBDKIT("disk-network-ssh-nbdkit", QEMU_NBDKIT_CAPS_PLUGIN_SSH);
+ DO_TEST_CAPS_LATEST_NBDKIT("disk-network-ssh-password", QEMU_NBDKIT_CAPS_PLUGIN_SSH);
+ driver.config->storageUseNbdkit = 0;
+
DO_TEST_CAPS_VER("disk-virtio-scsi-reservations", "5.2.0");
DO_TEST_CAPS_LATEST("disk-virtio-scsi-reservations");
--
2.41.0
10 months
[PATCH 0/4] Support for dirty-limit live migration
by Hyman Huang
v2:
- mark the VIR_MIGRATE_DIRTY_LIMIT flag since 9.10.0
v1:
The dirty-limit functionality for live migration was
introduced since qemu>=8.1.
In the live migration scenario, it implements the force
convergence using the dirty-limit approach, which results
in better reliable read performance.
A straightforward dirty-limit capability for live migration
is added by this patchset. Users might not care about other
dirty-limit arguments like "x-vcpu-dirty-limit-period"
or "vcpu-dirty-limit," thus do not expose them to Libvirt
and Keep the default configurations and values in place.
For more details about dirty-limit, please see the following
reference:
https://lore.kernel.org/qemu-
devel/169024923116.19090.10825599068950039132-0(a)git.sr.ht/
Hyman Huang (4):
Add VIR_MIGRATE_DIRTY_LIMIT flag
qemu_migration: Implement VIR_MIGRATE_DIRTY_LIMIT flag
virsh: Add support for VIR_MIGRATE_DIRTY_LIMIT flag
NEWS: document support for dirty-limit live migration
NEWS.rst | 8 ++++++++
docs/manpages/virsh.rst | 10 +++++++++-
include/libvirt/libvirt-domain.h | 5 +++++
src/libvirt-domain.c | 8 ++++++++
src/qemu/qemu_migration.c | 8 ++++++++
src/qemu/qemu_migration.h | 1 +
src/qemu/qemu_migration_params.c | 6 ++++++
src/qemu/qemu_migration_params.h | 1 +
tools/virsh-domain.c | 10 ++++++++++
9 files changed, 56 insertions(+), 1 deletion(-)
--
2.39.1
10 months, 2 weeks
[PATCH v1] qemu_driver: Don't handle the EOF event if monitor changed
by tugy@chinatelecom.cn
From: Guoyi Tu <tugy(a)chinatelecom.cn>
Currently, libvirt creates a thread pool with only on thread to handle all
qemu monitor events for virtual machines, In the cases that if the thread
gets stuck while handling a monitor EOF event, such as unable to kill the
virtual machine process or release resources, the events of other virtual
machine will be also blocked, which will lead to the abnormal behavior of
other virtual machines.
For instance, when another virtual machine completes a shutdown operation
and the monitor EOF event has been queued but remains unprocessed, we
immediately destroy and start the virtual machine again, at a later time
when EOF event get processed, the processMonitorEOFEvent() will kill the
virtual machine that just started.
To address this issue, in the processMonitorEOFEvent(), we check whether
the current virtual machine's monitor object matches the one at the time
the event was generated. If they do not match, we immediately return.
Signed-off-by: Guoyi Tu <tugy(a)chinatelecom.cn>
Signed-off-by: dengpengcheng <dengpc12(a)chinatelecom.cn>
---
src/qemu/qemu_domain.c | 2 +-
src/qemu/qemu_driver.c | 11 +++++++++--
src/qemu/qemu_process.c | 2 +-
3 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 953808fcfe..435ee621df 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -11470,7 +11470,6 @@ qemuProcessEventFree(struct qemuProcessEvent *event)
case QEMU_PROCESS_EVENT_NETDEV_STREAM_DISCONNECTED:
case QEMU_PROCESS_EVENT_NIC_RX_FILTER_CHANGED:
case QEMU_PROCESS_EVENT_SERIAL_CHANGED:
- case QEMU_PROCESS_EVENT_MONITOR_EOF:
case QEMU_PROCESS_EVENT_GUEST_CRASHLOADED:
g_free(event->data);
break;
@@ -11484,6 +11483,7 @@ qemuProcessEventFree(struct qemuProcessEvent *event)
case QEMU_PROCESS_EVENT_UNATTENDED_MIGRATION:
case QEMU_PROCESS_EVENT_RESET:
case QEMU_PROCESS_EVENT_NBDKIT_EXITED:
+ case QEMU_PROCESS_EVENT_MONITOR_EOF:
case QEMU_PROCESS_EVENT_LAST:
break;
}
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index d00d2a27c6..57acafb48b 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -3854,7 +3854,8 @@ processJobStatusChangeEvent(virDomainObj *vm,
static void
processMonitorEOFEvent(virQEMUDriver *driver,
- virDomainObj *vm)
+ virDomainObj *vm,
+ qemuMonitor *mon)
{
qemuDomainObjPrivate *priv = vm->privateData;
int eventReason = VIR_DOMAIN_EVENT_STOPPED_SHUTDOWN;
@@ -3863,6 +3864,12 @@ processMonitorEOFEvent(virQEMUDriver *driver,
unsigned int stopFlags = 0;
virObjectEvent *event = NULL;
+ if (priv->mon != mon) {
+ VIR_DEBUG("Domain %p '%s' has been shutdown, ignoring EOF",
+ vm, vm->def->name);
+ return;
+ }
+
if (qemuProcessBeginStopJob(vm, VIR_JOB_DESTROY, true) < 0)
return;
@@ -4082,7 +4089,7 @@ static void qemuProcessEventHandler(void *data, void *opaque)
processJobStatusChangeEvent(vm, processEvent->data);
break;
case QEMU_PROCESS_EVENT_MONITOR_EOF:
- processMonitorEOFEvent(driver, vm);
+ processMonitorEOFEvent(driver, vm, processEvent->data);
break;
case QEMU_PROCESS_EVENT_PR_DISCONNECT:
processPRDisconnectEvent(vm);
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index f32e82bbd1..ff54e947fe 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -316,7 +316,7 @@ qemuProcessHandleMonitorEOF(qemuMonitor *mon,
}
qemuProcessEventSubmit(vm, QEMU_PROCESS_EVENT_MONITOR_EOF,
- 0, 0, NULL);
+ 0, 0, priv->mon);
/* We don't want this EOF handler to be called over and over while the
* thread is waiting for a job.
--
2.17.1
10 months, 4 weeks
[libvirt PATCH] test: remove redundant cpuTestGuestCPUID test
by Jonathon Jongsma
DO_TEST_CPUID(arch, host, json) is a multipart test. It consists of the
following tests:
- cpuTestHostCPUID()
- cpuTestGuestCPUID(with JSON_* flag)
- cpuTestCPUIDSignature()
- DO_TEST_JSON():
- if json==JSON_MODELS:
- cpuTestGuestCPUID(without JSON_* flag)
- cpuTestJSONCPUID()
- cputestJSONSignature()
Notice that for tests with json==JSON_MODELS, cpuTestGuestCPUID() is
actually called twice but with different arguments. The first one passes
JSON_MODELS to the test function, while the second one passes 0.
The main difference in behavior when calling cpuTestGuestCPUID() with or
without the flag is that in the first case, it parses the captured qemu
output from $ARCH-cpuid-$CPU.json. It extracts the cpu model list from
that JSON, and uses that to filter out possible cpu models to match.
In other words, it tries to match the cpu to a model that was supported
by the qemu version that was used to generate this JSON file. When it
finds a match, it generates a cpu definition and compares the xml form
of that definition with the file $ARCH-cpuid-$CPU-guest.xml.
When called without the JSON_MODELS flag, it simply attempts to match it
against the full libvirt cpu map and doesn't attempt to filter out any
matches based on the JSON qemu cpu model list. After it finds a match,
it generates an xml definition for the cpu and compares it to the same
file listed above. So if these two invocations disagree on the cpu match
(e.g. because libvirt has added a cpu model to its cpu map that matches
better than one that was supported by the version of qemu that generated
the JSON file) the test will fail.
This duplicate call to cpuTestGuestCPUID() was originally added in
commit 49c945a6f5c885394507f88086cc2f9461df7c27. The original
justification for that commit was to fix test failures when the Qemu
driver was disabled. But since DO_TEST_JSON() is #defined empty when
qemu is disabled, this particular invocation would not even be executed
in this scenario, so it doesn't seem relevant.
Signed-off-by: Jonathon Jongsma <jjongsma(a)redhat.com>
---
tests/cputest.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/tests/cputest.c b/tests/cputest.c
index b3253e3116..93cd0e12a7 100644
--- a/tests/cputest.c
+++ b/tests/cputest.c
@@ -993,10 +993,6 @@ mymain(void)
#if WITH_QEMU
# define DO_TEST_JSON(arch, host, json) \
do { \
- if (json == JSON_MODELS) { \
- DO_TEST(arch, cpuTestGuestCPUID, host, host, \
- NULL, NULL, 0, NULL, 0, 0); \
- } \
if (json != JSON_NONE) { \
DO_TEST(arch, cpuTestJSONCPUID, host, host, \
NULL, NULL, 0, NULL, json, 0); \
--
2.41.0
10 months, 4 weeks