[PATCH 00/16] Farewell rpcgen
by Daniel P. Berrangé
This series something I was hacking on a little while back in an
attempt to make our RPC layer more maintainable. There are many
aspects I'm unhappy about with current code
* When serializing a message we have no clue how big
it will be, but xdrmem_create wants a fixed size,
so we have to keep trying to serialize in a loop
making it bigger each time
* We don't control memory allocation/free'ing directly
so we can't do a virSecureErase on fields inside the
RPC message struct that handle secrets.
* The XDR API is generally unpleasant to use as it is
outside our virNetMessage object. Ideally we would
be reading/writing directly from/to the virNetMessage
buffer with APIs on virNetMessage,instead of indirectly
via a XDR object.
* We want more from XDR than it actually gives us. Our
XDR protocol files have annotations to express what
we want our code generator todo, or for ACLs. The
relationship between the structs and the message
numbers is implicit. Essentially we've defined our
own language indirectly via comments, and then
parse this with regexes which is horrid.
* The code rpcgen creates is poor quality which we have
to post-process to fix bugs/problems. It also lacks
support for modern features like g_auto.
Anyway, in a fit of rage I looked at the XDR RFC and thought..
This language is trivial, why do we need to outsource to
rpcgen and libtirpc instead of dealing with it directly.
This small series moves in that direction. It creates an
XDR language lexer and parser, and then a code generator
which emits code that is (nearly) identical to what rpcgen
would emit. This is sufficient to eliminate rpcgen usage
and support g_auto. Since we're still using libtirpc
at this stage we can be confident we're still doing the
same thing on the wire. I've got some unit tests too
with the rpcgen generation to validate stuff.
The next step is to change the code generator so that
instead of generating code for libtirpc APIs, it will
instead directly speak virNetMessage APIs. That would
give us full control over our RPC stack guaranteed to
be platform portable instead of fighting slight differences
in RPC libraries (eg xdr_quad vs xdr_int64 madness).
I was going to wait until I had written such code before
sending this series, but I've got diverted onto other more
important tasks. So here at least is what I have so far.
After that foundation is done, we are in a place where
we can actually do more innovative things. For example
we can directly extend the XDR protocol language if we
like, turning our magic comments into properly parsable
constructs.
With this, we would be in a position to replace our
Perl RPC client/server dispatch code generators with
something that is more supportable in Python. The python
code would work with properly represented objects and
formal parsers and not regexes and anonymous complex
perl data structures.
Daniel P. Berrangé (16):
rpcgen: drop type-puning workarounds
build-aux: skip E203 and W503 flake8 checks
build-aux: introduce 'black' tool for python formatting
rpcgen: add an XDR protocol lexer
rpcgen: add an XDR protocol abstract syntax tree
rpcgen: add an XDR protocol parser
rpcgen: define a visitor API for XDR protocol specs
rpcgen: add a C code generator for XDR protocol specs
rpcgen: add test case for XDR serialization
rpcgen: define entrypoint for running new rpcgen impl
build: switch over to new rpc generator code
rpcgen: add g_auto function support
rpc: use g_auto for client RPC return parameters
admin: use g_auto for client RPC return parameters
remote: use g_auto for client RPC return parameters
rpc: add helpers for XDR type serialization
build-aux/Makefile.in | 1 +
build-aux/meson.build | 5 +
build-aux/syntax-check.mk | 42 +-
libvirt.spec.in | 2 +-
meson.build | 13 +-
scripts/meson.build | 2 +
scripts/rpcgen/main.py | 90 ++
scripts/rpcgen/meson.build | 16 +
scripts/rpcgen/rpcgen/ast.py | 270 ++++++
scripts/rpcgen/rpcgen/generator.py | 509 ++++++++++++
scripts/rpcgen/rpcgen/lexer.py | 213 +++++
scripts/rpcgen/rpcgen/meson.build | 7 +
scripts/rpcgen/rpcgen/parser.py | 497 +++++++++++
scripts/rpcgen/rpcgen/visitor.py | 156 ++++
scripts/rpcgen/tests/demo.c | 495 +++++++++++
scripts/rpcgen/tests/demo.h | 264 ++++++
scripts/rpcgen/tests/demo.x | 127 +++
scripts/rpcgen/tests/meson.build | 20 +
scripts/rpcgen/tests/simple.x | 35 +
scripts/rpcgen/tests/test_demo.c | 782 ++++++++++++++++++
scripts/rpcgen/tests/test_demo_enum.bin | Bin 0 -> 4 bytes
.../tests/test_demo_enum_fixed_array.bin | Bin 0 -> 52 bytes
.../tests/test_demo_enum_pointer_null.bin | Bin 0 -> 4 bytes
.../tests/test_demo_enum_pointer_set.bin | Bin 0 -> 8 bytes
.../rpcgen/tests/test_demo_enum_scalar.bin | Bin 0 -> 4 bytes
.../test_demo_enum_variable_array_empty.bin | Bin 0 -> 4 bytes
.../test_demo_enum_variable_array_set.bin | Bin 0 -> 16 bytes
.../tests/test_demo_int_fixed_array.bin | Bin 0 -> 12 bytes
.../tests/test_demo_int_pointer_null.bin | Bin 0 -> 4 bytes
.../tests/test_demo_int_pointer_set.bin | Bin 0 -> 8 bytes
scripts/rpcgen/tests/test_demo_int_scalar.bin | Bin 0 -> 4 bytes
.../test_demo_int_variable_array_empty.bin | Bin 0 -> 4 bytes
.../test_demo_int_variable_array_set.bin | Bin 0 -> 16 bytes
.../tests/test_demo_opaque_fixed_array.bin | Bin 0 -> 12 bytes
.../test_demo_opaque_variable_array_empty.bin | Bin 0 -> 4 bytes
.../test_demo_opaque_variable_array_set.bin | Bin 0 -> 8 bytes
.../test_demo_string_variable_array_empty.bin | Bin 0 -> 4 bytes
.../test_demo_string_variable_array_set.bin | Bin 0 -> 12 bytes
scripts/rpcgen/tests/test_demo_struct.bin | Bin 0 -> 8 bytes
.../tests/test_demo_struct_fixed_array.bin | Bin 0 -> 136 bytes
.../tests/test_demo_struct_pointer_null.bin | Bin 0 -> 4 bytes
.../tests/test_demo_struct_pointer_set.bin | Bin 0 -> 12 bytes
.../rpcgen/tests/test_demo_struct_scalar.bin | 1 +
.../test_demo_struct_variable_array_empty.bin | Bin 0 -> 4 bytes
.../test_demo_struct_variable_array_set.bin | Bin 0 -> 28 bytes
.../tests/test_demo_test_struct_all_types.bin | Bin 0 -> 1752 bytes
scripts/rpcgen/tests/test_demo_union_case.bin | Bin 0 -> 8 bytes
.../rpcgen/tests/test_demo_union_default.bin | Bin 0 -> 8 bytes
.../tests/test_demo_union_fixed_array.bin | Bin 0 -> 168 bytes
.../tests/test_demo_union_no_default_case.bin | Bin 0 -> 8 bytes
.../tests/test_demo_union_pointer_null.bin | Bin 0 -> 4 bytes
.../tests/test_demo_union_pointer_set.bin | Bin 0 -> 12 bytes
.../rpcgen/tests/test_demo_union_scalar.bin | Bin 0 -> 8 bytes
.../test_demo_union_variable_array_empty.bin | Bin 0 -> 4 bytes
.../test_demo_union_variable_array_set.bin | Bin 0 -> 28 bytes
.../test_demo_union_void_default_case.bin | Bin 0 -> 8 bytes
.../test_demo_union_void_default_default.bin | 1 +
scripts/rpcgen/tests/test_generator.py | 60 ++
scripts/rpcgen/tests/test_lexer.py | 116 +++
scripts/rpcgen/tests/test_parser.py | 91 ++
src/admin/admin_remote.c | 50 +-
src/admin/meson.build | 8 +-
src/locking/meson.build | 8 +-
src/logging/meson.build | 8 +-
src/lxc/meson.build | 12 +-
src/remote/meson.build | 8 +-
src/remote/remote_driver.c | 754 ++++++-----------
src/rpc/gendispatch.pl | 60 +-
src/rpc/genprotocol.pl | 144 ----
src/rpc/meson.build | 9 +-
src/rpc/virnetmessage.c | 704 ++++++++++++++++
src/rpc/virnetmessage.h | 88 ++
72 files changed, 4897 insertions(+), 771 deletions(-)
create mode 100755 scripts/rpcgen/main.py
create mode 100644 scripts/rpcgen/meson.build
create mode 100644 scripts/rpcgen/rpcgen/ast.py
create mode 100644 scripts/rpcgen/rpcgen/generator.py
create mode 100644 scripts/rpcgen/rpcgen/lexer.py
create mode 100644 scripts/rpcgen/rpcgen/meson.build
create mode 100644 scripts/rpcgen/rpcgen/parser.py
create mode 100644 scripts/rpcgen/rpcgen/visitor.py
create mode 100644 scripts/rpcgen/tests/demo.c
create mode 100644 scripts/rpcgen/tests/demo.h
create mode 100644 scripts/rpcgen/tests/demo.x
create mode 100644 scripts/rpcgen/tests/meson.build
create mode 100644 scripts/rpcgen/tests/simple.x
create mode 100644 scripts/rpcgen/tests/test_demo.c
create mode 100644 scripts/rpcgen/tests/test_demo_enum.bin
create mode 100644 scripts/rpcgen/tests/test_demo_enum_fixed_array.bin
create mode 100644 scripts/rpcgen/tests/test_demo_enum_pointer_null.bin
create mode 100644 scripts/rpcgen/tests/test_demo_enum_pointer_set.bin
create mode 100644 scripts/rpcgen/tests/test_demo_enum_scalar.bin
create mode 100644 scripts/rpcgen/tests/test_demo_enum_variable_array_empty.bin
create mode 100644 scripts/rpcgen/tests/test_demo_enum_variable_array_set.bin
create mode 100644 scripts/rpcgen/tests/test_demo_int_fixed_array.bin
create mode 100644 scripts/rpcgen/tests/test_demo_int_pointer_null.bin
create mode 100644 scripts/rpcgen/tests/test_demo_int_pointer_set.bin
create mode 100644 scripts/rpcgen/tests/test_demo_int_scalar.bin
create mode 100644 scripts/rpcgen/tests/test_demo_int_variable_array_empty.bin
create mode 100644 scripts/rpcgen/tests/test_demo_int_variable_array_set.bin
create mode 100644 scripts/rpcgen/tests/test_demo_opaque_fixed_array.bin
create mode 100644 scripts/rpcgen/tests/test_demo_opaque_variable_array_empty.bin
create mode 100644 scripts/rpcgen/tests/test_demo_opaque_variable_array_set.bin
create mode 100644 scripts/rpcgen/tests/test_demo_string_variable_array_empty.bin
create mode 100644 scripts/rpcgen/tests/test_demo_string_variable_array_set.bin
create mode 100644 scripts/rpcgen/tests/test_demo_struct.bin
create mode 100644 scripts/rpcgen/tests/test_demo_struct_fixed_array.bin
create mode 100644 scripts/rpcgen/tests/test_demo_struct_pointer_null.bin
create mode 100644 scripts/rpcgen/tests/test_demo_struct_pointer_set.bin
create mode 100644 scripts/rpcgen/tests/test_demo_struct_scalar.bin
create mode 100644 scripts/rpcgen/tests/test_demo_struct_variable_array_empty.bin
create mode 100644 scripts/rpcgen/tests/test_demo_struct_variable_array_set.bin
create mode 100644 scripts/rpcgen/tests/test_demo_test_struct_all_types.bin
create mode 100644 scripts/rpcgen/tests/test_demo_union_case.bin
create mode 100644 scripts/rpcgen/tests/test_demo_union_default.bin
create mode 100644 scripts/rpcgen/tests/test_demo_union_fixed_array.bin
create mode 100644 scripts/rpcgen/tests/test_demo_union_no_default_case.bin
create mode 100644 scripts/rpcgen/tests/test_demo_union_pointer_null.bin
create mode 100644 scripts/rpcgen/tests/test_demo_union_pointer_set.bin
create mode 100644 scripts/rpcgen/tests/test_demo_union_scalar.bin
create mode 100644 scripts/rpcgen/tests/test_demo_union_variable_array_empty.bin
create mode 100644 scripts/rpcgen/tests/test_demo_union_variable_array_set.bin
create mode 100644 scripts/rpcgen/tests/test_demo_union_void_default_case.bin
create mode 100644 scripts/rpcgen/tests/test_demo_union_void_default_default.bin
create mode 100644 scripts/rpcgen/tests/test_generator.py
create mode 100644 scripts/rpcgen/tests/test_lexer.py
create mode 100644 scripts/rpcgen/tests/test_parser.py
delete mode 100755 src/rpc/genprotocol.pl
--
2.39.1
6 months, 1 week
Re: [libvirt] [Qemu-devel] Qemu migration with vhost-user-blk on top of local storage
by Stefan Hajnoczi
On Wed, Jan 09, 2019 at 06:23:42PM +0800, wuzhouhui wrote:
> Hi everyone,
>
> I'm working qemu with vhost target (e.g. spdk), and I attempt to migrate VM with
> 2 local storages. One local storage is a regular file, e.g. /tmp/c74.qcow2, and
> the other is a malloc bdev that spdk created. This malloc bdev will exported to
> VM via vhost-user-blk. When I execute following command:
>
> virsh migrate --live --persistent --unsafe --undefinesource --copy-storage-all \
> --p2p --auto-converge --verbose --desturi qemu+tcp://<uri>/system vm0
>
> The libvirt reports:
>
> qemu-2.12.1: error: internal error: unable to execute QEMU command \
> 'nbd-server-add': Cannot find device=drive-virtio-disk1 nor \
> node_name=drive-virtio-disk1
Please post your libvirt domain XML.
> Does it means that qemu with spdk on top of local storage don't support migration?
>
> QEMU: 2.12.1
> SPDK: 18.10
vhost-user-blk bypasses the QEMU block layer, so NBD storage migration
at the QEMU level will not work for the vhost-user-blk disk.
Stefan
6 months, 2 weeks
[libvirt] [PATCH v3] openvswitch: Add new port VLAN mode "dot1q-tunnel"
by luzhipeng@uniudc.com
From: ZhiPeng Lu <luzhipeng(a)uniudc.com>
Signed-off-by: ZhiPeng Lu <luzhipeng(a)uniudc.com>
---
v1->v2:
1. Fix "make syntax-check" failure
v2->v3:
1. remove other_config when updating vlan
docs/formatnetwork.html.in | 17 +++++++++--------
docs/schemas/networkcommon.rng | 1 +
src/conf/netdev_vlan_conf.c | 2 +-
src/util/virnetdevopenvswitch.c | 7 +++++++
src/util/virnetdevvlan.h | 1 +
5 files changed, 19 insertions(+), 9 deletions(-)
diff --git a/docs/formatnetwork.html.in b/docs/formatnetwork.html.in
index 363a72b..3c1ae62 100644
--- a/docs/formatnetwork.html.in
+++ b/docs/formatnetwork.html.in
@@ -688,16 +688,17 @@
</p>
<p>
For network connections using Open vSwitch it is also possible
- to configure 'native-tagged' and 'native-untagged' VLAN modes
+ to configure 'native-tagged' and 'native-untagged' and 'dot1q-tunnel'
+ VLAN modes.
<span class="since">Since 1.1.0.</span> This is done with the
- optional <code>nativeMode</code> attribute on
- the <code><tag></code> subelement: <code>nativeMode</code>
- may be set to 'tagged' or 'untagged'. The <code>id</code>
- attribute of the <code><tag></code> subelement
- containing <code>nativeMode</code> sets which VLAN is considered
- to be the "native" VLAN for this interface, and
+ optional <code>nativeMode</code> attribute on the
+ <code><tag></code> subelement: <code>nativeMode</code>
+ may be set to 'tagged' or 'untagged' or 'dot1q-tunnel'.
+ The <code>id</code> attribute of the <code><tag></code>
+ subelement containing <code>nativeMode</code> sets which VLAN is
+ considered to be the "native" VLAN for this interface, and
the <code>nativeMode</code> attribute determines whether or not
- traffic for that VLAN will be tagged.
+ traffic for that VLAN will be tagged or QinQ.
</p>
<p>
<code><vlan></code> elements can also be specified in
diff --git a/docs/schemas/networkcommon.rng b/docs/schemas/networkcommon.rng
index 2699555..11c48ff 100644
--- a/docs/schemas/networkcommon.rng
+++ b/docs/schemas/networkcommon.rng
@@ -223,6 +223,7 @@
<choice>
<value>tagged</value>
<value>untagged</value>
+ <value>dot1q-tunnel</value>
</choice>
</attribute>
</optional>
diff --git a/src/conf/netdev_vlan_conf.c b/src/conf/netdev_vlan_conf.c
index dff49c6..79710d9 100644
--- a/src/conf/netdev_vlan_conf.c
+++ b/src/conf/netdev_vlan_conf.c
@@ -29,7 +29,7 @@
#define VIR_FROM_THIS VIR_FROM_NONE
VIR_ENUM_IMPL(virNativeVlanMode, VIR_NATIVE_VLAN_MODE_LAST,
- "default", "tagged", "untagged")
+ "default", "tagged", "untagged", "dot1q-tunnel")
int
virNetDevVlanParse(xmlNodePtr node, xmlXPathContextPtr ctxt, virNetDevVlanPtr def)
diff --git a/src/util/virnetdevopenvswitch.c b/src/util/virnetdevopenvswitch.c
index 8fe06fd..9fec30b 100644
--- a/src/util/virnetdevopenvswitch.c
+++ b/src/util/virnetdevopenvswitch.c
@@ -91,6 +91,11 @@ virNetDevOpenvswitchConstructVlans(virCommandPtr cmd, virNetDevVlanPtr virtVlan)
virCommandAddArg(cmd, "vlan_mode=native-untagged");
virCommandAddArgFormat(cmd, "tag=%d", virtVlan->nativeTag);
break;
+ case VIR_NATIVE_VLAN_MODE_DOT1Q_TUNNEL:
+ virCommandAddArg(cmd, "vlan_mode=dot1q-tunnel");
+ virCommandAddArg(cmd, "other_config:qinq-ethtype=802.1q");
+ virCommandAddArgFormat(cmd, "tag=%d", virtVlan->nativeTag);
+ break;
case VIR_NATIVE_VLAN_MODE_DEFAULT:
default:
break;
@@ -504,6 +509,8 @@ int virNetDevOpenvswitchUpdateVlan(const char *ifname,
"--", "--if-exists", "clear", "Port", ifname, "tag",
"--", "--if-exists", "clear", "Port", ifname, "trunk",
"--", "--if-exists", "clear", "Port", ifname, "vlan_mode",
+ "--", "--if-exists", "remove", "Port", ifname, "other_config",
+ "qinq-ethtype", NULL,
"--", "--if-exists", "set", "Port", ifname, NULL);
if (virNetDevOpenvswitchConstructVlans(cmd, virtVlan) < 0)
diff --git a/src/util/virnetdevvlan.h b/src/util/virnetdevvlan.h
index be85f59..0667f9d 100644
--- a/src/util/virnetdevvlan.h
+++ b/src/util/virnetdevvlan.h
@@ -29,6 +29,7 @@ typedef enum {
VIR_NATIVE_VLAN_MODE_DEFAULT = 0,
VIR_NATIVE_VLAN_MODE_TAGGED,
VIR_NATIVE_VLAN_MODE_UNTAGGED,
+ VIR_NATIVE_VLAN_MODE_DOT1Q_TUNNEL,
VIR_NATIVE_VLAN_MODE_LAST
} virNativeVlanMode;
--
1.8.3.1
6 months, 2 weeks
[libvirt] [PATCH] Fix compile error for stable 1.2.9
by Yang hongyang
Seems a backport miss. An extra member is passed to struct
virLXCBasicMountInfo.
Signed-off-by: Yang hongyang <hongyang.yang(a)easystack.cn>
---
src/lxc/lxc_container.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
index 28dabec..1c65fa9 100644
--- a/src/lxc/lxc_container.c
+++ b/src/lxc/lxc_container.c
@@ -760,7 +760,7 @@ typedef struct {
static const virLXCBasicMountInfo lxcBasicMounts[] = {
{ "proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false },
- { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false },
+ { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false },
{ "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false },
{ "securityfs", "/sys/kernel/security", "securityfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, true, true },
#if WITH_SELINUX
--
1.7.1
6 months, 2 weeks
[libvirt] Supporting vhost-net and macvtap in libvirt for QEMU
by Anthony Liguori
Disclaimer: I am neither an SR-IOV nor a vhost-net expert, but I've CC'd
people that are who can throw tomatoes at me for getting bits wrong :-)
I wanted to start a discussion about supporting vhost-net in libvirt.
vhost-net has not yet been merged into qemu but I expect it will be soon
so it's a good time to start this discussion.
There are two modes worth supporting for vhost-net in libvirt. The
first mode is where vhost-net backs to a tun/tap device. This is
behaves in very much the same way that -net tap behaves in qemu today.
Basically, the difference is that the virtio backend is in the kernel
instead of in qemu so there should be some performance improvement.
Current, libvirt invokes qemu with -net tap,fd=X where X is an already
open fd to a tun/tap device. I suspect that after we merge vhost-net,
libvirt could support vhost-net in this mode by just doing -net
vhost,fd=X. I think the only real question for libvirt is whether to
provide a user visible switch to use vhost or to just always use vhost
when it's available and it makes sense. Personally, I think the later
makes sense.
The more interesting invocation of vhost-net though is one where the
vhost-net device backs directly to a physical network card. In this
mode, vhost should get considerably better performance than the current
implementation. I don't know the syntax yet, but I think it's
reasonable to assume that it will look something like -net
tap,dev=eth0. The effect will be that eth0 is dedicated to the guest.
On most modern systems, there is a small number of network devices so
this model is not all that useful except when dealing with SR-IOV
adapters. In that case, each physical device can be exposed as many
virtual devices (VFs). There are a few restrictions here though. The
biggest is that currently, you can only change the number of VFs by
reloading a kernel module so it's really a parameter that must be set at
startup time.
I think there are a few ways libvirt could support vhost-net in this
second mode. The simplest would be to introduce a new tag similar to
<source network='br0'>. In fact, if you probed the device type for the
network parameter, you could probably do something like <source
network='eth0'> and have it Just Work.
Another model would be to have libvirt see an SR-IOV adapter as a
network pool whereas it handled all of the VF management. Considering
how inflexible SR-IOV is today, I'm not sure whether this is the best model.
Has anyone put any more thought into this problem or how this should be
modeled in libvirt? Michael, could you share your current thinking for
-net syntax?
--
Regards,
Anthony Liguori
6 months, 2 weeks
[PATCH libvirt v1 0/3] Ensure full early console access with libvirt
by Marc Hartmayer
Currently, early console output may be lost, e.g. if starting a guest with
`virsh start --console` guest, which can make debugging of early failures very
difficult
(like zipl messages or disabled wait conditions happening early). This is
because QEMU may emit serial console output before the libvirt console client
starts to consume data from the pts. This can be prevented by starting the guest
in paused state, connect to the console and then resume the guest.
Note: There is still a problem in QEMU itself, see QEMU patch series `[PATCH]
chardev/char-pty: Avoid losing bytes when the other side just (re-)connected`
[1]
Changelog:
RFCv1->v1:
+ rebased on current master
+ worked in comments from Daniel
[1] https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg02725.html
Marc Hartmayer (3):
virsh: add `console --resume` support
Improve `virsh start --console` behavior
Improve `virsh create --console` behavior
tools/virsh-console.c | 8 ++++
tools/virsh-console.h | 1 +
tools/virsh-domain.c | 94 ++++++++++++++++++++++++++++++++-----------
3 files changed, 80 insertions(+), 23 deletions(-)
base-commit: dd403f8873cf8de7675b89ed757a4228af7bc05e
--
2.34.1
6 months, 2 weeks
[libvirt RFCv11 00/33] multifd save restore prototype
by Claudio Fontana
This is v11 of the multifd save prototype, which focuses on saving
to a single file instead of requiring multiple separate files for
multifd channels.
This series demonstrates a way to save and restore from a single file
by using interleaved channels of a size equal to the transfer buffer
size (1MB), and relying on UNIX holes to avoid wasting physical disk
space due to channels of different size.
KNOWN ISSUES:
a) still applies only to save/restore (no managed save etc)
b) this is not not done in QEMU, where it could be possible to teach
QEMU to migrate directly to a file or block device in a
block-aligned way by altering the migration stream code and the
state migration code of all devices.
changes from v10:
* virfile: add new API virFileDiskCopyChannel, which extends the
existing virFileDiskCopy to work with parallel channels in the file.
* drop use of virthread API, use GLIB for threads.
* pass only a single FD to the multifd-helper, which will then open
additional FDs as required for the multithreaded I/O.
* simplify virQEMUSaveFd API, separating the initialization from the
addition of extra channels.
* adapt all documentation to mention a single file instead of multiple.
* remove the "Lim" versions of virFileDirectRead and Write, they are
not needed.
---
changes from v9:
* exposed virFileDirectAlign
* separated the >= 2 QEMU_SAVE_VERSION change in own patch
* reworked the write code to add the alignment padding to the
data_len, making the on disk format compatible when loaded
from an older libvirt.
* reworked the read code to use direct I/O APIs only for actual
direct I/O file descriptors, so as to make old images work
with newer libvirt.
---
changes from v8:
* rebased on master
* reordered patches to add more upstreamable content at the start
* split introduction of virQEMUSaveFd, so the first part is multifd-free
* new virQEMUSaveDataRead as a mirror of virQEMUSaveDataWrite
* introduced virFileDirect API, using it in virFileDisk operations and
for virQEMUSaveRead and virQEMUSaveWrite
---
changes from v7:
* [ base params API and iohelper refactoring upstreamed ]
* extended the QEMU save image format more, to record the nr
of multifd channels on save. Made the data header struct packed.
* removed --parallel-connections from the restore command, as now
it is useless due to QEMU save image format extension.
* separate out patches to expose migration_params APIs to saveimage,
including qemuMigrationParamsSetString, SetCap, SetInt.
* fixed bugs in the ImageOpen patch (missing saveFd init), removed
some whitespace, and fixed some convoluted code paths for return
value -3.
---
changes from v6:
* improved error path handling, with error messages and especially
cancellation of qemu process on error during restore.
* split patches more and reordered them to keep general refactoring
at the beginning before the --parallel stuff is introduced.
* improved multifd compression support, including adding an enum
and extending the QEMU save image format to record the compression
used on save, and pick it up automatically on restore.
---
changes from v4:
* runIO renamed to virFileDiskCopy and rethought arguments
* renamed new APIs from ...ParametersFlags to ...Params
* introduce the new virDomainSaveParams and virDomainRestoreParams
without any additional parameters, so they can be upstreamed first.
* solved the issue in the gendispatch.pl script generating code that
was missing the conn parameter.
---
changes from v3:
* reordered series to have all helper-related change at the start
* solved all reported issues from ninja test, including documentation
* fixed most broken migration capabilities code (still imperfect likely)
* added G_GNUC_UNUSED as needed
* after multifd restore, added what I think were the missing operations:
qemuProcessRefreshState(),
qemuProcessStartCPUs() - most importantly,
virDomainObjSave()
The domain now starts running after restore without further encouragement
* removed the sleep(10) from the multifd-helper
---
changes from v2:
* added ability to restore the VM from disk using multifd
* fixed the multifd-helper to work in both directions,
assuming the need to listen for save, and connect for restore.
* fixed a large number of bugs, and probably introduced some :-)
---
Claudio Fontana (33):
virfile: introduce virFileDirect APIs
virfile: use virFileDirect API in runIOCopy
qemu: saveimage: rework image read/write to be O_DIRECT friendly
qemu: saveimage: assume future formats will also support compression
virfile: virFileDiskCopy: prepare for O_DIRECT files without wrapper
qemu: saveimage: introduce virQEMUSaveFd
qemu: saveimage: convert qemuSaveImageCreate to use virQEMUSaveFd
qemu: saveimage: convert qemuSaveImageOpen to use virQEMUSaveFd
tools: prepare doSave to use parameters
tools: prepare cmdRestore to use parameters
libvirt: add new VIR_DOMAIN_SAVE_PARALLEL flag and parameter
qemu: add stub support for VIR_DOMAIN_SAVE_PARALLEL in save
qemu: add stub support for VIR_DOMAIN_SAVE_PARALLEL in restore
virfile: add new API virFileDiskCopyChannel
multifd-helper: new helper for parallel save/restore
qemu: saveimage: update virQEMUSaveFd struct for parallel save
qemu: saveimage: wire up saveimage code with the multifd helper
qemu: capabilities: add multifd to the probed migration capabilities
qemu: saveimage: add multifd related fields to save format
qemu: migration_params: add APIs to set Int and Cap
qemu: migration: implement qemuMigrationSrcToFilesMultiFd for save
qemu: add parameter to qemuMigrationDstRun to skip waiting
qemu: implement qemuSaveImageLoadMultiFd for restore
tools: add parallel parameter to virsh save command
tools: add parallel parameter to virsh restore command
qemu: add migration parameter multifd-compression
libvirt: add new VIR_DOMAIN_SAVE_PARAM_PARALLEL_COMPRESSION
qemu: saveimage: add parallel compression argument to ImageCreate
qemu: saveimage: add stub support for multifd compression parameter
qemu: migration: expose qemuMigrationParamsSetString
qemu: saveimage: implement multifd-compression in parallel save
qemu: saveimage: restore compressed parallel images
tools: add parallel-compression parameter to virsh save command
docs/manpages/virsh.rst | 26 +-
include/libvirt/libvirt-domain.h | 24 +
po/POTFILES | 1 +
src/libvirt_private.syms | 6 +
src/qemu/qemu_capabilities.c | 4 +
src/qemu/qemu_capabilities.h | 2 +
src/qemu/qemu_driver.c | 146 ++--
src/qemu/qemu_migration.c | 160 ++--
src/qemu/qemu_migration.h | 16 +-
src/qemu/qemu_migration_params.c | 71 +-
src/qemu/qemu_migration_params.h | 15 +
src/qemu/qemu_process.c | 3 +-
src/qemu/qemu_process.h | 5 +-
src/qemu/qemu_saveimage.c | 703 +++++++++++++-----
src/qemu/qemu_saveimage.h | 69 +-
src/qemu/qemu_snapshot.c | 6 +-
src/util/iohelper.c | 3 +
src/util/meson.build | 16 +
src/util/multifd-helper.c | 359 +++++++++
src/util/virfile.c | 391 +++++++---
src/util/virfile.h | 11 +
.../caps_4.0.0.aarch64.xml | 1 +
.../qemucapabilitiesdata/caps_4.0.0.ppc64.xml | 1 +
.../caps_4.0.0.riscv32.xml | 1 +
.../caps_4.0.0.riscv64.xml | 1 +
.../qemucapabilitiesdata/caps_4.0.0.s390x.xml | 1 +
.../caps_4.0.0.x86_64.xml | 1 +
.../caps_4.1.0.x86_64.xml | 1 +
.../caps_4.2.0.aarch64.xml | 1 +
.../qemucapabilitiesdata/caps_4.2.0.ppc64.xml | 1 +
.../qemucapabilitiesdata/caps_4.2.0.s390x.xml | 1 +
.../caps_4.2.0.x86_64.xml | 1 +
.../caps_5.0.0.aarch64.xml | 2 +
.../qemucapabilitiesdata/caps_5.0.0.ppc64.xml | 2 +
.../caps_5.0.0.riscv64.xml | 2 +
.../caps_5.0.0.x86_64.xml | 2 +
.../qemucapabilitiesdata/caps_5.1.0.sparc.xml | 2 +
.../caps_5.1.0.x86_64.xml | 2 +
.../caps_5.2.0.aarch64.xml | 2 +
.../qemucapabilitiesdata/caps_5.2.0.ppc64.xml | 2 +
.../caps_5.2.0.riscv64.xml | 2 +
.../qemucapabilitiesdata/caps_5.2.0.s390x.xml | 2 +
.../caps_5.2.0.x86_64.xml | 2 +
.../caps_6.0.0.aarch64.xml | 2 +
.../qemucapabilitiesdata/caps_6.0.0.s390x.xml | 2 +
.../caps_6.0.0.x86_64.xml | 2 +
.../caps_6.1.0.x86_64.xml | 2 +
.../caps_6.2.0.aarch64.xml | 2 +
.../qemucapabilitiesdata/caps_6.2.0.ppc64.xml | 2 +
.../caps_6.2.0.x86_64.xml | 2 +
.../caps_7.0.0.aarch64.xml | 2 +
.../qemucapabilitiesdata/caps_7.0.0.ppc64.xml | 2 +
.../caps_7.0.0.x86_64.xml | 2 +
.../caps_7.1.0.x86_64.xml | 2 +
tools/virsh-domain.c | 101 ++-
55 files changed, 1748 insertions(+), 445 deletions(-)
create mode 100644 src/util/multifd-helper.c
--
2.26.2
6 months, 2 weeks
[PATCH v2] lxc: fix lxcContainerMountAllFS() DEREF_BEFORE_CHECK
by Dmitry Frolov
vmDef->fss[i]->src->path may be NULL,
so check is needed before passing it to VIR_DEBUG.
Also removed checking vmDef->fss[i]->src for NULL, since it may not be NULL.
Fixes: 57487085dc ("lxc: don't try to reference NULL when mounting filesystems")
Signed-off-by: Dmitry Frolov <frolov(a)swemel.ru>
---
src/lxc/lxc_container.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
index 21220661f7..9601061b23 100644
--- a/src/lxc/lxc_container.c
+++ b/src/lxc/lxc_container.c
@@ -1467,12 +1467,12 @@ static int lxcContainerMountAllFS(virDomainDef *vmDef,
if (STREQ(vmDef->fss[i]->dst, "/"))
continue;
- VIR_DEBUG("Mounting '%s' -> '%s'", vmDef->fss[i]->src->path, vmDef->fss[i]->dst);
+ VIR_DEBUG("Mounting '%s' -> '%s'", NULLSTR(vmDef->fss[i]->src->path), vmDef->fss[i]->dst);
if (lxcContainerResolveSymlinks(vmDef->fss[i], false) < 0)
return -1;
- if (!(vmDef->fss[i]->src && vmDef->fss[i]->src->path &&
+ if (!(vmDef->fss[i]->src->path &&
STRPREFIX(vmDef->fss[i]->src->path, vmDef->fss[i]->dst)) &&
lxcContainerUnmountSubtree(vmDef->fss[i]->dst, false) < 0)
return -1;
--
2.34.1
6 months, 3 weeks
[libvirt PATCH] logging: lockdown the systemd service configuration
by Daniel P. Berrangé
The 'systemd-analyze security' command looks at the unit file
configuration and reports on any settings which increase the
attack surface for the daemon. Since most systemd units are
fairly minimalist, this is generally informing us about settings
that we never put any thought into using before.
In its current configuration it reports
# systemd-analyze security virtlogd.service
...snip...
→ Overall exposure level for virtlogd.service: 9.6 UNSAFE 😨
which is pretty terrible as a score.
If we apply all of the recommendations that appear possible
without (knowingly) breaking functionality it reports:
# systemd-analyze security virtlogd.service
...snip...
→ Overall exposure level for virtlogd.service: 2.2 OK 🙂
which is a pretty decent improvement.
Some of the settings we would like to enable require a systemd
version that is newer than that available in our oldest distro
target - RHEL-8 at v239.
NB, RestrictSUIDSGID is technically newer than 239, but RHEL-8
backported it, and other distros we target have it by default.
Remaining recommendations are
✗ CapabilityBoundingSet=~CAP_(DAC_*|FOWNER|IPC_OWNER)
We block FOWNER/IPC_OWNER, but can't block the two DAC
capabilities. Historically apps/users might point QEMU
to log files in $HOME, pre-created with their own user
ID.
✗ IPAddressDeny=
Not required since RestrictAddressFamilies blocks IP
usage. Ignoring this avoids the overhead of creating
a traffic filter than will never be used.
✗ NoNewPrivileges=
Highly desirable, but cannot enable it yet, because it
will block the ability to transition to the virtlogd_t
SELinux domain during execve. The SELinux policy needs
fixing to permit this transition under NNP first.
✗ PrivateTmp=
There is a decent chance people have VMs configured
with a serial port logfile pointing at /tmp. We would
cause a regression to use private /tmp for logging
✗ PrivateUsers=
This would put virtlogd inside a user namespace where
its root is in fact unprivileged. Same problem as the
User= setting below
✗ ProcSubset=
Libraries we link to might read certain non-PID related
files from /proc
✗ ProtectClock=
Requires v245
✗ ProtectHome=
Same problem as PrivateTmp=. There's a decent chance
that someone has a VM configured to write a logfile
to /home
✗ ProtectHostname=
Requires v241
✗ ProtectKernelLogs
Requires v244
✗ ProtectProc
Requires v247
✗ ProtectSystem=
We only set it to 'full', as 'strict' is not viable for
our required usage
✗ RootDirectory=/RootImage=
We are not capable of running inside a custom chroot
given needs to write log files to arbitrary places
✗ RestrictAddressFamilies=~AF_UNIX
We need AF_UNIX to communicate with other libvirt daemons
✗ SystemCallFilter=~@resources
We link to libvirt.so which links to libnuma.so which has
a constructor that calls set_mempolicy. This is highly
undesirable todo during a constructor.
✗ User=/DynamicUser=
This is highly desirable, but we currently read/write
logs as root, and directories we're told to write into
could be anywhere. So using a non-root user would have
a major risk of regressions for applications and also
have upgrade implications
Signed-off-by: Daniel P. Berrangé <berrange(a)redhat.com>
---
src/logging/virtlogd.service.in | 94 +++++++++++++++++++++++++++++++++
1 file changed, 94 insertions(+)
diff --git a/src/logging/virtlogd.service.in b/src/logging/virtlogd.service.in
index 8e245ddb43..9e3838ff34 100644
--- a/src/logging/virtlogd.service.in
+++ b/src/logging/virtlogd.service.in
@@ -20,5 +20,99 @@ OOMScoreAdjust=-900
# per systemd recommendations
LimitNOFILE=1024:524288
+CapabilityBoundingSet=~CAP_AUDIT_CONTROL
+CapabilityBoundingSet=~CAP_AUDIT_READ
+CapabilityBoundingSet=~CAP_AUDIT_WRITE
+CapabilityBoundingSet=~CAP_BLOCK_SUSPEND
+CapabilityBoundingSet=~CAP_CHOWN
+# Mgmt app/user might have pre-created log files that we're
+# told to open and write to, or be storing them in otherwise
+# inaccessible locations like $HOME. So we need to ignore
+# DAC permission checks.
+#CapabilityBoundingSet=~CAP_DAC_OVERRIDE
+#CapabilityBoundingSet=~CAP_DAC_READ_SEARCH
+CapabilityBoundingSet=~CAP_FOWNER
+CapabilityBoundingSet=~CAP_FSETID
+CapabilityBoundingSet=~CAP_IPC_LOCK
+CapabilityBoundingSet=~CAP_IPC_OWNER
+CapabilityBoundingSet=~CAP_KILL
+CapabilityBoundingSet=~CAP_LEASE
+CapabilityBoundingSet=~CAP_LINUX_IMMUTABLE
+CapabilityBoundingSet=~CAP_MAC_ADMIN
+CapabilityBoundingSet=~CAP_MAC_OVERRIDE
+CapabilityBoundingSet=~CAP_MKNOD
+CapabilityBoundingSet=~CAP_NET_ADMIN
+CapabilityBoundingSet=~CAP_NET_BIND_SERVICE
+CapabilityBoundingSet=~CAP_NET_BROADCAST
+CapabilityBoundingSet=~CAP_NET_RAW
+CapabilityBoundingSet=~CAP_SETFCAP
+CapabilityBoundingSet=~CAP_SETPCAP
+CapabilityBoundingSet=~CAP_SETGID
+CapabilityBoundingSet=~CAP_SETUID
+CapabilityBoundingSet=~CAP_SYSLOG
+CapabilityBoundingSet=~CAP_SYS_ADMIN
+CapabilityBoundingSet=~CAP_SYS_BOOT
+CapabilityBoundingSet=~CAP_SYS_CHROOT
+CapabilityBoundingSet=~CAP_SYS_MODULE
+CapabilityBoundingSet=~CAP_SYS_NICE
+CapabilityBoundingSet=~CAP_SYS_PACCT
+CapabilityBoundingSet=~CAP_SYS_PTRACE
+CapabilityBoundingSet=~CAP_SYS_RAWIO
+CapabilityBoundingSet=~CAP_SYS_RESOURCE
+CapabilityBoundingSet=~CAP_SYS_TIME
+CapabilityBoundingSet=~CAP_SYS_TTY_CONFIG
+CapabilityBoundingSet=~CAP_WAKE_ALARM
+
+LockPersonality=true
+MemoryDenyWriteExecute=true
+# Cannot enable this as it prevents transitioning to
+# the confined SELinux virtlogd_t domain on execve
+# unless we modify the policy to allow this.
+#NoNewPrivileges=true
+PrivateDevices=true
+PrivateMounts=true
+PrivateNetwork=true
+# XXX someone could configure QEMU to log a serial port to an
+# arbitrary directory, including /tmp, even if this is ill-advised
+#PrivateTmp=true
+# Not until oldest build target has systemd >= v245
+#ProtectClock=true
+ProtectControlGroups=true
+# Not until oldest build target has systemd >= v241
+#ProtectHostname=true
+# Not until oldest build target has systemd >= v244
+#ProtectKernelLogs=true
+ProtectKernelModules=true
+ProtectKernelTunables=true
+# Not until oldest build target has systemd >= v247
+#ProtectProc=invisible
+ProtectSystem=full
+RestrictAddressFamilies=AF_UNIX
+RestrictNamespaces=~cgroup
+RestrictNamespaces=~ipc
+RestrictNamespaces=~mnt
+RestrictNamespaces=~net
+RestrictNamespaces=~pid
+RestrictNamespaces=~user
+RestrictNamespaces=~uts
+RestrictRealtime=true
+RestrictSUIDSGID=true
+SystemCallArchitectures=native
+SystemCallFilter=~@clock
+SystemCallFilter=~@debug
+SystemCallFilter=~@module
+SystemCallFilter=~@mount
+SystemCallFilter=~@raw-io
+SystemCallFilter=~@reboot
+SystemCallFilter=~@swap
+SystemCallFilter=~@privileged
+# Unfortunately we link to libnuma via libvirt.so which
+# has a constructor that runs unconditionally that invokes
+# set_mempolicy()
+#SystemCallFilter=~@resources
+SystemCallFilter=~@cpu-emulation
+SystemCallFilter=~@obsolete
+UMask=077
+
[Install]
Also=virtlogd.socket
--
2.41.0
6 months, 3 weeks