[libvirt] Supporting vhost-net and macvtap in libvirt for QEMU
by Anthony Liguori
Disclaimer: I am neither an SR-IOV nor a vhost-net expert, but I've CC'd
people that are who can throw tomatoes at me for getting bits wrong :-)
I wanted to start a discussion about supporting vhost-net in libvirt.
vhost-net has not yet been merged into qemu but I expect it will be soon
so it's a good time to start this discussion.
There are two modes worth supporting for vhost-net in libvirt. The
first mode is where vhost-net backs to a tun/tap device. This is
behaves in very much the same way that -net tap behaves in qemu today.
Basically, the difference is that the virtio backend is in the kernel
instead of in qemu so there should be some performance improvement.
Current, libvirt invokes qemu with -net tap,fd=X where X is an already
open fd to a tun/tap device. I suspect that after we merge vhost-net,
libvirt could support vhost-net in this mode by just doing -net
vhost,fd=X. I think the only real question for libvirt is whether to
provide a user visible switch to use vhost or to just always use vhost
when it's available and it makes sense. Personally, I think the later
makes sense.
The more interesting invocation of vhost-net though is one where the
vhost-net device backs directly to a physical network card. In this
mode, vhost should get considerably better performance than the current
implementation. I don't know the syntax yet, but I think it's
reasonable to assume that it will look something like -net
tap,dev=eth0. The effect will be that eth0 is dedicated to the guest.
On most modern systems, there is a small number of network devices so
this model is not all that useful except when dealing with SR-IOV
adapters. In that case, each physical device can be exposed as many
virtual devices (VFs). There are a few restrictions here though. The
biggest is that currently, you can only change the number of VFs by
reloading a kernel module so it's really a parameter that must be set at
startup time.
I think there are a few ways libvirt could support vhost-net in this
second mode. The simplest would be to introduce a new tag similar to
<source network='br0'>. In fact, if you probed the device type for the
network parameter, you could probably do something like <source
network='eth0'> and have it Just Work.
Another model would be to have libvirt see an SR-IOV adapter as a
network pool whereas it handled all of the VF management. Considering
how inflexible SR-IOV is today, I'm not sure whether this is the best model.
Has anyone put any more thought into this problem or how this should be
modeled in libvirt? Michael, could you share your current thinking for
-net syntax?
--
Regards,
Anthony Liguori
1 year, 1 month
[libvirt] Libvirt multi queue support
by Naor Shlomo
Hello experts,
Could anyone please tell me if Multi Queue it fully supported in Libvirt and if so what version contains it?
Thanks,
Naor
8 years, 6 months
[libvirt] securityselinuxlabeltest test fails on v1.2.5
by Scott Sullivan
I am trying to build v1.2.5-maint, however I have one test failing
causing the build to fail:
TEST: securityselinuxlabeltest
!!!. 4 FAIL
PASS: virsh-undefine
=======================================
1 of 112 tests failed
Please report to libvir-list(a)redhat.com
=======================================
make[2]: *** [check-TESTS] Error 1
make[2]: Leaving directory `/home/rpmbuild/packages/libvirt/tests'
make[1]: *** [check-am] Error 2
make[1]: Leaving directory `/home/rpmbuild/packages/libvirt/tests'
make: *** [check-recursive] Error 1
error: Bad exit status from /var/tmp/rpm-tmp.UGNUaq (%build)
Is anyone else having this problem? Im building on CentOS 6.5. Im happy
to provide any further information as needed.
9 years, 8 months
[libvirt] [PATCH] LXC: create a bind mount for sysfs when enable userns but disable netns
by Chen Hanxiao
kernel commit 7dc5dbc879bd0779924b5132a48b731a0bc04a1e
forbid us doing a fresh mount for sysfs
when enable userns but disable netns.
This patch will create a bind mount in this senario.
Signed-off-by: Chen Hanxiao <chenhanxiao(a)cn.fujitsu.com>
---
src/lxc/lxc_container.c | 44 +++++++++++++++++++++++++++++++++-----------
1 file changed, 33 insertions(+), 11 deletions(-)
diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
index 4d89677..8a27215 100644
--- a/src/lxc/lxc_container.c
+++ b/src/lxc/lxc_container.c
@@ -815,10 +815,13 @@ static int lxcContainerSetReadOnly(void)
}
-static int lxcContainerMountBasicFS(bool userns_enabled)
+static int lxcContainerMountBasicFS(bool userns_enabled,
+ bool netns_disabled)
{
size_t i;
int rc = -1;
+ char* mnt_src = NULL;
+ int mnt_mflags;
VIR_DEBUG("Mounting basic filesystems");
@@ -826,8 +829,25 @@ static int lxcContainerMountBasicFS(bool userns_enabled)
bool bindOverReadonly;
virLXCBasicMountInfo const *mnt = &lxcBasicMounts[i];
+ /* When enable userns but disable netns, kernel will
+ * forbid us doing a new fresh mount for sysfs.
+ * So we had to do a bind mount for sysfs instead.
+ */
+ if (userns_enabled && netns_disabled &&
+ STREQ(mnt->src, "sysfs")) {
+ if (VIR_STRDUP(mnt_src, "/sys") < 0) {
+ goto cleanup;
+ }
+ mnt_mflags = MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY|MS_BIND;
+ } else {
+ if (VIR_STRDUP(mnt_src, mnt->src) < 0) {
+ goto cleanup;
+ }
+ mnt_mflags = mnt->mflags;
+ }
+
VIR_DEBUG("Processing %s -> %s",
- mnt->src, mnt->dst);
+ mnt_src, mnt->dst);
if (mnt->skipUnmounted) {
char *hostdir;
@@ -856,7 +876,7 @@ static int lxcContainerMountBasicFS(bool userns_enabled)
if (virFileMakePath(mnt->dst) < 0) {
virReportSystemError(errno,
_("Failed to mkdir %s"),
- mnt->src);
+ mnt_src);
goto cleanup;
}
@@ -867,24 +887,24 @@ static int lxcContainerMountBasicFS(bool userns_enabled)
* we mount the filesystem in read-write mode initially, and then do a
* separate read-only bind mount on top of that.
*/
- bindOverReadonly = !!(mnt->mflags & MS_RDONLY);
+ bindOverReadonly = !!(mnt_mflags & MS_RDONLY);
VIR_DEBUG("Mount %s on %s type=%s flags=%x",
- mnt->src, mnt->dst, mnt->type, mnt->mflags & ~MS_RDONLY);
- if (mount(mnt->src, mnt->dst, mnt->type, mnt->mflags & ~MS_RDONLY, NULL) < 0) {
+ mnt_src, mnt->dst, mnt->type, mnt_mflags & ~MS_RDONLY);
+ if (mount(mnt_src, mnt->dst, mnt->type, mnt_mflags & ~MS_RDONLY, NULL) < 0) {
virReportSystemError(errno,
_("Failed to mount %s on %s type %s flags=%x"),
- mnt->src, mnt->dst, NULLSTR(mnt->type),
- mnt->mflags & ~MS_RDONLY);
+ mnt_src, mnt->dst, NULLSTR(mnt->type),
+ mnt_mflags & ~MS_RDONLY);
goto cleanup;
}
if (bindOverReadonly &&
- mount(mnt->src, mnt->dst, NULL,
+ mount(mnt_src, mnt->dst, NULL,
MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) < 0) {
virReportSystemError(errno,
_("Failed to re-mount %s on %s flags=%x"),
- mnt->src, mnt->dst,
+ mnt_src, mnt->dst,
MS_BIND|MS_REMOUNT|MS_RDONLY);
goto cleanup;
}
@@ -893,6 +913,7 @@ static int lxcContainerMountBasicFS(bool userns_enabled)
rc = 0;
cleanup:
+ VIR_FREE(mnt_src);
VIR_DEBUG("rc=%d", rc);
return rc;
}
@@ -1643,7 +1664,8 @@ static int lxcContainerSetupPivotRoot(virDomainDefPtr vmDef,
goto cleanup;
/* Mounts the core /proc, /sys, etc filesystems */
- if (lxcContainerMountBasicFS(vmDef->idmap.nuidmap) < 0)
+ if (lxcContainerMountBasicFS(vmDef->idmap.nuidmap,
+ !vmDef->nnets) < 0)
goto cleanup;
/* Ensure entire root filesystem (except /.oldroot) is readonly */
--
1.9.0
9 years, 9 months
[libvirt] [PATCH] Fix reporting of i/o errors by iohelper process
by Jason J. Herne
From: "Jason J. Herne" <jjherne(a)us.ibm.com>
libvirt_iohelper is a helper process that is exec'ed and used to handle I/O
during a Qemu managed save operation. Due to a missing call to
virFileWrapperFdClose, all I/O error messages reported by iohelper are lost.
This patch adds a call to virFileWrapperFdClose to the cleanup phase of
qemuDomainSaveMemory.
This patch also modifies virFileWrapperFdClose such that errors are only
reported when the length of the err_msg buffer is > 0. Before now, the
existence of the buffer would trigger error reporting in virFileWrapperFdClose.
Signed-off-by: Jason J. Herne <jjherne(a)us.ibm.com>
---
src/qemu/qemu_driver.c | 1 +
src/util/virfile.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index ecccf6c..8d78805 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -3015,6 +3015,7 @@ qemuDomainSaveMemory(virQEMUDriverPtr driver,
cleanup:
VIR_FORCE_CLOSE(fd);
+ virFileWrapperFdClose(wrapperFd);
virFileWrapperFdFree(wrapperFd);
VIR_FREE(xml);
diff --git a/src/util/virfile.c b/src/util/virfile.c
index 463064c..813b4f5 100644
--- a/src/util/virfile.c
+++ b/src/util/virfile.c
@@ -322,7 +322,7 @@ virFileWrapperFdClose(virFileWrapperFdPtr wfd)
return 0;
ret = virCommandWait(wfd->cmd, NULL);
- if (wfd->err_msg)
+ if (wfd->err_msg && strlen(wfd->err_msg))
VIR_WARN("iohelper reports: %s", wfd->err_msg);
return ret;
--
1.8.3.2
9 years, 10 months
[libvirt] [libvirt-test-API][PATCH V4 0/4] Add test case for virconnect V4
by Jincheng Miao
V3->V4:
Remove getSysinfo() check for lxc connection.
Fix minor problems.
V2->V3:
Refactor connection_nodeinfo.
Change the way of getting version number.
V1->V2:
Seperate check functions in each test case.
Improve log message.
V1:
Add test case for virconnect
Add test case for nodeinfo of virconnect
Add connection_version test case
Add conf file of virconnect test
jmiao (4):
Add test case for virConnect
Add connection_nodeinfo test case
Add connection_version test case
Add test_connection.conf
cases/test_connection.conf | 31 +++++++
repos/virconn/__init__.py | 0
repos/virconn/connection_attributes.py | 92 +++++++++++++++++++++
repos/virconn/connection_nodeinfo.py | 146 +++++++++++++++++++++++++++++++++
repos/virconn/connection_version.py | 119 +++++++++++++++++++++++++++
5 files changed, 388 insertions(+)
create mode 100644 cases/test_connection.conf
create mode 100644 repos/virconn/__init__.py
create mode 100644 repos/virconn/connection_attributes.py
create mode 100644 repos/virconn/connection_nodeinfo.py
create mode 100644 repos/virconn/connection_version.py
--
1.8.3.1
10 years
[libvirt] ANNOUNCE: libguestfs 1.26 released
by Richard W.M. Jones
I'm pleased to announce libguestfs 1.26, a library and set of tools
for accessing and modifying virtual machine disk images. This release
took more than 6 months of work by a considerable number of people,
and has many new features (see release notes below).
You can get libguestfs 1.26 here:
Main website: http://libguestfs.org/
Source: http://libguestfs.org/download/1.26-stable/
You will also need latest supermin from here:
http://libguestfs.org/download/supermin/
Fedora 20/21: http://koji.fedoraproject.org/koji/packageinfo?packageID=8391
It will appear as an update for F20 in about a week.
Debian/experimental coming soon, see:
https://packages.debian.org/experimental/libguestfs0
The Fedora and Debian packages have split dependencies so you can
download just the features you need.
>From http://libguestfs.org/guestfs-release-notes.1.html :
RELEASE NOTES FOR LIBGUESTFS 1.26
New features
Tools
virt-customize(1) is a new tool for customizing virtual machine disk
images. It lets you install packages, edit configuration files, run
scripts, set passwords and so on. virt-builder(1) and virt-sysprep(1)
use virt-customize, and command line options across all these tools are
now identical.
virt-diff(1) is a new tool for showing the differences between the
filesystems of two virtual machines. It is mainly useful when showing
what files have been changed between snapshots.
virt-builder(1) has been greatly enhanced. There are many more ways to
customize the virtual machine. It can pull templates from multiple
repositories. A parallelized internal xzcat implementation speeds up
template decompression. Virt-builder uses an optimizing planner to
choose the fastest way to build the VM. It is now easier to use
virt-builder from other programs. Internationalization support has been
added to metadata. More efficient SELinux relabelling of files. Can
build guests for multiple architectures. Error messages have been
improved. (Pino Toscano)
virt-sparsify(1) has a new --in-place option. This sparsifies an image
in place (without copying it) and is also much faster. (Lots of help
provided by Paolo Bonzini)
virt-sysprep(1) can delete and scrub files under user control. You can
lock user accounts or set random passwords on accounts. Can remove more
log files. Can unsubscribe a guest from Red Hat Subscription Manager.
New flexible way to enable and disable operations. (Wanlong Gao, Pino
Toscano)
virt-win-reg(1) allows you to use URIs to specify remote disk images.
virt-format(1) can now pass the extra space that it recovers back to
the host.
guestfish(1) has additional environment variables to give fine control
over the ><fs> prompt. Guestfish reads its (rarely used) configuration
file in a different order now so that local settings override global
settings. (Pino Toscano)
virt-make-fs(1) was rewritten in C, but is unchanged in terms of
functionality and command line usage.
Language bindings
The OCaml bindings have a new Guestfs.Errno module, used to check the
error number returned by Guestfs.last_errno.
PHP tests now work. (Pino Toscano)
Inspection
Inspection can recognize Debian live images.
Architectures
ARMv7 (32 bit) now supports KVM acceleration.
Aarch64 (ARM 64 bit) is supported, but the appliance part does not work
yet.
PPC64 support has been fixed and enhanced.
Security
Denial of service when inspecting disk images with corrupt btrfs
volumes
It was possible to crash libguestfs (and programs that use libguestfs
as a library) by presenting a disk image containing a corrupt btrfs
volume.
This was caused by a NULL pointer dereference causing a denial of
service, and is not thought to be exploitable any further.
See commit d70ceb4cbea165c960710576efac5a5716055486 for the fix. This
fix is included in libguestfs stable branches ≥ 1.26.0, ≥ 1.24.6 and
≥ 1.22.8, and also in RHEL ≥ 7.0. Earlier versions of libguestfs are
not vulnerable.
Better generation of random root passwords and random seeds
When generating random root passwords and random seeds, two bugs were
fixed which are possibly security related. Firstly we no longer read
excessive bytes from /dev/urandom (most of which were just thrown
away). Secondly we changed the code to avoid modulo bias. These
issues were not thought to be exploitable. (Both changes suggested by
Edwin Török)
API
GUID parameters are now validated when they are passed to API calls,
whereas previously you could have passed any string. (Pino Toscano)
New APIs
guestfs_add_drive_opts: new discard parameter
The new discard parameter allows fine-grained control over
discard/trim support for a particular disk. This allows the host file
to become more sparse (or thin-provisioned) when you delete files or
issue the guestfs_fstrim API call.
guestfs_add_domain: new parameters: cachemode, discard
These parameters are passed through when adding the domain's disks.
guestfs_blkdiscard
Discard all blocks on a guestfs device. Combined with the discard
parameter above, this makes the host file sparse.
guestfs_blkdiscardzeroes
Test if discarded blocks read back as zeroes.
guestfs_compare_*
guestfs_copy_*
For each struct returned through the API, libguestfs now generates
guestfs_compare_* and guestfs_copy_* functions to allow you to
compare and copy structs.
guestfs_copy_attributes
Copy attributes (like permissions, xattrs, ownership) from one file
to another. (Pino Toscano)
guestfs_disk_create
A flexible API for creating empty disk images from scratch. This
avoids the need to call out to external programs like qemu-img(1).
guestfs_get_backend_settings
guestfs_set_backend_settings
Per-backend settings (can also be set via the environment variable
LIBGUESTFS_BACKEND_SETTINGS). The main use for this is forcing TCG
mode in the qemu-based backends, for example:
export LIBGUESTFS_BACKEND=direct
export LIBGUESTFS_BACKEND_SETTINGS=force_tcg
guestfs_part_get_name
Get the label or name of a partition (for GPT disk images).
Build changes
The following extra packages are required to build libguestfs 1.26:
supermin ≥ 5
Supermin version 5 is required to build this version of libguestfs.
flex, bison
Virt-builder now uses a real parser to parse its metadata file, so
these tools are required.
xz
This is now a required build dependency, where previously it was (in
theory) optional.
Internals
PO message extraction rewritten to be more robust. (Pino Toscano)
podwrapper gives an error if the --insert or --verbatim argument
pattern is not found.
Libguestfs now passes the qemu -enable-fips option to enable FIPS, if
qemu supports it.
./configure --without-qemu can be used if you don't want to specify a
default hypervisor.
Copy-on-write [COW] overlays, used for example for read-only drives,
are now created through an internal backend API (.create_cow_overlay).
Libvirt backend uses some funky C macros to generate XML. These are
simpler and safer.
The ChangeLog file format has changed. It is now just the same as git
log, instead of using a custom format.
Appliance start-up has changed:
* The libguestfs appliance now initializes LVM the same way as it is
done on physical machines.
* The libguestfs appliance does not write an empty string to
/proc/sys/kernel/hotplug when starting up.
Note that you must configure your kernel to have
CONFIG_UEVENT_HELPER_PATH="" otherwise you will get strange LVM
errors (this applies as much to any Linux machine, not just
libguestfs). (Peter Rajnoha)
Libguestfs can now be built on arches that have ocamlc(1) but not
ocamlopt(1). (Hilko Bengen, Olaf Hering)
You cannot use ./configure --disable-daemon --enable-appliance. It made
no sense anyway. Now it is expressly forbidden by the configure script.
The packagelist file uses m4 for macro expansion instead of cpp.
Bugs fixed
https://bugzilla.redhat.com/1073906
java bindings inspect_list_applications2 throws
java.lang.ArrayIndexOutOfBoundsException:
https://bugzilla.redhat.com/1063374
[RFE] enable subscription manager clean or unregister operation to
sysprep
https://bugzilla.redhat.com/1060404
virt-resize does not preserve GPT partition names
https://bugzilla.redhat.com/1057504
mount-local should give a clearer error if root is not mounted
https://bugzilla.redhat.com/1056290
virt-sparsify overwrites block devices if used as output files
https://bugzilla.redhat.com/1055452
libguestfs: error: invalid backend: appliance
https://bugzilla.redhat.com/1054761
guestfs_pvs prints "unknown device" if a physical volume is missing
https://bugzilla.redhat.com/1053847
Recommended default clock/timer settings
https://bugzilla.redhat.com/1046509
ruby-libguestfs throws "expecting 0 or 1 arguments" on
Guestfs::Guestfs.new
https://bugzilla.redhat.com/1045450
Cannot inspect cirros 0.3.1 disk image fully
https://bugzilla.redhat.com/1045033
LIBVIRT_DEFAULT_URI=qemu:///system breaks libguestfs
https://bugzilla.redhat.com/1044585
virt-builder network (eg. --install) doesn't work if resolv.conf sets
nameserver 127.0.0.1
https://bugzilla.redhat.com/1044014
When SSSD is installed, libvirt configuration requires
authentication, but not clear to user
https://bugzilla.redhat.com/1039995
virt-make-fs fails making fat/vfat whole disk: Device partition
expected, not making filesystem on entire device '/dev/sda' (use -I
to override)
https://bugzilla.redhat.com/1039540
virt-sysprep to delete more logfiles
https://bugzilla.redhat.com/1033207
RFE: libguestfs inspection does not recognize Free4NAS live CD
https://bugzilla.redhat.com/1028660
RFE: virt-sysprep/virt-builder should have an option to lock a user
account
https://bugzilla.redhat.com/1026688
libguestfs fails examining libvirt guest with ceph drives: rbd: image
name must begin with a '/'
https://bugzilla.redhat.com/1022431
virt-builder fails if $HOME/.cache doesn't exist
https://bugzilla.redhat.com/1022184
libguestfs: do not use versioned jar file
https://bugzilla.redhat.com/1020806
All libguestfs LVM operations fail on Debian/Ubuntu
https://bugzilla.redhat.com/1008417
Need update helpout of part-set-gpt-type
https://bugzilla.redhat.com/953907
virt-sysprep does not correctly set the hostname on Debian/Ubuntu
https://bugzilla.redhat.com/923355
guestfish prints literal "\n" in error messages
https://bugzilla.redhat.com/660687
guestmount: "touch" command fails: touch: setting times of
`timestamp': Invalid argument
https://bugzilla.redhat.com/593511
[RFE] function to get partition name
https://bugzilla.redhat.com/563450
list-devices returns devices of different types out of order
---
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-p2v converts physical machines to virtual machines. Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v
10 years
Re: [libvirt] [GIT PULL] namespace updates for v3.17-rc1
by Richard Weinberger
On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm(a)xmission.com> wrote:
>
> Linus,
>
> Please pull the for-linus branch from the git tree:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus
>
> HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts
>
> This is a bunch of small changes built against 3.16-rc6. The most
> significant change for users is the first patch which makes setns
> drmatically faster by removing unneded rcu handling.
>
> The next chunk of changes are so that "mount -o remount,.." will not
> allow the user namespace root to drop flags on a mount set by the system
> wide root. Aks this forces read-only mounts to stay read-only, no-dev
> mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec mounts to
> stay no exec and it prevents unprivileged users from messing with a
> mounts atime settings. I have included my test case as the last patch
> in this series so people performing backports can verify this change
> works correctly.
>
> The next change fixes a bug in NFS that was discovered while auditing
> nsproxy users for the first optimization. Today you can oops the kernel
> by reading /proc/fs/nfsfs/{servers,volumes} if you are clever with pid
> namespaces. I rebased and fixed the build of the !CONFIG_NFS_FS case
> yesterday when a build bot caught my typo. Given that no one to my
> knowledge bases anything on my tree fixing the typo in place seems more
> responsible that requiring a typo-fix to be backported as well.
>
> The last change is a small semantic cleanup introducing
> /proc/thread-self and pointing /proc/mounts and /proc/net at it. This
> prevents several kinds of problemantic corner cases. It is a
> user-visible change so it has a minute chance of causing regressions so
> the change to /proc/mounts and /proc/net are individual one line commits
> that can be trivially reverted. Unfortunately I lost and could not find
> the email of the original reporter so he is not credited. From at least
> one perspective this change to /proc/net is a refgression fix to allow
> pthread /proc/net uses that were broken by the introduction of the network
> namespace.
>
> Eric
>
> Eric W. Biederman (11):
> namespaces: Use task_lock and not rcu to protect nsproxy
> mnt: Only change user settable mount flags in remount
> mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount
> mnt: Correct permission checks in do_remount
This commit breaks libvirt-lxc.
libvirt does in lxcContainerMountBasicFS():
/*
* We can't immediately set the MS_RDONLY flag when mounting filesystems
* because (in at least some kernel versions) this will propagate back
* to the original mount in the host OS, turning it readonly too. Thus
* we mount the filesystem in read-write mode initially, and then do a
* separate read-only bind mount on top of that.
*/
bindOverReadonly = !!(mnt_mflags & MS_RDONLY);
VIR_DEBUG("Mount %s on %s type=%s flags=%x",
mnt_src, mnt->dst, mnt->type, mnt_mflags & ~MS_RDONLY);
if (mount(mnt_src, mnt->dst, mnt->type, mnt_mflags &
~MS_RDONLY, NULL) < 0) {
^^^^ Here it fails for sysfs because with user namespaces we bind the
existing /sys into the container
and would have to read out all existing mount flags from the current /sys mount.
Otherwise mount() fails with EPERM.
On my test system /sys is mounted with
"rw,nosuid,nodev,noexec,relatime" and libvirt
misses the realtime...
virReportSystemError(errno,
_("Failed to mount %s on %s type %s flags=%x"),
mnt_src, mnt->dst, NULLSTR(mnt->type),
mnt_mflags & ~MS_RDONLY);
goto cleanup;
}
if (bindOverReadonly &&
mount(mnt_src, mnt->dst, NULL,
MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) < 0) {
^^^ Here it fails because now we'd have to specify all flags as used
for the first
mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV.
See lxcBasicMounts[].
In this case the fix is easy, add mnt_mflags to the mount flags.
virReportSystemError(errno,
_("Failed to re-mount %s on %s flags=%x"),
mnt_src, mnt->dst,
MS_BIND|MS_REMOUNT|MS_RDONLY);
goto cleanup;
}
--
Thanks,
//richard
10 years
[libvirt] [PATCH v3] leaseshelper: improvements to support all events
by Nehal J Wani
This patch enables the helper program to detect event(s) triggered when there
is a change in lease length or expiry and client-id. This transfers complete
control of leases database to libvirt and obsoletes use of the lease database
file (<network-name>.leases). That file will not be created, read, or written.
This is achieved by adding the option --leasefile-ro to dnsmasq and passing a
custom env var to leaseshelper, which helps us map events related to leases
with their corresponding network bridges, no matter what the event be.
Also, this requires the addition of a new non-lease entry in our custom lease
database: "server-duid". It is required to identify a DHCPv6 server.
Now that dnsmasq doesn't maintain its own leases database, it relies on our
helper program to tell it about previous leases and server duid. Thus, this
patch makes our leases program honor an extra action: "init", in which it sends
the known info in a particular format to dnsmasq by printing it to stdout.
---
This is compatible with libvirt 1.2.6 as only additions have been
introduced, and the old leases file (*.status) will still be supported.
v3: * Add server-duid as an entry in the lease object for every ipv6 lease.
* Remove unnecessary variables and double copies.
* Take value from DNSMASQ_OLD_HOSTNAME if hostname is not known.
v2: http://www.redhat.com/archives/libvir-list/2014-July/msg01109.html
v1: https://www.redhat.com/archives/libvir-list/2014-July/msg00568.html
src/network/bridge_driver.c | 3 +
src/network/leaseshelper.c | 132 +++++++++++++++++++++++++++++++++++---------
2 files changed, 109 insertions(+), 26 deletions(-)
diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c
index 965fdec..b578b3a 100644
--- a/src/network/bridge_driver.c
+++ b/src/network/bridge_driver.c
@@ -1288,7 +1288,10 @@ networkBuildDhcpDaemonCommandLine(virNetworkObjPtr network,
cmd = virCommandNew(dnsmasqCapsGetBinaryPath(caps));
virCommandAddArgFormat(cmd, "--conf-file=%s", configfile);
+ /* Libvirt gains full control of leases database */
+ virCommandAddArgFormat(cmd, "--leasefile-ro");
virCommandAddArgFormat(cmd, "--dhcp-script=%s", leaseshelper_path);
+ virCommandAddEnvPair(cmd, "VIR_BRIDGE_NAME", network->def->bridge);
*cmdout = cmd;
ret = 0;
diff --git a/src/network/leaseshelper.c b/src/network/leaseshelper.c
index c8543a2..e984cbb 100644
--- a/src/network/leaseshelper.c
+++ b/src/network/leaseshelper.c
@@ -50,6 +50,12 @@
*/
#define VIR_NETWORK_DHCP_LEASE_FILE_SIZE_MAX (32 * 1024 * 1024)
+/*
+ * Use this when passing possibly-NULL strings to printf-a-likes.
+ * Required for unknown parameters during init call.
+ */
+#define EMPTY_STR(s) ((s) ? (s) : "*")
+
static const char *program_name;
/* Display version information. */
@@ -65,7 +71,7 @@ usage(int status)
if (status) {
fprintf(stderr, _("%s: try --help for more details\n"), program_name);
} else {
- printf(_("Usage: %s add|old|del mac|clientid ip [hostname]\n"
+ printf(_("Usage: %s add|old|del|init mac|clientid ip [hostname]\n"
"Designed for use with 'dnsmasq --dhcp-script'\n"
"Refer to man page of dnsmasq for more details'\n"),
program_name);
@@ -89,6 +95,7 @@ enum virLeaseActionFlags {
VIR_LEASE_ACTION_ADD, /* Create new lease */
VIR_LEASE_ACTION_OLD, /* Lease already exists, renew it */
VIR_LEASE_ACTION_DEL, /* Delete the lease */
+ VIR_LEASE_ACTION_INIT, /* Tell dnsmasq of existing leases on restart */
VIR_LEASE_ACTION_LAST
};
@@ -96,7 +103,7 @@ enum virLeaseActionFlags {
VIR_ENUM_DECL(virLeaseAction);
VIR_ENUM_IMPL(virLeaseAction, VIR_LEASE_ACTION_LAST,
- "add", "old", "del");
+ "add", "old", "del", "init");
int
main(int argc, char **argv)
@@ -112,20 +119,24 @@ main(int argc, char **argv)
const char *interface = virGetEnvAllowSUID("DNSMASQ_INTERFACE");
const char *exptime_tmp = virGetEnvAllowSUID("DNSMASQ_LEASE_EXPIRES");
const char *hostname = virGetEnvAllowSUID("DNSMASQ_SUPPLIED_HOSTNAME");
+ const char *server_duid = virGetEnvAllowSUID("DNSMASQ_SERVER_DUID");
const char *leases_str = NULL;
long long currtime = 0;
long long expirytime = 0;
size_t i = 0;
+ size_t count_ipv6 = 0;
+ size_t count_ipv4 = 0;
int action = -1;
int pid_file_fd = -1;
int rv = EXIT_FAILURE;
int custom_lease_file_len = 0;
- bool add = false;
bool delete = false;
virJSONValuePtr lease_new = NULL;
virJSONValuePtr lease_tmp = NULL;
virJSONValuePtr leases_array = NULL;
virJSONValuePtr leases_array_new = NULL;
+ virJSONValuePtr *leases_ipv4 = NULL;
+ virJSONValuePtr *leases_ipv6 = NULL;
virSetErrorFunc(NULL, NULL);
virSetErrorLogPriorityFunc(NULL);
@@ -156,16 +167,17 @@ main(int argc, char **argv)
}
}
- if (argc != 4 && argc != 5) {
+ if (argc != 4 && argc != 5 && argc != 2) {
/* Refer man page of dnsmasq --dhcp-script for more details */
usage(EXIT_FAILURE);
}
/* Make sure dnsmasq knows the interface. The interface name is not known
- * when dnsmasq (re)starts and throws 'del' events for expired leases.
- * So, if any old lease has expired, it will be automatically removed the
- * next time this program is invoked */
- if (!interface)
+ * via env variable set by dnsmasq when dnsmasq (re)starts and throws 'del'
+ * events for expired leases. So, libvirtd sets another env var for this
+ * purpose */
+ if (!interface &&
+ !(interface = virGetEnvAllowSUID("VIR_BRIDGE_NAME")))
goto cleanup;
ip = argv[3];
@@ -176,6 +188,10 @@ main(int argc, char **argv)
if (argc == 5)
hostname = argv[4];
+ /* In case hostname is still unkown, use the last known one */
+ if (!hostname)
+ hostname = virGetEnvAllowSUID("DNSMASQ_OLD_HOSTNAME");
+
if (VIR_STRDUP(exptime, exptime_tmp) < 0)
goto cleanup;
@@ -185,7 +201,7 @@ main(int argc, char **argv)
exptime[strlen(exptime) - 1] = '\0';
/* Check if it is an IPv6 lease */
- if (virGetEnvAllowSUID("DNSMASQ_IAID")) {
+ if (iaid) {
mac = virGetEnvAllowSUID("DNSMASQ_MAC");
clientid = argv[2];
}
@@ -235,7 +251,6 @@ main(int argc, char **argv)
delete = true;
if (action == VIR_LEASE_ACTION_ADD ||
action == VIR_LEASE_ACTION_OLD) {
- add = true;
/* Create new lease */
if (!(lease_new = virJSONValueNewObject())) {
virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
@@ -260,11 +275,13 @@ main(int argc, char **argv)
goto cleanup;
if (clientid && virJSONValueObjectAppendString(lease_new, "client-id", clientid) < 0)
goto cleanup;
+ if (server_duid && virJSONValueObjectAppendString(lease_new, "server-duid", server_duid) < 0)
+ goto cleanup;
if (expirytime && virJSONValueObjectAppendNumberLong(lease_new, "expiry-time", expirytime) < 0)
goto cleanup;
}
}
- } else {
+ } else if (action != VIR_LEASE_ACTION_INIT) {
fprintf(stderr, _("Unsupported action: %s\n"),
virLeaseActionTypeToString(action));
exit(EXIT_FAILURE);
@@ -294,7 +311,7 @@ main(int argc, char **argv)
i = 0;
while (i < virJSONValueArraySize(leases_array)) {
const char *ip_tmp = NULL;
- long long expirytime_tmp = -1;
+ const char *server_duid_tmp = NULL;
if (!(lease_tmp = virJSONValueArrayGet(leases_array, i))) {
virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
@@ -303,14 +320,13 @@ main(int argc, char **argv)
}
if (!(ip_tmp = virJSONValueObjectGetString(lease_tmp, "ip-address")) ||
- (virJSONValueObjectGetNumberLong(lease_tmp, "expiry-time", &expirytime_tmp) < 0)) {
+ (virJSONValueObjectGetNumberLong(lease_tmp, "expiry-time", &expirytime) < 0)) {
virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
_("failed to parse json"));
goto cleanup;
}
-
/* Check whether lease has expired or not */
- if (expirytime_tmp < currtime) {
+ if (expirytime < currtime) {
i++;
continue;
}
@@ -321,6 +337,30 @@ main(int argc, char **argv)
continue;
}
+ /* Store pointers to ipv4 and ipv6 leases */
+ if (strchr(ip_tmp, ':')) {
+ /* This is an ipv6 lease */
+ ignore_value(VIR_APPEND_ELEMENT_COPY(leases_ipv6, count_ipv6, lease_tmp));
+ if ((server_duid_tmp
+ = virJSONValueObjectGetString(lease_tmp, "server-duid"))) {
+ if (!server_duid) {
+ /* Control reaches here when the 'action' is not for an
+ * ipv6 lease or, for some weird reason the env var
+ * DNSMASQ_SERVER_DUID wasn't set*/
+ server_duid = server_duid_tmp;
+ }
+ } else {
+ /* Inject server-duid into those ipv6 leases which
+ * didn't have it previously, for example, those
+ * created by leaseshelper from libvirt 1.2.6 */
+ if (virJSONValueObjectAppendString(lease_tmp, "server-duid", server_duid) < 0)
+ goto cleanup;
+ }
+ } else {
+ /* This is an ipv4 lease */
+ ignore_value(VIR_APPEND_ELEMENT_COPY(leases_ipv4, count_ipv4, lease_tmp));
+ }
+
/* Move old lease to new array */
if (virJSONValueArrayAppend(leases_array_new, lease_tmp) < 0) {
virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
@@ -333,31 +373,71 @@ main(int argc, char **argv)
}
}
- if (add) {
+ switch ((enum virLeaseActionFlags) action) {
+ case VIR_LEASE_ACTION_INIT:
+ /* Man page of dnsmasq says: the script (helper program, in our case)
+ * should write the saved state of the lease database, in dnsmasq
+ * leasefile format, to stdout and exit with zero exit code, when
+ * called with argument init. Format:
+ * $expirytime $mac $ip $hostname $clientid # For all ipv4 leases
+ * duid $server-duid # If DHCPv6 is present
+ * $expirytime $iaid $ip $hostname $clientduid # For all ipv6 leases */
+ for (i = 0; i < count_ipv4; i++) {
+ lease_tmp = leases_ipv4[i];
+ virJSONValueObjectGetNumberLong(lease_tmp, "expiry-time", &expirytime);
+ printf("%lld %s %s %s %s\n",
+ expirytime,
+ virJSONValueObjectGetString(lease_tmp, "mac-address"),
+ virJSONValueObjectGetString(lease_tmp, "ip-address"),
+ EMPTY_STR(virJSONValueObjectGetString(lease_tmp, "hostname")),
+ EMPTY_STR(virJSONValueObjectGetString(lease_tmp, "client-id")));
+ }
+ if (server_duid) {
+ printf("duid %s\n", server_duid);
+ for (i = 0; i < count_ipv6; i++) {
+ lease_tmp = leases_ipv6[i];
+ virJSONValueObjectGetNumberLong(lease_tmp, "expiry-time", &expirytime);
+ printf("%lld %s %s %s %s\n",
+ expirytime,
+ virJSONValueObjectGetString(lease_tmp, "iaid"),
+ virJSONValueObjectGetString(lease_tmp, "ip-address"),
+ EMPTY_STR(virJSONValueObjectGetString(lease_tmp, "hostname")),
+ EMPTY_STR(virJSONValueObjectGetString(lease_tmp, "client-id")));
+ }
+ }
+ break;
+
+ case VIR_LEASE_ACTION_OLD:
+ case VIR_LEASE_ACTION_ADD:
if (virJSONValueArrayAppend(leases_array_new, lease_new) < 0) {
virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
_("failed to create json"));
goto cleanup;
}
lease_new = NULL;
- }
- if (!(leases_str = virJSONValueToString(leases_array_new, true))) {
- virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
- _("empty json array"));
- goto cleanup;
- }
+ default:
+ if (!(leases_str = virJSONValueToString(leases_array_new, true))) {
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+ _("empty json array"));
+ goto cleanup;
+ }
- /* Write to file */
- if (virFileRewrite(custom_lease_file, 0644,
- customLeaseRewriteFile, &leases_str) < 0)
- goto cleanup;
+ /* Write to file */
+ if (virFileRewrite(custom_lease_file, 0644,
+ customLeaseRewriteFile, &leases_str) < 0)
+ goto cleanup;
+ }
rv = EXIT_SUCCESS;
cleanup:
if (pid_file_fd != -1)
virPidFileReleasePath(pid_file, pid_file_fd);
+ for (i = 0; i < count_ipv4; i++)
+ VIR_FREE(leases_ipv4);
+ for (i = 0; i < count_ipv6; i++)
+ VIR_FREE(leases_ipv6);
VIR_FREE(pid_file);
VIR_FREE(exptime);
--
1.9.3
10 years, 1 month