July 2018 - Devel - Libvirt List Archives

[libvirt] [PATCH] spec: remove libcgroup and cgconfig

by Pavel Hrdina

RHEL-6/CentOS-6 is no longer supported, let's remove dependency on libcgroup and code that enables/starts cgconfig service. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1602407 Signed-off-by: Pavel Hrdina <phrdina(a)redhat.com> --- libvirt.spec.in | 19 ------------------- 1 file changed, 19 deletions(-) diff --git a/libvirt.spec.in b/libvirt.spec.in index 6f0d399064..6f360c5579 100644 --- a/libvirt.spec.in +++ b/libvirt.spec.in @@ -80,7 +80,6 @@ # A few optional bits off by default, we enable later %define with_fuse 0%{!?_without_fuse:0} -%define with_cgconfig 0%{!?_without_cgconfig:0} %define with_sanlock 0%{!?_without_sanlock:0} %define with_systemd 0%{!?_without_systemd:0} %define with_numad 0%{!?_without_numad:0} @@ -216,11 +215,6 @@ %endif %endif -# Pull in cgroups config system -%if %{with_qemu} || %{with_lxc} - %define with_cgconfig 0%{!?_without_cgconfig:1} -%endif - # Force QEMU to run as non-root %define qemu_user qemu %define qemu_group qemu @@ -496,9 +490,6 @@ Requires: polkit >= 0.112 %else Requires: polkit >= 0.93 %endif -%if %{with_cgconfig} -Requires: libcgroup -%endif %ifarch %{ix86} x86_64 ia64 # For virConnectGetSysinfo Requires: dmidecode @@ -1490,16 +1481,6 @@ if [ $1 -eq 1 ] ; then fi %endif %else - %if %{with_cgconfig} -# Starting with Fedora 16/RHEL-7, systemd automounts all cgroups, -# and cgconfig is no longer a necessary service. - %if 0%{?rhel} && 0%{?rhel} < 7 -if [ "$1" -eq "1" ]; then -/sbin/chkconfig cgconfig on -fi - %endif - %endif - /sbin/chkconfig --add libvirtd /sbin/chkconfig --add virtlogd /sbin/chkconfig --add virtlockd -- 2.17.1

6 years, 11 months

2
1
0 / 0

[libvirt] [PATCH] docs: formatdomain: unify naming for CPUs/vCPUs

by Katerina Koukiou

CPU is an acronym and should be written in uppercase when part of plain text and not refering to an element. Signed-off-by: Katerina Koukiou <kkoukiou(a)redhat.com> --- As asked in the review here https://www.redhat.com/archives/libvir-list/2018-July/msg01093.html docs/formatdomain.html.in | 84 +++++++++++++++++++-------------------- 1 file changed, 42 insertions(+), 42 deletions(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index b00971a945..d08ede9ab5 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -631,45 +631,45 @@ </dd> <dt><code>vcpus</code></dt> <dd> - The vcpus element allows to control state of individual vcpus. + The vcpus element allows to control state of individual vCPUs. The <code>id</code> attribute specifies the vCPU id as used by libvirt - in other places such as vcpu pinning, scheduler information and NUMA - assignment. Note that the vcpu ID as seen in the guest may differ from - libvirt ID in certain cases. Valid IDs are from 0 to the maximum vcpu + in other places such as vCPU pinning, scheduler information and NUMA + assignment. Note that the vCPU ID as seen in the guest may differ from + libvirt ID in certain cases. Valid IDs are from 0 to the maximum vCPU count as set by the <code>vcpu</code> element minus 1. The <code>enabled</code> attribute allows to control the state of the - vcpu. Valid values are <code>yes</code> and <code>no</code>. + vCPU. Valid values are <code>yes</code> and <code>no</code>. - <code>hotpluggable</code> controls whether given vcpu can be hotplugged - and hotunplugged in cases when the cpu is enabled at boot. Note that - all disabled vcpus must be hotpluggable. Valid values are + <code>hotpluggable</code> controls whether given vCPU can be hotplugged + and hotunplugged in cases when the CPU is enabled at boot. Note that + all disabled vCPUs must be hotpluggable. Valid values are <code>yes</code> and <code>no</code>. - <code>order</code> allows to specify the order to add the online vcpus. - For hypervisors/platforms that require to insert multiple vcpus at once - the order may be duplicated across all vcpus that need to be - enabled at once. Specifying order is not necessary, vcpus are then + <code>order</code> allows to specify the order to add the online vCPUs. + For hypervisors/platforms that require to insert multiple vCPUs at once + the order may be duplicated across all vCPUs that need to be + enabled at once. Specifying order is not necessary, vCPUs are then added in an arbitrary order. If order info is used, it must be used for - all online vcpus. Hypervisors may clear or update ordering information + all online vCPUs. Hypervisors may clear or update ordering information during certain operations to assure valid configuration. - Note that hypervisors may create hotpluggable vcpus differently from - boot vcpus thus special initialization may be necessary. + Note that hypervisors may create hotpluggable vCPUs differently from + boot vCPUs thus special initialization may be necessary. - Hypervisors may require that vcpus enabled on boot which are not + Hypervisors may require that vCPUs enabled on boot which are not hotpluggable are clustered at the beginning starting with ID 0. It may - be also required that vcpu 0 is always present and non-hotpluggable. + be also required that vCPU 0 is always present and non-hotpluggable. - Note that providing state for individual cpus may be necessary to enable + Note that providing state for individual CPUs may be necessary to enable support of addressable vCPU hotplug and this feature may not be supported by all hypervisors. - For QEMU the following conditions are required. Vcpu 0 needs to be - enabled and non-hotpluggable. On PPC64 along with it vcpus that are in - the same core need to be enabled as well. All non-hotpluggable cpus - present at boot need to be grouped after vcpu 0. + For QEMU the following conditions are required. vCPU 0 needs to be + enabled and non-hotpluggable. On PPC64 along with it vCPUs that are in + the same core need to be enabled as well. All non-hotpluggable CPUs + present at boot need to be grouped after vCPU 0. Since 2.2.0 (QEMU only) </dd> </dl> @@ -774,11 +774,11 @@ <dt><code>vcpupin</code></dt> <dd> The optional <code>vcpupin</code> element specifies which of host's - physical CPUs the domain VCPU will be pinned to. If this is omitted, + physical CPUs the domain vCPU will be pinned to. If this is omitted, and attribute <code>cpuset</code> of element <code>vcpu</code> is not specified, the vCPU is pinned to all the physical CPUs by default. It contains two required attributes, the attribute <code>vcpu</code> - specifies vcpu id, and the attribute <code>cpuset</code> is same as + specifies vCPU id, and the attribute <code>cpuset</code> is same as attribute <code>cpuset</code> of element <code>vcpu</code>. (NB: Only qemu driver support) Since 0.9.0 @@ -786,7 +786,7 @@ <dt><code>emulatorpin</code></dt> <dd> The optional <code>emulatorpin</code> element specifies which of host - physical CPUs the "emulator", a subset of a domain not including vcpu + physical CPUs the "emulator", a subset of a domain not including vCPU or iothreads will be pinned to. If this is omitted, and attribute <code>cpuset</code> of element <code>vcpu</code> is not specified, "emulator" is pinned to all the physical CPUs by default. It contains @@ -820,7 +820,7 @@ <dt><code>period</code></dt> <dd> The optional <code>period</code> element specifies the enforcement - interval(unit: microseconds). Within <code>period</code>, each vcpu of + interval(unit: microseconds). Within <code>period</code>, each vCPU of the domain will not be allowed to consume more than <code>quota</code> worth of runtime. The value should be in range [1000, 1000000]. A period with value 0 means no value. @@ -835,7 +835,7 @@ vCPU threads, which means that it is not bandwidth controlled. The value should be in range [1000, 18446744073709551] or less than 0. A quota with value 0 means no value. You can use this feature to ensure that all - vcpus run at the same speed. + vCPUs run at the same speed. Only QEMU driver support since 0.9.4, LXC since 0.9.10 </dd> @@ -864,7 +864,7 @@ <dd> The optional <code>emulator_period</code> element specifies the enforcement interval(unit: microseconds). Within <code>emulator_period</code>, emulator - threads(those excluding vcpus) of the domain will not be allowed to consume + threads(those excluding vCPUs) of the domain will not be allowed to consume more than <code>emulator_quota</code> worth of runtime. The value should be in range [1000, 1000000]. A period with value 0 means no value. Only QEMU driver support since 0.10.0 @@ -873,9 +873,9 @@ <dd> The optional <code>emulator_quota</code> element specifies the maximum allowed bandwidth(unit: microseconds) for domain's emulator threads(those - excluding vcpus). A domain with <code>emulator_quota</code> as any negative + excluding vCPUs). A domain with <code>emulator_quota</code> as any negative value indicates that the domain has infinite bandwidth for emulator threads - (those excluding vcpus), which means that it is not bandwidth controlled. + (those excluding vCPUs), which means that it is not bandwidth controlled. The value should be in range [1000, 18446744073709551] or less than 0. A quota with value 0 means no value. Only QEMU driver support since 0.10.0 @@ -2131,13 +2131,13 @@ QEMU, the user-configurable extended TSEG feature was unavailable up to and including <code>pc-q35-2.9</code>. Starting with <code>pc-q35-2.10</code> the feature is available, with default size - 16 MiB. That should suffice for up to roughly 272 VCPUs, 5 GiB guest + 16 MiB. That should suffice for up to roughly 272 vCPUs, 5 GiB guest RAM in total, no hotplug memory range, and 32 GiB of 64-bit PCI MMIO - aperture. Or for 48 VCPUs, with 1TB of guest RAM, no hotplug DIMM + aperture. Or for 48 vCPUs, with 1TB of guest RAM, no hotplug DIMM range, and 32GB of 64-bit PCI MMIO aperture. The values may also vary based on the loader the VM is using. - Additional size might be needed for significantly higher VCPU counts + Additional size might be needed for significantly higher vCPU counts or increased address space (that can be memory, maxMemory, 64-bit PCI MMIO aperture size; roughly 8 MiB of TSEG per 1 TiB of address space) which can also be rounded up. @@ -2147,7 +2147,7 @@ documentation of the guest OS or loader (if there is any), or test this by trial-and-error changing the value until the VM boots successfully. Yet another guiding value for users might be the fact - that 48 MiB should be enough for pretty large guests (240 VCPUs and + that 48 MiB should be enough for pretty large guests (240 vCPUs and 4TB guest RAM), but it is on purpose not set as default as 48 MiB of unavailable RAM might be too much for small guests (e.g. with 512 MiB of RAM). @@ -2425,7 +2425,7 @@ </tr> <tr> <td><code>cpu_cycles</code></td> - <td>the count of cpu cycles (total/elapsed)</td> + <td>the count of CPU cycles (total/elapsed)</td> <td><code>perf.cpu_cycles</code></td> </tr> <tr> @@ -2460,25 +2460,25 @@ </tr> <tr> <td><code>stalled_cycles_frontend</code></td> - <td>the count of stalled cpu cycles in the frontend of the instruction + <td>the count of stalled CPU cycles in the frontend of the instruction processor pipeline by applications running on the platform</td> <td><code>perf.stalled_cycles_frontend</code></td> </tr> <tr> <td><code>stalled_cycles_backend</code></td> - <td>the count of stalled cpu cycles in the backend of the instruction + <td>the count of stalled CPU cycles in the backend of the instruction processor pipeline by applications running on the platform</td> <td><code>perf.stalled_cycles_backend</code></td> </tr> <tr> <td><code>ref_cpu_cycles</code></td> - <td>the count of total cpu cycles not affected by CPU frequency scaling + <td>the count of total CPU cycles not affected by CPU frequency scaling by applications running on the platform</td> <td><code>perf.ref_cpu_cycles</code></td> </tr> <tr> <td><code>cpu_clock</code></td> - <td>the count of cpu clock time, as measured by a monotonic + <td>the count of CPU clock time, as measured by a monotonic high-resolution per-CPU timer, by applications running on the platform</td> <td><code>perf.cpu_clock</code></td> @@ -2505,7 +2505,7 @@ </tr> <tr> <td><code>cpu_migrations</code></td> - <td>the count of cpu migrations, that is, where the process + <td>the count of CPU migrations, that is, where the process moved from one logical processor to another, by applications running on the platform</td> <td><code>perf.cpu_migrations</code></td> @@ -5621,8 +5621,8 @@ qemu-kvm -net nic,model=? /dev/null The resulting difference, according to the qemu developer who added the option is: "bh makes tx more asynchronous and reduces latency, but potentially causes more processor bandwidth - contention since the cpu doing the tx isn't necessarily the - cpu where the guest generated the packets." + contention since the CPU doing the tx isn't necessarily the + CPU where the guest generated the packets." In general you should leave this option alone, unless you are very certain you know what you are doing. -- 2.17.1

6 years, 11 months

2
1
0 / 0

[libvirt] [dbus PATCH] tests: Explicitly spawn a session libvirt-dbus instance

by Andrea Bolognani

Tests are performed using the session D-Bus instance, so we should launch libvirt-dbus in session mode as well. This was working fine when running the tests as a regular user, because in that case libvirt-dbus would default to session mode, but fail when running them as root because libvirt-dbus would run in system mode and consequently not show up on the session bus. Of course building and runnning the test suite as root is a pretty bad idea in general, but a lot of distributions run at least part of their package build steps with pretend root privileges (eg. fakeroot), so we have to make sure it works in that scenario too. Signed-off-by: Andrea Bolognani <abologna(a)redhat.com> --- tests/libvirttest.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/libvirttest.py b/tests/libvirttest.py index 3741abd..4653055 100644 --- a/tests/libvirttest.py +++ b/tests/libvirttest.py @@ -33,7 +33,7 @@ class BaseTestClass(): """Start libvirt-dbus for each test function """ os.environ['LIBVIRT_DEBUG'] = '3' - self.libvirt_dbus = subprocess.Popen([exe]) + self.libvirt_dbus = subprocess.Popen([exe, '--session']) self.bus = dbus.SessionBus() for i in range(10): -- 2.17.1

6 years, 11 months

2
1
0 / 0

[libvirt] [jenkins-ci PATCH v2 00/12] lcitool: Rewrite in Python (and add Dockefile generator)

by Andrea Bolognani

Read the initial cover letter for background and motivations. Changes from [v1]: * add Dockerfile generator; * rename the 'list' action to 'hosts' to better fit along with the additional 'projects' action; * always list items in alphabetical order; * move some generic functions to an Util class. [v1] https://www.redhat.com/archives/libvir-list/2018-July/msg00717.html Andrea Bolognani (12): lcitool: Drop shell implementation lcitool: Stub out Python implementation lcitool: Add tool configuration handling lcitool: Add inventory handling lcitool: Implement the 'hosts' action lcitool: Implement the 'install' action lcitool: Implement the 'update' action guests: Update documentation guests: Add Docker-related information to the inventory lcitool: Add projects information handling lcitool: Implement the 'projects' action lcitool: Implement the 'dockerfile' action guests/README.markdown | 8 +- guests/host_vars/libvirt-centos-7/docker.yml | 2 + guests/host_vars/libvirt-debian-8/docker.yml | 2 + guests/host_vars/libvirt-debian-9/docker.yml | 2 + .../host_vars/libvirt-debian-sid/docker.yml | 2 + guests/host_vars/libvirt-fedora-27/docker.yml | 2 + guests/host_vars/libvirt-fedora-28/docker.yml | 2 + .../libvirt-fedora-rawhide/docker.yml | 2 + guests/host_vars/libvirt-ubuntu-16/docker.yml | 2 + guests/host_vars/libvirt-ubuntu-18/docker.yml | 2 + guests/lcitool | 722 ++++++++++++------ 11 files changed, 521 insertions(+), 227 deletions(-) create mode 100644 guests/host_vars/libvirt-centos-7/docker.yml create mode 100644 guests/host_vars/libvirt-debian-8/docker.yml create mode 100644 guests/host_vars/libvirt-debian-9/docker.yml create mode 100644 guests/host_vars/libvirt-debian-sid/docker.yml create mode 100644 guests/host_vars/libvirt-fedora-27/docker.yml create mode 100644 guests/host_vars/libvirt-fedora-28/docker.yml create mode 100644 guests/host_vars/libvirt-fedora-rawhide/docker.yml create mode 100644 guests/host_vars/libvirt-ubuntu-16/docker.yml create mode 100644 guests/host_vars/libvirt-ubuntu-18/docker.yml -- 2.17.1

6 years, 11 months

3
30
0 / 0

[libvirt] [PATCH 0/2] Some additions/fixes in formatdomain docs

by Katerina Koukiou

Katerina Koukiou (2): docs: formatdomain: add info about global_period and global_quota for cputune docs: formatdomain: clarify period cputune subelement docs/formatdomain.html.in | 31 ++++++++++++++++++++++++++----- 1 file changed, 26 insertions(+), 5 deletions(-) -- 2.17.1

6 years, 11 months

4
7
0 / 0

[libvirt] [jenkins-ci PATCH v3 00/12] lcitool: Rewrite in Python (and add Dockefile generator)

by Andrea Bolognani

pylint is reasonably happy with the script now: $ pylint lcitool No config file found, using default configuration ************* Module lcitool C: 1, 0: Missing module docstring (missing-docstring) C: 37, 0: Missing class docstring (missing-docstring) C: 47, 4: Missing method docstring (missing-docstring) W:108,15: Catching too general exception Exception (broad-except) R:427, 4: Too many branches (14/12) (too-many-branches) R:289, 0: Too few public methods (1/2) (too-few-public-methods) ------------------------------------------------------------------- Your code has been rated at 9.21/10 (previous run: 9.21/10, +0.00) The remaining issues are not considered blockers and will be addressed, if at all, later down the line. Changes from [v2]: * address review comments; * improve pycodestyle and pylint compliance; * replace FSF address with an URL. Changes from [v1]: * add Dockerfile generator; * rename the 'list' action to 'hosts' to better fit along with the additional 'projects' action; * always list items in alphabetical order; * move some generic functions to an Util class. [v2] https://www.redhat.com/archives/libvir-list/2018-July/msg00795.html [v1] https://www.redhat.com/archives/libvir-list/2018-July/msg00717.html Andrea Bolognani (12): lcitool: Drop shell implementation lcitool: Stub out Python implementation lcitool: Add tool configuration handling lcitool: Add inventory handling lcitool: Implement the 'hosts' action lcitool: Implement the 'install' action lcitool: Implement the 'update' action guests: Update documentation guests: Add Docker-related information to the inventory lcitool: Add projects information handling lcitool: Implement the 'projects' action lcitool: Implement the 'dockerfile' action guests/README.markdown | 8 +- guests/host_vars/libvirt-centos-7/docker.yml | 2 + guests/host_vars/libvirt-debian-8/docker.yml | 2 + guests/host_vars/libvirt-debian-9/docker.yml | 2 + .../host_vars/libvirt-debian-sid/docker.yml | 2 + guests/host_vars/libvirt-fedora-27/docker.yml | 2 + guests/host_vars/libvirt-fedora-28/docker.yml | 2 + .../libvirt-fedora-rawhide/docker.yml | 2 + guests/host_vars/libvirt-ubuntu-16/docker.yml | 2 + guests/host_vars/libvirt-ubuntu-18/docker.yml | 2 + guests/lcitool | 730 ++++++++++++------ 11 files changed, 529 insertions(+), 227 deletions(-) create mode 100644 guests/host_vars/libvirt-centos-7/docker.yml create mode 100644 guests/host_vars/libvirt-debian-8/docker.yml create mode 100644 guests/host_vars/libvirt-debian-9/docker.yml create mode 100644 guests/host_vars/libvirt-debian-sid/docker.yml create mode 100644 guests/host_vars/libvirt-fedora-27/docker.yml create mode 100644 guests/host_vars/libvirt-fedora-28/docker.yml create mode 100644 guests/host_vars/libvirt-fedora-rawhide/docker.yml create mode 100644 guests/host_vars/libvirt-ubuntu-16/docker.yml create mode 100644 guests/host_vars/libvirt-ubuntu-18/docker.yml -- 2.17.1

6 years, 11 months

2
13
0 / 0

[libvirt] [PATCH] qemu: stop qemu progress when restore failed

by Jie Wang

>From 29482622218f525f0133be0b7db74835174035d9 Mon Sep 17 00:00:00 2001 From: Jie Wang <wangjie88(a)huawei.com> Date: Thu, 5 Jul 2018 09:52:03 +0800 Subject: [PATCH] qemu: stop qemu progress when restore failed if qemuProcessStartCPUs perform failed in qemuDomainSaveImageStartVM, we need to stop qemu progress, otherwise will remains a wild VM which can't be managed by libvirt. Signed-off-by: Jie Wang <wangjie88.huawei.com> --- src/qemu/qemu_driver.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 9a35e04a85..639b57316d 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -6621,6 +6621,8 @@ qemuDomainSaveImageStartVM(virConnectPtr conn, if (virGetLastErrorCode() == VIR_ERR_OK) virReportError(VIR_ERR_OPERATION_FAILED, "%s", _("failed to resume domain")); + + qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_FAILED, asyncJob, 0); goto cleanup; } if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm, driver->caps) < 0) { -- 2.15.0.windows.1

6 years, 11 months

4
5
0 / 0

[libvirt] [PATCH v4 0/6] Support network stats for VF representor interface

by Jai Singh Rana

With availability of switchdev model in linux, it is possible to capture stats for SR-IOV device with interface_type as 'hostdev' provided device supports VF represontor in switchdev mode on host. These stats are supported by adding helper APIs for getting/verifying VF representor name based on PCI Bus:Device:Function information in domains 'hostdev' structure and querying required net sysfs directory and file entries on host according to switchdev model. These helper APIs are then used in qemu/conf to get the interface stats for VF representor of pci SR-IOV device. V4 includes changes based on feedback received for v3 patchset. Added new generic API for linux is added to fetch stats from /proc/net/dev which will be used by tap and hostdev devices. Also introduced new API to retrieve net def from given domain based on the given hostdev which supports network. [1] https://www.kernel.org/doc/Documentation/networking/switchdev.txt V3: https://www.redhat.com/archives/libvir-list/2018-April/msg00306.html Jai Singh Rana (6): util: Add helper function to clean extra spaces in string util: Add generic API to fetch network stats from procfs util: Add helper APIs to get/verify VF Representor name conf: util: Add API to find net def given its domain's hostdev qemu: Network stats support for VF Representor docs: Update news about Network stats support for VF Representor docs/news.xml | 9 ++ po/POTFILES | 1 + src/conf/domain_conf.c | 43 +++++++ src/conf/domain_conf.h | 2 + src/libvirt_private.syms | 11 ++ src/qemu/qemu_driver.c | 34 ++++- src/util/Makefile.inc.am | 2 + src/util/virhostdev.c | 4 +- src/util/virhostdev.h | 11 ++ src/util/virnetdev.c | 202 ++++++++++++++++++++++++++++- src/util/virnetdev.h | 5 + src/util/virnetdevhostdev.c | 300 ++++++++++++++++++++++++++++++++++++++++++++ src/util/virnetdevhostdev.h | 34 +++++ src/util/virnetdevtap.c | 71 +---------- src/util/virstring.c | 36 ++++++ src/util/virstring.h | 3 + 16 files changed, 691 insertions(+), 77 deletions(-) create mode 100644 src/util/virnetdevhostdev.c create mode 100644 src/util/virnetdevhostdev.h -- 2.13.7

6 years, 11 months

2
12
0 / 0

[libvirt] [PATCH 0/3] virStr*cpy*() related fixes and cleanups

by Andrea Bolognani

1/3 contains bug fixes, 2/3 a bunch of cleanups and 3/3 makes libvirt build again on MinGW. Andrea Bolognani (3): src: Use virStrcpyStatic() to avoid truncation src: Use virStrcpyStatic() wherever possible m4: Work around MinGW detection of strncpy() usage cfg.mk | 2 +- m4/virt-compile-warnings.m4 | 5 +++++ src/conf/nwfilter_conf.c | 3 +-- src/esx/esx_driver.c | 4 +--- src/hyperv/hyperv_driver.c | 3 +-- src/util/virfdstream.c | 2 +- src/util/virlog.c | 5 ++--- src/util/virnetdev.c | 3 +-- src/xenconfig/xen_xl.c | 17 ++++------------- 9 files changed, 17 insertions(+), 27 deletions(-) -- 2.17.1

6 years, 11 months

2
5
0 / 0

Re: [libvirt] opening tap devices that are created in a container

by Roman Mohr

On Wed, Jul 11, 2018 at 12:10 PM <nert@wheatley> wrote: > On Mon, Jul 09, 2018 at 05:00:49PM -0400, Jason Baron wrote: > > > > > >On 07/08/2018 02:01 AM, Martin Kletzander wrote: > >> On Thu, Jul 05, 2018 at 06:24:20PM +0200, Roman Mohr wrote: > >>> On Thu, Jul 5, 2018 at 4:20 PM Jason Baron <jbaron(a)akamai.com> wrote: > >>> > >>>> Hi, > >>>> > >>>> Opening tap devices, such as macvtap, that are created in containers > is > >>>> problematic because the interface for opening tap devices is via > >>>> /dev/tapNN and devtmpfs is not typically mounted inside a container as > >>>> its not namespace aware. It is possible to do a mknod() in the > >>>> container, once the tap devices are created, however, since the tap > >>>> devices are created dynamically its not possible to apriori allow > access > >>>> to certain major/minor numbers, since we don't know what these are > going > >>>> to be. In addition, its desirable to not allow the mknod capability in > >>>> containers. This behavior, I think is somewhat inconsistent with the > >>>> tuntap driver where one can create tuntap devices inside a container > by > >>>> first opening /dev/net/tun and then using them by supplying the tuntap > >>>> device name via the ioctl(TUNSETIFF). And since TUNSETIFF validates > the > >>>> network namespace, one is limited to opening network devices that > belong > >>>> to your current network namespace. > >>>> > >>>> Here are some options to this issue, that I wanted to get feedback > >>>> about, and just wondering if anybody else has run into this. > >>>> > >>>> 1) > >>>> > >>>> Don't create the tap device, such as macvtap in the container. > Instead, > >>>> create the tap device outside of the container and then move it into > the > >>>> desired container network namespace. In addition, do a mknod() for the > >>>> corresponding /dev/tapNN device from outside the container before > doing > >>>> chroot(). > >>>> > >>>> This solution still doesn't allow tap devices to be created inside the > >>>> container. Thus, in the case of kubevirt, which runs libvirtd inside > of > >>>> a container, it would mean changing libvirtd to open existing tap > >>>> devices (as opposed to the current behavior of creating new ones). > This > >>>> would not require any kernel changes, but as mentioned seems > >>>> inconsistent with the tuntap interface. > >>>> > >>> > >>> For KubeVirt, apart from how exactly the device ends up in the > >>> container, I > >>> would want to pursue a way where all network preparations which require > >>> privileges happens from a privileged process *outside* of the > container. > >>> Like CNI solutions do it. They run outside, have privileges and then > >>> create > >>> devices in the right network/mount namespace or move them there. The > >>> final > >>> goal for KubeVirt is that our pod with the qemu process is completely > >>> unprivileged and privileged setup happens from outside. > >>> > >>> As a consequence, and depending on which route Dan pursues with the > >>> restructured libvirt, I would assume that either a privileged > >>> libvirtd-part > >>> outside of containers creates the devices by entering the right > >>> namespaces, > >>> or that libvirt in the container can consume pre-created tun/tap > devices, > >>> like qemu. > >>> > >> > >> That would be nice, but as far as I understand there will always be a > >> need for > >> some privileges if you want to use a tap device. It's nice that CNI > >> does that > >> and all the containers can run unprivileged, but that's because they do > >> not open > >> the tap device and they do not do any privileged operations on it. But > >> QEMU > >> needs to. So the only way would be passing an opened fd to the > >> container or > >> opening the tap device there and making the fd usable for one process in > >> the > >> container. Is this already supported for some type of containers in > >> some way? > >> > >> Martin > > > >Hi, > > > >So another option here call it #3 is to pass open fds via unix sockets. > >If there are privileged operations that QEMU is trying to do with the fd > >though, how will opening it first and then passing it to an unprivileged > >QEMU address that? Is the opener doing those operations first? > > > > Sorry for the confusion, but QEMU is not doing any privileged operations. > I got > confused by the fact that anyone can open and do a R/W on a tap device. > But it > looks like that's on purpose. No capabilities are needed for opening > /dev/net/tun and calling ioctl(TUNSETIFF) with existing name and then > doing R/W > operations on it. It just works. > > Correct me if I'm wrong, but to sum it all up, the only things that we > need to > figure out (which might possibly be solved by ideas in the other thread) > are: > > tap: > - Existence of /dev/net/tun > - Having permissions to open it (0666 by default, shouldn't be a nig deal) > - Knowing the device name > > macvtap: > - Existence of /dev/tapXX > - Having permissions to open /dev/tapXX > - One of the following: > - Knowing the device name (and being able to translate it using a > netlink socket) > - Knowing the the device index > > The rest should be an implementation detail. > > Am I right? Did I miss anything? At least from the KubeVirt use-case that sounds to be the things which we would need to solve the networking setup in a similar way like the Container Network Interface implementations solve the setup in k8s. Best Regards, Roman

6 years, 11 months

1
0
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Devel July 2018