[libvirt] [PATCH] qemu: Relax hard RSS limit
by Michal Privoznik
Currently, if there's no hard memory limit defined for a domain,
libvirt tries to calculate one, based on domain definition and magic
equation and set it upon the domain startup. The rationale behind was,
if there's a memory leak or exploit in qemu, we should prevent the
host system trashing. However, the equation was too tightening, as it
didn't reflect what the kernel counts into the memory used by a
process. Since many hosts do have a swap, nobody hasn't noticed
anything, because if hard memory limit is reached, process can
continue allocating memory on a swap. However, if there is no swap on
the host, the process gets killed by OOM killer. In our case, the qemu
process it is.
To prevent this, we need to relax the hard RSS limit. Moreover, we
should reflect more precisely the kernel way of accounting the memory
for process. That is, even the kernel caches are counted within the
memory used by a process (within cgroups at least). Hence the magic
equation has to be changed:
limit = 1.5 * (domain memory + total video memory) + (32MB for cache
per each disk) + 200MB
---
There is a bit more that should be taken into account, e.g. shared
pages, where accounting is even more complicated:
"Shared pages are accounted on the basis of the first touch approach.
The cgroup that first touches a page is accounted for the page." [1]
I don't we even want to try to reflect this in our code. That's why
the coefficient of domain memory has been lifted from 1.02 to 1.5, in
hope it will just be enough.
1: http://www.kernel.org/doc/Documentation/cgroups/memory.txt
src/qemu/qemu_cgroup.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c
index 7faf025..16a9d7c 100644
--- a/src/qemu/qemu_cgroup.c
+++ b/src/qemu/qemu_cgroup.c
@@ -343,15 +343,18 @@ int qemuSetupCgroup(virQEMUDriverPtr driver,
unsigned long long hard_limit = vm->def->mem.hard_limit;
if (!hard_limit) {
- /* If there is no hard_limit set, set a reasonable
- * one to avoid system trashing caused by exploited qemu.
- * As 'reasonable limit' has been chosen:
- * (1 + k) * (domain memory + total video memory) + F
- * where k = 0.02 and F = 200MB. */
+ /* If there is no hard_limit set, set a reasonable one to avoid
+ * system trashing caused by exploited qemu. As 'reasonable limit'
+ * has been chosen:
+ * (1 + k) * (domain memory + total video memory) + (32MB for
+ * cache per each disk) + F
+ * where k = 0.5 and F = 200MB. The cache for disks is important as
+ * kernel cache on the host side counts into the RSS limit. */
hard_limit = vm->def->mem.max_balloon;
for (i = 0; i < vm->def->nvideos; i++)
hard_limit += vm->def->videos[i]->vram;
- hard_limit = hard_limit * 1.02 + 204800;
+ hard_limit = hard_limit * 1.5 + 204800;
+ hard_limit += vm->def->ndisks * 32768;
}
rc = virCgroupSetMemoryHardLimit(cgroup, hard_limit);
--
1.8.0.2
11 years, 11 months
[libvirt] [PATCH v2 00/11] Rework storage migration
by Michal Privoznik
This patch set re-implements migration with storage for enough new qemu.
Currently, you can migrate a domain to a host without need for shared storage.
This is done by setting 'blk' or 'inc' attribute (representing
VIR_MIGRATE_NON_SHARED_DISK and VIR_MIGRATE_NON_SHARED_INC flags respectively)
of 'migrate' monitor command. However, the qemu implementation is
buggy and applications are advised to switch to new impementation
which, moreover, offers some nice features, like migrating only explicitly
specified disks.
The new functionality is controlled via 'nbd-server-*' and 'drive-mirror'
commands. The flow is meant to look like this:
1) User invokes libvirt's migrate functionality.
2) libvirt checks that no block jobs are active on the source.
3) libvirt starts the destination QEMU and sets up the NBD server using the
nbd-server-start and nbd-server-add commands.
4) libvirt starts drive-mirror with a destination pointing to the remote NBD
server, for example nbd:host:port:exportname=diskname (where diskname is the
-drive id specified on the destination).
5) once all mirroring jobs reach steady state, libvirt invokes the migrate
command.
6) once migration completed, libvirt invokes the nbd-server-stop command on the
destination QEMU.
If we just skip the 2nd step and there is an active block-job, qemu will fail in
step 4. No big deal.
Since we try to NOT break migration and keep things compatible, this feature is
enabled iff both sides support it. Since there's obvious need for some data
transfer between src and dst, I've put it into qemuCookieMigration:
1) src -> dest: (QEMU_MIGRATION_PHASE_BEGIN3 -> QEMU_MIGRATION_PHASE_PREPARE)
<nbd>
<disk size='17179869184'/>
</nbd>
Hey destination, I know how to use this cool new feature. Moreover,
these are the disks I'll send you. Each one of them is X bytes big.
It's one of the prerequisite - the file (disk->src) on dst exists and has
at least the same size as on dst.
2) dst -> src: (QEMU_MIGRATION_PHASE_PREPARE -> QEMU_MIGRATION_PHASE_PERFORM3)
<nbd port='X'/>
Okay, I (destination) support this feature as well. I've created all
files as you (src) told me to and you can start rolling data. I am listening
on port X.
3) src -> dst: (QEMU_MIGRATION_PHASE_PERFORM3 -> QEMU_MIGRATION_PHASE_FINISH3)
<nbd port='-1'/>
Migration completed, destination, you may shut the NBD server down.
If either src or dst doesn't support NBD, it is not used and whole process fall
backs to old implementation.
diff to v1:
-Eric's and Daniel's suggestions worked in. To point out the bigger ones:
don't do NBD style when TUNNELLED requested, added 'b:writable' to
'nbd-server-add'
-drop '/qemu-migration/nbd/disk/@src' attribute from migration cookie.
As pointed out by Jirka, disk->src can be changed during migration (e.g. by
migration hook or by passed xml). So I've tried (as suggested on the list)
passing disk alias. However, since qemu hasn't been started on destination yet,
the aliases hasn't been generated yet. So we have to rely on ordering
completely.
The patches 1,3 and 5 has been ACKed already.
Michal Privoznik (11):
qemu: Introduce NBD_SERVER capability
Introduce NBD migration cookie
qemu: Introduce nbd-server-start command
qemu: Introduce nbd-server-add command
qemu: Introduce nbd-server-stop command
qemu_migration: Introduce qemuMigrationStartNBDServer
qemu_migration: Move port allocation to a separate func
qemu_migration: Implement qemuMigrationStartNBDServer()
qemu_migration: Implement qemuMigrationDriveMirror
qemu_migration: Check size prerequisites
qemu_migration: Stop NBD server at Finish phase
src/qemu/qemu_capabilities.c | 3 +
src/qemu/qemu_capabilities.h | 1 +
src/qemu/qemu_driver.c | 8 +-
src/qemu/qemu_migration.c | 620 +++++++++++++++++++++++++++++++++++++++---
src/qemu/qemu_migration.h | 6 +-
src/qemu/qemu_monitor.c | 63 +++++
src/qemu/qemu_monitor.h | 7 +
src/qemu/qemu_monitor_json.c | 95 +++++++
src/qemu/qemu_monitor_json.h | 7 +
9 files changed, 772 insertions(+), 38 deletions(-)
--
1.7.8.6
11 years, 11 months
[libvirt] [PATCH] BZ 657918 Default iptables setup in libvirt breaks mDNS
by Brian J. Murrell
Hi,
Per the request on https://bugzilla.redhat.com/show_bug.cgi?id=657918
please find attached a patch that should address the issue.
I'm not subscribed to this list though (I know, it's pretty rude, but
my e-mail traffic is already too heavy to add another list to it), so
if you could either CC me on any follow-up or just move followups to
the BZ ticket where the patch also appears, that would be great.
Cheers,
b.
--- src/network/bridge_driver.c.orig 2012-10-27 16:56:23.000000000 -0400
+++ src/network/bridge_driver.c 2012-12-11 15:49:13.937133883 -0500
@@ -1301,9 +1301,10 @@
*
* We need to end up with 3 rules in the table in this order
*
- * 1. protocol=tcp with sport mapping restriction
- * 2. protocol=udp with sport mapping restriction
- * 3. generic any protocol
+ * 1. multicast is exempted
+ * 2. protocol=tcp with sport mapping restriction
+ * 3. protocol=udp with sport mapping restriction
+ * 4. generic any protocol
*
* The sport mappings are required, because default IPtables
* MASQUERADE maintain port numbers unchanged where possible.
@@ -1361,8 +1362,21 @@
goto masqerr5;
}
+ /* exempt multicast traffic */
+ if (iptablesAddForwardMasqueradeExempt(driver->iptables) < 0) {
+ virReportError(VIR_ERR_SYSTEM_ERROR,
+ _("failed to add iptables rule to exempt multicast traffic from masquerading"));
+ goto masqerr6;
+ }
+
return 0;
+ masqerr6:
+ iptablesRemoveForwardMasquerade(driver->iptables,
+ &ipdef->address,
+ prefix,
+ forwardIf,
+ "tcp");
masqerr5:
iptablesRemoveForwardMasquerade(driver->iptables,
&ipdef->address,
--- src/util/iptables.c.orig 2012-10-27 16:56:23.000000000 -0400
+++ src/util/iptables.c 2012-12-11 15:53:28.715044866 -0500
@@ -858,6 +858,26 @@
}
/**
+ * iptablesAddForwardMasqueradeExempt:
+ * @ctx: pointer to the IP table context
+ *
+ * Add rules to the IP table context to exempt masquerading
+ * for multicast networks
+ *
+ * Returns 0 in case of success or an error code otherwise
+ */
+int
+iptablesAddForwardMasqueradeExempt(iptablesContext *ctx)
+{
+ return iptablesAddRemoveRule(ctx->nat_postrouting,
+ AF_INET,
+ ADD,
+ "--destination", "224.0.0.0/4",
+ "--jump", "RETURN",
+ NULL);
+}
+
+/**
* iptablesAddForwardMasquerade:
* @ctx: pointer to the IP table context
* @network: the source network name
--- src/util/iptables.h.orig 2012-10-27 16:56:23.000000000 -0400
+++ src/util/iptables.h 2012-12-11 15:57:03.284144679 -0500
@@ -101,6 +101,7 @@
int family,
const char *iface);
+int iptablesAddForwardMasqueradeExempt (iptablesContext *ctx);
int iptablesAddForwardMasquerade (iptablesContext *ctx,
virSocketAddr *netaddr,
unsigned int prefix,
--- src/libvirt_private.syms.orig 2012-12-11 15:46:11.141932324 -0500
+++ src/libvirt_private.syms 2012-12-11 15:58:11.715865516 -0500
@@ -681,6 +681,7 @@
iptablesAddForwardAllowOut;
iptablesAddForwardAllowRelatedIn;
iptablesAddForwardMasquerade;
+iptablesAddForwardMasqueradeExempt;
iptablesAddForwardRejectIn;
iptablesAddForwardRejectOut;
iptablesAddOutputFixUdpChecksum;
11 years, 11 months
[libvirt] [PATCH RESEND] build: libvirt-guests files misplaced in specfile
by Viktor Mihajlovski
In a non-systemd environment the post and preun scripts of libvirt-client
fail, since the required files are in libvirt-daemon. Moved them to client.
Doing that I noticed %{_unitdir}/libvirt-guests.service was contained in
both libvirt-client and libvirt-daemon, which I don't think was intended.
Removed the extra copy from daemon.
Signed-off-by: Viktor Mihajlovski <mihajlov(a)linux.vnet.ibm.com>
---
libvirt.spec.in | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/libvirt.spec.in b/libvirt.spec.in
index 0a5a8e0..52da4f4 100644
--- a/libvirt.spec.in
+++ b/libvirt.spec.in
@@ -1665,15 +1665,12 @@ fi
%dir %attr(0700, root, root) %{_sysconfdir}/libvirt/nwfilter/
-%attr(0755, root, root) %{_libexecdir}/libvirt-guests.sh
%if %{with_systemd}
%{_unitdir}/libvirtd.service
-%{_unitdir}/libvirt-guests.service
%{_unitdir}/virtlockd.service
%{_unitdir}/virtlockd.socket
%else
%{_sysconfdir}/rc.d/init.d/libvirtd
-%{_sysconfdir}/rc.d/init.d/libvirt-guests
%{_sysconfdir}/rc.d/init.d/virtlockd
%endif
%doc daemon/libvirtd.upstart
@@ -1951,8 +1948,11 @@ rm -f $RPM_BUILD_ROOT%{_prefix}/lib/sysctl.d/libvirtd.conf
%if %{with_systemd}
%{_unitdir}/libvirt-guests.service
+%else
+%{_sysconfdir}/rc.d/init.d/libvirt-guests
%endif
%config(noreplace) %{_sysconfdir}/sysconfig/libvirt-guests
+%attr(0755, root, root) %{_libexecdir}/libvirt-guests.sh
%dir %attr(0755, root, root) %{_localstatedir}/lib/libvirt/
%if %{with_sasl}
--
1.7.9.5
11 years, 11 months
[libvirt] [PATCH v2] Resolve FORWARD_NULL errors found by Coverity
by John Ferlan
Reworked the usbFreeDevice changes to allow NULL to be passed and added
usbFreeDevice to the cfg.mk file.
John Ferlan (1):
util: Check for NULL 'dev' on input to usbFreeDevice
cfg.mk | 1 +
src/util/virusb.c | 5 +++--
2 files changed, 4 insertions(+), 2 deletions(-)
--
1.7.11.7
11 years, 11 months
[libvirt] [PATCH] docs: remove duplicate check in index.add
by Claudio Bley
Signed-off-by: Claudio Bley <cbley(a)av-test.de>
---
Pushing under the trivial rule.
docs/apibuild.py | 2 --
1 file changed, 2 deletions(-)
diff --git a/docs/apibuild.py b/docs/apibuild.py
index e73a85d..7b336b3 100755
--- a/docs/apibuild.py
+++ b/docs/apibuild.py
@@ -240,8 +240,6 @@ class index:
self.includes[name] = d
elif type == "struct":
self.structs[name] = d
- elif type == "struct":
- self.structs[name] = d
elif type == "union":
self.unions[name] = d
elif type == "enum":
--
1.7.9.5
11 years, 11 months
[libvirt] [PATCH] maint: avoid potential promotion issues with [ug]id_t
by Eric Blake
POSIX does not guarantee whether uid_t and gid_t are signed or
unsigned, nor does it guarantee whether they are smaller, same
size, or larger than int (or even the same size as one another).
Therefore, it is possible to have platforms where '(uid_t)-1==-1'
is false or where 'uid = gid = -1' sets uid to the wrong value,
thanks to integer promotion rules. The only portable way to use
the placeholder value of these two types is to always use a cast.
Thankfully, the issue is mostly theoretical - sanlock only
compiles on Linux for now, and on Linux, these types do not
suffer from strange promotion problems.
* src/locking/lock_driver_sanlock.c
(virLockManagerSanlockSetupLockspace, virLockManagerSanlockInit)
(virLockManagerSanlockCreateLease): Cast -1 to proper type before
comparing with uid_t or gid_t.
---
src/locking/lock_driver_sanlock.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/src/locking/lock_driver_sanlock.c b/src/locking/lock_driver_sanlock.c
index c955003..d06fa66 100644
--- a/src/locking/lock_driver_sanlock.c
+++ b/src/locking/lock_driver_sanlock.c
@@ -1,7 +1,7 @@
/*
* lock_driver_sanlock.c: A lock driver for Sanlock
*
- * Copyright (C) 2010-2012 Red Hat, Inc.
+ * Copyright (C) 2010-2013 Red Hat, Inc.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
@@ -236,7 +236,7 @@ static int virLockManagerSanlockSetupLockspace(void)
goto error;
}
- if (driver->group != -1)
+ if (driver->group != (gid_t) -1)
perms |= 0060;
if ((fd = open(path, O_WRONLY|O_CREAT|O_EXCL, perms)) < 0) {
@@ -249,7 +249,7 @@ static int virLockManagerSanlockSetupLockspace(void)
VIR_DEBUG("Someone else just created lockspace %s", path);
} else {
/* chown() the path to make sure sanlock can access it */
- if ((driver->user != -1 || driver->group != -1) &&
+ if ((driver->user != (uid_t) -1 || driver->group != (gid_t) -1) &&
(fchown(fd, driver->user, driver->group) < 0)) {
virReportSystemError(errno,
_("cannot chown '%s' to (%u, %u)"),
@@ -303,8 +303,8 @@ static int virLockManagerSanlockSetupLockspace(void)
}
} else if (S_ISREG(st.st_mode)) {
/* okay, the lease file exists. Check the permissions */
- if (((driver->user != -1 && driver->user != st.st_uid) ||
- (driver->group != -1 && driver->group != st.st_gid)) &&
+ if (((driver->user != (uid_t) -1 && driver->user != st.st_uid) ||
+ (driver->group != (gid_t) -1 && driver->group != st.st_gid)) &&
(chown(path, driver->user, driver->group) < 0)) {
virReportSystemError(errno,
_("cannot chown '%s' to (%u, %u)"),
@@ -314,7 +314,7 @@ static int virLockManagerSanlockSetupLockspace(void)
goto error;
}
- if ((driver->group != -1 && (st.st_mode & 0060) != 0060) &&
+ if ((driver->group != (gid_t) -1 && (st.st_mode & 0060) != 0060) &&
chmod(path, 0660) < 0) {
virReportSystemError(errno,
_("cannot chmod '%s' to 0660"),
@@ -397,7 +397,8 @@ static int virLockManagerSanlockInit(unsigned int version,
driver->requireLeaseForDisks = true;
driver->hostID = 0;
driver->autoDiskLease = false;
- driver->user = driver->group = -1;
+ driver->user = (uid_t) -1;
+ driver->group = (gid_t) -1;
if (!(driver->autoDiskLeasePath = strdup(LOCALSTATEDIR "/lib/libvirt/sanlock"))) {
VIR_FREE(driver);
virReportOOMError();
@@ -680,7 +681,7 @@ static int virLockManagerSanlockCreateLease(struct sanlk_resource *res)
VIR_DEBUG("Someone else just created lockspace %s", res->disks[0].path);
} else {
/* chown() the path to make sure sanlock can access it */
- if ((driver->user != -1 || driver->group != -1) &&
+ if ((driver->user != (uid_t) -1 || driver->group != (gid_t) -1) &&
(fchown(fd, driver->user, driver->group) < 0)) {
virReportSystemError(errno,
_("cannot chown '%s' to (%u, %u)"),
--
1.8.0.2
11 years, 11 months
[libvirt] [PATCH] build: avoid compiler warning
by Eric Blake
gcc 4.1.2 on RHEL 5 warned:
conf/network_conf.c:3136: warning: 'foundIdx' may be used uninitialized in this function
The warning is spurious, but initializing the variable doesn't hurt.
* src/conf/network_conf.c (virNetworkDefUpdateDNSHost): Silence
unused variable warning.
---
Pushing under the build-breaker rule.
src/conf/network_conf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/conf/network_conf.c b/src/conf/network_conf.c
index d185b46..48639a9 100644
--- a/src/conf/network_conf.c
+++ b/src/conf/network_conf.c
@@ -3133,7 +3133,7 @@ virNetworkDefUpdateDNSHost(virNetworkDefPtr def,
/* virNetworkUpdateFlags */
unsigned int fflags ATTRIBUTE_UNUSED)
{
- int ii, jj, kk, foundIdx, ret = -1;
+ int ii, jj, kk, foundIdx = -1, ret = -1;
virNetworkDNSDefPtr dns = &def->dns;
virNetworkDNSHostDef host;
bool isAdd = (command == VIR_NETWORK_UPDATE_COMMAND_ADD_FIRST ||
--
1.8.0.2
11 years, 11 months
[libvirt] [PATCH] Resolve COPY_PASTE error found by Coverity
by John Ferlan
Fix copy-paste error doing handshake.
The clientShake was not set to true, thus we'd potentially never leave
the handshake while loop.
---
tests/virnettlscontexttest.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tests/virnettlscontexttest.c b/tests/virnettlscontexttest.c
index a1e71dc..49462df 100644
--- a/tests/virnettlscontexttest.c
+++ b/tests/virnettlscontexttest.c
@@ -662,7 +662,7 @@ static int testTLSSessionInit(const void *opaque)
if (rv < 0)
goto cleanup;
if (rv == VIR_NET_TLS_HANDSHAKE_COMPLETE)
- serverShake = true;
+ clientShake = true;
}
} while (!clientShake && !serverShake);
--
1.7.11.7
11 years, 11 months
[libvirt] [PATCH] storage: fix leak in virStorageBackendLogicalMakeVol
by Ján Tomko
Use regfree instead of VIR_FREE.
---
src/storage/storage_backend_logical.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/src/storage/storage_backend_logical.c b/src/storage/storage_backend_logical.c
index 2734689..bd902fe 100644
--- a/src/storage/storage_backend_logical.c
+++ b/src/storage/storage_backend_logical.c
@@ -264,7 +264,7 @@ virStorageBackendLogicalMakeVol(virStoragePoolObjPtr pool,
cleanup:
VIR_FREE(regex);
- VIR_FREE(reg);
+ regfree(reg);
VIR_FREE(vars);
if (is_new_vol && (ret == -1))
virStorageVolDefFree(vol);
--
1.7.8.6
11 years, 11 months