On Mon, Jan 18, 2016 at 11:39:34AM +0100, Michal Privoznik wrote:
So I can observe this crasher that with freshly started daemon
(and virtlogd enabled) I am trying to startup a domain that
immediately dies (because it's said to use huge pages but I
haven't allocated a single one in the pool). But what's
interesting is that the daemon crashes too. Hardly reproducible
with -O0 or under valgrind. But I just got lucky:
==20469== Invalid write of size 8
==20469== at 0x4C2E99B: memcpy(a)GLIBC_2.2.5 (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==20469== by 0x217EDD07: qemuProcessReadLog (qemu_process.c:1670)
==20469== by 0x217EDE1D: qemuProcessReportLogError (qemu_process.c:1696)
==20469== by 0x217EE8C1: qemuProcessWaitForMonitor (qemu_process.c:1957)
==20469== by 0x217F6636: qemuProcessLaunch (qemu_process.c:4955)
==20469== by 0x217F71A4: qemuProcessStart (qemu_process.c:5152)
==20469== by 0x21846582: qemuDomainObjStart (qemu_driver.c:7396)
==20469== by 0x218467DE: qemuDomainCreateWithFlags (qemu_driver.c:7450)
==20469== by 0x21846845: qemuDomainCreate (qemu_driver.c:7468)
==20469== by 0x5611CD0: virDomainCreate (libvirt-domain.c:6753)
==20469== by 0x125D9A: remoteDispatchDomainCreate (remote_dispatch.h:3613)
==20469== by 0x125CB7: remoteDispatchDomainCreateHelper (remote_dispatch.h:3589)
==20469== Address 0x27a52ad0 is 0 bytes after a block of size 5,584 alloc'd
==20469== at 0x4C29F80: malloc (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==20469== by 0x9B8D1DB: xdr_string (in /lib64/libc-2.21.so)
==20469== by 0x563B39C: xdr_virLogManagerProtocolNonNullString (log_protocol.c:24)
==20469== by 0x563B6B7: xdr_virLogManagerProtocolDomainReadLogFileRet
(log_protocol.c:123)
==20469== by 0x164B34: virNetMessageDecodePayload (virnetmessage.c:407)
==20469== by 0x5682360: virNetClientProgramCall (virnetclientprogram.c:379)
==20469== by 0x563B30E: virLogManagerDomainReadLogFile (log_manager.c:272)
==20469== by 0x217CD613: qemuDomainLogContextRead (qemu_domain.c:2485)
==20469== by 0x217EDC76: qemuProcessReadLog (qemu_process.c:1660)
==20469== by 0x217EDE1D: qemuProcessReportLogError (qemu_process.c:1696)
==20469== by 0x217EE8C1: qemuProcessWaitForMonitor (qemu_process.c:1957)
==20469== by 0x217F6636: qemuProcessLaunch (qemu_process.c:4955)
This points to memmove() in qemuProcessReadLog(). And in fact
after few moments and a couple of papers (to debug the code) I
can observe the bug. Imagine we just read the following string
from qemu:
"abc\n2016-01-18T09:40:44.022744Z qemu-system-x86_64: Error\n"
After the first pass of the while() loop in the
qemuProcessReadLog() @buf still points to the beginning of the
string, @filter_next points to the beginning of the second line.
So we start second iteration because there is yet another newline
character at the end. In this iteration @eol points to it
actually. Now, the control gets inside true branch of if(). Just
to remind you:
got = 58
filter_next = buf + 5,
eol = buf + 58.
So several weeks ago I spent non-negligible amount of hours on this,
with papers written all over and everything. I couldn't find a problem
in it, but I didn't consider the fact that in first iteration we can go
with the if condition being false, hence taking the else branch which
forgets to update 'got'. I always set got -= skip. Good catch, but
please mention that in the first iteration we take the false branch,
thanks.
Therefore skip = 54 which is correct. The message we want to skip
is 54 bytes long. However:
memmove(filter_next, eol + 1, (got - skip) +1);
which is
memmove(filter_next, eol + 1, 5)
is obviously wrong as there is only one byte we can access, not 5!
Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
---
src/qemu/qemu_process.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 05cbda2..f85afd5 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -1667,7 +1667,7 @@ qemuProcessReadLog(qemuDomainLogContextPtr logCtxt, char **msg)
if (virLogProbablyLogMessage(filter_next) ||
STRPREFIX(filter_next, "char device redirected to")) {
size_t skip = (eol + 1) - filter_next;
- memmove(filter_next, eol + 1, (got - skip) + 1);
+ memmove(filter_next, eol + 1, buf + got - eol);
I wonder why do we even deal with this when qemuDomainLogContextRead()
always returns buffer that is zero-terminated, but that's a question for
another day, I guess.
ACK and we could also probably backport to maint branches.
Martin