[libvirt] [PATCH 0/2] Fix log related crasher

*** BLURB HERE *** Michal Privoznik (2): qemuProcessReadLog: Fix memcpy arguments virLogManagerDomainReadLogFile: Don't do dummy allocs src/logging/log_manager.c | 4 +--- src/qemu/qemu_process.c | 2 +- 2 files changed, 2 insertions(+), 4 deletions(-) -- 2.4.10

So I can observe this crasher that with freshly started daemon (and virtlogd enabled) I am trying to startup a domain that immediately dies (because it's said to use huge pages but I haven't allocated a single one in the pool). But what's interesting is that the daemon crashes too. Hardly reproducible with -O0 or under valgrind. But I just got lucky: ==20469== Invalid write of size 8 ==20469== at 0x4C2E99B: memcpy@GLIBC_2.2.5 (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==20469== by 0x217EDD07: qemuProcessReadLog (qemu_process.c:1670) ==20469== by 0x217EDE1D: qemuProcessReportLogError (qemu_process.c:1696) ==20469== by 0x217EE8C1: qemuProcessWaitForMonitor (qemu_process.c:1957) ==20469== by 0x217F6636: qemuProcessLaunch (qemu_process.c:4955) ==20469== by 0x217F71A4: qemuProcessStart (qemu_process.c:5152) ==20469== by 0x21846582: qemuDomainObjStart (qemu_driver.c:7396) ==20469== by 0x218467DE: qemuDomainCreateWithFlags (qemu_driver.c:7450) ==20469== by 0x21846845: qemuDomainCreate (qemu_driver.c:7468) ==20469== by 0x5611CD0: virDomainCreate (libvirt-domain.c:6753) ==20469== by 0x125D9A: remoteDispatchDomainCreate (remote_dispatch.h:3613) ==20469== by 0x125CB7: remoteDispatchDomainCreateHelper (remote_dispatch.h:3589) ==20469== Address 0x27a52ad0 is 0 bytes after a block of size 5,584 alloc'd ==20469== at 0x4C29F80: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==20469== by 0x9B8D1DB: xdr_string (in /lib64/libc-2.21.so) ==20469== by 0x563B39C: xdr_virLogManagerProtocolNonNullString (log_protocol.c:24) ==20469== by 0x563B6B7: xdr_virLogManagerProtocolDomainReadLogFileRet (log_protocol.c:123) ==20469== by 0x164B34: virNetMessageDecodePayload (virnetmessage.c:407) ==20469== by 0x5682360: virNetClientProgramCall (virnetclientprogram.c:379) ==20469== by 0x563B30E: virLogManagerDomainReadLogFile (log_manager.c:272) ==20469== by 0x217CD613: qemuDomainLogContextRead (qemu_domain.c:2485) ==20469== by 0x217EDC76: qemuProcessReadLog (qemu_process.c:1660) ==20469== by 0x217EDE1D: qemuProcessReportLogError (qemu_process.c:1696) ==20469== by 0x217EE8C1: qemuProcessWaitForMonitor (qemu_process.c:1957) ==20469== by 0x217F6636: qemuProcessLaunch (qemu_process.c:4955) This points to memmove() in qemuProcessReadLog(). And in fact after few moments and a couple of papers (to debug the code) I can observe the bug. Imagine we just read the following string from qemu: "abc\n2016-01-18T09:40:44.022744Z qemu-system-x86_64: Error\n" After the first pass of the while() loop in the qemuProcessReadLog() @buf still points to the beginning of the string, @filter_next points to the beginning of the second line. So we start second iteration because there is yet another newline character at the end. In this iteration @eol points to it actually. Now, the control gets inside true branch of if(). Just to remind you: got = 58 filter_next = buf + 5, eol = buf + 58. Therefore skip = 54 which is correct. The message we want to skip is 54 bytes long. However: memmove(filter_next, eol + 1, (got - skip) +1); which is memmove(filter_next, eol + 1, 5) is obviously wrong as there is only one byte we can access, not 5! Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 05cbda2..f85afd5 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1667,7 +1667,7 @@ qemuProcessReadLog(qemuDomainLogContextPtr logCtxt, char **msg) if (virLogProbablyLogMessage(filter_next) || STRPREFIX(filter_next, "char device redirected to")) { size_t skip = (eol + 1) - filter_next; - memmove(filter_next, eol + 1, (got - skip) + 1); + memmove(filter_next, eol + 1, buf + got - eol); got -= skip; } else { filter_next = eol + 1; -- 2.4.10

s/memcpy/memmove/ in the summary On Mon, Jan 18, 2016 at 11:39:34AM +0100, Michal Privoznik wrote:
So I can observe this crasher that with freshly started daemon (and virtlogd enabled) I am trying to startup a domain that immediately dies (because it's said to use huge pages but I haven't allocated a single one in the pool).
But what's interesting is that the daemon crashes too.
Redundant sentence.
Hardly reproducible with -O0 or under valgrind.
But I just got lucky:
Also redundant.
==20469== Invalid write of size 8 ==20469== at 0x4C2E99B: memcpy@GLIBC_2.2.5 (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==20469== by 0x217EDD07: qemuProcessReadLog (qemu_process.c:1670) ==20469== by 0x217EDE1D: qemuProcessReportLogError (qemu_process.c:1696) ==20469== by 0x217EE8C1: qemuProcessWaitForMonitor (qemu_process.c:1957) ==20469== by 0x217F6636: qemuProcessLaunch (qemu_process.c:4955) ==20469== by 0x217F71A4: qemuProcessStart (qemu_process.c:5152) ==20469== by 0x21846582: qemuDomainObjStart (qemu_driver.c:7396) ==20469== by 0x218467DE: qemuDomainCreateWithFlags (qemu_driver.c:7450) ==20469== by 0x21846845: qemuDomainCreate (qemu_driver.c:7468) ==20469== by 0x5611CD0: virDomainCreate (libvirt-domain.c:6753) ==20469== by 0x125D9A: remoteDispatchDomainCreate (remote_dispatch.h:3613) ==20469== by 0x125CB7: remoteDispatchDomainCreateHelper (remote_dispatch.h:3589) ==20469== Address 0x27a52ad0 is 0 bytes after a block of size 5,584 alloc'd ==20469== at 0x4C29F80: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==20469== by 0x9B8D1DB: xdr_string (in /lib64/libc-2.21.so) ==20469== by 0x563B39C: xdr_virLogManagerProtocolNonNullString (log_protocol.c:24) ==20469== by 0x563B6B7: xdr_virLogManagerProtocolDomainReadLogFileRet (log_protocol.c:123) ==20469== by 0x164B34: virNetMessageDecodePayload (virnetmessage.c:407) ==20469== by 0x5682360: virNetClientProgramCall (virnetclientprogram.c:379) ==20469== by 0x563B30E: virLogManagerDomainReadLogFile (log_manager.c:272) ==20469== by 0x217CD613: qemuDomainLogContextRead (qemu_domain.c:2485) ==20469== by 0x217EDC76: qemuProcessReadLog (qemu_process.c:1660) ==20469== by 0x217EDE1D: qemuProcessReportLogError (qemu_process.c:1696) ==20469== by 0x217EE8C1: qemuProcessWaitForMonitor (qemu_process.c:1957) ==20469== by 0x217F6636: qemuProcessLaunch (qemu_process.c:4955)
The backtrace matches this report by Luyao: https://www.redhat.com/archives/libvir-list/2015-December/msg00329.html
This points to memmove() in qemuProcessReadLog().
And in fact after few moments and a couple of papers (to debug the code) I can observe the bug.
This can be dropped too.
Imagine we just read the following string from qemu:
"abc\n2016-01-18T09:40:44.022744Z qemu-system-x86_64: Error\n"
After the first pass of the while() loop in the qemuProcessReadLog() @buf still points to the beginning of the string, @filter_next points to the beginning of the second line. So we start second iteration because there is yet another newline character at the end. In this iteration @eol points to it actually. Now, the control gets inside true branch of if(). Just to remind you:
got = 58 filter_next = buf + 5, eol = buf + 58.
Therefore skip = 54 which is correct. The message we want to skip is 54 bytes long. However:
memmove(filter_next, eol + 1, (got - skip) +1);
which is
memmove(filter_next, eol + 1, 5)
is obviously wrong as there is only one byte we can access, not 5!
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Introduced by v1.2.21-150-g69b0992
--- src/qemu/qemu_process.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
ACK if you strip your life story from the commit message. Jan
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 05cbda2..f85afd5 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1667,7 +1667,7 @@ qemuProcessReadLog(qemuDomainLogContextPtr logCtxt, char **msg) if (virLogProbablyLogMessage(filter_next) || STRPREFIX(filter_next, "char device redirected to")) { size_t skip = (eol + 1) - filter_next; - memmove(filter_next, eol + 1, (got - skip) + 1); + memmove(filter_next, eol + 1, buf + got - eol); got -= skip; } else { filter_next = eol + 1; -- 2.4.10
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Mon, Jan 18, 2016 at 11:39:34AM +0100, Michal Privoznik wrote:
So I can observe this crasher that with freshly started daemon (and virtlogd enabled) I am trying to startup a domain that immediately dies (because it's said to use huge pages but I haven't allocated a single one in the pool). But what's interesting is that the daemon crashes too. Hardly reproducible with -O0 or under valgrind. But I just got lucky:
==20469== Invalid write of size 8 ==20469== at 0x4C2E99B: memcpy@GLIBC_2.2.5 (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==20469== by 0x217EDD07: qemuProcessReadLog (qemu_process.c:1670) ==20469== by 0x217EDE1D: qemuProcessReportLogError (qemu_process.c:1696) ==20469== by 0x217EE8C1: qemuProcessWaitForMonitor (qemu_process.c:1957) ==20469== by 0x217F6636: qemuProcessLaunch (qemu_process.c:4955) ==20469== by 0x217F71A4: qemuProcessStart (qemu_process.c:5152) ==20469== by 0x21846582: qemuDomainObjStart (qemu_driver.c:7396) ==20469== by 0x218467DE: qemuDomainCreateWithFlags (qemu_driver.c:7450) ==20469== by 0x21846845: qemuDomainCreate (qemu_driver.c:7468) ==20469== by 0x5611CD0: virDomainCreate (libvirt-domain.c:6753) ==20469== by 0x125D9A: remoteDispatchDomainCreate (remote_dispatch.h:3613) ==20469== by 0x125CB7: remoteDispatchDomainCreateHelper (remote_dispatch.h:3589) ==20469== Address 0x27a52ad0 is 0 bytes after a block of size 5,584 alloc'd ==20469== at 0x4C29F80: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==20469== by 0x9B8D1DB: xdr_string (in /lib64/libc-2.21.so) ==20469== by 0x563B39C: xdr_virLogManagerProtocolNonNullString (log_protocol.c:24) ==20469== by 0x563B6B7: xdr_virLogManagerProtocolDomainReadLogFileRet (log_protocol.c:123) ==20469== by 0x164B34: virNetMessageDecodePayload (virnetmessage.c:407) ==20469== by 0x5682360: virNetClientProgramCall (virnetclientprogram.c:379) ==20469== by 0x563B30E: virLogManagerDomainReadLogFile (log_manager.c:272) ==20469== by 0x217CD613: qemuDomainLogContextRead (qemu_domain.c:2485) ==20469== by 0x217EDC76: qemuProcessReadLog (qemu_process.c:1660) ==20469== by 0x217EDE1D: qemuProcessReportLogError (qemu_process.c:1696) ==20469== by 0x217EE8C1: qemuProcessWaitForMonitor (qemu_process.c:1957) ==20469== by 0x217F6636: qemuProcessLaunch (qemu_process.c:4955)
This points to memmove() in qemuProcessReadLog(). And in fact after few moments and a couple of papers (to debug the code) I can observe the bug. Imagine we just read the following string from qemu:
"abc\n2016-01-18T09:40:44.022744Z qemu-system-x86_64: Error\n"
After the first pass of the while() loop in the qemuProcessReadLog() @buf still points to the beginning of the string, @filter_next points to the beginning of the second line. So we start second iteration because there is yet another newline character at the end. In this iteration @eol points to it actually. Now, the control gets inside true branch of if(). Just to remind you:
got = 58 filter_next = buf + 5, eol = buf + 58.
So several weeks ago I spent non-negligible amount of hours on this, with papers written all over and everything. I couldn't find a problem in it, but I didn't consider the fact that in first iteration we can go with the if condition being false, hence taking the else branch which forgets to update 'got'. I always set got -= skip. Good catch, but please mention that in the first iteration we take the false branch, thanks.
Therefore skip = 54 which is correct. The message we want to skip is 54 bytes long. However:
memmove(filter_next, eol + 1, (got - skip) +1);
which is
memmove(filter_next, eol + 1, 5)
is obviously wrong as there is only one byte we can access, not 5!
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 05cbda2..f85afd5 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1667,7 +1667,7 @@ qemuProcessReadLog(qemuDomainLogContextPtr logCtxt, char **msg) if (virLogProbablyLogMessage(filter_next) || STRPREFIX(filter_next, "char device redirected to")) { size_t skip = (eol + 1) - filter_next; - memmove(filter_next, eol + 1, (got - skip) + 1); + memmove(filter_next, eol + 1, buf + got - eol);
I wonder why do we even deal with this when qemuDomainLogContextRead() always returns buffer that is zero-terminated, but that's a question for another day, I guess. ACK and we could also probably backport to maint branches. Martin

Since we pass a dummy variables @fdout and @fdoutlen into virNetClientProgramCall() we make it alloc @fdout array (even though it's array of 0 elements since vitlogd can hardly pass us some FDs at this stage). Nevertheless, it's an allocation not followed by free(): ==29385== 0 bytes in 60 blocks are definitely lost in loss record 2 of 1,009 ==29385== at 0x4C2C070: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==29385== by 0x54B99EF: virAllocN (viralloc.c:191) ==29385== by 0x56821B1: virNetClientProgramCall (virnetclientprogram.c:359) ==29385== by 0x563B304: virLogManagerDomainReadLogFile (log_manager.c:272) ==29385== by 0x217CD613: qemuDomainLogContextRead (qemu_domain.c:2485) ==29385== by 0x217EDC76: qemuProcessReadLog (qemu_process.c:1660) ==29385== by 0x217EDE1D: qemuProcessReportLogError (qemu_process.c:1696) ==29385== by 0x217EE8C1: qemuProcessWaitForMonitor (qemu_process.c:1957) ==29385== by 0x217F6636: qemuProcessLaunch (qemu_process.c:4955) ==29385== by 0x217F71A4: qemuProcessStart (qemu_process.c:5152) ==29385== by 0x21846582: qemuDomainObjStart (qemu_driver.c:7396) ==29385== by 0x218467DE: qemuDomainCreateWithFlags (qemu_driver.c:7450) Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/logging/log_manager.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/src/logging/log_manager.c b/src/logging/log_manager.c index 290e2c8..a9c9e72 100644 --- a/src/logging/log_manager.c +++ b/src/logging/log_manager.c @@ -256,8 +256,6 @@ virLogManagerDomainReadLogFile(virLogManagerPtr mgr, { struct virLogManagerProtocolDomainReadLogFileArgs args; struct virLogManagerProtocolDomainReadLogFileRet ret; - int *fdout = NULL; - size_t fdoutlen = 0; char *rv = NULL; memset(&args, 0, sizeof(args)); @@ -273,7 +271,7 @@ virLogManagerDomainReadLogFile(virLogManagerPtr mgr, mgr->client, mgr->serial++, VIR_LOG_MANAGER_PROTOCOL_PROC_DOMAIN_READ_LOG_FILE, - 0, NULL, &fdoutlen, &fdout, + 0, NULL, 0, NULL, (xdrproc_t)xdr_virLogManagerProtocolDomainReadLogFileArgs, &args, (xdrproc_t)xdr_virLogManagerProtocolDomainReadLogFileRet, &ret) < 0) goto cleanup; -- 2.4.10

On Mon, Jan 18, 2016 at 11:39:35AM +0100, Michal Privoznik wrote:
Since we pass a dummy variables @fdout and @fdoutlen into virNetClientProgramCall() we make it alloc @fdout array (even though it's array of 0 elements since vitlogd can hardly pass us some FDs at this stage). Nevertheless, it's an allocation not followed by free():
==29385== 0 bytes in 60 blocks are definitely lost in loss record 2 of 1,009 ==29385== at 0x4C2C070: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==29385== by 0x54B99EF: virAllocN (viralloc.c:191) ==29385== by 0x56821B1: virNetClientProgramCall (virnetclientprogram.c:359) ==29385== by 0x563B304: virLogManagerDomainReadLogFile (log_manager.c:272) ==29385== by 0x217CD613: qemuDomainLogContextRead (qemu_domain.c:2485) ==29385== by 0x217EDC76: qemuProcessReadLog (qemu_process.c:1660) ==29385== by 0x217EDE1D: qemuProcessReportLogError (qemu_process.c:1696) ==29385== by 0x217EE8C1: qemuProcessWaitForMonitor (qemu_process.c:1957) ==29385== by 0x217F6636: qemuProcessLaunch (qemu_process.c:4955) ==29385== by 0x217F71A4: qemuProcessStart (qemu_process.c:5152) ==29385== by 0x21846582: qemuDomainObjStart (qemu_driver.c:7396) ==29385== by 0x218467DE: qemuDomainCreateWithFlags (qemu_driver.c:7450)
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/logging/log_manager.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
ACK Jan

On Mon, Jan 18, 2016 at 11:39:35AM +0100, Michal Privoznik wrote:
Since we pass a dummy variables @fdout and @fdoutlen into
s/a //
virNetClientProgramCall() we make it alloc @fdout array (even though it's array of 0 elements since vitlogd can hardly pass us
s/array/an array/ And moreover, they are named fdout and fdoutlen, but we were passing them as ninfds and infds. But I don't think we need to describe all of that, it's just the reason why they were allocated.
some FDs at this stage). Nevertheless, it's an allocation not followed by free():
==29385== 0 bytes in 60 blocks are definitely lost in loss record 2 of 1,009 ==29385== at 0x4C2C070: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==29385== by 0x54B99EF: virAllocN (viralloc.c:191) ==29385== by 0x56821B1: virNetClientProgramCall (virnetclientprogram.c:359) ==29385== by 0x563B304: virLogManagerDomainReadLogFile (log_manager.c:272) ==29385== by 0x217CD613: qemuDomainLogContextRead (qemu_domain.c:2485) ==29385== by 0x217EDC76: qemuProcessReadLog (qemu_process.c:1660) ==29385== by 0x217EDE1D: qemuProcessReportLogError (qemu_process.c:1696) ==29385== by 0x217EE8C1: qemuProcessWaitForMonitor (qemu_process.c:1957) ==29385== by 0x217F6636: qemuProcessLaunch (qemu_process.c:4955) ==29385== by 0x217F71A4: qemuProcessStart (qemu_process.c:5152) ==29385== by 0x21846582: qemuDomainObjStart (qemu_driver.c:7396) ==29385== by 0x218467DE: qemuDomainCreateWithFlags (qemu_driver.c:7450)
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/logging/log_manager.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/src/logging/log_manager.c b/src/logging/log_manager.c index 290e2c8..a9c9e72 100644 --- a/src/logging/log_manager.c +++ b/src/logging/log_manager.c @@ -256,8 +256,6 @@ virLogManagerDomainReadLogFile(virLogManagerPtr mgr, { struct virLogManagerProtocolDomainReadLogFileArgs args; struct virLogManagerProtocolDomainReadLogFileRet ret; - int *fdout = NULL; - size_t fdoutlen = 0; char *rv = NULL;
memset(&args, 0, sizeof(args)); @@ -273,7 +271,7 @@ virLogManagerDomainReadLogFile(virLogManagerPtr mgr, mgr->client, mgr->serial++, VIR_LOG_MANAGER_PROTOCOL_PROC_DOMAIN_READ_LOG_FILE, - 0, NULL, &fdoutlen, &fdout, + 0, NULL, 0, NULL,
Anyway, the 7th parameter is a pointer to size_t, so it should rather be NULL, hence making this whole line a "0, NULL, NULL, NULL,". ACK with that changed.
(xdrproc_t)xdr_virLogManagerProtocolDomainReadLogFileArgs, &args, (xdrproc_t)xdr_virLogManagerProtocolDomainReadLogFileRet, &ret) < 0) goto cleanup; -- 2.4.10
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
participants (3)
-
Ján Tomko
-
Martin Kletzander
-
Michal Privoznik