[libvirt] [PATCH] lxc: Don't pass a local variable address randomly

So, recently I was testing the LXC driver. You know, startup some domains. But to my surprise, I was not able to start a single one: virsh # start --console test error: Reconnected to the hypervisor error: Failed to start domain test error: internal error: guest failed to start: unexpected exit status 125 So I've start digging. It turns out, that in virExec(), when I printed out the @cmd, I got strange values: *(cmd->outfdptr) was certainly not valid FD number: it has random value of several millions. This obviously made prepareStdFd(childout, STDOUT_FILENO) fail (line 611). But outfdptr is set in virCommandSetOutputFD(). The only place within LXC driver where the function is called is in virLXCProcessBuildControllerCmd(). If you take a closer look at the function it looks like this: static virCommandPtr virLXCProcessBuildControllerCmd(virLXCDriverPtr driver, .. int logfd, const char *pidfile) { ... virCommandSetOutputFD(cmd, &logfd); virCommandSetErrorFD(cmd, &logfd); ... } Yes, you guessed it. @logfd is passed into the function by value. However, in the function we try to get its address (an address of a local variable) which is no longer valid once function is finished and stack is cleaned. Therefore when cmd->outfdptr is evaluated at any point after this function, we may get a random number, depending on what's currently on the stack. Of course, this may work sometimes too - it depends on the compiler how it arranges the code, when the stack is wiped out. In order to fix this, lets pass a pointer to @logfd instead of figuring out (wrong) its value in a function. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- I'd love to target this into the current release - in my case LXC driver is unusable without this patch. There might be other users affected too. src/lxc/lxc_process.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/lxc/lxc_process.c b/src/lxc/lxc_process.c index 6c23a0b..2bdce3b 100644 --- a/src/lxc/lxc_process.c +++ b/src/lxc/lxc_process.c @@ -750,7 +750,7 @@ virLXCProcessBuildControllerCmd(virLXCDriverPtr driver, int *files, size_t nfiles, int handshakefd, - int logfd, + int * const logfd, const char *pidfile) { size_t i; @@ -820,8 +820,8 @@ virLXCProcessBuildControllerCmd(virLXCDriverPtr driver, virCommandPassFD(cmd, handshakefd, 0); virCommandDaemonize(cmd); virCommandSetPidFile(cmd, pidfile); - virCommandSetOutputFD(cmd, &logfd); - virCommandSetErrorFD(cmd, &logfd); + virCommandSetOutputFD(cmd, logfd); + virCommandSetErrorFD(cmd, logfd); /* So we can pause before exec'ing the controller to * write the live domain status XML with the PID */ virCommandRequireHandshake(cmd); @@ -1208,7 +1208,7 @@ int virLXCProcessStart(virConnectPtr conn, ttyFDs, nttyFDs, files, nfiles, handshakefds[1], - logfd, + &logfd, pidfile))) goto cleanup; -- 2.3.6

On Wed, Jul 01, 2015 at 05:36:06PM +0200, Michal Privoznik wrote:
So, recently I was testing the LXC driver. You know, startup some domains. But to my surprise, I was not able to start a single one:
virsh # start --console test error: Reconnected to the hypervisor error: Failed to start domain test error: internal error: guest failed to start: unexpected exit status 125
So I've start digging. It turns out, that in virExec(), when I printed out the @cmd, I got strange values: *(cmd->outfdptr) was certainly not valid FD number: it has random value of several millions. This obviously made prepareStdFd(childout, STDOUT_FILENO) fail (line 611). But outfdptr is set in virCommandSetOutputFD(). The only place within LXC driver where the function is called is in virLXCProcessBuildControllerCmd(). If you take a closer look at the function it looks like this:
static virCommandPtr virLXCProcessBuildControllerCmd(virLXCDriverPtr driver, .. int logfd, const char *pidfile) { ... virCommandSetOutputFD(cmd, &logfd); virCommandSetErrorFD(cmd, &logfd); ... }
Yes, you guessed it. @logfd is passed into the function by value. However, in the function we try to get its address (an address of a local variable) which is no longer valid once function is finished and stack is cleaned. Therefore when cmd->outfdptr is evaluated at any point after this function, we may get a random number, depending on what's currently on the stack. Of course, this may work sometimes too - it depends on the compiler how it arranges the code, when the stack is wiped out.
In order to fix this, lets pass a pointer to @logfd instead of figuring out (wrong) its value in a function.
Nice catch and it's a tricky bug hard to find. ACK and safe for release. Pavel
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ---
I'd love to target this into the current release - in my case LXC driver is unusable without this patch. There might be other users affected too.
src/lxc/lxc_process.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/src/lxc/lxc_process.c b/src/lxc/lxc_process.c index 6c23a0b..2bdce3b 100644 --- a/src/lxc/lxc_process.c +++ b/src/lxc/lxc_process.c @@ -750,7 +750,7 @@ virLXCProcessBuildControllerCmd(virLXCDriverPtr driver, int *files, size_t nfiles, int handshakefd, - int logfd, + int * const logfd, const char *pidfile) { size_t i; @@ -820,8 +820,8 @@ virLXCProcessBuildControllerCmd(virLXCDriverPtr driver, virCommandPassFD(cmd, handshakefd, 0); virCommandDaemonize(cmd); virCommandSetPidFile(cmd, pidfile); - virCommandSetOutputFD(cmd, &logfd); - virCommandSetErrorFD(cmd, &logfd); + virCommandSetOutputFD(cmd, logfd); + virCommandSetErrorFD(cmd, logfd); /* So we can pause before exec'ing the controller to * write the live domain status XML with the PID */ virCommandRequireHandshake(cmd); @@ -1208,7 +1208,7 @@ int virLXCProcessStart(virConnectPtr conn, ttyFDs, nttyFDs, files, nfiles, handshakefds[1], - logfd, + &logfd, pidfile))) goto cleanup;
-- 2.3.6
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Wed, Jul 01, 2015 at 17:36:06 +0200, Michal Privoznik wrote:
So, recently I was testing the LXC driver. You know, startup some domains. But to my surprise, I was not able to start a single one:
virsh # start --console test error: Reconnected to the hypervisor error: Failed to start domain test error: internal error: guest failed to start: unexpected exit status 125
So I've start digging. It turns out, that in virExec(), when I printed out the @cmd, I got strange values: *(cmd->outfdptr) was certainly not valid FD number: it has random value of several millions. This obviously made prepareStdFd(childout, STDOUT_FILENO) fail (line 611). But outfdptr is set in virCommandSetOutputFD(). The only place within LXC driver where the function is called is in virLXCProcessBuildControllerCmd(). If you take a closer look at the function it looks like this:
static virCommandPtr virLXCProcessBuildControllerCmd(virLXCDriverPtr driver, .. int logfd, const char *pidfile) { ... virCommandSetOutputFD(cmd, &logfd); virCommandSetErrorFD(cmd, &logfd); ... }
Yes, you guessed it. @logfd is passed into the function by value. However, in the function we try to get its address (an address of a local variable) which is no longer valid once function is finished and stack is cleaned. Therefore when cmd->outfdptr is evaluated at any point after this function, we may get a random number, depending on what's currently on the stack. Of course, this may work sometimes too - it depends on the compiler how it arranges the code, when the stack is wiped out.
In order to fix this, lets pass a pointer to @logfd instead of figuring out (wrong) its value in a function.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ---
I'd love to target this into the current release - in my case LXC driver is unusable without this patch. There might be other users affected too.
ACK in freeze. BTW it was caused by commit e1de5521. Peter

On Wed, Jul 01, 2015 at 05:48:32PM +0200, Peter Krempa wrote:
On Wed, Jul 01, 2015 at 17:36:06 +0200, Michal Privoznik wrote:
So, recently I was testing the LXC driver. You know, startup some domains. But to my surprise, I was not able to start a single one:
virsh # start --console test error: Reconnected to the hypervisor error: Failed to start domain test error: internal error: guest failed to start: unexpected exit status 125
So I've start digging. It turns out, that in virExec(), when I printed out the @cmd, I got strange values: *(cmd->outfdptr) was certainly not valid FD number: it has random value of several millions. This obviously made prepareStdFd(childout, STDOUT_FILENO) fail (line 611). But outfdptr is set in virCommandSetOutputFD(). The only place within LXC driver where the function is called is in virLXCProcessBuildControllerCmd(). If you take a closer look at the function it looks like this:
static virCommandPtr virLXCProcessBuildControllerCmd(virLXCDriverPtr driver, .. int logfd, const char *pidfile) { ... virCommandSetOutputFD(cmd, &logfd); virCommandSetErrorFD(cmd, &logfd); ... }
Yes, you guessed it. @logfd is passed into the function by value. However, in the function we try to get its address (an address of a local variable) which is no longer valid once function is finished and stack is cleaned. Therefore when cmd->outfdptr is evaluated at any point after this function, we may get a random number, depending on what's currently on the stack. Of course, this may work sometimes too - it depends on the compiler how it arranges the code, when the stack is wiped out.
In order to fix this, lets pass a pointer to @logfd instead of figuring out (wrong) its value in a function.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ---
I'd love to target this into the current release - in my case LXC driver is unusable without this patch. There might be other users affected too.
ACK in freeze. BTW it was caused by commit e1de5521.
Wow, how did this work so well for so long. We need to apply this to the stable branches from 1.2.13 onwards. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 01.07.2015 18:11, Daniel P. Berrange wrote:
On Wed, Jul 01, 2015 at 05:48:32PM +0200, Peter Krempa wrote:
On Wed, Jul 01, 2015 at 17:36:06 +0200, Michal Privoznik wrote:
So, recently I was testing the LXC driver. You know, startup some domains. But to my surprise, I was not able to start a single one:
virsh # start --console test error: Reconnected to the hypervisor error: Failed to start domain test error: internal error: guest failed to start: unexpected exit status 125
So I've start digging. It turns out, that in virExec(), when I printed out the @cmd, I got strange values: *(cmd->outfdptr) was certainly not valid FD number: it has random value of several millions. This obviously made prepareStdFd(childout, STDOUT_FILENO) fail (line 611). But outfdptr is set in virCommandSetOutputFD(). The only place within LXC driver where the function is called is in virLXCProcessBuildControllerCmd(). If you take a closer look at the function it looks like this:
static virCommandPtr virLXCProcessBuildControllerCmd(virLXCDriverPtr driver, .. int logfd, const char *pidfile) { ... virCommandSetOutputFD(cmd, &logfd); virCommandSetErrorFD(cmd, &logfd); ... }
Yes, you guessed it. @logfd is passed into the function by value. However, in the function we try to get its address (an address of a local variable) which is no longer valid once function is finished and stack is cleaned. Therefore when cmd->outfdptr is evaluated at any point after this function, we may get a random number, depending on what's currently on the stack. Of course, this may work sometimes too - it depends on the compiler how it arranges the code, when the stack is wiped out.
In order to fix this, lets pass a pointer to @logfd instead of figuring out (wrong) its value in a function.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ---
I'd love to target this into the current release - in my case LXC driver is unusable without this patch. There might be other users affected too.
ACK in freeze. BTW it was caused by commit e1de5521.
Wow, how did this work so well for so long. We need to apply this to the stable branches from 1.2.13 onwards.
Done. Michal
participants (4)
-
Daniel P. Berrange
-
Michal Privoznik
-
Pavel Hrdina
-
Peter Krempa