[libvirt] heisenbug in command.c

Hi, It seems I've run into quite the heisenbug, reported at https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/922628 It manifests itself as virPidWait returning status=4 for iptables (which should never exit with status=4). But it's only been seen on two (very different) machines, and the slightest shifting of the winds makes it go away. Given how sneaky this bug appears to be, there's a slight temptation to have iptablesAddRemoveRule pass in a int* for status and better deal with the -EINTR. But I fear that might be papering over a worse race. Does anyone have any ideas on this? Has anyone else ever seen this? -serge

On 03/16/2012 10:36 AM, Serge Hallyn wrote:
Hi,
It seems I've run into quite the heisenbug, reported at https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/922628
It manifests itself as virPidWait returning status=4 for iptables (which should never exit with status=4).
Maybe iptables isn't documented as exiting with $? of 4, but that's what is happening. The libvirt code in question is quite clear that it grabbed an accurate exit status from the child process. ret = virPidWait(cmd->pid, exitstatus ? exitstatus : &status); if (ret == 0) { cmd->pid = -1; cmd->reap = false; if (status) { char *str = virCommandToString(cmd); char *st = virCommandTranslateStatus(status); virCommandError(VIR_ERR_INTERNAL_ERROR, _("Child process (%s) status unexpected: %s"), str ? str : cmd->args[0], NULLSTR(st));
But it's only been seen on two (very different) machines, and the slightest shifting of the winds makes it go away. Given how sneaky this bug appears to be, there's a slight temptation to have iptablesAddRemoveRule pass in a int* for status and better deal with the -EINTR. But I fear that might be papering over a worse race.
I don't follow how you think there is a -EINTR being encountered in libvirt. I think you'd be better off investigating why iptables really is exiting with status 4. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 03/16/2012 11:50 AM, Eric Blake wrote:
On 03/16/2012 10:36 AM, Serge Hallyn wrote:
Hi,
It seems I've run into quite the heisenbug, reported at https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/922628
It manifests itself as virPidWait returning status=4 for iptables (which should never exit with status=4).
Maybe iptables isn't documented as exiting with $? of 4, but that's what is happening. The libvirt code in question is quite clear that it grabbed an accurate exit status from the child process.
Well, yes. I figured that either (1) iptables actually got -EINTR from the kernel and passed that along as its exit code, or (2) something went wrong with memory being overwritten in libvirt, however unlikely. Stranger things have happened. If (1), I was wondering if it was being ignored on purpose.
ret = virPidWait(cmd->pid, exitstatus ? exitstatus :&status); if (ret == 0) { cmd->pid = -1; cmd->reap = false; if (status) { char *str = virCommandToString(cmd); char *st = virCommandTranslateStatus(status); virCommandError(VIR_ERR_INTERNAL_ERROR, _("Child process (%s) status unexpected: %s"), str ? str : cmd->args[0], NULLSTR(st));
But it's only been seen on two (very different) machines, and the slightest shifting of the winds makes it go away. Given how sneaky this bug appears to be, there's a slight temptation to have iptablesAddRemoveRule pass in a int* for status and better deal with the -EINTR. But I fear that might be papering over a worse race.
I don't follow how you think there is a -EINTR being encountered in libvirt.
Yeah I don't really either.
I think you'd be better off investigating why iptables really is exiting with status 4.
Well, given what EINTR means, shouldn't src/util/iptables.c re-try the command if it gets that? Anyway I'll keep digging, but was wondering if anyone else has seen this. -serge

Serge Hallyn wrote:
On 03/16/2012 11:50 AM, Eric Blake wrote:
On 03/16/2012 10:36 AM, Serge Hallyn wrote:
Hi,
It seems I've run into quite the heisenbug, reported at https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/922628
It manifests itself as virPidWait returning status=4 for iptables (which should never exit with status=4).
Maybe iptables isn't documented as exiting with $? of 4, but that's what is happening. The libvirt code in question is quite clear that it grabbed an accurate exit status from the child process.
Well, yes. I figured that either (1) iptables actually got -EINTR from the kernel and passed that along as its exit code, or (2) something went wrong with memory being overwritten in libvirt, however unlikely. Stranger things have happened. If (1), I was wondering if it was being ignored on purpose.
Why do you bring up EINTR at all? Just because EINTR is 4? That seems very much unrelated. This is from iptables: enum xtables_exittype { OTHER_PROBLEM = 1, PARAMETER_PROBLEM, VERSION_PROBLEM, RESOURCE_PROBLEM, XTF_ONLY_ONCE, XTF_NO_INVERT, XTF_BAD_VALUE, XTF_ONE_ACTION, }; So it looks like iptables is returning RESOURCE_PROBLEM (which could explain why it's intermittent). -jim
ret = virPidWait(cmd->pid, exitstatus ? exitstatus :&status); if (ret == 0) { cmd->pid = -1; cmd->reap = false; if (status) { char *str = virCommandToString(cmd); char *st = virCommandTranslateStatus(status); virCommandError(VIR_ERR_INTERNAL_ERROR, _("Child process (%s) status unexpected: %s"), str ? str : cmd->args[0], NULLSTR(st));
But it's only been seen on two (very different) machines, and the slightest shifting of the winds makes it go away. Given how sneaky this bug appears to be, there's a slight temptation to have iptablesAddRemoveRule pass in a int* for status and better deal with the -EINTR. But I fear that might be papering over a worse race.
I don't follow how you think there is a -EINTR being encountered in libvirt.
Yeah I don't really either.
I think you'd be better off investigating why iptables really is exiting with status 4.
Well, given what EINTR means, shouldn't src/util/iptables.c re-try the command if it gets that?
Anyway I'll keep digging, but was wondering if anyone else has seen this.
-serge
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 03/16/2012 03:52 PM, Jim Paris wrote:
Serge Hallyn wrote:
On 03/16/2012 11:50 AM, Eric Blake wrote:
On 03/16/2012 10:36 AM, Serge Hallyn wrote:
Hi,
It seems I've run into quite the heisenbug, reported at https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/922628
It manifests itself as virPidWait returning status=4 for iptables (which should never exit with status=4).
Maybe iptables isn't documented as exiting with $? of 4, but that's what is happening. The libvirt code in question is quite clear that it grabbed an accurate exit status from the child process.
Well, yes. I figured that either (1) iptables actually got -EINTR from the kernel and passed that along as its exit code, or (2) something went wrong with memory being overwritten in libvirt, however unlikely. Stranger things have happened. If (1), I was wondering if it was being ignored on purpose.
Why do you bring up EINTR at all? Just because EINTR is 4?
Yup.
That seems very much unrelated.
I didn't think so, but looks like you're right :)
This is from iptables:
enum xtables_exittype { OTHER_PROBLEM = 1, PARAMETER_PROBLEM, VERSION_PROBLEM, RESOURCE_PROBLEM, XTF_ONLY_ONCE, XTF_NO_INVERT, XTF_BAD_VALUE, XTF_ONE_ACTION, };
So it looks like iptables is returning RESOURCE_PROBLEM (which could explain why it's intermittent).
That makes a lot more sense then, yes. When scripting a bunch of parallel iptables rules adds (followed by waits), I do once in awhile get iptables: Resource temporarily unavailable. (and sometimes a -EINVAL message) though 'wait' still says status was 0. I've never gotten 4. The next run then succeeds. This still however sounds like src/util/iptables.c might ought to re-run the command if it gets 4. thanks, -serge

On Fri, Mar 16, 2012 at 04:42:42PM -0500, Serge Hallyn wrote:
On 03/16/2012 03:52 PM, Jim Paris wrote:
Serge Hallyn wrote:
On 03/16/2012 11:50 AM, Eric Blake wrote:
On 03/16/2012 10:36 AM, Serge Hallyn wrote:
Hi,
It seems I've run into quite the heisenbug, reported at https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/922628
It manifests itself as virPidWait returning status=4 for iptables (which should never exit with status=4).
Maybe iptables isn't documented as exiting with $? of 4, but that's what is happening. The libvirt code in question is quite clear that it grabbed an accurate exit status from the child process.
Well, yes. I figured that either (1) iptables actually got -EINTR from the kernel and passed that along as its exit code, or (2) something went wrong with memory being overwritten in libvirt, however unlikely. Stranger things have happened. If (1), I was wondering if it was being ignored on purpose.
Why do you bring up EINTR at all? Just because EINTR is 4?
Yup.
That seems very much unrelated.
I didn't think so, but looks like you're right :)
This is from iptables:
enum xtables_exittype { OTHER_PROBLEM = 1, PARAMETER_PROBLEM, VERSION_PROBLEM, RESOURCE_PROBLEM, XTF_ONLY_ONCE, XTF_NO_INVERT, XTF_BAD_VALUE, XTF_ONE_ACTION, };
So it looks like iptables is returning RESOURCE_PROBLEM (which could explain why it's intermittent).
That makes a lot more sense then, yes. When scripting a bunch of parallel iptables rules adds (followed by waits), I do once in awhile get
iptables: Resource temporarily unavailable.
(and sometimes a -EINVAL message)
though 'wait' still says status was 0. I've never gotten 4. The next run then succeeds.
This still however sounds like src/util/iptables.c might ought to re-run the command if it gets 4.
Or do we need some serialization of commands at a higher level perhaps ? Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (4)
-
Daniel P. Berrange
-
Eric Blake
-
Jim Paris
-
Serge Hallyn