On 12/17/2014 08:06 AM, Martin Kletzander wrote:
On Wed, Dec 17, 2014 at 12:00:36AM -0700, Eric Blake wrote:
> On 12/16/2014 11:51 PM, Eric Blake wrote:
>> On 12/15/2014 12:58 AM, Martin Kletzander wrote:
>>> Instead of setting the value of cpuset.mems once when the domain starts
>>> and then re-calculating the value every time we need to change the
>>> child
>>> cgroup values, leave the cgroup alone and rather set the child data
>>> every time there is new cgroup created. We don't leave any task in the
>>> parent group anyway. This will ease both current and future code.
>>>
>>> Signed-off-by: Martin Kletzander <mkletzan(a)redhat.com>
>>> ---
>>> src/qemu/qemu_cgroup.c | 67
>>> ++++++++++++++++++++++++++++++++++++++++++++++++--
>>> src/qemu/qemu_driver.c | 59
>>> +++++++++++++++-----------------------------
>>> 2 files changed, 85 insertions(+), 41 deletions(-)
>>
>> This patch causes libvirtd to segfault on startup:
>
> More particularly, I'm doing an upgrade from older libvirtd to this
> version, while leaving a transient domain running that was started by
> the older libvirtd. Hope that helps you narrow in on the problem.
Weird - At the time I made the report, I ran 'git bisect' and reliably
reproduced the crash across multiple libvirtd restarts (a restart for
each new build while trying to nail down the culprit commit), as long as
I left that transient domain running.
>
I tried that and it works for me. And I tried various domains, both
heavily cgroup dependent and simple ones.
But now that I've rebooted, and then proceeded to do incremental builds
from both before and after the patch, I can't reproduce the crash.
Although my formula for creating my transient domain was the same both
yesterday and today, there may be some difference in the version of
libvirtd that was running at the time I created the domain that then
affected the XML affecting the libvirtd restarts.
> Reverting 86759e and af2a1f0 was sufficient to get me going again
> locally, but I'm not going to push my reversions until you've first had
> a chance to address the regression.
>
>> #2 0x00007ffff7405673 in virCgroupHasEmptyTasks
(cgroup=0x0,
>> controller=2)
>> at util/vircgroup.c:3935
From this line it looks like priv->cgroup was not initialized. I did
not add a check for that, so that may be the cause. I'll send a patch
soon.
But I wonder how did you manage to do that, is that a session libvirtd
you restarted? Otherwise how come virCgroupNewDetectMachine() didn't
fill the cgroup?
I'm not spotting anything obvious why it wasn't initialized at the time
I reproduced the crash, nor why I can't reproduce it now. Hopefully
it's not a lurking time bomb.
If it happens again, I'll definitely report it; but for now, without a
reliable reproduction, it could easily have been something caused on my
end (since it is my dev machine, it may have been caused by a half-baked
patch on my end that I was testing at the time I created the transient
domain, where using only upstream patches wouldn't see the issue). So
for now, don't worry too hard if you can't find it either.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library
http://libvirt.org