On 09/13/2018 07:29 AM, Wang Yechao wrote:
numa_nodes_ptr is a global variable in libnuma.so. It is been freed
after main thread exits. If we have many running vms, restart the
libvirtd service continuously at intervals of a few seconds, the main
thread may exit before qemuProcessReconnect thread, and a segfault
error occurs. Backstrace as follows:
0 0x00007f40e3d2dd72 in numa_bitmask_isbitset () from /lib64/libnuma.so.1
1 0x00007f40e4d14c55 in virNumaNodeIsAvailable (node=node@entry=0) at
util/virnuma.c:396
2 0x00007f40e4d16010 in virNumaGetHostMemoryNodeset () at util/virnuma.c:1011
3 0x00007f40b94ced90 in qemuRestoreCgroupState (vm=0x7f407c39df00, vm=0x7f407c39df00) at
qemu/qemu_cgroup.c:877
4 qemuConnectCgroup (driver=driver@entry=0x7f407c21fe80, vm=0x7f407c39df00) at
qemu/qemu_cgroup.c:969
5 0x00007f40b94eef93 in qemuProcessReconnect (opaque=<optimized out>) at
qemu/qemu_process.c:3531
6 0x00007f40e4d34bd2 in virThreadHelper (data=<optimized out>) at
util/virthread.c:206
7 0x00007f40e214ee25 in start_thread () from /lib64/libpthread.so.0
8 0x00007f40e1e7c36d in clone () from /lib64/libc.so.6
Signed-off-by: Wang Yechao <wang.yechao255(a)zte.com.cn>
---
src/util/virnuma.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
Not quite sure what is the difference between v1 posted:
https://www.redhat.com/archives/libvir-list/2018-September/msg00504.html
and this v2 posted 1 day later. Please be patient as patches are many
and reviewers are few... Reviewers also have their own work to complete.
If after a week of no response, then use reply-all to ping on the
original posting. Assuming you've set up your email client and posted
the patches properly (see
https://libvirt.org/hacking.html for 'git
send-email' suggestions), your reply will bump the message and hopefully
someone has a chance to look. Usually within a week or two you'll have
an answer.
Anyway, as for this particular patch...
I'm wondering if libvirt is the "right place" for this patch. From your
description and trace, I'm thinking perhaps this is more of a numa
problem with the numa_bitmask_isbitset API which says:
int numa_bitmask_isbitset(const struct bitmask *bmp, unsigned int n);
...
numa_bitmask_isbitset() returns the value of a specified bit in a bit
mask. If the n value is greater than the size of the bit map, 0 is returned.
The fact that we pass a global from libnuma and it's NULL is
"problematic" since previous calls didn't find that as NULL; otherwise,
domain startup would have failed fairly ungracefully. If the "frequent
and continuous" libvirtd restarts are the problem, then that should be
reproducible without libvirtd with some amount of effort in coding.
I also don't consider frequent and continuous libvirtd restarts to be a
"common problem" - it's nice for "testing", but not really a
"real
world" type activity.
BTW: A concern with returning false now after having returned true
previously before the reconnect is the impact on various callers that at
one time thought numa was available, but now it's not and they error out
somewhat quietly, but in essence the domain could die "for some reason"
and finding *that* needle in the haystack of code won't be easy. What if
a first few succeed, then the pointer goes away, and libvirtd goes to
use it.
Since this "could" happen I believe we need to save the fact that some
original connection knew that numa was available so that reconnect can
check too if it's available and handle the failure appropriately.
Whether that's an error and die or some sort of mechanism to somehow
force libnuma to fill that pointer back in, I'm not sure yet... Still
lean towards a libnuma fix over a libvirt fix.
If we were to do something, what I have in mind in addition to what
you've posted is...
In qemuProcessPrepareHost we can "check and save" if virNumaIsAvailable
in qemuDomainObjPrivatePtr in such a way that it gets written out to the
live domain status file. That covers regular startup *and* migration
destination. We could also do it in qemuProcessLaunch or wherever we
perhaps first would need to make a call to know if it's available. I
think it's best to fetch early, but another reviewer may say to fill it
in at a later time.
Then in qemuProcessReconnect prior to calling qemuConnectCgroup could
check if we originally determined that numa was available and check
again is virNumaIsAvailable. If it's not available now, then we could
generate an error.
But again, I think this is a timing bug in libnuma so if libvirt could
do nothing and let the failure point at libnuma instead.
Maybe others have a different opinion, so I'd wait a bit before acting
upon the suggestions above.
John
diff --git a/src/util/virnuma.c b/src/util/virnuma.c
index 67e6c86..f06f6b3 100644
--- a/src/util/virnuma.c
+++ b/src/util/virnuma.c
@@ -381,7 +381,10 @@ virNumaGetMaxCPUs(void)
bool
virNumaNodeIsAvailable(int node)
{
- return numa_bitmask_isbitset(numa_nodes_ptr, node);
+ if (numa_nodes_ptr)
+ return numa_bitmask_isbitset(numa_nodes_ptr, node);
+ else
+ return false;
}